SITIS Topic Details |
||||||
| Proposals Accepted: | |
| Program: | SBIR |
| Topic Number: | AF103-036 (AirForce) |
| Title: | Multi-Modal Interactions for Multi-RPA (Remotely Piloted Aircraft) Supervisory Control | Research & Technical Areas: | Human Systems |
| The technology within this topic is restricted under the International Traffic in Arms Regulation (ITAR), which controls the export and import of defense-related material and services. Offerors must disclose any proposed use of foreign nationals, their country of origin, and what tasks each would accomplish in the statement of work in accordance with section 3.5.b.(7) of the solicitation. | Objective: | Develop and demonstrate techniques for allowing naturalistic, multi-modal human-machine interactions which includes a shared understanding about plans and goals in a multi-RPA supervisory control environment.
| Description: | As RPA operations continue to mature and expand into a variety of operational contexts, traditional ground control station technologies may be inappropriate for dismounted warriors. Today RPA operators must navigate multiple complex menu structures, memorize keystroke sequences, and visually search for interface control elements such as icons and buttons. This leads to potential mode confusion as well as increased workload, error rates and response times. New interface technologies are required to enable efficient human-machine interactions, because man-portable systems may be limited to PDA’s, ruggedized laptops, or mobile devices. This issue is exacerbated by the push for multi-RPA supervisory control by a single operator. Some research suggests that interfaces based on a natural language approach may enhance the effectiveness of traditional input devices, such as keyboard and mouse.
Complete natural language understanding and fully natural, human-like interactions have long been unachievable goals in human-machine interactions. While progress has been made in speech recognition, natural language understanding, and sketch and gesture recognition, the state of the art still falls well short of complete, natural multi-modal human input, robust and deep machine understanding of human instructions, and human-like system response. Most current systems (a) emphasize one modality or form of interaction to the exclusion of others (with consequences for the speed and ease of human interaction), (b) require the human to learn and use a specific vocabulary of utterances and/or gestures (with consequences for training, naturalistic interaction and possibly for human-machine error rates), (c) require extensive system training where the system learns the human’s unique behaviors (with consequences for ease and speed to utility, as well as brittleness and lack of transferability of the system to different users or contexts) and/or (d) restrict themselves to an extremely narrow set of operations (with a highly restricted set of vocabulary, utterances, and gestures).
The reason for this lack of broader success in integrated, multi-modal interaction understanding is not so much the failure of interpretations of the individual recognition techniques in alternate modalities, as it is the lack of an integrative framework around which to organize what is understood from the alternate modalities. Even humans, if they are untrained in RPA operations, will have trouble understanding, in any deep sense, what is being discussed or requested of them due to the range of implicit domain knowledge about plans, operations, constraints, and restrictions in the RPA domain.
If multi-modal human-machine interaction systems are to advance to the next level of robust and extended functionality, it is critical that they be able to understand and clearly convey the operational implications of communications between the human and machine. In order to accomplish this, multiple modes of communication must be integrated into a framework of knowledge about RPA operations. The multi-modal interaction system must be aware of the operational domain such that the operator’s input may be naturalistic yet reliably interpreted by the system to match the operator’s intent. The R&D challenge is to develop a framework for multi-modal human machine interaction that enables reasonable, yet restricted, inferences for a wide range of contexts and alternate multi-modal inputs/responses. The outcome should be more natural (i.e., resembling human to human) and efficient interaction resulting in reduced errors, operator workload, and time to perform tasks for RPA planning, monitoring, and/or decision aiding systems.
| PHASE I: For a representative RPA mission planning, control, or ISR application, develop an architecture for integrated plan-aware multi-modal interaction recognition. Demonstrate aspects of the component technologies and illustrate how they will be integrated to provide enhanced benefits in Phase II. Develop an experimental plan to establish improvements in usability in Phase II.
| PHASE II: Develop and demonstrate a prototype system for integration with a representative application domain simulation. Evaluate the human-machine interactions to demonstrate payoffs in interaction speed, error reduction, workload, training time reduction, and/or interaction flexibility.
| PHASE III DUAL USE APPLICATIONS:
MILITARY APPLICATION: Successful enhancements in multi-modal human-machine interaction would have application in a variety of complex military and commercial monitoring, planning and control domains.
COMMERCIAL APPLICATION: RPA control and Air Operations Center operations are immediate application areas, but utility would also be present for domains such as commercial air traffic control, complex manufacturing operations, and smart grid power generation and distribution.
| References: | 1. Allen, J., et al., (1994). The Trains project: A case study in building a conversational planning agent. Journal of Experimental and Theoretical AI, 7:7-48. 2. Carberry, S. (2001). Techniques for plan recognition. User Modeling and User-Adapted Interaction, 11(1-2). 3. Goldman, R. P., Geib, C. W., & Miller, C. A. (1999). A new model of plan recognition. In Proceedings of the Conference on Uncertainty in Artificial Intelligence, pp. 245--254. 4. Jaimes, A. & Sebe, N. (2005). Multimodal Human Computer Interaction: A Survey. In Computer Vision and Image Understanding, 108(1-2). 116-134. 5. Lesh, N., Rich, C., & Sidner, C., (1999). Using Plan Recognition in Human-Computer Collaboration, In Proceedings of the Conference on User Modelling, Banff, Canada, NY: Springer Wien. 6. Rouse, W., Geddes, N., & Curry, R. (1987). An architecture for intelligent interfaces: Outline of an approach to supporting operators of complex systems. Human-Computer Interaction, 3, 87-122. 7. Sharma, R., Yeasin, M., Krahnstoever, N., Rauschert, C., Brewer, I., MacEachren, A., and Sengupta, K. (2003) Speech-gesture driven multimodal interfaces for crisis management. Proceedings of the IEEE 91: 1327–54. 8. Rowe, A. J., Liggett, K. K., and Davis, J. E. (2009). Vigilant spirit control station: a research testbed for multi-UAS supervisory control interfaces. In Proceedings of the Fifteenth International Symposium on Aviation Psychology. Dayton, OH: WSU. |
| Keywords: | multi-modal interaction, human-machine interaction, plan recognition, intent inference, supervisory control, RPA, gesture and sketch recognition |
Additional Information, Corrections, References, etc.. |
Ref #8: NOTE: Ref. 8 document uploaded in SITIS 8/20/10 and now available for view/download. AF103_036 Ref 8 Vigilant Spirit Control Station.doc |
Questions and Answers: |
Q: Would a real-time voice to text engine for continuous speech as a component of the multi-RPA supervisory control environment be considered under this SBIR? Thank You |
A: Yes, voice to text and voice recognition are candidates for a multi-modal system for multi-RPA supervisory control. The focus of this topic should not be the development of yet another voice system, but rather integration of the current state-of-the-art. |
As of midnight September 1, questions for solicitations SBIR 10.3 and STTR 10.B will no longer be accepted.
To read the solicitation for full proposal preparation and submission details click here. |