|Acquisition Program: ||PMA 281, PM-SOMPE, PEO-C4I|| Objective: ||The main objective of this SBIR topic is to develop techniques for automatic extraction and representation of contents of images that helps in automatic understanding of complex scenes.
|| Description: ||Automatic scene understanding is essential to reduce the workload of a commander or a warfighter in maritime or urban security operations where the environment is demanding, complex and changing dynamically. Currently, mainly imaging sensors such as visual and/or IR cameras are used for security operations. Several algorithms have been developed for image enhancement, segmentation and object recognition. However, for scene understanding, there is a need for automatic extraction and representation of image contents in the form of semantics, syntax, perception and grammar development. This kind of extraction and representation is similar to language modeling i.e., probabilistic grammar and knowledge representation that is used in automatic speech understanding by applying natural language processing techniques [1-2]. Similar trend can be currently seen in the areas of vision and relational database systems [3-5]. However, there is a need for rigorous algorithm development for the image content extraction, representation and grammar development for automated image understanding. This SBIR topic is addressing this need.
|| ||PHASE I: (a) develop algorithms for image content extraction, representation (e.g., temporal-spatial), grammar (e.g., context sensitive) and text generation (e.g., sentence generation that describes the image) and (b) proof of concept demonstration on a chosen example image/scene
|| ||PHASE II: (a) extend algorithms for disparate distributed data sources (e.g., audio and video) and (b) extend and demonstrate the performance of algorithms on complex urban or maritime surveillance/security scenes.
|| ||PHASE III: Extend phase II efforts to incorporate the algorithms as part of a Navy surveillance/security operations system. Collaborate with Navy laboratories and industry to transition the algorithms to naval and/or other DoD systems and other commercial applications such as digital library, description of points of interest using wireless devices like cellular phone and Personal Digital Aids (PDAs).
PRIVATE SECTOR COMMERCIAL POTENTIAL/|| ||DUAL-USE APPLICATIONS: Automatic generation of image content is needed in many information retrieval, navigation and digital library systems. The algorithms developed under this SBIR topic will have direct impact on these commercial applications. For example, in a navigation system, these algorithms can be used to annotate the current scene that the system has captured using the visual camera in terms of points of interest (e.g. type of a museum based on the icon on the building or text on the building), restaurants by food type, etc
|| References: ||
1. Zue, V.; Seneff, S.; Glass, J.R.; Polifroni, J.; Pao, C.; Hazen, T.J.; Hetherington, L., “JUPlTER: a telephone-based conversational interface for weather information,” IEEE trans. On Speech and Audio processing, Volume 8, Issue 1, Jan. 2000 Page(s):85 – 96.
2. Zue, V.W.; Glass, J.R, “Conversational interfaces: advances and challenges,” Proceedings of the IEEE, Volume 88, Issue 8, Aug. 2000 Page(s):1166 – 1180.
3. S. Kumar, ``Models for learning spatial interactions in natural images for context-based classification’’, Ph.D Thesis, CMU-RI-TR-05-28, CMU, August, 2005.
4. S. Casadei, “Hierarchical estimation of image features with compensation of model approximation errors”, International Conference on Computer Vision Theory and Applications, 25 - 28 February, 2006, Setúbal, Portugal.
5. F. Han and S.-C. Zhu, “Bottom-up/top-down image parsing with attribute grammar”, Preprint, 2005|
|Keywords: ||Image content extraction; Image parsing; probabilistic grammar; text generation; Image content representation.|