SITIS Topic Details |
||||||
| Proposals Accepted: | |
| Program: | SBIR |
| Topic Number: | AF103-032 (AirForce) |
| Title: | Multi-camera real-time Feature Recognition, Extraction & Tagging Automation (McFRETA) | Research & Technical Areas: | Information Systems |
| The technology within this topic is restricted under the International Traffic in Arms Regulation (ITAR), which controls the export and import of defense-related material and services. Offerors must disclose any proposed use of foreign nationals, their country of origin, and what tasks each would accomplish in the statement of work in accordance with section 3.5.b.(7) of the solicitation. | Objective: | Develop an open and scalable framework/tool to perform automated feature recognition of multiple streaming sources and to extract metadata and make available for both ongoing operations and forensics.
| Description: | Burgeoning numbers of battlespace video sensors require machine assistance and warfighter interface engineering to enable identification of important features including behavior, faces, vehicles, and construction activities. These cameras are located on vehicles and fixed placements with data feeds often available on a “post before process” basis. Currently, all this video information inundates without illuminating battlefield decision-makers. The ability to incorporate all sources into an orchestrated event processing system, referenced to archival military “YouTube” databases, will necessitate real-time extraction and metadata tagging of features. While there are many existing algorithms that can be applied to individual feature extraction tasks, there is no way for operators to perform ad-hoc queries on what entities are within the real-time field of regard of a sensor or sensor set. New algorithms need to be developed that will point out possibly important activities from the multitude of live sensors in real-time. Invariants in space, time, illumination, sensor resolutions/bands must be extracted—automatically—to account for varying perspectives, distances, transmission latencies, and sensors. Novel live video information processing techniques such as adaptive multi-spectral sensor fusion, viewpoint invariant matching (VIM), and inter-camera image point cloud correlation need to be refined and extended, and new techniques suitable for dynamically moving multiple cameras need to be invented. The user interface in this architecture is critical—a human cannot look at all the information from a multi-camera plus archival yottabyte surveillance system and pull out what is important. Presentation to the operator might be a 3D space rendered in 2D screens comprising animated graphics, cartoons, and avatars for tracked objects (person, vehicle, or group) with embedded fused video coming up by mouse-over on dynamic symbols. Bandwidth and connectivity considerations may argue for pre-processing on or near collection platforms with salient information transmitted but with live video available on demand.
Existing tools are almost exclusively based on off-line processing and are not adequate for real-time execution. The tool sought in this topic comprises definition of an open framework for integration of real-time feature recognition and extraction algorithms, generation of a stream of standardized metadata associated with the content source, and design and demonstration of an open, scalable system that supports queries and event/alert notification based on rule sets. Operators enabled with automatic extraction and posting of features could perform machine queries regarding features of interest, vs. the current time-consuming error-prone procedure of asking individual sensor operators what they see or have seen recently. Additionally, through event processing, a rule set could be defined. The metadata needs to be in a consistent format (e.g., Community of Interest defined schemas). The methodology proposed must enable diverse sensors and the integration of feature recognition and extraction algorithms with an asynchronous event and querying capability. Due to the heterogeneous nature of the content capture and storage systems as well as the operations (or forensics) systems, the integrating framework must be open and user-friendly so as to enable queries in a broad manner.
| PHASE I: Identify algorithms for feature recognition and extraction suitable for realtime application; identify suitable metadata tags that allow for human and machine devices search criteria; devise a framework that would function in orchestration and event processing frameworks. Design a prototype system.
| PHASE II: Prototype and demonstrate automated identification, tagging, and tracking of humans and vehicles from multiple realtime video feeds. Develop test framework and demonstrate how existing and new algorithms can be incorporated and tested. Show how an operator can develop queries and rules that assist assessment and execution. Demonstrate scalability from tactical to regional areas of interest.
| PHASE III | DUAL USE COMMERCIALIZATION:
Military Application: Military applications include high-value target location, improvised explosive device detection and prevention, and automated generation of alerts based using rules on metadata.
Commercial Application: Commercial applications include homeland security, police, and industrial site surveillance.
| References: |
1. Meichun Hsu and Tao Yu, “An In-Database Streaming Solution to Multi-camera Fusion,” in Data Management in Grid and Peer-to-Peer Systems,” Lecture Notes in Computer Science, Vol 5697, pp. 136ff (Springer, Berlin Heidelberg, 2009); http://www.springerlink.com/content/2ql0052m32633116/ 2. M. Andriluka, S. Roth, and B. Schiele, “People-tracking-by-detection and people-detection-by-tracking,” IEEE Conf on Computer Vision and Pattern Recognition, http://ieeexplore.ieee.org (2008). 3. Workshop on Multi-camera and Multi-modal Sensor Fusion Algorithms and Applications, The 10th European Conference on Computer Vision (ECCV), http://www.elec.qmul.ac.uk/staffinfo/andrea/dwnld/Abstracts.M2SFA2.2008.pdf (2008). 4. Mike Hanion, Panoptic C-Thru 3D Video Surveillance System, www.panopticsystems.com provides an example 3D graphical/animated cartoon presentation of multi-sensor & video fusion (accessed 7 December 2009). 5. A. Senior, “An Introduction to Automatic Video Surveillance,” Chapter 1, Protecting Privacy in Video Surveillance (Springer, London, 2009). |
| Keywords: | realtime video surveillance, people and vehicle tracking, multi-camera multi-sensor fusion algorithms, automated identification and tagging, animated graphical interface, metadata, extraction, framework, database, query, event, asychronous |
Questions and Answers: |
Q: Is a proposal (and the computer used for constructing a proposal for an ITAR solicitation) for an ITAR solicitation considered to be ITAR? Thank you. |
A: No. |
Q: 1. For Phase I work, is it sufficient to provide feature recognition and extraction algorithms for video sensors only? Should there be a support for sensors other than video in the framework? |
A: 1. Yes. No. |
Q: It is difficult to visualize what you are expecting by this statement -- |
A: Single integrated display (fusion of all video feeds). |
Q: Is a phase 1 option allowable for this solicitation? |
A: No. |
Q: What is expected from the phase 1 option portion of the solicitation? |
A: Phase 1 is mandatory-is not an option. Requirement for Phase 1 effort is described in the Phase 1 section of the topic. |
Q: In regards to the identification of suitable metatags - are these to be used for both human identification as well as vehicle identification? |
A: Yes. |
Q: 1. what type of video streams should we design for? |
A: Q1. Design for standard video interfaces/streams. |
Q: Are there any performance metrics we should consider for the framework? |
A: Based on commercially available video cameras. |
Q: Would it be a safe assumption to say that the framework/tool could be designed around a given format for simplicity sake - say AVI - with the intention of a pluggable framework to switch the video format? |
A: Yes, so long as the approach addresses the vision laid out in the topic description and the proposed statement of work addresses the Phase I section in the topic. |
Q: Are there any software requirements in terms of what platform us leveraged? For example - would it be a safe assumption to design the software to work for a windows operating system like microsoft windows - or is there is a specific operating system this is targeted for? |
A: Propose the OS that you think works best. MS Windows is an option. Open Systems architecture is needed. |
Q: are there any hardware restrictions in terms of bandwidth, processing power, memory consumption, storage consumption and so on? |
A: No. But must not waste resources. |
Q: Is the framework expected to have the ability to leverage a query like language - similar to that of SQL of database servers - or - is it a user interface driven query where the end user can select from a series of options or search text? |
A: The latter. |
Q: If this is to be deployed to a given existing platform - what are the software and hardware restrictions? |
A: . . . response pending . . . |
Q: What is expected of the orchestration and event processing framework? |
A: Propose your own framework to address the topic. |
Q: It is understood that that we need only identify algorithms for the purposes of the solicitation. |
A: Q1. Yes. |
Q: Thank you for all the answers - I think there was a miscommunication some place in one of the questions... |
A: There is no such thing as phase 1 option. |
As of midnight September 1, questions for solicitations SBIR 10.3 and STTR 10.B will no longer be accepted.
To read the solicitation for full proposal preparation and submission details click here. |