SITIS Topic Details

Proposals Accepted:  
Program:  SBIR
Topic Number:  A10-172 (Army)
Title:  Obstacle Detection and Awareness via High-Resolution Monocular Video
Research & Technical Areas:  Sensors, Electronics

Acquisition Program:  
  Objective:  This SBIR aims to build an electro-optic mobility aid for indirect driving and situational awareness applications that allows Soldiers to detect and analyze distant obstacles from a moving manned or unmanned ground vehicle with high-resolution monocular video.
  Description:  The United States Army increasingly aims to integrate advanced sensor technologies onto its manned and unmanned ground vehicles to advance Soldiers’ situational awareness and indirect driving capabilities in the field. In this regard, monocular video is a very critical sensing technology because it is fairly inexpensive and requires relatively little power. In addition, monocular camera technologies are reliable; and, such technologies emit almost no electromagnetic signature. In recent years, monocular cameras have offered increased spatial resolution; however, the application of these high-resolution imaging technologies upon military ground vehicles is hindered by the limited resolution of their display panels and bandwidth of their content delivery systems. Often, engineers alleviate this issue by down-sampling the original monocular signal to match the system’s display resolution or data transfer capability; however, with this approach, Soldiers cannot view a local Region of Interest (ROI) in the highest captured resolution. Soldiers in the field increasingly use high-resolution monocular video to operate their systems sometimes, beneath a closed hatch. These vehicle operators have no access to windows that could improve situational awareness; and, they increasingly require capabilities to understand the threat of potential obstacles from a large distance. The Army therefore desires the capability to use existing high-resolution monocular cameras to their fullest potential to detect and understand the threat of obstacles while driving. Virtual Pan-Tilt-Zoom (PTZ) methods may be used to partially address this issue. Virtual PTZ algorithms allow Soldiers to pan, tilt, and zoom through a high-resolution video feed in real-time without mechanical components. These algorithms thereby allow Soldiers to view potential obstacles within some local ROI from a large distance in the highest captured resolution. Unfortunately, no known technologies exist that meld Virtual PTZ methods with those that detect and understand obstacles from a vehicle moving at speeds of up to sixty miles per hour. These algorithms would benefit ongoing Army operations. For instance, current capabilities to detect and optically zoom upon an obstacle of interest are limited by the number of zoom-capable cameras upon the vehicle. A versatile Virtual PTZ framework would permit one high-resolution monocular camera to simultaneously detect, track, and zoom upon several obstacles of interest from a large distance. A robust Virtual PTZ framework must allow random access to any local ROI with little latency. Obstacles may be dynamic or stationary. The obstacle detection and tracking system must be robust to physical occlusion and environmental conditions for instance, varying light levels, shadows, and precipitation. The tracking system must account for orientation shifts to obstacles as the vehicle moves; and, the system must at a minimum offer a method to measure an obstacle threat to the vehicle and alert its operator as appropriate. The solution must provide all of this information and capability to the Soldier in an intuitive manner.

  PHASE I: Design the Obstacle Detection and Awareness System with a High-Resolution Monocular Camera. Provide a report that describes the intended implementation. At a minimum, it must explain the algorithm, provide a user interface concept, describe significant design trade-offs, estimate an implementation schedule, and include a risk mitigation matrix. In addition, applicants must conduct proof-of-principle experiments to support their concept and provide evidence of its viability.

  PHASE II: Completely develop the Obstacle Detection and Awareness System for a High-Resolution Monocular Camera. A prototype shall be integrated onto a commercial vehicle or robot; however, care must be taken to ensure its eventual integration on a manned or unmanned military ground vehicle. It shall be demonstrated in an outdoor urban environment of the Government’s choice. Reports shall be delivered that document Project-related activities, the system’s technical specifications, and a User’s Guide.

  PHASE III: The Obstacle Detection and Awareness System described herein may be integrated onto a fleet of manned and unmanned military ground vehicles to improve operators’ situational awareness and indirect driving capabilities in combat. These capabilities are particularly important for urban operations wherein obstacles are dispersed throughout cluttered and dynamic environments; but, they may also be employed in off-road mountainous terrains wherein obstacles are hidden throughout natural landscapes. Virtual PTZ algorithms may be employed in various commercial applications, such as interactive television, wide-area surveillance, and interactive streaming media. Combined with the obstacle detection and awareness technologies detailed herein, the complete Obstacle Detection and Awareness System may be used in the luxury automobile and emergency response i.e., police, fire, SWAT, etc. markets to improve operators’ awareness of obstacles while driving. The IMOPAT ATO strongly supports this effort; and, if successful, it aims to provide non-SBIR funding after Phase II to integrate the technology onto future military vehicle platforms.

  References:  1. Mavlankar, A., Baccichet, P., Varodayan, D., and Girod, B., Optimal Slice Size for Streaming Regions of High Resolution Video with Virtual Pan/Tilt/Zoom Functionality, Proc. of 15th European Signal Processing Conference (EUSIPCO), Poznan, Poland, Sept. 2007.

2. Sinn, R., Virtual Pan-Tilt-Zoom for a Wide-Area-Video Surveillance System Master’s Thesis, Massachusetts Institute of Technology, September 2008.

3. Ulrich, I., and Nourbakhsh, I., Appearance-Based Obstacle Detection with Monocular Color Vision, Proc. of AAAI 2000.

4. Regensburger, U., and Graefe, V., Visual Recognition of Obstacles on Roads, Proc. IEEE/RSJ International Conference on Intelligent Robots and Systems, 1994.

Keywords:  Obstacle Detection, Indirect Driving, Situational Awareness, Tracking, Pan, Tilt, Zoom, Threat Evaluation

Questions and Answers:
Q: 1. What kinds of obstacles are of interest? For example, specific types of potential threats such as humans, trucks, etc. or more generally anything that might pose a collision threat, including fallen tree branches, rocks, etc.
2. What kinds of threats are of interest for automatic detection? For example, general collision threats, or more specific threats such as a human carrying a grenade launcher.
3. Are there any specific system requirements, such as minimum/maximum detection range, field of view (forward facing, 180 degrees, 360 degrees etc), processing limitations, and nighttime operation?
4. Is the system intended to work on urban roads, highways, rough off-road terrain or all of the above?
A: Here are the responses to your questions:
1) I'd say anything that might pose a collision threat (i.e., fallen tree branches, rocks, etc.).

2) General collision threats.

3) I'd say a detection range between 200 m and 500 m would be sufficient. The camera should be a forward-facing color daytime camera w/ a resolution of at least 1920x1080. At this point, there is no desire for nighttime capability (i.e., with an IR sensor). And at this point, there are no specific processing limitations. That said, I think the solution should be as lean in hardware as possible; and, as agnostic to specific sensor, display, and processor types as possible. This SBIR Topic primarily supports an R&D vehicle program; but if we'd like to transition it to a fielded platform, it'd be much easier to do if the hardware footprint is small. From our perspective, it is much easier to port software than to integrate additional hardware into vehicles that
have limited SWAP as it is.

4) I'd focus primarily on urban, cityscape environments. If it can work on off-road terrains - great! But, the focus should be on urban areas.
Q: What is the definition of an "obstacle"? Do obstacles have any pattern that can be detected or it could be anything like a person, tree, vehicle, building, etc. Also, do you have any sample video from the monocular camera?
A: 1) From our perspective, an obstacle is anything that can impede a
vehicle's movement through an environment. We will focus on an urban
cityscape environment - but beyond that, there are no specific patterns
or signatures that are any more or less valuable to us. I'd focus on things like: the size of the obstacle; is it on the road in front of you?; is it heading toward you - or, is it stationary?; etc.

At this point, I don't have a sample video from a monocular camera. But, there may be some possibility to obtain or generate one before work begins on the project.
Q: 1. What are the maximum slew rates at which a camera may be virtually Panned, Tilted or Zoomed by an operator?

2. Is the system supposed to maintain track on objects removed from the virtual field of view by such operator actions?

3. Is the system supposed to detect objects of interest in regions of the physical field of view that are outside the current virtual field of view?

4. Is the primary goal: (a) automated cueuing of the operator by the system; (b) cueuing of the system by the operator's PTZ actions; (c) mere simultaneous PTZ viewing and detection/tracking; or (d) something else?
A: 1) I don't have a maximum slew rate in mind; but, it should be set to ensure effective usability from the Warfighter. It shouldn't be any less than 70 degrees for pan and 30 degrees for tilt.

2) Yes.

3) Yes.

4) I would say, "automated cuing of the operator by the system". When the system cues the operator, it could use virtual PTZ as a means to allow the operator to interrogate the object of interest. As such, there is both an automated obstacle detect / tracking component to this SBIR and an interface design component. i.e., how can we best use a
high-resolution sensor on a low-resolution display to achieve the objectives described in the SBIR.
Q: Will there be multiple users, that is, multiple simultaneous local ROIs?
A: I would target this solution for the operator of the vehicle. That said, I think a framework to transmit useful information from the system to the vehicle commander and rear occupants would be useful.

Also - multiple simultaneous ROIs is desired for this project, assuming you choose to design an interface through the recommended virtual PTZ. Note that if other kinds of interfaces could be used to better assist the user as described in the Topic, I would be open to that. That said, one benefit of using a single high-resolution camera to detect and track obstacles (rather than a camera on a mechanical PTZ) is that we can track multiple objects simultaneously and present them all to the user for further interrogation. So in that sense, multiple ROIs would allow
the operator to analyze multiple obstacles; and, we would therefore like that functionality.
Q: Is the report which describes the intended implementation geared towards the actual development of the system in phase?
A: I don't exactly understand the question - but, the report at the end of Phase I should describe the intended implementation in Phase II.
Q: What is expected in the risk mitigation matrix?
A: Essentially, I want to know about the potential difficulties or vulnerabilities of the proposed system ahead of time. i.e., have certain portions of your solution never been done before?, what is the likelihood of failure for critical portions of the system based on a reasonable assessment of the solution?, what are the probable impacts of failure?, etc.

I couldn't find an Army-specific example; but, the FAA uses the same Risk Mitigation matrix that we do - and, I think it'd be a good example to work from:

http://www.faa.gov/about/office_org/headquarters_offices/ato/service_uni
ts/operations/isse/items/e-Dev-Prem-Vul-Risk-assessment.cfm

Q: Is this solicitation permitted to have a phase 1 option?
A: Yes - a Phase I Option is permitted for this solicitation. Definitely include a Phase I Option in your proposal.
Q: What can be considered as viable work to be included in the phase 1 option of the solicitation?
A: The Phase I Option is meant to provide a bridge between Phase I and Phase II. It is worth $50K; and, it would last approximately 4 months. It can include any work that moves the project forward during the contract negotiation process.

For the purposes of this project, I'd say additional simulation would be relevant Phase I Option work; and, the start of software development would be great. Note, the Phase I Option will not be awarded unless Phase II is approved for your proposal.

Q: Are you expecting to see any high level plans for the phase 2 portion of this solicitation?
If so what would like for us to address?
A: The only documentation that I expect from Phase II is a report that details the design and implementation of the system, a report that provides the technical specifications of the system, and a User's Guide to operate the system.
Q: Is there a need to provide full motion video (e.g., 30 fps) for all tracked targets in their respective virtual zoom windows, or would it be acceptable if tracked targets are updated at a slower rate (e.g., 3 fps)? This question seekss to understand whether fluid video motion is more important or whether an ability to track a larger number of simultaneous targets is more important.
A: I think there needs to be a balance between these issues. For instance, I don't think the number of targets to be tracked should be completely unbounded - first, this would increase the possibility of false alarms; and also, too large a number of tracked and displayed objects could render the system unusable. On the other hand, we expect the vehicle to move at speeds of up to sixty miles per hour; and as such, a 3 fps tracking system might be too slow - especially for objects near the vehicle.

It's probably a little frustrating to hear that "I want both"; but, I'd encourage you to look for ways to determine a "sweet spot" between these considerations. It's hard for me to say which is more important because they're both important. That said, I'd probably favor fluid video motion over tracking a larger number of targets - but, not by very much. Work for an effective balance between these considerations and do your best to justify the designs you propose.

As of midnight September 1, questions for solicitations SBIR 10.3 and STTR 10.B will no longer be accepted.

To read the solicitation for full proposal preparation and submission details click here.

Record: 27 of 367