SITIS Archives - Topic Details
Program:  SBIR
Topic Num:  A10-090 (Army)
Title:  Visualization Tools for Causal Data Mining
Research & Technical Areas:  Information Systems

Acquisition Program:  
  Objective:  Develop an interrelated extensible tool suite that is highly graphical and permits users to better understand causal relationships between large data sets. This toolkit should function by allowing a user to discover relationships between seemingly unrelated events in order to solve problems.
  Description:  A number of Army problems center around the need to discover patterns in large quantities of data that act as causal markers for subsequent events. Attempting to find relationships in intelligence data is typical of this type of data searching. For example, an Intelligence Officer may not really know specifically what he is looking for, but he is looking for causal patterns in what appears to him to be unrelated information. This is also the same problem maintenance engineers have when searching vehicle health information when performing Condition Based Maintenance. Data mining, a tactic frequently used in retail and other commercial areas, involves analyzing data from different perspectives and discovering information on a topic through the relationships found between the data. In the commercial sector, this type of system could be used in order to discover customer patterns. With this information, the company would then be able to advertise or sell their product in the most effective way targeted toward their customers’ habits. These same concepts of data mining are applicable for Army use in cases such as those previously mentioned. For this SBIR, the contractor will develop a graphical tool suite that allows users to more effectively find and understand causal relationships between large data sets. The solution should be interoperable with existing Army C4ISR (Command, Control, Communications, Computers, Intelligence, Surveillance and Reconnaissance) systems, such as TIGR (Tactical Ground Reporting), CPOF (Command Post of the Future), and DCGS (Distributed Common Ground System). The contractor should be able to demonstrate system effectiveness in the two main areas of Intel and Condition Based Maintenance. In Intel applications, the system should allow the user to graphically “see” causal patterns in intelligence data leading to specific outcomes or events, enabling them to prepare for the events when they notice those patterns occurring at later points. The system’s application to Condition Based Maintenance would involve allowing the user to find relationships and patterns in data leading up to a breakdown or malfunction in some equipment, thereby enabling the system user to recognize the precursors to such an event occurring on the actual equipment. This then allows maintenance engineers to provide needed maintenance to such equipment before it breaks down in the future, which can be disastrous in the field. The contractor will need to ensure that algorithms are used which provide predictive maintenance capabilities for smaller populations of equipment (under a few thousand) that contain the same functionality as those used commercially in the auto industry which normally require larger populations to function effectively. In addition to tools with Intelligence and Condition Based Maintenance applications, the Army will also accept proposals that include other applicable areas for this type of causal data mining tool.

  PHASE I: Phase I of this project would involve extensive research of work already conducted in this area, followed by an analysis of alternatives and documentation of the strengths and weaknesses of the possible approaches. An approach will be selected, and the system will be designed accordingly. This design must include tools to conduct the data mining and pattern finding, as well as a graphical interface to allow users to easily see the relationships between the data. The deliverables from this phase will include the analysis of alternatives and the system design.
  PHASE II: Phase II activities would include building and demonstrating the system which is able to detect patterns and relationships between seemingly unrelated sets of data. The demonstration must show how the system will function in actual Army-based situations using relevant and representative data. To accomplish this, the contractor must propose and demonstrate use of the system using a realistic Army use case. The contractor must work with TRADOC (U.S. Army Training and Doctrine Command) and relevant Project Managers and Subject Matter Experts to design the use case. The scenario must demonstrate the system’s capability to find relationships and patterns between large sets of data, consistently leading up to the occurrence of a particular event. The solution will also be tested in a live experiment, for example at PM C4ISR OTM ("On The Move"). The deliverables for this phase will include the functional tool suite, the use case, and the demonstration, as well as the runtime and design documentation.

  PHASE III: Phase III involves the commercialization and deployment of the system. The ability to find patterns and relationships between large sets of data, and exhibit these relationships in a way that is easy for a user to see is applicable to and helpful for gleaning information from intelligence data and performing Condition Based Maintenance on other Army equipment. The concepts used in commercial data mining can be applied toward this effort. Data mining is frequently used in the area of retail sales to predict consumer responsiveness to a product or advertising scheme. The same principles used for pattern finding in these areas can be applied to the Army’s Intel and maintenance needs. The proposed system, however, must be able to take smaller populations of equipment into account (as similar commercial systems often require a larger sample) in order to meet Army-specific predictive maintenance needs.

  References:  ) Thearling, Kurt. “An Introduction to Data Mining” 2) Silverstein, Craig; Brin, Sergey; Motwani, Rajeev; Ullman, Jeff. (1998) “Scalable Techniques for Mining Causal Structures” 3) “Tactical Ground Reporting (TIGR)” 4) “Command Post of the Future”, General Dynamics (2006) 5) “Distributed Common Ground System – Army (DCGS-A) “, General Dynamics (2008) 6) “PM C4ISR On-The-Move”

Keywords:  Data Mining, Pattern Finding, Condition Based Maintenance

Additional Information, Corrections, References, etc:
Ref #2: Ref. 2: Weblink corrected 5/14/10.
Ref #7: "Fixing Intel: A Blueprint for Making Intelligence Relevant in Afghanistan", MG Flynn

Questions and Answers:
Q: Should we assume that the data is already collected and structured in a database, or is this a requirement for the system as well?
A: The data that will be utilized for this effort will consist of both structured and unstructured types. We are not asking that the proposers make sense of and classify the entire set of unstructured data, however pulling some information out of those sets would probably be helpful in working towards a solution.

Q: With respect to target platforms for the client application, is there a type and range handheld devices to which the client app must conform to?
A: We do not have a single specific type of platform in mind for this application. That said, a more multi-platform or -purpose solution would probably prove to be more useful than one with a more limited scope.
Q: 1. Is the visualization one of the key focus' of the topic ?
2. As the visualization will depend on the data used, can you elaborate more on the type of data that may be used ?
3. For example, is image sequence a valid example of the applicable datasets ?
A: 1. Enhancing the ability of users to understand relationships is key. So, yes, a visual component must be included as a key element to any solution.

2. Solutions that can work on a variety of data are preferred to those that are specifically built towards one data product/standard. This will both help the proposer when working to commercialize the solution and our team when seeking transition partners. Unfortunately, we cannot provided specific information to any team prior to being under contract. A simple web search on the systems we've listed (TIGR, CPOF, DCGS, etc.) will give you a better feel for what types of data we are interested in working with.

3. I'm a bit confused, so the answer is probably no. Are you referring to analysis of raw imagery?

Q: For Phase I execution, would the Army provide a corpus of relevant data for the development of the intended causal relationships?
A: That data is not public releasable. Once the winning team is under contract, we will work with that organization and PMs/PORs to provide relevant data.
Q: TIGR appears to be developed in C#/Visual C#. Since the expectation for the visual discovery tool suite is to integrate with TIGR, will solutions implemented using the LAMP stack be entertained?
A: As long as the systems are interoperable with (at least) the systems mentioned in the topic, there is no specific implementation method that we are looking for; that would be up to the developers.
Q: 1) Condition Based Maintenance is mentioned multiple times in the solicitation. Is it an implied requirement that a successful phase I proposal MUST show a capability in CBM?

2) The Topic and references seem to imply that visualization/analysis toolkit capabilities that can be utilized by the War-fighter all the way to Command & Control are of high value. Is this true?

3) What weight/effect would the ability to analyze remotely and display locally (analytics done at C&C, results displayed on the battlefield device) be in the evaluation of a proposal?
A: 1) Intel and Condition Based Maintenance are the two examples of applications we gave in the topic. While this by no means signals a limitation to these two areas, a promising solution would demonstrate capability in at least these areas.

2) Visualization and analysis of causal relationships are integral parts of this topic. A tool that could be utilized by a broader range of users would be more useful than one restricted to those of a certain area or skillset.

3) The proposal needs to cover at a minimum the requirements mentioned in the topic. We do not have a specific implementation for the system in mind, the inclusion of other capabilities would be up to you as the developer.
Q: 1. Considering the emphasis on battalion-scale intel collection in Ref 7, would it be out of scope to include handheld collection/display devices which would aid the data mining/visualization task?

2. Do you see this application working into the field or is it intended for Intelligence Centers?
A: 1) For this topic, we are seeking a solution with a wide range of applicability. In keeping with this, there is not really a limit on the type of platforms to be designed for.

2) Again, a more general-purpose tool would be considered more useful than one slotted for a single, specific purpose. Allowing for a broader user or location base would be more helpful than limiting the proposed application area.
Q: Is data typically logged for successful missions/non-events or only those resulting in a "failure"?
A: Not all non-events can fit clearly into the categories of successful or non-successful. Therefore, we can assume that the logging of data will not be determined by such a status.
Q: 1. Are you looking for solutions that specifically highlight relationships between data sets of different format (e.g. imagery and text), relationships within the same dataset (e.g. all within a TiGR database of geolocated events), or both? Should there be a greater focus on one or the other?

2. Is the interest for this system to operate in real-time or on historical data, or both?

3. Is there interest in spatial, temporal relationships if they are "causal", meaning is there interest in visualization methods that show temporal relationships and/or spatial relationships between events/items that are linked causally?
A: 1. The scope of this project is not constrained to finding relationships in any subset of data; we would like to be able to find relationships between any sort of data ingested by the system.

2. The system should be able to work both in real-time and with historical data.

3. Yes, this solution would be intended as a way to find and view any sort of causal relationships between source data.
Q: Is there a particular emphasis on "causal" relationships, or is the desire to find correlations in general? There is debate in data mining as to whether you can define anything as causal vs. statistically significant. Pearl basis much of his Bayesian work on the notion of causality (although this is also debated), however there are inherent limitations to this approach.
A: In the topic write-up, we discussed the area of Condition-Based Maintenance as one possible application area for the desired tool. The principles behind this are to discover particular "events" that lead up to a significant event, such as a system failure, in order to avoid or put off such occurrences in the future. So the focus of this project can be considered to be on finding any sort of relationships that consistently seem to imply causality, though we are not stuck on semantics.
Q: 1. The topic synopsis refers to smaller populations of equipment (a few thousand) for condition based maintenance. Is there information you can share on the Intel data set regarding the number of data points / size?

2. Is there the potential (based on research and emerging new industry standards) to suggest additions or changes to the type of data that may be captured e.g. new variables, frequency of data capture etc or must the vendor work only with what is already currently available.

3. Can you provide any additional information on the nature of the dependent/independent variables that are currently captured. Are they continuous, discrete, ordinal, etc?

Thank you for your service and time.
A: 1. While we cannot give a definite size of the likely Intel datasets, it is reasonable to assume that the amount of Intel information will probably be significantly larger than the amount that would be provided for condition-based maintenance.

2. As mentioned in the solicitation, the solution should be able to work with data received from existing army systems. Therefore, the implementation need not be concerned with how to collect the data, but should instead focus on how to find relationships among previously discovered information.

3. The nature of the information that will be captured is highly varied. The desired solution should be able to take all different types into account in order to provide a more general solution that will be able to work with many different sources of data.

Record: of