SITIS Archives - Topic Details
Program:  SBIR
Topic Num:  AF071-041 (AirForce)
Title:  Rapid Development Techniques for Spoken Language Translation
Research & Technical Areas:  Information Systems, Human Systems

  Objective:  Develop techniques for spoken language translation that do not require substantial amounts of new training data when being adapted for use in a new language and/or domain.
  Description:  DoD personnel are operating all over the world with the Global War on Terror, humanitarian relief operations, and various coalition operations. Much of the information needed to effectively operate in these situations is found in foreign language speech; however, there is a critical lack of linguists to translate this speech. To help address this problem, the DoD has funded the development of various spoken language translators (SLTs); however, to date, these have been of limited utility. They are only available for a small number of languages and domains, and those that do exist have not performed as well as desired. Efforts are currently underway to improve the performance of these SLTs and to increase their support for additional languages and domains. However, the development process has been slow and costly as the standard method employed by developers to improve performance has been to collect, transcribe, and translate ever larger amounts of training data. While this methodology involves low technical risk, it will not be able to address rapid turn-around requirements in new languages and domains of interest to the DoD in a reasonable time frame and cost. Thus, proposals are sought for innovative techniques (i.e., algorithms) for spoken language translation that can meet the Phase II target of performing at least on par with standard techniques but that do not require substantial amounts of new training data when being adapted for use in a new language and/or domain. In particular, proposals should be for innovative techniques that could work with training data in a new language and domain consisting of ten hours or less of speech data and 20,000 words or less of text data. Proposals may focus on improving techniques in speech recognition or translation individually; however, novel ideas that would lead to more rapid development of both components are especially encouraged. Also, techniques that could leverage data already available in other languages and/or domains are especially encouraged. Note that this topic is in the Key Technology Areas of Human Systems (System Interfaces and Cognitive Processing) and Information Systems Technology (Knowledge and Information Management) with the Human Systems Technology Area being primary.

  PHASE I: Develop innovative techniques for spoken language translationand evaluate their performance for a single foreign language and a single domain. The evaluation should show the performance of the proposed techniques in two ways. The first way is relative to that of a standard technique using the same limited amount of training data as used to develop the proposed technique. The second way is relative to that of the standard technique, where the standard technique has been trained on a much larger amount of training data. Due to the short time period of Phase I, it is preferable that currently available databases be used in the evaluation.
  PHASE II: Further develop the proposed techniques and evaluate their performance for multiple languages and/or domains to show the generality of the techniques. The evaluations should follow the same format as described under the Phase I description but for the new languages and domains. Any databases collected for development and/or evaluation should be delivered to the contract sponsor.

  PHASE III DUAL USE APPLICATIONS: Military applications include: SLT for refugee processing, medical triage, force protection, and coalition C2; information retrieval from foreign audio sources; and computer-aided language learning. Commercial applications are similar to military applications but generally for different domains, such as: law enforcement, business, and travel.

  References:  1. Tanja Schultz and Alex Waibel, "Fast bootstrapping of LVCSR systems with multilingual phoneme sets," in Proceedings of Eurospeech'97, (Rhodes, Greece), 1997. 2. Bill Byrne, et al., Towards Language Independent Acoustic Modeling, Final Report of the 1999 Johns Hopkins University Language Engineering Workshop, (available at: http://www.clsp.jhu.edu/ws99/projects/asr/). 3. Alan Black, et al., “Rapid Development of Speech-to-Speech Translation Systems,” in Proceedings of ICSLP, (Denver CO), 2002.

Keywords:  spoken language translation, speech recognition, speech synthesis, machine translation, foreign language

Additional Information, Corrections, References, etc:
Ref #1: schultz_euro97_gp.pdf
Ref #1: schultz_euro97_gp.pdf
Ref #3: available at: http://www.cs.cmu.edu/~awb/papers/HLT2002/tongues/tongues.html
Ref #3: available at: http://www.cs.cmu.edu/~awb/papers/HLT2002/tongues/tongues.html

Questions and Answers:
Q: 1. Do you have target languages and domains for speech translation?

2. Is the goal of the SBIR to have other languages translated into English or is it to be Bi-Directional?

3. What level of translation depth do you need. Do you need language modeling in the target language or do you want a surface translation of the words with accurate syntax as a secondary goal?

4. Do you have sample corpora we can study to develop our proposal?

5. At the end of phase I, do you expect a working proto-type system/tool that translates speech from one language to another?
A: 1. Is your question really whether the language and domain chosen for Phase I matter? If so, then the short answer is: If you need us to provide data to you, then the language and domain does matter (but it's negotiable); otherwise, it does not matter.

It is clearly the case that some languages and domains are of greater interest to the Air Force and the Department of Defense today than others; however, tomorrow, new languages and domains may be of interest.
For this reason, this topic is not intended to be a procurement of a working end-to-end system for a particular customer in any specific language or domain. Rather, it is intended to foster innovative research into techniques that allow components for spoken language translation to be rapidly developed in many languages and domains that might be of interest to various customers. I'm reluctant to specify any particular language(s) and/or domain(s) as respondents might feel pressured to address those, even though they might be in a better position to address other languages or domains due to factors such as the data that they have on hand. You should pick a language and domain for Phase I that you think you can address with your technique in order to show that it works. There is time in Phase II to address other languages and domains.

The availability of data is an important factor to consider in your choice of language (and domain), and the following are some issues to consider. If you need us to provide the data, and you pick a language for which we don't already have data that we can release, there is probably no way we could get some to you in time for you to complete a Phase I contract. (Note: Should you receive a Phase I award and need us to provide data to you, you will have to remove it from your systems at the completion of the Phase I contract.) Also, it would be unwise to propose to collect a speech database of your own in Phase I; generally, such a database collection requires an approved human subjects protocol, and the time required for this approval would negatively impact your ability to complete the Phase I work. A possibility (but by no means a
requirement) is to consider doing an evaluation track in the 2007 International Workshop on Spoken Language Translation (IWSLT: see http://www.slt.atr.jp/IWSLT2006/). Finally, we have been members of the Linguistic Data Consortium (LDC: see http://www.ldc.upenn.edu/) for a number of years, so we have the vast majority of the databases that they have available. If you were to propose to purchase an LDC database on this contract, you would have to turn it over to us at the end of the contract. As might be expected, we are not very interested in receiving yet another copy of a database that we already have; we'd much rather see money spent on research. So the language you choose does matter to us if you need us to provide the data, but again, it's negotiable.

2. Both are of interest: Foreign language to English for things like broadcast news translation and bi-directional translation for speech-to-speech (dialogue) type applications. Which type of application you focus on is up to you. We are especially interested in development techniques that can be applied to either type of application.

3. In general, we are more interested in techniques that would produce accurate translations, both semantically and syntactically, not just simple word-for-word substitutions. To what level you need to go to do this is up to you. Whether or not language modeling in the target language would be needed is a function of the particular type of problem you intend to address and the type of techniques you intend to investigate. Thus, the need for language models is up to you as well.

4. No, we do not have corpora that we can make available for you to study in order to develop your proposal.

5. Not necessarily. However, it's important to know that generally there are fewer Phase II awards than there are topics. Thus, your goal in Phase I should be to make a strong case for why you should be given a Phase II award. The closer you can come to a spoken language translator, necessary component, or development tool developed using your techniques, the better the case to be made for a Phase II award.
Also, the better the performance is, the better the case for a Phase II award. You might consider using open source or other available tools for components that you are not focusing on. Note that it is better to focus on improving the performance of the algorithms or tools rather than focusing on shrinking it/them to fit some resource-constrained computing platform.
Q: 1. Do you have target languages and domains for speech translation?

2. Is the goal of the SBIR to have other languages translated into English or is it to be Bi-Directional?

3. What level of translation depth do you need. Do you need language modeling in the target language or do you want a surface translation of the words with accurate syntax as a secondary goal?

4. Do you have sample corpora we can study to develop our proposal?

5. At the end of phase I, do you expect a working proto-type system/tool that translates speech from one language to another?
A: 1. Is your question really whether the language and domain chosen for Phase I matter? If so, then the short answer is: If you need us to provide data to you, then the language and domain does matter (but it's negotiable); otherwise, it does not matter.

It is clearly the case that some languages and domains are of greater interest to the Air Force and the Department of Defense today than others; however, tomorrow, new languages and domains may be of interest.
For this reason, this topic is not intended to be a procurement of a working end-to-end system for a particular customer in any specific language or domain. Rather, it is intended to foster innovative research into techniques that allow components for spoken language translation to be rapidly developed in many languages and domains that might be of interest to various customers. I'm reluctant to specify any particular language(s) and/or domain(s) as respondents might feel pressured to address those, even though they might be in a better position to address other languages or domains due to factors such as the data that they have on hand. You should pick a language and domain for Phase I that you think you can address with your technique in order to show that it works. There is time in Phase II to address other languages and domains.

The availability of data is an important factor to consider in your choice of language (and domain), and the following are some issues to consider. If you need us to provide the data, and you pick a language for which we don't already have data that we can release, there is probably no way we could get some to you in time for you to complete a Phase I contract. (Note: Should you receive a Phase I award and need us to provide data to you, you will have to remove it from your systems at the completion of the Phase I contract.) Also, it would be unwise to propose to collect a speech database of your own in Phase I; generally, such a database collection requires an approved human subjects protocol, and the time required for this approval would negatively impact your ability to complete the Phase I work. A possibility (but by no means a
requirement) is to consider doing an evaluation track in the 2007 International Workshop on Spoken Language Translation (IWSLT: see http://www.slt.atr.jp/IWSLT2006/). Finally, we have been members of the Linguistic Data Consortium (LDC: see http://www.ldc.upenn.edu/) for a number of years, so we have the vast majority of the databases that they have available. If you were to propose to purchase an LDC database on this contract, you would have to turn it over to us at the end of the contract. As might be expected, we are not very interested in receiving yet another copy of a database that we already have; we'd much rather see money spent on research. So the language you choose does matter to us if you need us to provide the data, but again, it's negotiable.

2. Both are of interest: Foreign language to English for things like broadcast news translation and bi-directional translation for speech-to-speech (dialogue) type applications. Which type of application you focus on is up to you. We are especially interested in development techniques that can be applied to either type of application.

3. In general, we are more interested in techniques that would produce accurate translations, both semantically and syntactically, not just simple word-for-word substitutions. To what level you need to go to do this is up to you. Whether or not language modeling in the target language would be needed is a function of the particular type of problem you intend to address and the type of techniques you intend to investigate. Thus, the need for language models is up to you as well.

4. No, we do not have corpora that we can make available for you to study in order to develop your proposal.

5. Not necessarily. However, it's important to know that generally there are fewer Phase II awards than there are topics. Thus, your goal in Phase I should be to make a strong case for why you should be given a Phase II award. The closer you can come to a spoken language translator, necessary component, or development tool developed using your techniques, the better the case to be made for a Phase II award.
Also, the better the performance is, the better the case for a Phase II award. You might consider using open source or other available tools for components that you are not focusing on. Note that it is better to focus on improving the performance of the algorithms or tools rather than focusing on shrinking it/them to fit some resource-constrained computing platform.

Record: of