Project BriefOpen Competition 5 - Information TechnologyA Phrase-Based Statistical Approach to Understanding and Translating Natural LanguageDevelop and demonstrate technologies that will enable accurate machine understanding of human languages by isolating statistically significant phrases and mapping equivalencies in their usage. Sponsor: Sehda, Inc.1040 Noel DriveSuite 100 Menlo Park, CA 94025
Machines are currently unable to fully "understand" human language. Highly restricted vocabularies of individual words may be recognized in a specific context, but overall the words are not understood in the way humans do. New methods are needed to resolve semantic, syntactic, and even pragmatic ambiguities. Conventional approaches focusing on keywords, grammar rules, and simple probabilistic modeling appear to have reached their limits. In a two-year project, Sehda plans to develop and demonstrate novel technologies, usable by anyone with or without specialized linguistic knowledge, to automate the understanding of text and spontaneous conversation by mapping equivalencies in usage of phrases instead of focusing on the meaning of individual words. Sehda's approach is based on statistical modeling of human conversations. Research has shown that children learn their native language phrase by phrase rather than word by word; preliminary tests suggest this concept has promise for machine understanding as well as machine translation. The company's goal is to construct a network of equivalent phrases of conversational English using algorithmic procedures that automatically extract a significant number of phrases from text and organize them into semantically and syntactically equivalent classes. The same step will be taken for either French or Spanish. A mapping between the two languages will be used to produce valid translations. The overall challenge is to build and validate a viable system despite the very large scale of the challenge posed by natural usages, and to verify the heuristics for measuring the closeness between phrase meanings. ATP support is needed because Sehda is a small company and the project is too risky for external private investors, who are wary of the limited success of other natural language translation systems. If successfully developed and deployed, the new technology would provide the core language engine for a variety of applications in addition to language translation, such as speech recognition and data-mining. Speech interfaces could be built quickly and inexpensively, companies could translate product information easily, and customer service costs could be reduced through the use of automatic question-answering systems. The technology would reduce the cost of developing a natural speech recognition system by an estimated 50 to 80 percent.
|
|
ATP website comments: webmaster-atp@nist.gov
Privacy Statement / Security Notice • NIST Disclaimer • NIST Information Quality Standards NIST is an agency of the U.S. Commerce Department |