TMCnet News

A universal translator in your pocket
[April 28, 2006]

A universal translator in your pocket


(New Scientist Via Thomson Dialog NewsEdge)YOU are in a foreign country, looking for an art gallery, so you stop a passer-by in the street to ask for directions. Instead of talking loudly and slowly in your own language in the vain hope of making yourself understood, or attempting to communicate using an improvised form of sign language, you simply speak into your handheld computer, which instantly translates what you are saying. Then it translates the person's response for you.



That is the vision of computer translation researchers, who have been struggling to develop a real-time, easy-to-use system capable of accurately turning speech into other languages and back again. The technology has so far fallen far short of expectations.

However, in the next couple of years we can expect an explosion in translation technologies, including camera cellphones that can capture text on road signs, say, and translate them into another language, and real-time automatic dubbing to enable anyone to watch any movie or TV programme in their native tongue. Ultimately, we may even see an electronic version of the fictional Babel Fish the universal translator in Douglas Adams's The Hitchhiker's Guide to the Galaxy
in the form of a device like a hearing aid that whispers a translation in your ear as someone speaks to you in another language.


In the past, translation systems have tended to disappoint, as anyone who has ever used an online service will testify. Although useful for simple words or phrases, web translators tend to fall down when it comes to anything remotely complex. In one famous example, a standard online translator transformed an Arabic sentence meaning "The White House confirmed the existence of a new Bin Laden tape," into "Alpine white new presence tape registered for coffee confirms Laden."

This is because traditional translation programmes tend to break languages down into a set of rules covering their basic grammar and word order. Linguists identify the grammatical rules for each language, and a programmer laboriously hand-codes the rules into the software. When the software scans a new piece of text in a particular language, it assumes it will conform to these rules and that words will always be in a certain order within each sentence. This means the programmes are easily confused by exceptions to grammatical rules, as well as by bad grammar.

Researchers are moving away from rule-based systems towards software that is trained to learn new languages for itself. This change should make such systems far more useful. "After decades of stagnation, something major is happening to create the technologies we have always dreamed about," says Alex Waibel, director of the International Centre for Advanced Communication Technology, based jointly at the University of Karlsruhe, Germany, and Carnegie Mellon University in Pittsburgh, Pennsylvania.

The software is trained on huge amounts of text, and learns to identify different types of word, such as nouns, by their position within sentences. It also uses word groupings to help it work out the meaning of a word, for example, "apple" is often likely to be used alongside words such as "juicy" and "fruit". This approach means the software learns the language as it is actually used on a day-to-day basis, rather than how grammar dictates it should be used. It also means the software can learn and translate languages whose rules we don't know. Researchers at the University of Southern California's Information Sciences Institute in Los Angeles, for example, are testing translation software on a mysterious 15th-century manuscript, written using an unknown alphabet .

Poring over ancient texts is one thing, but if translation systems are ultimately to be used more widely, they will need to be capable of translating spoken conversations in real-time. The US military is testing one system capable of doing just that. Originally designed for translating between English and Iraqi, the system was created by researchers at Carnegie Mellon University as part of the Defense Advanced Research Projects Agency's TransTec (Translation Systems for Tactical Use) programme. It consists of a handheld computer fitted with speech recognition, translation and voice synthesis software.

Unlike previous translation gadgets, the TransTec device can translate more than just a few hundred phrases stored in its memory, as it is based on language-learning software. This could make a huge difference in peacekeeping missions, where previous devices have precluded any kind of dialogue, as they could not recognise spoken words and translate them back into English, so the people you were speaking to could answer only with a nod or shake of the head. "Now they can answer back, describing things," says Alan Black, one of the researchers involved. And if you want to be sure it has translated what you have said correctly, you simply translate it back into English and see how closely it matches your original sentence, says Black.

However, the handheld device's limited processing power means the system can be slow, and cannot translate all phrases. "You may have to repeat yourself or change the question," he says. This is because the accuracy of such translation programmes depends on the amount of text they are trained on, but those fed on large amounts of text need considerable computing power to run. To run the system on a handheld device, the team had to severely limit its vocabulary, meaning it can only translate conversations on particular topics.

As the processing power of handheld computers increases, such systems should improve. Alternatively, the software could be stored on a central server rather than the handheld device. When you want to talk to someone in another language, you would simply connect to the server wirelessly, says Waibel.

Indeed, programmes capable of translating speech on any topic are already being developed to run on more powerful computers. And researchers are tapping into a variety of sources to feed their voracious programs with sample text. In Germany, Waibel's team has developed a system to translate lectures on any subject in real time. The system, called Lecture Translator, was trained on speeches taken from sessions of the European Parliament. This is ideal, says Waibel, as the speakers tend to talk clearly on a broad range of subjects. Meanwhile, Franz Och, Google's machine language researcher, said recently that the search engine's huge repository of text is likely to play a key role in creating better translation systems.

Europe's answer to Google, a search engine being developed under the codename Quaero, is planning to offer a range of translation services. Surfers will be able to carry out text and audio searches, and then translate the files into their language of choice. They will also be able to translate live audio feeds, or enter a search term in one language and have the software search for references in several languages at once.

So how accurate will such systems become? "There is no reason why they should not become as good, if not better, than humans," says Waibel.

From ancient symbols to dolphin whistlesDuncan Graham-Rowe Translation programs that learn new languages are being tested out on everything from ancient manuscripts to dolphin sounds.

Kevin Knight, a computer language researcher at the University of Southern California, Los Angeles, is studying an ancient scripture known as the Voynich manuscript using software he has trained to spot similarities between languages. The 15th-century manuscript is about 20,000 words long and written in an unknown alphabet. Cryptographers have studied it intensely but so far no one has been able to translate it. "A lot of people thought it was a hoax, but it has such regular patterns to it that it would have to be a very elaborate hoax if it were one," says Knight.

One leading theory is that it was written in an ancient form of a familiar language. For example, some researchers have suggested it is a form of ancient Ukrainian that lacks vowels. But using his software, Knight has been able to show that the order and frequency of symbols in the text do not match any of those used in Ukrainian.

Meanwhile, a team at Carnegie Mellon University in Pittsburgh, led by Alan Black, is applying the same techniques to dolphin noises. It is not as daft as it sounds: recent research suggests that humpback whales have their own form of syntax (Journal of the Acoustical Society of America
DOI: 10.1121/1.2161827).

The team is picking up the high frequency whistles made by a group of Atlantic spotted dolphins off the coast of the Bahamas, and then applying software to the whistles to identify the equivalent of dolphin words. So far though, they have only managed to detect the signature whistles dolphins regularly use to identify themselves to each other.

[ Back To TMCnet.com's Homepage ]