TMCnet - World's Largest Communications and Technology Community

February 27, 2006

Finally, A Listener

TMCnet News

(Design News Via Thomson Dialog NewsEdge) Voice recognition technology is broadening its reach, addressing the challenges of the harsh automotive environment with systems that attempt to understand what drivers want. As vendors focus on this high-volume market, they are also making voice recognition systems that are more conversational, understanding meaning instead of requiring specific commands.

Voice is considered an important control option for vehicles as drivers bring in more portable consumer products, adding to the driver distraction problem. As legislators mandate hands-free phones, reliable voice-controlled dialing is viewed as a near-necessity for controlling their electronic hardware.

Many observers feel that voice and Bluetooth wireless connections will be nearly inseparable in cars. Chrysler's uConnect Bluetooth link is already merging the two. "uConnect utilizes voice recognition technology to allow the customers' hands to remain on the steering wheel," said Vance Peacock, senior manager vehicle entertainment and communications at the Chrysler Group.

Voice makes it simpler to dial phone numbers, provide addresses for navigation systems, and consumers who want to pick from thousands of MP3 files are also expected to prefer voice input. The push to address the auto industry comes as voice technology is seeing solid growth.

Nuance (News - Alert) Communications, Inc. of Burlington, MA, formerly known as ScanSoft Inc., notes that its sales are increasing at greater than 30 percent, predicting a rise to $325 million in 2006. IBM (News - Alert), rarely a company to focus on small niches, is deploying voice recognition in many of its business systems and is expanding into automotive. In 2004, it signed deals to provide software for GM's OnStar and Honda's navigation system.

As recognition moves into the mainstream, vendors are beginning to look at the next step for the technology. As they develop products that are now being tested for the long auto industry development cycle, the focus is shifting to conversational techniques that understand common phrases instead of requiring a rigid syntax or word structure.

"Speech technology is finally coming of age. The current trend is toward conversational style commands or natural language understanding," says Mark Kady, software engineering group manager at Delphi Automotive of Kokomo, IN.

He notes that vocabulary capacity for automotive embedded systems has grown exponentially from 2,000 words in 2003 to 30,000 words in 2004 and more than 100,000 words in 2005. This capability permits voice destination input for navigation and media player song selection for hard drive MP3 systems with more than 10,000 songs.

As vocabularies grow, developers are also striving to understand not merely words, but what users want. The transition to conversational speech is prompting some companies to alter the way they benchmark their systems. The existing approach of counting recognized words isn't very effective if the overall meaning of the combined words is not understood. "We measure ourselves not around a recognition rate, but on a task completion rate," says Tom Freeman, marketing vice president at Voice Box Technologies. The young company from Kirkland, WA, plans to begin shipping its product next year.

These systems won't be attempting to decode conversations between passengers, but will sit idle until drivers wake them up. "You can use a push to talk button or use an attention name. The system ignores the conversation until you say 'computer' or whatever word you pick," Freeman said.

Though conversational systems are the coming wave, there will be a market for simpler programs that use specific instruction words, for some time. "Command and control systems will still be available in the lower end products for functions such as phone dialing and other cost sensitive systems," Kady says.

Hardware issuesThough many users don't like the limitations of limited command structures, many of them adapt once they realize the added complexity of conversational speech brings higher pricing.

Pricing may keep automakers from rapidly moving conversational voice control into mid-range vehicles. "People talk about conversational speech, but that requires high speed DSPs and more memory. It's very expensive. It will definitely come, but not in the near future," says Stephan Thaler, director of marketing for National Semiconductor's Device Connectivity Division.

That processing capability already exists in today's luxury cars. "Conversational speech input will be available on higher-end automotive systems where there are more than 100 MIPS available for voice recognition," Kady says.

Today's chip technologies run software fast enough to provide what seems like a real-time response for the drivers of these premium vehicles. "We can process an utterance in about 1.2 times the speed of speech, which is fast enough to do a lot of processing before the person realizes there's any delay," Freeman said.

But there are still reliability concerns. Some observers note that voice recognition systems used for phone services, such as telephone information or credit card services, operate in quiet environments but still have failures. "Voice recognition hasn't shown enough reliability. Even in quiet environments, there are errors," says Tarun Gupta, product & innovation manager for Siemens (News - Alert) VDO's North American Infotainment Systems Division.

Beyond the noisy environment, another change from office-based voice is that the automotive industry requires small memory sizes, often just a few Mbytes, compared to the almost unlimited capacities for office systems. Microcontrollers rated for automotive environments typically have far lower speeds than PC-based systems. Today's auto industry controllers typically have 60-120 MIPS, well below the capability of low-end PCs.

Additional technologiesThe global market poses additional challenges for developers, who must alter programs for each language. That's a particularly difficult challenge for those who focus on conversational speech. "Each language is a project. Most of what our technology is about is semantics and word order," Freeman said.

Then there's the task of determining which systems require voice control. Early voice recognition systems for autos offered radio features and even some air control features. But developers soon realized some features like setting a heater fan level are easily and accurately accessed by turning one knob or pushing one button. Voice offers little advantage, especially when activating voice recognition can also require pushing a button.

Those who focus on the automotive market also have to make long-term plans, which doesn't always fit in with a small company's immediate needs. "Lead times are very long for automakers. Selling aftermarket through Best Buy is much more exciting," Freeman said.

In the automotive environment, existing voice recognition must be augmented. Voice recognition can work well in offices or homes, but it's far harder to understand utterances of a distracted driver on a bumpy road during a rainstorm.

Suppliers note that voice-activated phones often have problems in noisy vehicles. "The voice recognition in cell phones works in quiet conditions, but in a car, you generally need a hands free kit that has noise suppression and echo canceling," says Tom Houy, a vice president at CSR plc.

The solutions go beyond the voice recognition software. Operating system vendors are working closely with companies that focus on acoustic echo canceling and noise suppression to make sure these programs operate well in vehicles. "From the operating system side, there's a lot involved in echo canceling and noise suppression. Once that is addressed, speech engines work well in the car," said Andrew Poliak, automotive business manager at QNX Software Systems.

When the many functions controlled by voice are integrated within the infotainment system, vendors must also make sure that a problem with one application doesn't bring down other programs. That's also being addressed by RTOS providers. "In our operating system, applications never share memory, so we can guarantee there's no way a program can get outside of its padded cell," says Dan Mender, business development director at Green Hills Software Inc.

For more info, check out the links below:/

Voice Box:




Siemens VDO:


Green Hills:

Speech Recognition

Technology Marketing Corporation

2 Trap Falls Road Suite 106, Shelton, CT 06484 USA
Ph: +1-203-852-6800, 800-243-6002

General comments:
Comments about this site:


© 2021 Technology Marketing Corporation. All rights reserved | Privacy Policy