TMCnet Feature Free eNews Subscription
May 08, 2013

HD Voice, Voice-as-Data Complement Each Other

By Doug Mohney, Contributing Editor

As the world moves to more voice-driven applications, such as speech-to-text for email and hands-free SMS, speech-driven IVR systems – c'mon, you've used them, you hate them just like I do – and super-search personal assistants like Siri, voice quality becomes much more important. HD voice and "Full HD voice" quality are needed to improve the data going into these apps. Otherwise, it's GIGO (garbage in, garbage out).



Technology to clean up voice input is already front and center for chip manufacturers, as cell phone manufacturers and automobile companies want clearer voice, with a number of companies added into the 2013 HD Voice Report. Dedicated DSP processing power in combination with specialized algorithms is used to take sound input from one or more microphones and then "scrub" it to remove background noise and echo. For in-car systems, voice filtering/highlight becomes more interesting due to all the background noise your car can generate when it starts, drives slowly and at speed, and if you roll the windows down.

Audience's (News - Alert) earSmart voice processors are being rolled into everything from smart phones and tablets to ultrabook computers, but vendors such as CEVA are cutting licensing deals for incorporating their DSPs into ARM (News - Alert)-based designs to reduce part count and help with battery life.

Other fine-tuning goes in at the network core, where voice input may be run against one or more customized libraries based upon geography, tailored for recognition of specific accents and phrases. There are many different dialects and unique phrases – just take the differences between the Queen's English, U.S. English, New Zealand and Australian usage. Multiply by the number of languages and different dialects around the globe and there's a definite need to have "clean" and clear voice input, be it from a cell phone or landline.

Real-time tasks in the cloud, such as personal assistants and voice ID, will require voice processing to happen immediately.   Other applications can have some delay without impacting user expectations for near-instant gratification, such as speech-to-text for email or hands-free SMS.   All processes need high quality voice in order to work effectively.

If people get past their hang-ups, voice recording has enormous potential for consumer and enterprise use. Google's (News - Alert) Keep note application transcribes speech to text before posting a note in the cloud, but it might make sense to keep the sound file for reference. For anyone keeping notes, be it a college student or a writer wanting to make sure he gets the quote right, being able to record and store would be a boon; the line blurs here between traditional phone calls and recording, but the technology is equally applicable.

In the business world, all calls should be recorded and transcribed, starting with conference calls as a priority, then moving to sales, customer service, and any sort of billing/accounting transactions. Large call centers already record and run voice analytics programs to spot marketing and business trends, but costs are coming down so that smaller businesses will be able to afford this as a cloud-based service in the future.

Finally, there's the Hypervoice movement, promoting the detailed indexing of voice conversations so anyone can rapidly go back through a mass of calls to mine and reference conversations in one big, big data base. I'm not sure if Hypervoice will be the end-state of voice recording and analytics, but given how young the concept is, I'm sure we'll be hearing a lot more about it in the future.




Edited by Alisen Downey
» More TMCnet Feature Articles
Get stories like this delivered straight to your inbox. [Free eNews Subscription]
SHARE THIS ARTICLE

LATEST TMCNET ARTICLES

» More TMCnet Feature Articles