TMCnet - World's Largest Communications and Technology Community



August 2008 | Volume 27 / Number 3
CALL CENTER Technology

Speech Rec: The New Leader of Automated Voice?

Speech recognition applications are not such revolutionizing IVR, defined for this article as enabling dual-tone multiplex frequency (DTMF) or TouchTone™ interactions, but instead appears to be supplanting it as the key means of automated voice interaction.

That victory, which may be in sight, sets the stage for integrating voice with web, e-mail, and SMS to provide a unified user-friendly automated solution that will reduce agent engagement time and, for an increasing number but far from all interactions, eliminate agent involvement.

The value proposition of speech rec is that offers superior usability compared with DTMF for much lower cost than live agents: typically 50 cents compared with $5-$9 per transaction.

Implemented right, speech rec, along with the text and web applications can cut costs and maintain if not increase customer service satisfaction and retention.

Yet while speech rec technology has come a long way, it still places significantly lower than live agents in customer satisfaction surveys, though higher than DTMF.

"Speech rec has a customer satisfaction ranking of about 4.5 on a 10 point scale while DTMF IVR is typically between 1 and 2, points out Bob Lyons, General Manager and Vice President, Avaya's customer service business. "In contrast, live agent interactions score a 7 on average. The real opportunity is in finding a way to get speech interactions to begin approaching scores seen by live interactions."

To achieve that goal will, however, require resources. Speech rec software and integration can costs upward of several hundred thousand dollars, can take nine to 12 months to implement followed by one year of operation before achieving return on investment (ROI).

It will also require speech applications to be more user-responsive and integrated with multiple sources of rich content such as web, voice and user-specific data. These sources are typically in application silos, which will require large investments to integrate them so that they can present the data in the appropriate context, at the right time.

"The question becomes can the technology reach a level where customer satisfaction is high enough to offset the investments," says Lyons.

Speech technology developments

Speech recognition technology is slowly but inexorably moving in that direction. Aaron Fisher, Director, Professional Services, West Interactive has seen marked improvements in the overall performance of speech recognition software.

Applications are now better able to recognize callers with accents. The speech engines are more effective at screening out ambient or surrounding noise that is not generated by the callers' voices. These developments have led to increased automation rates and fewer agent opt-outs.

"In the old days, like 2003 and earlier, if a dog barked any time during a call, the speech rec application would think this noise might have been from the caller but wouldn't be able to make sense of it, " recounts Fisher. "Now if you have a loud dog or child, the system has the ability to analyze the difference between spoken noise and ambient noise and callers can achieve their tasks with higher success rates."

To illustrate, Loquendo's Loquendo ASR 7.5 features a new noise compensation feature plus it has re-trained all supporting languages with additional material recorded in the presence of background noise, including mobile. It also offers more complex and support for multilingual grammars and large vocabularies. It has differentiated timeouts to permit utterances of fixed format and length such as credit card numbers.

There is a continuing shift by users toward natural language speech rec, which enables callers to speak to the computers like they are conversing with people, away from directed dialogue speech rec, where callers speak one or two words in response to a DTMF-like menu.

Natural language permits callers to obtain what they want quicker and more easily. They can, for example, barge into the applications and have their requests understood because the speech engines parses through their words and retrieves the right responses from their libraries. This functionality leads to greater automated interaction completion rates and fewer live agent zero-outs. Yet the solutions are more expensive and complex to install.

"Natural language is preferable because it more closely aligns with the users need to have the system to respond to them," explains Lyons. The challenge is that natural language is not mature enough yet to deal with the general public. You have to build extensive libraries focused on the things that a person might ask. When you think about the many language options along with the many accents and slang options, it is easy to see why natural language is rather difficult to implement successfully in many situations."

There are many applications where directed dialogue is extremely useful. Voxeo, a provider of premise and hosted IVR and VoIP applications, points to the example of a mail order firm where 20 percent of callers dial in to find out about their order status.

The marketer uses Voxeo's Prophecy platform to ask the callers to say their order numbers, which is less restrictive than having them use DTMF. It then queries an existing Web-based order status solution and receives XML instructions to inform the customers that their orders have been shipped. Rather then ending the calls, the platform then queries the shipper's package tracking Web applications and tells the callers where the packages are.

Speech rec engines, especially those that use natural language will benefit from increased chip processing power driven principally by strong demand for increasingly sophisticated computer games, reports Ian Jacobs, senior analyst, Frost & Sullivan.

"The faster and more affordable chipsets will enable speech rec applications to route calls quicker and handle more complex interactions," he explains.

Advantages of speech over DTMF

These improvements are making speech rec a more effective automated voice solution compared with DTMF-enabled IVR for most if not all interactions.

Speech rec can bolster customers' experience with automated voice methods by enabling them to complete transactions or obtain information and assistance quicker by accommodating their requests, instead of forcing them to go through long hierarchical menus as with DTMF.

Speech rec also enhances CRM by permitting customer personalization. When the system recognizes the callers it can then, based on the rules you create, address them by the first names, cut through the menus, and present customized information and offers.

One literal driver to speech rec from DTMF is mobile commerce. Andrea Holko, Senior Vice President of Global Consulting Services, Intervoice cites the growing number of jurisdictions that have hands-free cellphone laws.

"In environments where for safety reasons you cannot use your hands to use a phone, like driving a car, speech rec is a necessity," says Holko.

Also, the conversational flow in natural language speech recognition keeps older customers in the automated applications longer before contacting live agents.

Security is enhanced with speech recognition because it allows for complicated and less-readily-faked passwords. These are migrating from the common mother's maiden name to names of high schools attended and to the names of first pets.

There are places for DTMF. It can provide a high degree of accuracy for low level security, such as through the entry of 4-digit or 6-digit PINs. It also permits customers to enter confidential information in public places, to avoid it being overheard, and possibly stolen by others. It can, in addition, process vast number of simple calls requiring only numeric inputs highly reliably at low cost.

If you do retain DTMF, avoid upgrading the host IVR with speech rec, recommends Avaya's Lyons. Instead, have the speech applications integrated directly on the routing and switching solutions. Have the IVR connected only for those customers who wish to use the DTMF functionality.

The Avaya executive explains that the IVR's hierarchical call flow conflict with the natural conversation flows in speech rec and live agent. Firms that install speech rec on the IVR therefore risk failing to achieve ROI, such as improved customer retention and satisfaction and shorter live agent call lengths, because more callers will zero-out than projected.

"Installing speech rec on the IVR is the worst solution for your customers because it doesn't allow you to change the paradigm and permit you to create customer-friendly call flows," Lyons points out. "All what you will have is more expensive DTMF with poor satisfaction or call containment rates."

Speech rec as live agent adjunct or replacement

Banishing IVR to the periphery leaves speech recognition open to take on live agents. Already it is handling more transaction types that are edge of competence with DTMF IVR but which are too expensive for live agents, such as ordering movies, products, and tickets.

Speech rec can also reduce call lengths and call handling costs by obtaining basic routine information from callers that it then transmits to live agents. As speech applications become more robust they will be able to gather more data and handle more tasks, leaving less work for live agents to carry out.

"Speech is taking increasingly sophisticated calls from live agents, including diagnostics, and tech support, and account verification," reports Keith Dawson, senior analyst, Frost and Sullivan. "I estimate that 35 percent, maybe 50 percent of calls are so routinized that they can go to speech self-service in the near future. "

Customers can more easily choose to leave for or enter speech rec applications from live agents and other channels without re-entering data thanks to greater integration between speech rec and other interaction types on platforms and services.

Intervoice's new CTI-enabled Intervoice Contact Portal permits customers can choose the channel: phone, e-mail, SMS, or web chat, and the resource - self-service or a live agent. All of the information is routed with the contact throughout the entire session.

Also, Genesys Telecommunications Laboratories' newly-released intelligent Customer Front Door™ solution incorporates Nuance's natural-language based Nuance Call Steering application to determine caller intent to reach either a live agent or automated system.

Avaya's Lyons sees automated speech and the web merging with customers sometimes talking with machines, sometimes interacting with over the web, while the applications tap the same rules and response engines and databases.

He gives the example of a customer receiving a voicemail or SMS from the airline that his flight is delayed and has to rebook it to get to his destination on time. He call backs and reach the speech engine where he finds out plane times and then listens to alternative departures. When the desired flight is selected he can automatically process seat selection, payment processing and any other tasks on the web.

"By merging speech and web together gains more control could reduce the need for live agents by avoiding them altogether or by limiting the length of conversations through moving the interactions farther in the automated process. "Lyons points out. "This will open up automated functionality to many more and other applications such as help desk, stock trades, and healthcare claims processing without the need for more phone calls."

Choosing and Implementing Speech Rec

To see if speech recognition is right for your contact center and to obtain the most of the investment, Ian Jacobs recommends that you understand what your organization's most common interaction types are, what callers are calling about and whether speech can automate those. If they are then it is worth considering, if not then it isn't.

In carrying out your examination look at the industries where this makes a lot of sense and whether their benefits apply to your case. The big speech rec adopters include the wireless, financial services, and transportation and travel industries.

When selecting suppliers, which for speech rec can include not just for the software application but also consultants and systems integrators, choose those understand call flow and where the technology fits in the total customer interaction picture.

At the same time examine the option of hosted speech whether by dedicated firm or by teleservices firms that offer speech either standalone or integrated with its other solutions. This alternative lowers up-front costs and provides you with applications that have been pre-installed.

"Hosted speech is a good opportunity to access this technology in a pay as you go model, allowing a company to better align costs with benefit," explains Lyons. "The drawback is that without having an integrated solution: live voice, speech, and web it will be difficult to move the satisfaction index beyond where it currently is."

Hosting is also a good option for DTMF IVR and for high-volume directed dialogue applications, which minimizes technology investments and enables contact centers to handle spikes in traffic. Examples include new credit card launches, holiday season order tracking, or provide basic information including with password support such as in the event of a disaster. Hosting firms typically have thousands of ports, on machines at disaster-protected sites.

The principal challenge faced in putting speech recognition to work is not so much in making the base technology work, which is reasonably reliable, but in implementing it with the customers in mind so that they will not mind using it.

The key is creating analytics so that customers can obtain more control of the interactions. That includes coupling the speech rec engines to the customers' intent quickly with minimal prompts and mapping out the customer taxonomy so that high-value customers go directly to live agents.

"Unfortunately there is very little analytics out there on a production scale," reports Lyons. "This must be provided by systems integrator or internally by people who understand the analytics, the application, the underlying technology and the clients' business."

When you design your speech application, avoid trapping customers inside the automated system. While this technique lowers costs by cutting agent opt-outs it also drive down satisfaction, retention, and in most cases, revenues.

One technique to consider is to design menus and applications that reflect knowledge of the caller, their account information and their most recent transactions. For instance, if your firm makes furniture and a retailer calls every day only to check the status of their orders then "smart menuing" can be incorporated. You offer the order status prompt first, and if the caller wants to do something different, you can back off to the main menu.

"This is a simple but good example of how to demonstrate to your customers that you value their business and time, and that you're giving them a type of custom tailored treatment" says West Interactive's Fisher.

The following companies participated in the preparation of this article:


Frost & Sullivan






West Interactive

CIS Magazine Table of Contents

Technology Marketing Corporation

2 Trap Falls Road Suite 106, Shelton, CT 06484 USA
Ph: +1-203-852-6800, 800-243-6002

General comments: [email protected].
Comments about this site: [email protected].


© 2023 Technology Marketing Corporation. All rights reserved | Privacy Policy