April 2009 | Volume 27 / Number 11
CALL CENTER Technology
Enabling The Automated Speech Dialogue
There continues to be a compelling case for deploying
and sourcing advanced speech technologies, namely
automated speech recognition (ASR) a.k.a. speech
rec self-service in contact centers. ASR arguably enhances
customer satisfaction and retention—by eliminating annoying
queues—while slicing per-interaction costs to 50 cents
compared with $5 for live agents.
Joe Outlaw, principal analyst with Frost and Sullivan, reports that firms continue to buy into speech technology. Based on a December 2008 contact center end user survey by the research firm, 20 percent of respondents said all new applications will be speech enabled while 50 percent reports that some of their new applications would have that feature. The survey also found larger contact centers and those in travel, hospitality, communications, and outsourcing expressed the strongest plans to build all new applications with speech interfaces.
"Upon closer examination and facing tight budgets many businesses are finding that the case for speech as an upgrade from traditional DTMF is not as clear as it could be, " explains Outlaw. "The ROI in terms of higher call deflection rates is not a guarantee. You can have a welldesigned DTMF application that has a high acceptance which when converted to speech may not deliver increased usage." For all these benefits and feasibility, speech rec has been deployed in only a minority of contact centers. These rates ranging from four percent to 25 percent.
"Speech recognition should have been deployed much more widely now than it has been because of its cost containment and customer service benefits," says Ravi Narayanan, Vice President, Convergys Product Management. "Yet high costs and slow installation times have crippled it. We must lower the barriers to speech recognition."
Enabling the technology
Convergys has been doing just that. It has been designing and deploying statistical language models for their clients’ verticals. These are based on commonly used words, phrases and vocabulary found in those domains, drawn from the firm’s hundreds of deployed speech applications. That has accelerated speech implementations without ‘reinventing the wheel’ for each new application, thereby avoiding spending time and money unnecessarily; the teleservices firm utilizes ASR engines from various vendors.
Convergys can refine and render dynamic speech applications with a combination of its Dynamic Decisioning Solution (DDS) platform and the Intervoice Voice Portal (IVP), an interactive multi-modal self-service platform. DDS enables dialogs to adapt to user preferences, business requirements and other dynamic conditions without having to re-write the dialog application. This lets firms adapt their speech recognition applications for changing conditions, and new products and services without creating new libraries and dialogues.
Convergys fine-tunes their speech applications by analyzing mistakes that the speech recognizer applications make. For example, if it seems to ‘hear’ ‘ Houston’ when the caller says’ Boston’, the tuning process will update the recognition parameters to correct the problem. Thus on subsequent calls, the correct interpretation of the caller’s utterance will be made.
Convergys has also developed unique technologies to leverage the large pool of live agents into the automated dialog process. When the recognition engine is having problems understanding a caller, an agent can be automatically brought by the DDS into the call to monitor the customer’s responses and correct any mistakes made by the ASR. This avoids frustrating callers by having to ask them to repeat their responses. Meanwhile the speech recognition parameters are updated such that problems such as the ones solved by the agent do not occur again.
These processes and enhancements have enabled Convergys to cut deployment times by 30 percent and sliced costs by a similar amount. They have also significantly improved customer satisfaction rates.
Boosting call completion rates enhances the case for ASR by increasing the net what it argues is the single biggest hurdle to it and that is coping with the out-of-grammar ‘ahhs’ and ‘umms’ and other statements of intent such as ‘I’d like to..’ that the applications pick up then stall because they do not understand what is being said.
To scale these obstacles the firm has come out with SmartListener, a new tool that uses an adaptive grammar engine that will automatically create an adaptive grammar which has all the words, phrases, and hesitations people utter before and after the main thrust of the call. This ‘teaches’ the speech applications the languages of customers via standard or SRGS grammars so that it recognizes the key statements from the supporting verbiage.
The results from SmartListener are impressive. It dramatically decreases retries and confirmations through an immediate error rate reduction of up to 30 percent out of the box, which leads to fewer costly transfers to live agents.
Loquendo is addressing out of grammar-caused poor completion rates in Loquendo ASR 7.7. It has an improved confidence score algorithm that increases separation between in-grammar and out-of-grammar utterances, resulting in more robust dialogues.
ASR has long been available by buying premises licenses from specialized suppliers directly or by acquiring them through purchasing contact center platforms that resell those applications. Its capabilities can also be obtained by outsourcing this functionality to specialized companies. Microsoft offers another approach: enabling its partners to offer ASR via its Office Communications Server 2007 (OCS) Speech Server. Partner IVR vendor Aumtech reports one of its customers, JetBlue Airways, realized significant cost savings and shortened integration time from the Microsoft solution over other offerings. Aumtech tapped its Media Resource Control Protocol (MRCP) Connector, which is an open standards based tool to use Microsoft’s speech engines on any MRCP-compliant IVR system. The speech engines will also be supporting more languages: to 26 by 2010 from 12 currently.
There are new applications aimed at increasing ASR ROI that improves cost effectiveness. Avaya has connected into its new Voice Portal 5.0 release a tracing tool that detects where speech system users are calling from. It can be used to fine-tune targeted direct response campaigns or respond to power outages.
West Interactive is developing caller experience personalization applications that can understand for example what products particular callers may have to ensure that offers will be tailored around them. It has also created several multilingual and multi-cultural speech solutions.
"English speakers globally interact with dialogs differently depending upon their country of origin, culture and cultural norms," explains Aaron Fisher, Director of Speech Services, West Interactive. "Our new solutions are aimed at helping them use speech applications so that they will complete more interactions with them."