TMCnet - The World's Largest Communications and Technology Community
ITEXPO begins in:   New Coverage :  Asterisk  |  Fax Software  |  SIP Phones  |  Small Cells

Innovative Management Information
October 2001

Automatic Speech Recognition Fine-Tunes Self-Service


[ Go Right To: Technology Partners Offer New Services To Ensure Customer Success ]

Imagine you are a customer and you have just completed a call with your insurance company. For the first time you can recall, you hang up smiling. You have received all the requested information about your policy quickly, without spending an eternity on hold or having to bear those long, annoying touch-tone menus -- and the call was completed without ever talking to a live agent.

Does this sound like science fiction? Welcome to the new millennium, where more of the classic touch-tone IVR systems that have dominated the enterprise for 30 or more years are being provided with major face lifts. More to the point, perhaps it's a new set of vocal cords.

Voice recognition technology has come of age and, when integrated into business-critical systems such as IVR and office automation systems, it can provide a new level of service at a surprisingly reasonable cost. Replacing functions such as the basic hierarchical dual-tone multifrequency (DTMF) menus and complex dialogs employing natural language understanding, automatic speech recognition (ASR) is finding its way into enterprises conducting e-business and carriers deploying voice-activated dialing and automated directory assistance.

Thanks to language modeling, sophisticated grammars and accuracy tuning tools, speaker-independent ASR engines can attain an accuracy rate of 97 percent or better, rivaling that of a live agent. Combined with natural language understanding, this allows callers to navigate an application without having to follow a strict menu structure common in a typical IVR system. For instance, a caller who wants to transfer $100 between his or her bank accounts need not listen to a series of prompts such as "for transfer, press one" and "for checking account, press two."

All that is required is a verbal caller request such as, "I'd like to transfer $100 from my savings account to my checking account, please." The application responds by prompting the caller to articulate the account number and, upon validation, handles the transaction appropriately.

ASR is only part of this remarkable achievement. Text-to-speech (TTS), an application that uses basic computer ASCII text and simulates speech, has produced quality very close to the natural human voice. This human-like automated response allows callers to listen and understand with ease rather than struggle through the tedious, monotone sound that has been the hallmark of TTS for 35 or more years. TTS technology is capable of simulating actual speech while maintaining the appropriate prosody, speed, voice inflections and other characteristics that are important to human communication.

While businesses experiment with voice technology, the industry itself has accelerated the development efforts of linguists, dialog designers and speech technology engineers to create a broader selection of vastly improved products. To aid this development effort, tools are available that range from grammar and vocabulary to call flow and dialog design, enabling the creation of extensive, complex and accurate applications. Additionally, speaker verification, which is a biometric technology, provides unsurpassed security over the telephone when coupled with traditional passwords or account numbers.

In this global business environment, supporting multiple languages is vital. Currently, most of the languages spoken in North and South America, Western and Eastern Europe and Asia are supported through speech recognition, while developing areas such as India, the Middle East and Africa are either available or in development.

VoiceXML: The Emerging Standard
Voice technology is bridging yet another gap to access the enormous expanse of information contained on the Web. VoiceXML, a scripting language born of the same family as HTML, allows voice applications to be served up to speech browsers in the same way that HTML pages are served up to the traditional Web browsers. The similarity of VoiceXML to HTML makes it easy for developers to create voice applications that can leverage the existing Web infrastructure and enable companies to use existing investments with voice access to information on the Web.

VoiceXML is a particularly compelling, emerging technology for voice applications and it represents the first potential standard for voice applications. Originating with the VoiceXML Forum, a consortium of companies that includes Motorola, IBM, Lucent and AT&T, the VoiceXML standard is now the responsibility of the Worldwide Web Consortium (W3C), an organization with a long record of establishing technological standards. The involvement of the W3C, coupled with widespread developer acceptance of the VoiceXML specification, will promote interoperability among diverse voice applications and businesses around the globe.

Connecting The Voice Application
Voice recognition over the telephone has created some challenges for telephony equipment vendors. Often when a caller's voice is transmitted over the traditional public switched telephone networks (PSTN) to an ASR engine, it can be garbled and difficult to understand. Additionally, satellite repeaters and other telephone equipment can introduce echo, static or noise, which negatively affect the accuracy rate of the speech recognition engine.

Voice over IP (VoIP) technology has become prevalent in large enterprises and when introduced, can cause packet loss, latency and jitter that affect the voice sample. To counter these negative effects, equipment vendors are designing telephone network interfaces that provide superior echo cancellation, noise filters, jitter buffers and caching to improve voice quality and deliver excellent speech recognition. The accuracy of the ASR, of course, remains the major factor in user adoption of voice applications. 

The Session Initiation Protocol (SIP) standard is also emerging throughout VoIP networks to handle call control in a distributed network. Easier to use and more flexible than other protocols, such as H.323 and Megaco, SIP is becoming the preferred protocol for voice application developers. SIP has won widespread adop-tion by high-profile organizations such as Microsoft, which has integrated the SIP standard into its latest operating system.

Adoption Is The Key
Voice recognition over the telephone has reached a critical mass, demonstrated by the response of businesses to dramatically increased end user adoption. For service providers, voice applications provide a competitive differentiation that can drive revenue. Additional benefits to wireless carriers include promotion of the safe use of cell phones while driving and the increase of usable "minutes" that the carriers sell on their networks. Carriers recognize that speech technology represents more than another enhanced service. Speech recognition improves the usability of existing services and allows for the expansion of new, revenue-generating applications, such as instant conferencing and instant messaging services.

An increase in employee productivity, customer satisfaction, sales automation and more efficient service centers all contribute to the bottom line of any enterprise. Currently, there are a wide range of revenue-generating voice applications available that include customer relationship management systems (CRM), sales automation, e-business systems such as stock trading and voice banking, and more sophisticated interactive voice response (IVR) systems.

IVR vendors are now scrambling to integrate speech recognition into their products, propelled largely by competitive pressures, although this differentiation could be short-lived once it becomes commonplace in the IVR. Many IVRs have reached a limit on functionality constrained by DTMF. Fortunately, speech can broaden the services that an IVR can provide.

Voice recognition and voice-enabled applications have hit the mainstream, and we should expect to see an explosive growth of these types of services now and in the near future.

Steve Parsons is director of product management for the New Network Services division of NMS Communications (formerly Natural MicroSystems). In this position he is responsible for product marketing and management of HearSay, the company's high-density voice portal platform, integrating NMS telephony hardware with best-of-breed speech products.

[ Return To The October 2001 Table Of Contents ]

Technology Partners Offer New Services To Ensure Customer Success


Network-based speech recognition has been proven today in many corporations and in many industries to bring real business benefits to contact centers: cost savings, increased revenue, differentiation and the ability to expand upon and complement Web-based services. Through natural, automated dialogs, a caller can obtain information ("Here are your directions .... start by driving north on Route 205"); conduct transactions ("Transfer $2,000 from savings to checking") and route their calls ("Shipping information is available") instantly 24 hours a day from any phone.

However, if you're like many of today's managers, even though you're intrigued by the potential results, you still want concrete information about what speech can do for you. You probably have questions such as:

  • How would I start?
  • With what application?
  • What results can I expect?
  • What will my payback be?
  • What infrastructure is required?
  • Should I outsource this system?
  • How would I create and launch this service to my customers in a way that will make them want to try it, and come back to use the service again and again?
  • To answer these questions, start with the pre-sales evaluation process and continue on through system development, deployment and the final service launch.
To answer these questions, start with the pre-sales evaluation process and continue on through system development, deployment and the final service launch.

Choosing The Right Application
In today's economic climate, getting off to the right start is critical. Companies want to realize returns as soon as possible and they dare not jeopardize customer service in the process. Choose a good technology partner that can help you analyze your contact center objectives, existing customer service channels, cost structures and current business challenges to determine the best "fit" for speech in your company, now and in the future. You'll want to look at industry trends, ROI, corporate priorities, the competitive landscape and internal resource demands. An experienced partner will understand the potential of the technology, and can help you analyze your business considerations to determine a first application, and a path of future applications, in a way that makes good business sense for your organization.

Providing ROI Measurement Tools
Once you select a first application, you will likely have a good idea of the benefits you expect. However, it's also extremely important to put some hard numbers to the test by analyzing your new solution's ROI. By looking closely at ROI, both you and your partner will have consistent expectations and goals. In addition, you will both have a better understanding of what it will take to succeed -- from a carefully scripted user interface, to executive level "buy-in," to an adjustment to a back-end system, to new customer educational materials. This analysis begins by focusing on results.

Every organization has different resource needs. Your technology partner can help you assemble a solution that's right for you. Following are some questions you may wish to ask: Will your current call center and/or interactive voice response (IVR) infrastructure suffice? Do you need a new in-house platform? Do you need an outsourced hosting environment? What about application developers and system integrators? Will TTS (text-to-speech) or speaker verification be required? Do you need to adopt the latest standards? What are the cost constraints?

Get help to navigate through your implementation options and come up with the most effective and efficient plan for your company. Don't forget to take the time you need to get the appropriate people in your organization involved. Your technology partner can help you initiate key dialogs and keep them moving forward.

Consulting On UI Design For Ease Of Use And Branding
The process of implementing a speech application, in particular, is unique, and will be quite different from previous touch-tone or agent-based call center systems you may have managed. With speech recognition, callers can say anything and that creates both an opportunity and a challenge -- a challenge that is directly related to ROI.

When a user interface is designed well, callers will happily and successfully use your automated speech service, and need not "zero out" to speak with high-cost customer service representatives for routine inquiries. Touch-tone applications notoriously frustrate callers not only because the choices are limited (0-9, # and *), but also because little attention is paid to the caller experience. Speech has the power and potential to change all that -- when it's done right.

During development, a user interface design specialist will work closely with you and your call center agents to understand the typical conversations between caller and agent. They'll look closely at any installed IVR or Web-based systems to understand callers' experiences and expectations. They'll talk to you about the "personality" and brand of your new speech to help identify the voice talent and the style of dialog. Some systems inject humor, others are friendly but straightforward.

Launching The Service To Market
You're coming down the "home stretch" of your application deployment and your team may be swirling in technical implementation and logistics details. The success of your new service can be positively influenced by marketing and education efforts surrounding the introduction of your service to the market.

You'll want to analyze and segment your user base, and understand the various behaviors you're hoping to change with the introduction of new technology. Explore answers to questions such as: How can your own call center representatives assist in making the automated service a success? Will special promotions help increase awareness? Is a "caller guide" an appropriate tool? How are you communicating with your customers today (e.g., newsletters, statements) and can you use these vehicles to support the introduction of speech in order to get maximum usage and maximum ROI from your new speech service?

Whether your contact center resides within a financial institution, a manufacturing company, a travel agency or a retail enterprise, speech recognition, TTS and speaker verification can deliver bottom-line results. Beyond the product offerings, take advantage of services that complement the technologies and make sure that they are implemented in ways that make you and your company a success.
Don't let tighter budgets prevent you from exploring new opportunities. Difficult times breed creative solutions. In the case of speech recognition, there's a host of new services available to you. Your job, simply, is to use them.

Lauren Richman is director of corporate marketing at SpeechWorks, headquartered in Boston.

[ Return To The October 2001 Table Of Contents ]

Upcoming Events
ITEXPO West 2012
October 2- 5, 2012
The Austin Convention Center
Austin, Texas
The World's Premier Managed Services and Cloud Computing Event
Click for Dates and Locations
Mobility Tech Conference & Expo
October 3- 5, 2012
The Austin Convention Center
Austin, Texas
Cloud Communications Summit
October 3- 5, 2012
The Austin Convention Center
Austin, Texas

Subscribe FREE to all of TMC's monthly magazines. Click here now.