×

SUBSCRIBE TO TMCnet
TMCnet - World's Largest Communications and Technology Community

CHANNEL BY TOPICS


QUICK LINKS




 

December 1997


Speech Driven IVR: The Customer Is Almost Always Right

BY DESMOND PIERI, VOICE CONTROL SYSTEMS, INC.

Speech recognition technology can provide tremendous advantages when automating the customer support and service function. While industry professionals know speech-driven IVR can do the job, there are still many skeptics — those technology users at large organizations who are reluctant to take the step forward and embrace change. However, over the past year alone, there have been a number of successful speech recognition deployments, which point to a bright future for this very exciting industry.

The primary means of increasing the level of speech recognition use is through education. This is not to say (potential) users are not smart — on the contrary — but certain misapprehensions lead them to believe speech recognition technology is less capable than it really is. Some of these misconceptions revolve around the issues of capacity, accuracy, novelty, and expectations. This article will address these common customer and implementation myths. In drawing from the lessons learned, vendors can streamline costs and deployment time, while companies gain information that helps them overcome their reluctance to use speech recognition to automate customer support and service applications.

CAPACITY
Many customers believe that speech recognition is still not able to handle the demands of their business. They feel they should wait until solutions are robust enough to handle an application, such as an airline reservation/ticketing system, before investing in the technology. Mistakenly, they might think they require a 10,000-word vocabulary for their system.

While a customer may think a 10,000-word vocabulary is required to fulfill the needs of an airline reservation/ ticketing system, it can actually be performed very naturally and accurately by using a series of flat menus rather than one very large one.

In the case of a speech-activated IVR system for an airline reservation/ticketing system, where a large number of words must be recognized, the project can be broken down into natural segments such as departure city, seating preference, departure time, or seating class. This obviates the need to have one large database capable of recognizing 10,000 words. In short, it is not nec-essary for accuracy to be sacrificed for a complex application. Successful navigation of the system (or throughput) can be maintained, or even increased, through pooling menu selections.

Continuing with this example, departure/arrival cities and times are noted with likely transcriptions. It is relatively straightforward to develop a speech recognition system for these parameters where the user’s universe of possible cities or times is limited. To make the case that an extremely large vocabulary is not required, look at the number of possible transcriptions for each field. In the case of an airline reservation/ticketing system, the largest self-contained application must recognize approximately 250 cities (for domestic flights), most of which are close variations on the city name or airport. With different recognizers in use for each menu selection (field), it can be seen that 10,000 words are not required for a successful full-featured system.

ACCURACY
A general lack of knowledge about IVR and speech recognition leads some potential purchasers to believe that these systems possess a low level of accuracy. This issue stems from a basic misunderstanding of throughput, which measures the level of accuracy of a system. Sometimes a standard testing database may not exist, or purchasers will set up a testing system that merely echoes the results of the recognition after a user provides input. In this scenario, users may read from a script or “parrot” back words dictated by prompts. The only effective means of predicting performance over time is through an environment that mirrors the intended market conditions. At a minimum, purchasers should test a prototype of the system under realistic conditions — but this is rarely done.

Another source of confusion revolves around the recognition rate. Management may decide to randomly sample a system by listening in on live calls. The problem lies in the fact that managers may claim a system’s accuracy rate is 75 percent based on a sample of four calls. This sample size is statistically insignificant; not a valid representation of the total population.

When testing the system, it is important for people to speak clearly and enunciate. Customers also create confusion in that they “ping-pong” between wanting to reject any extraneous speech and wanting to recognize all utterances. They want the system to work for confused or uncooperative users and are thus unwilling to accept the lower levels of throughput for the expert user. In fact, the best design rewards cooperation and success, and penalizes uncooperative behavior.

NOVELTY
“IVR is nothing new — speechdriven IVR is just another technology wrapped in new packaging, so I’ll implement it that way.” This complaint is frequently heard from customers who are not really familiar with speech recognition. They choose to implement the technology with reluctance, and therefore introduce a system that is not a proper implementation of speech recognition. The result ends up being nothing more than a DTMF menu with a great deal of window dressing. This stems from the cautious project manager who wants the technology to prove itself before becoming reliant upon it. By taking this route, the application becomes cumbersome. Successfully implemented speech recognition applications are enabling technologies and do not simply showcase what can be done with existing resources.

EXPECTATIONS
The issue of managing a customer’s expectations is clearly visible in the misunderstanding that technology of this type might allow them to downsize their call centers by up to 80 percent. Customers and end users alike will adopt technological innovations at a different pace. Because a certain segment can only be weaned off traditional channels over time, the return on investment will grow over time.

The 80/20 rule also applies for speech recognition implementations, especially where a finite number of options which are generally agreed upon by the universe of users exists. An example of this would be an IVR system for a bank. The majority of the time, customers will request checking or savings transactions. A small portion of customers may wish to purchase CDs or update personal information. The solution should not be used to solve organizational or customer woes, for it can only create efficiencies for existing processes. However, by looking at all the possible uses of speechactivated IVR within a customer support environment, the cost of the system will be repaid quickly.

THE ROLE OF SPEECH INTERFACE DESIGN
Additional concerns developers and deployment teams should keep in mind include: usability, cost, choice of platform, and scalability.

Usability
In computer development labs across the world, usability is an important concept. No matter how well-coded software is, it will not satisfy users’ demands if the user cannot successfully navigate an application to complete tasks. A low throughput rate is evidenced by users seeking alternative ways to process the task or not repeat the task at a future date. The first step is to understand the role of human interaction through the telephone.

Cost
Directed natural language exploits a middle ground by allowing users a limited number of options, or constraints on speech, similar to those identified in the airline example. The system can learn over time, moving the user away from directed queries to a more unconstrained speech pattern. The benefits include significant cost savings and accuracy over pure natural language. With a minimum of resources required for development, a robust system can be developed to provide tremendous flexibility to IVR customers.

Traditionally, speech recognition developers thought of the interface as similar to that of a PC. In fact, speech recognition is based on humanhuman interaction, not human-computer interaction. By recognizing this concern, programmers can think conceptually of what it would be like to interact with a computer assuming the role of a live operator. The best design will lead customers to their choice with a minimum of menus.

Platform Issues
Developers need to be aware of hardware and software issues as they relate to timing, which affects the duration and placement of when the recognizer will be listening. Platforms support varying requirements and can alter the throughput rate if not accurately calibrated. Interoperability among platforms is also a concern.

The Enterprise Computer Telephony Forum (ECTF) has developed two standards: S.100, a software standard which defines a set of computer telephony APIs that provide an effective means to develop CT applications in an open environment, and H.100, a hardware standard which provides information to implement a CT bus interface for computers. These standards go a long way in developing open systems that will allow all developers to ensure compatibility in the market.

Scalability
Scalability is another issue which becomes a concern when working in a host-based environment. Because the speech-activated IVR system relies on the host-based CPU, its scalability is directly proportional to the DSPs which can be added to the system. Under client/server architecture, the IVR system can reside on a dedicated server, allowing tremendous scalability.

CONCLUSION
Many vertical industries are realizing the benefits of developing and deploying speech-activated IVR systems. With falling costs and great advances in technology, directed natural language and other IVR-based applications have tremendous potential for customer service and support centers in various industries. Compared with resourceintensive pure natural language or the limited functionality of DTMF, directed natural language offers benefits of speech recognition technology without the burdensome costs and development obstacles.

Desmond Pieri is vice president of marketing at Voice Control Systems, Inc. (VCS), a leading global supplier of speech recognition and related technologies which enable computers and electronic devices to understand human speech. Serving the telecommunications, personal computing, and consumer electronics markets, VCS products are used worldwide, with commercial installations in the Americas, Europe, Asia, and Australia. The author may be contacted by sending e-mail to [email protected]. For more information, visit the company’s Web site at www.voicecontrol.com.


Common Speech Recognition Terms

Throughput — The successful navigation (and completion) of an IVR transaction, leading to resolution of the customer query.

Density — A measurement of efficiency in speech recognition. It is the ratio of computer resource efficiency to the number of ports. Transcription — The number of variations of a specific utterance (i.e., December 25, 25 December, or Christmas).

DTMF — Dual tone multi-frequency. Activation of menu selections through push-button (touchtone) keypad dialing.







Technology Marketing Corporation

2 Trap Falls Road Suite 106, Shelton, CT 06484 USA
Ph: +1-203-852-6800, 800-243-6002

General comments: [email protected].
Comments about this site: [email protected].

STAY CURRENT YOUR WAY

© 2024 Technology Marketing Corporation. All rights reserved | Privacy Policy