Speech Driven IVR: The Customer Is Almost Always Right BY
DESMOND PIERI, VOICE CONTROL SYSTEMS, INC.
Speech recognition technology can provide tremendous advantages when automating the
customer support and service function. While industry professionals know speech-driven IVR
can do the job, there are still many skeptics those technology users at large
organizations who are reluctant to take the step forward and embrace change. However, over
the past year alone, there have been a number of successful speech recognition
deployments, which point to a bright future for this very exciting industry.
The primary means of increasing the level of speech recognition use is through
education. This is not to say (potential) users are not smart on the contrary
but certain misapprehensions lead them to believe speech recognition technology is
less capable than it really is. Some of these misconceptions revolve around the issues of
capacity, accuracy, novelty, and expectations. This article will address these common
customer and implementation myths. In drawing from the lessons learned, vendors can
streamline costs and deployment time, while companies gain information that helps them
overcome their reluctance to use speech recognition to automate customer support and
service applications.
CAPACITY
Many customers believe that speech recognition is still not able to handle the demands of
their business. They feel they should wait until solutions are robust enough to handle an
application, such as an airline reservation/ticketing system, before investing in the
technology. Mistakenly, they might think they require a 10,000-word vocabulary for their
system.
While a customer may think a 10,000-word vocabulary is required to fulfill the needs of
an airline reservation/ ticketing system, it can actually be performed very naturally and
accurately by using a series of flat menus rather than one very large one.
In the case of a speech-activated IVR system for an airline reservation/ticketing
system, where a large number of words must be recognized, the project can be broken down
into natural segments such as departure city, seating preference, departure time, or
seating class. This obviates the need to have one large database capable of recognizing
10,000 words. In short, it is not nec-essary for accuracy to be sacrificed for a complex
application. Successful navigation of the system (or throughput) can be maintained, or
even increased, through pooling menu selections.
Continuing with this example, departure/arrival cities and times are noted with likely
transcriptions. It is relatively straightforward to develop a speech recognition system
for these parameters where the users universe of possible cities or times is
limited. To make the case that an extremely large vocabulary is not required, look at the
number of possible transcriptions for each field. In the case of an airline
reservation/ticketing system, the largest self-contained application must recognize
approximately 250 cities (for domestic flights), most of which are close variations on the
city name or airport. With different recognizers in use for each menu selection (field),
it can be seen that 10,000 words are not required for a successful full-featured system.
ACCURACY
A general lack of knowledge about IVR and speech recognition leads some potential
purchasers to believe that these systems possess a low level of accuracy. This issue stems
from a basic misunderstanding of throughput, which measures the level of accuracy of a
system. Sometimes a standard testing database may not exist, or purchasers will set up a
testing system that merely echoes the results of the recognition after a user provides
input. In this scenario, users may read from a script or parrot back words
dictated by prompts. The only effective means of predicting performance over time is
through an environment that mirrors the intended market conditions. At a minimum,
purchasers should test a prototype of the system under realistic conditions but
this is rarely done.
Another source of confusion revolves around the recognition rate. Management may decide
to randomly sample a system by listening in on live calls. The problem lies in the fact
that managers may claim a systems accuracy rate is 75 percent based on a sample of
four calls. This sample size is statistically insignificant; not a valid representation of
the total population.
When testing the system, it is important for people to speak clearly and enunciate.
Customers also create confusion in that they ping-pong between wanting to
reject any extraneous speech and wanting to recognize all utterances. They want the system
to work for confused or uncooperative users and are thus unwilling to accept the lower
levels of throughput for the expert user. In fact, the best design rewards cooperation and
success, and penalizes uncooperative behavior.
NOVELTY
IVR is nothing new speechdriven IVR is just another technology wrapped in new
packaging, so Ill implement it that way. This complaint is frequently heard
from customers who are not really familiar with speech recognition. They choose to
implement the technology with reluctance, and therefore introduce a system that is not a
proper implementation of speech recognition. The result ends up being nothing more than a
DTMF menu with a great deal of window dressing. This stems from the cautious project
manager who wants the technology to prove itself before becoming reliant upon it. By
taking this route, the application becomes cumbersome. Successfully implemented speech
recognition applications are enabling technologies and do not simply showcase what can be
done with existing resources.
EXPECTATIONS
The issue of managing a customers expectations is clearly visible in the
misunderstanding that technology of this type might allow them to downsize their call
centers by up to 80 percent. Customers and end users alike will adopt technological
innovations at a different pace. Because a certain segment can only be weaned off
traditional channels over time, the return on investment will grow over time.
The 80/20 rule also applies for speech recognition implementations, especially where a
finite number of options which are generally agreed upon by the universe of users exists.
An example of this would be an IVR system for a bank. The majority of the time, customers
will request checking or savings transactions. A small portion of customers may wish to
purchase CDs or update personal information. The solution should not be used to solve
organizational or customer woes, for it can only create efficiencies for existing
processes. However, by looking at all the possible uses of speechactivated IVR within a
customer support environment, the cost of the system will be repaid quickly.
THE ROLE OF SPEECH INTERFACE DESIGN
Additional concerns developers and deployment teams should keep in mind include:
usability, cost, choice of platform, and scalability.
Usability
In computer development labs across the world, usability is an important concept.
No matter how well-coded software is, it will not satisfy users demands if the user
cannot successfully navigate an application to complete tasks. A low throughput rate is
evidenced by users seeking alternative ways to process the task or not repeat the task at
a future date. The first step is to understand the role of human interaction through the
telephone.
Cost
Directed natural language exploits a middle ground by allowing users a limited
number of options, or constraints on speech, similar to those identified in the airline
example. The system can learn over time, moving the user away from directed queries to a
more unconstrained speech pattern. The benefits include significant cost savings and
accuracy over pure natural language. With a minimum of resources required for development,
a robust system can be developed to provide tremendous flexibility to IVR customers.
Traditionally, speech recognition developers thought of the interface as similar to
that of a PC. In fact, speech recognition is based on humanhuman interaction, not
human-computer interaction. By recognizing this concern, programmers can think
conceptually of what it would be like to interact with a computer assuming the role of a
live operator. The best design will lead customers to their choice with a minimum of
menus.
Platform Issues
Developers need to be aware of hardware and software issues as they relate to
timing, which affects the duration and placement of when the recognizer will be listening.
Platforms support varying requirements and can alter the throughput rate if not accurately
calibrated. Interoperability among platforms is also a concern.
The Enterprise Computer Telephony Forum (ECTF) has developed two standards: S.100, a
software standard which defines a set of computer telephony APIs that provide an effective
means to develop CT applications in an open environment, and H.100, a hardware standard
which provides information to implement a CT bus interface for computers. These standards
go a long way in developing open systems that will allow all developers to ensure
compatibility in the market.
Scalability
Scalability is another issue which becomes a concern when working in a host-based
environment. Because the speech-activated IVR system relies on the host-based CPU, its
scalability is directly proportional to the DSPs which can be added to the system. Under
client/server architecture, the IVR system can reside on a dedicated server, allowing
tremendous scalability.
CONCLUSION
Many vertical industries are realizing the benefits of developing and deploying
speech-activated IVR systems. With falling costs and great advances in technology,
directed natural language and other IVR-based applications have tremendous potential for
customer service and support centers in various industries. Compared with
resourceintensive pure natural language or the limited functionality of DTMF, directed
natural language offers benefits of speech recognition technology without the burdensome
costs and development obstacles.
Desmond Pieri is vice president of marketing at Voice Control Systems, Inc. (VCS), a
leading global supplier of speech recognition and related technologies which enable
computers and electronic devices to understand human speech. Serving the
telecommunications, personal computing, and consumer electronics markets, VCS products are
used worldwide, with commercial installations in the Americas, Europe, Asia, and
Australia. The author may be contacted by sending e-mail to [email protected]. For more information, visit the
companys Web site at www.voicecontrol.com.
|