Advanced Speech Recognition: New Self-Service Options
For Call Centers
BY ROGER REECE, SYNTELLECT, INC.
Advanced speech recognition (ASR) is transforming the way callers interact with
self-service interactive voice response (IVR) systems. According to GartnerGroup, within
four years, 30 percent of all new IVR installations will employ ASR technology. Deployment
of self-service platforms of all kinds is growing faster than ever before, and speech
recognition is opening doors for applications that until now could only be accomplished
through live agents.
IVR has come a long way over the past 20 years and has overcome countless technological
obstacles. One of the most difficult obstacles to master, however, was not technological
in nature -- it was convincing top management that customers would not be insulted by
reaching a prerecorded voice rather than a call center agent. Clearly, self-service is
here to stay; not only in the world of call centers, but nearly everywhere else as well:
- Automated teller machines have become the cash withdrawal method of choice. Now you can
find ATMs virtually everywhere people use cash. Soon, smart cash cards and PC cash card
readers (home ATMs) will begin replacing the old paper-based technology, offering even
more self-service options.
- Automated gas pumps are now handling nearly as many retail gasoline purchases as
cashiers. The new, automated gas kiosks feature full video displays and allow gas
purchasers to watch CNN or ESPN while pumping gas and make a wide variety of purchases,
including lottery tickets.
- The Internet and electronic commerce are threatening retail stores in every business
area, from books and CDs to automobiles and airline tickets. With choices like amazon.com
and ebay.com, consumers are finding better value and service through online self-service
options. Soon, consumers with an aversion to PCs will be able to take advantage of
Internet-based browser phones. These phones will offer a combination of voice and screen
prompting with touch-screen and soft key input, making e-commerce accessible to more
The old concerns that self-service options eliminate the personal touch have diminished
in light of demands for new, more powerful and flexible self-service choices. IVR systems
using touch-tone voice menus can typically offload 20 to 30 percent of the live agent
calls in a call center. Until recently, it was assumed that the remaining 70 to 80 percent
of the call volume was best handled by agents. Now, advanced speech recognition provides
the option of offloading an additional 20 to 30 percent of a call center's telephone
Methods Of ASR User Interaction
Traditional IVR user interaction of pressing the keys on a touch-tone phone is relatively
straightforward in terms of application design. The basic rules include limiting choices
on a given voice menu to four, and always allowing the user to press zero to reach an
agent. Speech recognition application development is more complex and can be divided into
- Touch-tone replacement,
- Forms fill-in , and
- Natural language.
In essence, IVR speech recognition applications can be compared to the quizzes we have
all taken in school. Touch-tone replacement apps are similar to true/false and multiple
choice quizzes, forms fill-in apps are like fill-in-the-blank quizzes and natural language
apps are like short-answer or essay quizzes.
These speech recognition applications have been around for years and the technology does
not qualify as advanced speech recognition. Touch-tone replacement apps are handy when the
caller has a rotary phone (a collector's item in the U.S.). Touch-tone replacement apps
are generally scripted exactly like touch-tone IVR applications ("press or say
one"). These applications can be compared to multiple-choice and true/false quizzes.
The caller wants something and the IVR makes him or her answer with the correct choice
from a short menu. If you currently use IVR and are looking for ways to offload more calls
from your agents, you need to look to the two more advanced methods of user interaction.
In school, we all liked true/false and multiple-choice questions because it was easier
to guess at the right answer. If there are hundreds of possible choices, multiple choice
is far too cumbersome to use. This method can be made to work by asking the caller to wade
through a decision tree of menu choices, but generally the resulting application is
cumbersome and time-consuming for the caller.
This method of user interaction offers the most cost-effective solution for IVR
applications where true/false and multiple-choice methods are inappropriate. In school,
fill-in-the-blank quizzes require the student to know the answer. They are also easier for
the teacher to grade than short-sentence or essay quizzes because they are more objective
and specific. Think of the ASR-based IVR system as a teacher grading a quiz. The amount of
CPU and speech recognition processing resources required to handle a forms fill-in
application is far less than for a natural language application. The complexity in writing
and perfecting a forms fill-in application is also far less than for natural language.
Forms fill-in applications are the most common types of ASR apps being deployed in call
centers today. They are relatively straightforward and economical to develop and can
provide a means for handling large volumes of calls that currently require agent
interaction. The applications are scripted so that callers are clearly prompted to
"fill in the blanks" with specific information. The more specific the prompting,
the less complex the speech recognition task will be. ASR software can recognize words and
understand grammatical syntax. If the application is written so that the need for grammar
recognition is eliminated or minimized, the result is a true forms fill-in application.
The ability for voice response systems to reliably recognize spoken letters and numbers
provides a tremendous opportunity for self-service applications. A large percentage of
catalog, product I.D. shipment tracking, serial and part numbers contain both letters and
numbers and have not been easy to automate in the past. Certain automated attendant
applications are workable when the number of employees is small. Callers can enter the
first three or four letters of a last name by pressing the associated number keys on the
telephone. Although three letters are printed on each key, it provides enough input to
narrow a list of a few hundred employees to three or four. This method is unworkable,
however, when a specific alphanumeric combination, such as a part number, must be entered.
Callers do not appreciate being required to enter three keystrokes per letter, and IVR
apps that require that kind of user interaction generally get bypassed for the "press
zero for a CSR" option.
Alphanumeric speech recognition applications are relatively simple forms fill-in
applications for today's ASR-based IVR platforms. The IVR can prompt, "please speak
the number of the catalog item you would like to order." Then, when the customer
fills in the blank, the IVR performs a database lookup and can speak back the name of the
catalog item. "You have selected the Premier mahogany desk clock. Is this
Because ASR systems can recognize large vocabularies, forms fill-in apps that prompt
callers for specific words can allow callers to perform a wide range of self-service
transactions that would be nearly impossible using the true/false or multiple-choice
methods. For example, with an ASR forms fill-in app, travelers making airline reservations
receive prompts to speak the departure and destination cities, dates and times. When the
caller is directed to speak the name of the departure city, any of several hundred names
will be recognized, but the app does not attempt to recognize other information that may
be needed later. Each directed fill-in-the-blanks question is an independent application
segment with limited recognition and processing requirements.
Natural language applications are similar to short-answer or essay quizzes. The questions
asked by the IVR/ASR system are broader than forms fill-in, and recognition requires a
greater degree of subjectivity. Poorly written natural language apps leave callers
frustrated because they hear frequent "I don't understand" responses from the
system. Well-written natural language applications provide agent-like dialog and are very
caller-friendly. They are also complex to develop, difficult to fine-tune and expensive to
Using the travel example, a natural language short-answer application might ask the
caller to speak the departure city, date and time as a single answer. In order to process
the speech, the ASR system must understand enough grammar to recognize the key elements
from the response, "I'd like to depart from Atlanta around 4:00 p.m. on June
23rd." Since people structure sentences differently, all possible variations (proper
and poor grammar) must be recognized. The IVR may recognize the city name and date, but
not the time, so the caller might hear a request for clarification, such as, "What
time on June 23rd would you like to depart from Atlanta?"
Taking natural language further, applications can be set up with very broad questions
such as, "What would you like to do?" This type of app is similar to an essay
quiz, and just as teachers must spend more time grading essay quizzes, these ASR apps
require the largest amount of processing resources and development time. The application
should allow the caller to speak in any order he or she chooses. For example, in the
travel scenario, the caller may say, "I'd like to book a flight to New York from
Atlanta... let's see, I guess to LaGuardia next Saturday. I need to get there by four in
the afternoon and then return the following morning as early as possible." The more
open-ended the natural language questions are, the more varied the answers will be, and
the higher the likelihood that the app will have to reel the caller in with a response
like, "You would like to fly from Atlanta to LaGuardia on Saturday, June 24th. Is
this correct?" (The IVR then waits for a response.) "What time on Saturday would
you like to leave Atlanta?"
Essentially, the natural language application trains the frequent caller to speak in
complete sentences that are understood with minimal prompting. The system responds to new
callers with specific questions that guide them toward the appropriate responses, but the
best apps do this in a natural way that feels much like a conversation with an agent. In
some cases, open-ended essay-type natural language applications are the most effective way
to provide ASR-based self-service, but implementers must be prepared to supply the needed
processing power as well as development, debugging and fine-tuning efforts in order to do
How Should Your Call Center Deploy ASR Technology?
If it is your first ASR application, make it a forms fill-in app. You will be successful
more quickly for a smaller investment with simpler applications, and you can always
progress to natural language in a second phase after you experience your first level of
success. By taking the simpler approach, the price tag is lower, management buy-in will be
easier and you will have the fastest possible return on investment. Callers want more
self-service options -- and advanced speech recognition technology is proven and ready for
deployment in your call center now.
Roger Reece is Syntellect's corporate vice president of marketing and is
responsible for strategic planning, product planning, new business development and
corporate communications. He has 25 years of experience in voice and data networks,
software development and call center technology. Formerly, Reece was vice president of
marketing for Telecorp Systems for five years prior to the company's 1996 merger with
Syntellect. Prior positions include director of marketing for Melita International,
director of product marketing for Syntrex Corporation and director of product development
for the Lanier Worldwide division of Harris Corporation.