Natural Language Speech Recognition:
Expanding The Dimensions Of IVR
BY DR. VAL MATULA, AVAYA
The emergence of natural language speech recognition (NLSR) is a major
step toward fulfilling visions we saw portrayed in popular science fiction
more than 30 years ago. Astronaut Bowman pleaded with Hal to open the pod
bay doors in 2001: A Space Odyssey; Dr. McCoy demanded a medical synopsis
from a database in Star Trek.
More importantly, NLSR is becoming a useful and exciting tool for
supporting customer relationship management (CRM) applications. At a given
prompt, NLSR recognizes a sentence of words from a set of tens of
thousands of words, and the technology performs additional sentence
structure processing to extract specific meanings. The technology enables
customers to speak naturally into telephones and computers in order to
send and retrieve information and even complete entire transactions. It is
ideal for IVR call flows in which callers need to provide more complex
information than can be captured by pressing keys on a telephone key pad.
Furthermore, as an innovative way of using networks to conduct business,
NLSR is at the forefront of the e-business movement.
NLSR represents a logical progression from "basic" speech
recognition, which businesses have been implementing with increasing
regularity over the last few years. Today, as consumers, we barely notice
a prompt to "press or say one." But if prompted to say something
like, "I want to speak to someone in sporting goods," we
consciously approve. By giving customers this option to communicate
naturally, businesses and organizations are creating favorable impressions
and increasing the likelihood customers will return. Additionally, NLSR
provides opportunities to automate new types of transactions, potentially
reducing costs and increasing revenues.
While NLSR presents great opportunities, organizations should approach
it with the same care they would use while examining any leading-edge
technology. Importing "technology for technology's sake" is
never a good idea. Compounding the considerations for NLSR are so-called
"human factors," which must attempt to anticipate customers'
logical thoughts and assumptions during the process of making a
transaction.
When we're conversing with organizations about possibly adopting NLSR,
we strongly recommend the following application development cycle. It
helps avoid the "common pitfalls" companies run into when
dealing with technology -- i.e., it may "work great," but it's
not helping your business.
Project Definition
This is the first step in any telecommunications/IT project. It
establishes the scope of the project, the anticipated business case with
estimated call volumes, the development community and the general
timeline. Good candidates for an enterprise's first use of NLSR would be
an application that has good call volumes, does not demand too much
complexity in its technology and has a payback period on the order of no
more than 6 to 12 months.
Define Call Flow And System Architecture
Here, we examine which questions will be asked of the caller, and what
results will be presented back. Human factor elements are important
considerations in defining this flow. For example, if account number or
customer information is to be transferred to CTI applications, SNA hosts,
etc., the call center infrastructure must be in place to support this
transfer. Self-service applications require access to corporate data
servers. Asking a caller for his or her account number "for better
routing" should also transfer the account number to the target agent
so the agent need not ask for the number again before assisting the
caller. If NLSR is to be implemented using a client/server architecture
with the IVR system(s), then the local area network infrastructure must be
specified to assure the bandwidth between the IVR system and the NLSR
engine servers.
Customer Interaction Testing
Organizations can begin creating the basic flow of an application by
listening to live agents handle calls. A great deal can be learned about
the types of questions agents must ask to elicit information from callers,
as well as the different questions and vocabulary that customers might use
when making a request or discussing an issue with an agent.
Offline testing with human factors experts can be valuable in
determining the best way to ask questions of callers. For example, an
appliance manufacturer we worked with wanted to use voice response to
separate calls at a help desk/repair center. Originally, the first
question was, "Do you have a large or small appliance?" But
human factors testing found that consumers could consistently place almost
all products into one of those two categories, the only exception being
microwaves. Using focus groups, the company determined that a better
question was, "Do you have a large appliance, a small appliance or a
microwave?" It significantly increased the percentage of correct user
responses.
Other tools and methods are available for customer interaction testing.
For example, designers frequently do not know how callers will react to a
NLSR-enabled IVR application. Furthermore, senior management may not have
a feel for what a proposed application might sound like, or what the
potential call flow might be. Situations like these can be addressed by
creating a mock-up of the proposed application.
Using PC-based tools, organizations can create a live storyboard of the
proposed application flow, and record prompts accordingly. With a common
telephone interface card in the PC, the application works with a live
agent controlling the flow. When a call is brought to the PC, the agent
uses a mouse and clicks to play the first prompt to the caller. The caller
hears the prompt(s) and responds as if it were a completely automatic IVR
system. The agent hears the caller through a headset, analyzes the
response and then chooses the next prompt to play to the caller. Along the
way, caller's responses are recorded for later analysis.
This method enables testing and demonstration of a process without
significant IVR application development. Senior management can get a feel
for the application, designers can re-record prompts to be more precise,
and application flow can be modified for smoother operation, all before
full application development.
This step can be skipped for simple applications, but it can provide
significant benefits to more complex applications and assist in the
decision-making process before proceeding with a project.
IVR Application Development
In this step, the script or call flow is specified, and the IVR
application is developed. Host interfaces or CTI links are programmed, if
appropriate. If the customer interaction testing phase was executed, the
prompts recorded during that phase are used in the application development
-- if not, prompts are recorded.
In addition to these "standard IVR" activities, the
vocabulary and semantics for responses to application prompts must be
specified to the system. Anticipated caller responses are determined from
the recordings captured during customer interaction testing or are
estimated from past experience. In either case, tools available from the
IVR provider or from the NLSR engine provider will be used to specify the
grammar and vocabulary for the engine's recognizer.
First-Use Customer Trial
Generally, IVR applications benefit from a first-use trial with
customers in which callers' reactions to the application are monitored. If
necessary, changes are made to the call flow, prompts and options
presented. It's true that many DTMF or basic speech applications shorten
this step due to budget or project timeline constraints. While this may
still result in an acceptable (if lower) success rate with a DTMF
application, general industry experience at this time is that NLSR-based
applications strongly benefit from this step. It helps ensure the general
calling population interprets the prompts as designed, that the responses
are in the specified vocabulary and thus are recognizable, and the overall
application works as intended.
It is not unusual to begin this phase with a success rate for an
application at a relatively low number, perhaps 50 percent, and see it
raised to 60, 70 or even 80 percent within a matter of a few weeks.
Customer Lifecycle Maintenance
Many IVR applications, once installed, are not strongly monitored or
updated. While again, this may be acceptable for DTMF applications, there
is some need with NLSR-based applications to monitor and, if necessary,
adjust the application to maintain a continued high success rate for
users. Changes that can occur that will affect the application may include
the following scenarios.
- As users become more familiar with the application, they may provide
shorter responses and may use barge-in (speaking over a prompt) more
frequently.
- Changes in the company's catalog, billing format or product
descriptions may confuse callers or may require changes in the
acceptable vocabularies.
- Callers may assume that additional services beyond those offered
(because the company has acquired new businesses, offered new
products, etc.) are available through the application, and request
them by name.
- Through marketing and market expansion, new sets of users from
different parts of the country or different demographics may engage
the system. This can change the vocabulary presented to the
recognizers. A simple example: A bank may have customers call about
"car loans." If a bank expands into the Gulf Coast region,
or if time passes and winter turns to summer, the bank may now receive
calls for loans on jet-skis.
Natural language speech recognition is an exciting capability with
myriad positive consequences for e-business and CRM. It can nourish the
bottom line by controlling costs and increasing revenue. It can attract
new customers and encourage existing customers to return. As with any
leading-edge technology, organizations must be prudent in their approaches
to evaluation and implementation. But NSLR is gaining enough marketplace
traction that there are useful levels of insight and experience about
deploying the technology. Data exist to provide a reliable road map, and
any organization which handles high volumes of customer interactions and
wishes to serve its customers well could benefit by examining NLSR's
possibilities.
Dr. Val Matula is a distinguished member of the technical staff at
Avaya (formerly Lucent Technologies Enterprise Networks Group). He is
responsible for advanced speech and systems architecture planning, which
includes speech recogntion, text-to-speech and speech coding. Previous
work at Lucent's Bell Labs unit has supported systems engineering and
planning for ACD management systems.
[return to the August 2000 table of contents]
|