
May 1999
Listen Up! Eight Criteria For Selecting A Speech
Recognition Vendor For Your Call Center
BY LAUREN RICHMAN, SPEECHWORKS
If you have been following trends in over-the-telephone speech recognition, you may be
convinced that this once "futuristic" technology has much to contribute to your
call center today. Automated speech recognition (ASR) makes possible the widest range of
self-service e-commerce applications through the most ubiquitous device -- the telephone
-- and the most natural interface of all -- the spoken word. ASR technology provides an
extremely cost-effective way to offer friendly, personalized customer service 24 hours a
day, while stopping the cumbersome torture of "press 1," "press 2"
commands.
Early adopters of over-the-telephone speech recognition, including United Airlines,
E--TRADE, FedEx, BellSouth and Hewlett-Packard, are already realizing the benefits of ASR.
With innovative applications, they are taking customer service to the next level, offering
advanced solutions that were not possible a few years ago. Many other companies across
industries such as health care, manufacturing, insurance and banking are following suit.
Selecting The Right Solution
As with many tasks, the first step is the most difficult. When thinking about getting
started in speech, the challenge is to understand the basic technology and the key
criteria to consider when reviewing the products of various vendors. The following pages
present an overview of eight key criteria that should be evaluated in a vendor selection
process. If you do your job right, you will find a solutions partner that will not only
deliver a finished application that your callers will love, but will also support you
through a development process that is easier than you ever thought possible.
Let's get started. What criteria should you assess, and what results should you demand?
State-Of-The-Art Accuracy. The best recognition engines on the market today
are recognizing simple commands (like numbers and yes/no responses) with more than 98
percent accuracy. Even complex and larger vocabularies are handled accurately more than 95
percent of the time in some real-world applications. Today's state-of-the-art engines
recognize speaker-independent, continuous speech with vocabularies of more than 50,000
words and understand the more than 1 billion ways people combine them. Leading vendors are
supporting their engines with teams of speech scientists who are continually incorporating
new techniques to improve recognition accuracy for today's over-the-phone applications.
What appears as a straightforward concept, however, may often be complicated to
explain. Many vendors test and report recognition accuracy using different approaches and
assumptions. Be sure to compare apples to apples and probe for details when making your
own comparisons.
In addition to accuracy ratings, vendors should also provide measures of
"transaction completion." This rating lets you know the percentage of callers
who successfully complete their transactions using the ASR interface, without transferring
to a customer service representative (CSR) or hanging up in frustration. Transaction
completion rates, which should reach levels of 98 percent, are a good measure of accuracy
and will reflect a sound application design and an effective user interface.
Natural Language Processing. Natural language processing (NLP) allows your
callers to speak complex commands in complete sentences or phrases. Demonstrated in
research labs for years, NLP is now being used in many real-world applications.
NLP should be used to enhance the user friendliness of the speech interface. For
example, an experienced travel customer should be able to say, "I want to fly from
Boston to San Francisco next Thursday morning." The most advanced speech systems
enable this capability.
On the other hand, if most users of a speech system are typically going to be
"newcomers" (such as in a package rate-finding application), then NLP may not be
appropriate at all, and a "directed dialog" approach (prompt/response,
prompt/response) may be most comfortable option.
When selecting a vendor, evaluate both its NLP technology and its user interface design
expertise, which will enable you to use NLP most effectively.
Proven Techniques For "Barge-In". Experienced callers will not
tolerate speech systems that force them to listen to prompts in full, or "wait for
the beep" before responding. Like IVR applications that let callers, "type
ahead," "barge-in" lets users interrupt the system whenever they are ready
and know what they want to do next.
Barge-in is a challenging technology to implement. Discuss this concept with your
vendors, and make sure your callers will be able to take advantage of this advanced
capability.
Advanced Development Tools. To facilitate and speed implementation, look for
well-designed, proven development tools that are available to you and/or your systems
integrator. "Building block" components allow developers to create applications
by linking objects, and setting parameters, with little or no coding required.
The most advanced toolkits are also integrated into the development environments of
other IVR vendors, including graphical user interfaces (GUIs), enabling speech services to
be built quickly and easily using the tools with which your developers are most familiar.
Be sure to query vendors about how extensively their tools are being used today by
their customers and partners, and how much code they really save developers from writing.
A Well-Designed And Documented User Interface Design Process. In the world of
over-the-telephone ASR, you will hear a great deal about the speech user interface -- far
more than you did in the days of touch-tone IVR. Consider this: with IVR, the end-user has
a universe of 12 options, 1-9, 0, # and --. If the caller makes an invalid entry, you can
easily reroute them to a basic menu structure.
Speech recognition is a more powerful interface, allowing callers to route themselves
from topic to topic more freely and speak multiple inputs in one sentence, such as
"buy 100 shares at the market price."
Because of this power and flexibility, the user interface must be carefully crafted.
Callers can say "anything" and your system needs to respond accurately,
effectively and pleasantly at all times. In addition, call center managers and vendors
have all learned from the past. Although touch-tone systems offered many benefits, they
were not generally hailed for the friendly experience they provided to callers. (What are
your "transfer to CSR" figures?) ASR gives us all an opportunity to do better --
to give your most important constituents the highest level of customer service
satisfaction.
The key to good user interface design is a proven development process including
research, design and user testing to match your application needs to your caller
population. In addition, look for tools to aid the testing and tuning process. Do the
vendors you are considering have human factors specialists on staff? Do they have in-house
capabilities for testing and prompt development? Most important, check their track records
of satisfied customers.
Can You Build A Prototype? Because speech recognition is a new technology,
many corporations are looking for low-risk ways to explore its capabilities. By building
an application prototype, companies can "hear" just how their own application
will sound and can obtain feedback from trial users.
By building application prototypes, you and your team can gain a lot of experience very
quickly about all aspects of the speech development process. In addition, you have a model
system to share with your management team. When you are ready to move forward, the
building blocks of a prototype can be rolled into a full-scale, deployed system.
Reliable, Scalable And Efficient Systems. When comparing speech vendors, you
and your technical advisors must evaluate the various system configuration approaches that
will be proposed. A central server or server farm approach, for example, may be vulnerable
to single points of failure. An "n+1" architecture, on the other hand, in which
the system is set up as a number of independent, identical units of processing power,
provides high reliability since even if a single unit fails, there are extra processors to
keep the system up and running.
Be sure to ask how speech suppliers plan to scale your system upwards as your market
expands. What additional costs will be involved? Again, some system setups require
complicated load-balancing across multiple systems and networks, whereas the n+1 approach
used in large-scale telecom services can provide infinite scalability.
Finally, as you look at cost structures, get a feel for vendors' speech processing
efficiency. The most efficient system configurations will clearly give you more "bang
for the buck," and it is important to understand the options available to you. The
"Moore's Law" trends in processing power are having a huge impact on the
viability of speech applications for today's call centers - and the amount of speech
recognition possible on single, open systems platforms. You should make sure you are
working with a provider whose software is designed to benefit from these trends.
The Availability Of Ongoing Support. Supporting a new system -- and constantly
improving it -- is an important part of the entire process. Unlike other applications your
call center may have implemented, speech recognition systems can be improved dramatically
through careful analysis of usage patterns and results.
Even if you have done internal and pilot group testing, the feedback from early users
"in the field" is critical. Ask vendors how they manage this process and find
out what tuning tools they have to track results and make improvements quickly and easily.
By exploring these eight criteria, you will soon understand the capabilities needed to
develop a speech application quickly and easily. Then, you can launch your own
speech-activated application, which your call center customers will enjoy using and will
return to happily, again and again.
Lauren Richman is director of marketing communications at Boston,
Massachusetts-based SpeechWorks (formerly Applied Language Technologies). SpeechWorks (www.speechworks.com) enables people to talk to
computers over the telephone in a natural way. The company's solution is based on research
conducted at and licensed from the Massachusetts Institute of Technology. SpeechWorks
simplifies and speeds the development of speech-activated services for information
delivery and e-commerce through its patent-pending DialogModules. The company also
provides system integration and support services to clients and distributes its solution
through a network of resellers and integrators worldwide.
|