In November 2001, Amtrak replaced their touch-tone train information
system with a new, redesigned one using speech recognition
(1-800-USA-RAIL). Within a month, exits from the automated system to live
agents plunged from 75 percent to less than 30 percent. Voice technologies
-- speech recognition, text-to-speech, and speaker verification -- are now
mature enough to create a vital third mode of customer contact, equally as
powerful as live agents and the Web. They have the potential to
dramatically reduce the number of routine inquiries and transactions
handled by agents and boost customer satisfaction by offering easy-to-use,
always-available access from any landline or mobile phone.
Like other technologies, these voice solutions have unique
capabilities, but also limitations. For which types of services and
transactions will they yield the most benefits? How can you effectively
integrate them with your live agent and Web environments? Before answering
these questions, we have to think about how people interact with automated
voice systems and how voice differs from live agents and the Web.
Voice Vs. Live Agents
Although many agents conduct routine, scripted exchanges with callers,
they are always able to respond to an out-of-the-ordinary request.
Automated voice systems, on the other hand, are limited to very structured
dialogs. But the dialogs can be designed to replicate many transactions
handled by agents. They can be faster and more convenient (no waiting on
hold, you can interrupt to give your response without being impolite) if
well-designed user interfaces make them easy to understand and use. The
user interface is critical. It can make the difference between a satisfied
caller and a slow burn. And it can form the basis for a consistent user
experience and reliable, uniform branding messages because, unlike agents,
it's the same every time.
Voice Vs. The Web
In contrast to the text and graphics of the Web, voice is a medium of
sound and speech. It's perfect for applications where requests can be
spoken in a few words, and information can be read back in chunks of no
more than a few sentences at a time. Interactions are quite different than
on the Web, but similar results can often be achieved: although a map
can't be displayed with a voice system, driving directions work fine. In
fact, voice systems can in some cases be better than Web browsers -- you
can get those directions while you're driving, for example.
Choosing And Implementing Voice Transactions
So how do you choose which transactions are the most suitable (and
profitable) for voice automation? Clearly, the first step is to look at
transactions that will give the biggest bang for the buck. Which have the
highest agent call volumes? Do you have touch-tone applications that are
typically bypassed in favor of live agents (or would be if it were easy to
do so)? Think also about those with the most Web hits. Can they be
accomplished by phone, or might the phone be complementary to them? For
example, travelers may make reservations via either the Web or phone, but
once on the road, they'll most likely make changes by phone alone.
Next, evaluate how well each transaction can be automated with voice.
Can it be structured into a conceptually simple prompt-and-response
dialog? Can the requested information or content be played back in
reasonable size chunks? A good voice user interface designer can be a big
help; even surprisingly complex transactions can be structured to meet
these goals. Many Web transactions can be implemented in voice versions,
as can many agent-handled ones that aren't practical with touch-tone
menus.
Then identify any potentially difficult speech recognition tasks. For
example, speech recognition would seem to be ideal for capturing
alphanumeric account IDs -- ones with both numerals and letters -- as
letters can't easily be entered with touch-tones. But spoken letters
"m" and "n" are hard to distinguish, as are all the
"e" letters: "b", "c", "d",
"e", "g", etc. These issues can usually be resolved,
but you have to plan for it. In this example, speech recognition should
work fine if letters in the account numbers can be constrained, either by
determining from business rules where limits can be imposed (i.e. the
eighth character is always either "d", "m", or
"q") or an algorithmic rule (like a checksum) that can validate
the recognized character string.
Finally, think about content and maintenance. The phrases the system
must understand and the audio played back in response may depend on the
content of the transaction. Will the content be updated frequently? If so,
will the speech recognition vocabulary have to be updated as well? For
example, applications offering movie listings will have to recognize
current movie titles, whereas weather reports, although constantly
changing, will always be requested using the same phrases. Vocabularies
can usually be updated by loading the new words as text from a database.
But it can be a bit tricky if the new vocabulary contains words or phrases
for which the pronunciations aren't straightforward. The speech recognizer
(and text-to-speech engine, if present) needs to know how words are
pronounced. Normally, it gets pronunciations from its internal dictionary.
But in some cases, hand-tuning or addition of alternate pronunciations is
needed. For example, how do you pronounce "Walukiewicz" or part
number 1061-40? The "correct" pronunciation isn't important --
what counts are the various ways callers would say it.
For content played back to the caller, how often will it be updated?
Can it be pre-recorded or composed of concatenated recordings? An account
balance is usually played as a sequence of recordings: "Your balance
is�", "two", "hundred", "dollars",
"and", "thirty", "cents". These recordings
rarely, if ever, need to be changed. But, as with our example of movie
listings, some content may have to be recorded on a regular basis,
entailing time and expense. And you want to maintain an ongoing
relationship with the voice talent who makes the recordings so the voice
is consistent; changes of voice within a transaction can be confusing and
sound unprofessional.
For even more varied or frequently updated information, recordings may
not be practical. Here, text-to-speech is an option. As the name implies,
text-to-speech engines synthesize speech from text. Virtually no
maintenance is needed. Although great advances have been made in
naturalness of synthesized speech, most text-to-speech products retain an
artificial quality. And if the text-to-speech voice is different from the
one used in other recordings, switching between the two can again be
confusing.
Complementing, Not Replacing, Agents And The Web
Now that you've identified some transactions for voice automation, how can
they best be integrated with your other customer contact methods?
Customers' comfort levels are raised as systems become more familiar, so
you want voice applications to have as much as possible in common with
other customer interactions via agents and the Web. A starting point is to
mimic the way those transactions are handled by live agents. And, to the
greatest extent possible, offer the same transactions as your Web site,
use the same terminology, and require the same passwords.
A more subtle point is that voice applications, through their tone of
voice, pace, and sound effects, have the power to present a company
identity and reinforce brand images. So it's important to craft a
"sound and feel" that imparts the desired marketing messages and
is consistent with the "look and feel" of the Web site.
Conclusion
Where do the new voice technologies fit? Right up there with agents and
the Web. Voice-driven systems are a powerful way of gaining operational
efficiencies while providing new contact options for the customer and
developing a new marketing and branding channel for your company.
Mark Levinson is principal of VoxMedia
Consulting. VoxMedia offers business and technology consulting,
focusing on voice technologies for call centers, interactive voice
response (IVR), voice portals, automated assistants, embedded devices, and
voice-over-IP, PBX, and telco platforms.
|