×

SUBSCRIBE TO TMCnet
TMCnet - World's Largest Communications and Technology Community

CHANNEL BY TOPICS


QUICK LINKS




 

[June 10, 2002]

Voice Technologies Complement Call Center Agents, Web Sites

BY MARK LEVINSON


In November 2001, Amtrak replaced their touch-tone train information system with a new, redesigned one using speech recognition (1-800-USA-RAIL). Within a month, exits from the automated system to live agents plunged from 75 percent to less than 30 percent. Voice technologies -- speech recognition, text-to-speech, and speaker verification -- are now mature enough to create a vital third mode of customer contact, equally as powerful as live agents and the Web. They have the potential to dramatically reduce the number of routine inquiries and transactions handled by agents and boost customer satisfaction by offering easy-to-use, always-available access from any landline or mobile phone.

Like other technologies, these voice solutions have unique capabilities, but also limitations. For which types of services and transactions will they yield the most benefits? How can you effectively integrate them with your live agent and Web environments? Before answering these questions, we have to think about how people interact with automated voice systems and how voice differs from live agents and the Web.

Voice Vs. Live Agents
Although many agents conduct routine, scripted exchanges with callers, they are always able to respond to an out-of-the-ordinary request. Automated voice systems, on the other hand, are limited to very structured dialogs. But the dialogs can be designed to replicate many transactions handled by agents. They can be faster and more convenient (no waiting on hold, you can interrupt to give your response without being impolite) if well-designed user interfaces make them easy to understand and use. The user interface is critical. It can make the difference between a satisfied caller and a slow burn. And it can form the basis for a consistent user experience and reliable, uniform branding messages because, unlike agents, it's the same every time.

Voice Vs. The Web
In contrast to the text and graphics of the Web, voice is a medium of sound and speech. It's perfect for applications where requests can be spoken in a few words, and information can be read back in chunks of no more than a few sentences at a time. Interactions are quite different than on the Web, but similar results can often be achieved: although a map can't be displayed with a voice system, driving directions work fine. In fact, voice systems can in some cases be better than Web browsers -- you can get those directions while you're driving, for example.

Choosing And Implementing Voice Transactions
So how do you choose which transactions are the most suitable (and profitable) for voice automation? Clearly, the first step is to look at transactions that will give the biggest bang for the buck. Which have the highest agent call volumes? Do you have touch-tone applications that are typically bypassed in favor of live agents (or would be if it were easy to do so)? Think also about those with the most Web hits. Can they be accomplished by phone, or might the phone be complementary to them? For example, travelers may make reservations via either the Web or phone, but once on the road, they'll most likely make changes by phone alone.

Next, evaluate how well each transaction can be automated with voice. Can it be structured into a conceptually simple prompt-and-response dialog? Can the requested information or content be played back in reasonable size chunks? A good voice user interface designer can be a big help; even surprisingly complex transactions can be structured to meet these goals. Many Web transactions can be implemented in voice versions, as can many agent-handled ones that aren't practical with touch-tone menus.

Then identify any potentially difficult speech recognition tasks. For example, speech recognition would seem to be ideal for capturing alphanumeric account IDs -- ones with both numerals and letters -- as letters can't easily be entered with touch-tones. But spoken letters "m" and "n" are hard to distinguish, as are all the "e" letters: "b", "c", "d", "e", "g", etc. These issues can usually be resolved, but you have to plan for it. In this example, speech recognition should work fine if letters in the account numbers can be constrained, either by determining from business rules where limits can be imposed (i.e. the eighth character is always either "d", "m", or "q") or an algorithmic rule (like a checksum) that can validate the recognized character string.

Finally, think about content and maintenance. The phrases the system must understand and the audio played back in response may depend on the content of the transaction. Will the content be updated frequently? If so, will the speech recognition vocabulary have to be updated as well? For example, applications offering movie listings will have to recognize current movie titles, whereas weather reports, although constantly changing, will always be requested using the same phrases. Vocabularies can usually be updated by loading the new words as text from a database. But it can be a bit tricky if the new vocabulary contains words or phrases for which the pronunciations aren't straightforward. The speech recognizer (and text-to-speech engine, if present) needs to know how words are pronounced. Normally, it gets pronunciations from its internal dictionary. But in some cases, hand-tuning or addition of alternate pronunciations is needed. For example, how do you pronounce "Walukiewicz" or part number 1061-40? The "correct" pronunciation isn't important -- what counts are the various ways callers would say it.

For content played back to the caller, how often will it be updated? Can it be pre-recorded or composed of concatenated recordings? An account balance is usually played as a sequence of recordings: "Your balance is�", "two", "hundred", "dollars", "and", "thirty", "cents". These recordings rarely, if ever, need to be changed. But, as with our example of movie listings, some content may have to be recorded on a regular basis, entailing time and expense. And you want to maintain an ongoing relationship with the voice talent who makes the recordings so the voice is consistent; changes of voice within a transaction can be confusing and sound unprofessional.

For even more varied or frequently updated information, recordings may not be practical. Here, text-to-speech is an option. As the name implies, text-to-speech engines synthesize speech from text. Virtually no maintenance is needed. Although great advances have been made in naturalness of synthesized speech, most text-to-speech products retain an artificial quality. And if the text-to-speech voice is different from the one used in other recordings, switching between the two can again be confusing.

Complementing, Not Replacing, Agents And The Web
Now that you've identified some transactions for voice automation, how can they best be integrated with your other customer contact methods? Customers' comfort levels are raised as systems become more familiar, so you want voice applications to have as much as possible in common with other customer interactions via agents and the Web. A starting point is to mimic the way those transactions are handled by live agents. And, to the greatest extent possible, offer the same transactions as your Web site, use the same terminology, and require the same passwords.

A more subtle point is that voice applications, through their tone of voice, pace, and sound effects, have the power to present a company identity and reinforce brand images. So it's important to craft a "sound and feel" that imparts the desired marketing messages and is consistent with the "look and feel" of the Web site.

Conclusion
Where do the new voice technologies fit? Right up there with agents and the Web. Voice-driven systems are a powerful way of gaining operational efficiencies while providing new contact options for the customer and developing a new marketing and branding channel for your company.

Mark Levinson is principal of VoxMedia Consulting. VoxMedia offers business and technology consulting, focusing on voice technologies for call centers, interactive voice response (IVR), voice portals, automated assistants, embedded devices, and voice-over-IP, PBX, and telco platforms.







Technology Marketing Corporation

2 Trap Falls Road Suite 106, Shelton, CT 06484 USA
Ph: +1-203-852-6800, 800-243-6002

General comments: [email protected].
Comments about this site: [email protected].

STAY CURRENT YOUR WAY

© 2024 Technology Marketing Corporation. All rights reserved | Privacy Policy