TMCnet - World's Largest Communications and Technology Community




callcttechnology.gif (2256 bytes)
May 1999

Advanced Speech Recognition: New Self-Service Options For Call Centers


Advanced speech recognition (ASR) is transforming the way callers interact with self-service interactive voice response (IVR) systems. According to GartnerGroup, within four years, 30 percent of all new IVR installations will employ ASR technology. Deployment of self-service platforms of all kinds is growing faster than ever before, and speech recognition is opening doors for applications that until now could only be accomplished through live agents.

IVR has come a long way over the past 20 years and has overcome countless technological obstacles. One of the most difficult obstacles to master, however, was not technological in nature -- it was convincing top management that customers would not be insulted by reaching a prerecorded voice rather than a call center agent. Clearly, self-service is here to stay; not only in the world of call centers, but nearly everywhere else as well:

  • Automated teller machines have become the cash withdrawal method of choice. Now you can find ATMs virtually everywhere people use cash. Soon, smart cash cards and PC cash card readers (home ATMs) will begin replacing the old paper-based technology, offering even more self-service options.
  • Automated gas pumps are now handling nearly as many retail gasoline purchases as cashiers. The new, automated gas kiosks feature full video displays and allow gas purchasers to watch CNN or ESPN while pumping gas and make a wide variety of purchases, including lottery tickets.
  • The Internet and electronic commerce are threatening retail stores in every business area, from books and CDs to automobiles and airline tickets. With choices like and, consumers are finding better value and service through online self-service options. Soon, consumers with an aversion to PCs will be able to take advantage of Internet-based browser phones. These phones will offer a combination of voice and screen prompting with touch-screen and soft key input, making e-commerce accessible to more people.

The old concerns that self-service options eliminate the personal touch have diminished in light of demands for new, more powerful and flexible self-service choices. IVR systems using touch-tone voice menus can typically offload 20 to 30 percent of the live agent calls in a call center. Until recently, it was assumed that the remaining 70 to 80 percent of the call volume was best handled by agents. Now, advanced speech recognition provides the option of offloading an additional 20 to 30 percent of a call center's telephone traffic.

Methods Of ASR User Interaction
Traditional IVR user interaction of pressing the keys on a touch-tone phone is relatively straightforward in terms of application design. The basic rules include limiting choices on a given voice menu to four, and always allowing the user to press zero to reach an agent. Speech recognition application development is more complex and can be divided into three primary categories in terms of user interaction:

  • Touch-tone replacement,
  • Forms fill-in , and
  • Natural language.

In essence, IVR speech recognition applications can be compared to the quizzes we have all taken in school. Touch-tone replacement apps are similar to true/false and multiple choice quizzes, forms fill-in apps are like fill-in-the-blank quizzes and natural language apps are like short-answer or essay quizzes.

Touch-Tone Replacement
These speech recognition applications have been around for years and the technology does not qualify as advanced speech recognition. Touch-tone replacement apps are handy when the caller has a rotary phone (a collector's item in the U.S.). Touch-tone replacement apps are generally scripted exactly like touch-tone IVR applications ("press or say one"). These applications can be compared to multiple-choice and true/false quizzes. The caller wants something and the IVR makes him or her answer with the correct choice from a short menu. If you currently use IVR and are looking for ways to offload more calls from your agents, you need to look to the two more advanced methods of user interaction.

In school, we all liked true/false and multiple-choice questions because it was easier to guess at the right answer. If there are hundreds of possible choices, multiple choice is far too cumbersome to use. This method can be made to work by asking the caller to wade through a decision tree of menu choices, but generally the resulting application is cumbersome and time-consuming for the caller.

Forms Fill-In
This method of user interaction offers the most cost-effective solution for IVR applications where true/false and multiple-choice methods are inappropriate. In school, fill-in-the-blank quizzes require the student to know the answer. They are also easier for the teacher to grade than short-sentence or essay quizzes because they are more objective and specific. Think of the ASR-based IVR system as a teacher grading a quiz. The amount of CPU and speech recognition processing resources required to handle a forms fill-in application is far less than for a natural language application. The complexity in writing and perfecting a forms fill-in application is also far less than for natural language.

Forms fill-in applications are the most common types of ASR apps being deployed in call centers today. They are relatively straightforward and economical to develop and can provide a means for handling large volumes of calls that currently require agent interaction. The applications are scripted so that callers are clearly prompted to "fill in the blanks" with specific information. The more specific the prompting, the less complex the speech recognition task will be. ASR software can recognize words and understand grammatical syntax. If the application is written so that the need for grammar recognition is eliminated or minimized, the result is a true forms fill-in application.

The ability for voice response systems to reliably recognize spoken letters and numbers provides a tremendous opportunity for self-service applications. A large percentage of catalog, product I.D. shipment tracking, serial and part numbers contain both letters and numbers and have not been easy to automate in the past. Certain automated attendant applications are workable when the number of employees is small. Callers can enter the first three or four letters of a last name by pressing the associated number keys on the telephone. Although three letters are printed on each key, it provides enough input to narrow a list of a few hundred employees to three or four. This method is unworkable, however, when a specific alphanumeric combination, such as a part number, must be entered. Callers do not appreciate being required to enter three keystrokes per letter, and IVR apps that require that kind of user interaction generally get bypassed for the "press zero for a CSR" option.

Alphanumeric speech recognition applications are relatively simple forms fill-in applications for today's ASR-based IVR platforms. The IVR can prompt, "please speak the number of the catalog item you would like to order." Then, when the customer fills in the blank, the IVR performs a database lookup and can speak back the name of the catalog item. "You have selected the Premier mahogany desk clock. Is this correct?"

Because ASR systems can recognize large vocabularies, forms fill-in apps that prompt callers for specific words can allow callers to perform a wide range of self-service transactions that would be nearly impossible using the true/false or multiple-choice methods. For example, with an ASR forms fill-in app, travelers making airline reservations receive prompts to speak the departure and destination cities, dates and times. When the caller is directed to speak the name of the departure city, any of several hundred names will be recognized, but the app does not attempt to recognize other information that may be needed later. Each directed fill-in-the-blanks question is an independent application segment with limited recognition and processing requirements.

Natural Language
Natural language applications are similar to short-answer or essay quizzes. The questions asked by the IVR/ASR system are broader than forms fill-in, and recognition requires a greater degree of subjectivity. Poorly written natural language apps leave callers frustrated because they hear frequent "I don't understand" responses from the system. Well-written natural language applications provide agent-like dialog and are very caller-friendly. They are also complex to develop, difficult to fine-tune and expensive to deploy.

Using the travel example, a natural language short-answer application might ask the caller to speak the departure city, date and time as a single answer. In order to process the speech, the ASR system must understand enough grammar to recognize the key elements from the response, "I'd like to depart from Atlanta around 4:00 p.m. on June 23rd." Since people structure sentences differently, all possible variations (proper and poor grammar) must be recognized. The IVR may recognize the city name and date, but not the time, so the caller might hear a request for clarification, such as, "What time on June 23rd would you like to depart from Atlanta?"

Taking natural language further, applications can be set up with very broad questions such as, "What would you like to do?" This type of app is similar to an essay quiz, and just as teachers must spend more time grading essay quizzes, these ASR apps require the largest amount of processing resources and development time. The application should allow the caller to speak in any order he or she chooses. For example, in the travel scenario, the caller may say, "I'd like to book a flight to New York from Atlanta... let's see, I guess to LaGuardia next Saturday. I need to get there by four in the afternoon and then return the following morning as early as possible." The more open-ended the natural language questions are, the more varied the answers will be, and the higher the likelihood that the app will have to reel the caller in with a response like, "You would like to fly from Atlanta to LaGuardia on Saturday, June 24th. Is this correct?" (The IVR then waits for a response.) "What time on Saturday would you like to leave Atlanta?"

Essentially, the natural language application trains the frequent caller to speak in complete sentences that are understood with minimal prompting. The system responds to new callers with specific questions that guide them toward the appropriate responses, but the best apps do this in a natural way that feels much like a conversation with an agent. In some cases, open-ended essay-type natural language applications are the most effective way to provide ASR-based self-service, but implementers must be prepared to supply the needed processing power as well as development, debugging and fine-tuning efforts in order to do it correctly.

How Should Your Call Center Deploy ASR Technology?
If it is your first ASR application, make it a forms fill-in app. You will be successful more quickly for a smaller investment with simpler applications, and you can always progress to natural language in a second phase after you experience your first level of success. By taking the simpler approach, the price tag is lower, management buy-in will be easier and you will have the fastest possible return on investment. Callers want more self-service options -- and advanced speech recognition technology is proven and ready for deployment in your call center now.

Roger Reece is Syntellect's corporate vice president of marketing and is responsible for strategic planning, product planning, new business development and corporate communications. He has 25 years of experience in voice and data networks, software development and call center technology. Formerly, Reece was vice president of marketing for Telecorp Systems for five years prior to the company's 1996 merger with Syntellect. Prior positions include director of marketing for Melita International, director of product marketing for Syntrex Corporation and director of product development for the Lanier Worldwide division of Harris Corporation.

Technology Marketing Corporation

2 Trap Falls Road Suite 106, Shelton, CT 06484 USA
Ph: +1-203-852-6800, 800-243-6002

General comments:
Comments about this site:


© 2020 Technology Marketing Corporation. All rights reserved | Privacy Policy