TMCnet - World's Largest Communications and Technology Community




[April 12, 2005]

The Secret of Natural Speech Technology: Keep it Simple.

Peter F. Theis, ConServIT

An observation frequently expressed about a call handled by natural speech technology (NST) is “It was so easy. So simple. What’s the big deal?”

Simplicity is the Big Deal. An NST handled call is so remarkable because it is so un-remarkable. Amazing!

An NST handled call is no more noteworthy than a routine live call. No one extols the virtues of an operator handling a routine call routinely. It doesn’t happen! Instead, a routine call becomes remarkable when the caller has a negative experience.

The simplicity and ease with which callers are serviced using NST is a tremendous deal – requiring a leap in technology and common sense in scripting. For those into PCs, it is as much a big deal as going from a DOS based system to Microsoft’s Windows, with its icon based user interface.

Simplicity. That is THE secret of ConServIT’s NST. But simplicity doesn’t mean that an NST program can’t be incredibly complex and lengthy when looking at the call progress, branching and programming. Nor does it mean that thought provoking, open-ended questions can’t be asked. But what it does mean is that, to the caller, the call progression is simple, obvious, and intuitive, like falling off a log. A successful NST program has to be caller centric, first and foremost.

But that simplicity can often be difficult and complex to attain. Anyone can create a complex program that will boggle a rocket scientist’s mind. But to have a call that is simple for a universal audience is a significant accomplishment.

This apparent simplicity for the caller of ConServIT’s NST results in as high or higher a yield (the percentage of complete records to the number of calls answered), greater accuracy, and lower cost than any other alternative. Calls are much briefer in duration as well.

How does NST Make a Call Seem So Simple?

Explaining in these few pages why something that is inherently complex appears so simple and elegant is difficult and perhaps impossible. With that caveat, hopefully, my best efforts will at least provide some insight.

1. Program Objectives

A particularly difficult aspect of an NST program is for the sponsoring organization to decide its objectives and adhere to them. An NST program is objective oriented, and typically objectives are interdependent. They can be changed from time to time. However, if the objectives become moving targets due to frequent changes, simplicity and the broader objectives become victims.

2. Script Layout and Call Progress

Call progress, the sequence of prompts, for an NST handled call is distinctly different from the call progress using any other automated technology. NST call progress generally follows what the caller would anticipate, not what the sponsor wants or how the technology can be most efficiently implemented. An NST program script is not interchangeable with scripts using other automated technologies. It is sufficiently different that porting an NST script to an ASR/IVR system won’t work.

3. Phrases, Wording, and Ambiguity

An NST script is caller centric, whereas, for alternative technologies, it is client and, more particularly, technology centric. The scripts for those systems are much more constrained, having to work around the requirements and limitations of their embedded word recognition technology. NST uses a fundamentally different technological approach.

Because an NST program is designed to be successful with the totality of the callers, the words used in the script must be universally understood. The expression “What is your destination?” will not work because many people don’t know what “destination” means. Rather the question is worded “Where are you going?” Third grade English is best.

In any automated script, there can be no tolerance for scripting ambiguity. If he caller responds, “What did you mean” the call could be lost. Ambiguity includes the perceptions of the caller, outside the specific words used in the script, based on the caller’s experiences and prejudices. One of the reasons that a perfected NST script can be the model for a live script is because all ambiguities and glitches have been recognized and resolved.

4. The NST Technological Difference

In the real world, there is no absolute certainty about the speech characteristics of any caller, the specific utterances that the caller will use, or the quality of the telephone connection. NST is inherently fault tolerant, unlike other technologies that require digital precision in recognizing a particular word or phrase. An agent, as does NST, has the flexibility to adjust for these shortcomings. Other voice automation systems, on the other hand, are inherently less flexible and forgiving.

The alternative systems compensate for their inflexibility by telling the caller how to respond, and by repeating back the answer as understood by the automated technology. Often their scripted prompts become painfully convoluted to compensate for the mismatch between the caller’s thought process and the logic process of the machine. The effort to match the two processes is one of the reasons an ASR/IVR program can be so expensive to implement.

Comparative Illustrations

I have selected a couple of familiar examples of prompts to highlight significant differences between the NST approach and the approach of alternative voice recognition technologies (IVR and ASR).

a. Frequently, imbedded in the greeting for voice mail systems is “After you have left your message, please hang up or push 1 for further options”. Do callers really need to be instructed to hang up after leaving a message? This prompt is very patronizing.

“Push 1 for further options” is placed very much outside of a rational position. These options regard a message after it has been left, such as adding to a message, erasing a message, hearing it again, etc. Logically, before a caller has even started a message, there is little relevance and the tag line is a distraction. It is after the message that the option alternative should be offered. So why aren’t these options expressed after a message has been completed?

The Press 1 option is used by only a very small percent of the callers (reportedly less than 5%). Yet, the additional five seconds to repeat this tag line, seemingly a lifetime to the typical impatient caller, has to be listened to by every caller on every call. It is an unnecessary, unappreciated and annoying imposition.

With NST, callers would only receive the prompt after messages had been left, making the call flow easier and simpler for everyone.

b. The unnecessarily complex prompt “If you want such and such, push or say 1” requires callers make two and possibly three independent decisions in response. One decision is whether to select the particular issue represented by the “1”. That, unto itself, is generally a difficult decision (The caller might be considering “Is that the best choice or should I wait to hear the next option”). The second decision is whether to respond using touch tone or word recognition. A third decision might well be “Why are they giving me this choice? Is there something wrong with one of these options that could lead me into automation hell?”

The prompt could be “If you want such and such, say one”. If the caller should push “1”, even though not specifically suggested as an option, the touch tone signal could still be recognized. Or the prompt could just say, “Are you calling about such and such?”

Telling callers to respond to confusing, complex, compound questions is definitionally the antithesis of simplicity. With NST, multiple decisions are seldom required from a single prompt, and when they are, it is only when the prompt avoids any ambiguity. The decision must be easy for the caller. An example of an NST multiple decision prompt with a single easy decision could be “What is your relationship to the person you are calling about?” The first decision in this example would be whether the caller is calling for himself/herself, and the second is, if not, what is the relationship.

NST - Packaged with a Bright Red Ribbon

The call progress and scripting are inextricably linked to the call handling software and technology of the equipment employed. The linkage may be more interrelated than even a computer program is with its operating system - a Mac OS program is only for the Mac and Windows XP is only for the PC.

About ConServIT

ConServIT is packaged to offer a complete service – one stop shopping. It has the experience in designing and scripting programs to match its client’s objectives, using technology about which it is the unchallenged expert. Call me at 1-800-994-4400 or email me at if you have questions or would like to discuss this further.

Peter F. Theis
a service of Conversational Voice Technologies Corporation



Purchase reprints of this article by calling (800) 290-5460 or buy them directly online at

Respond to this article in our forums!

Technology Marketing Corporation

35 Nutmeg Drive Suite 340, Trumbull, Connecticut 06611 USA
Ph: 800-243-6002, 203-852-6800
Fx: 203-866-3326

General comments:
Comments about this site:


© 2019 Technology Marketing Corporation. All rights reserved | Privacy Policy