TMCnet - World's Largest Communications and Technology Community




[March 15, 2005]

Setting Up a Natural Speech Application

By Peter F. Theis, President, Conversational Voice Technologies Corporation

Setting up a natural speech technology (NST) application involves skills in human engineering, a unique computer technology and a specialized field of metrics. Most importantly, it is an art coupled with hands-on experience.

An NST program, once successfully developed, is so good it can be used as the pattern script for agents at a live facility. For those interested in thinking outside the box, performance metrics for NST systems employing that script are sufficiently consistent and similar to those of a live call center that they can even be used as a performance standard for the live center.


The natural speech technology script is derived from what a call center agent says handling similar calls. Natural speech scripts, not being variants of scripts for other automated speech technologies, can neither be used by IVR and word recognition systems nor imported from them. Migrations between other automated speech alternatives and either NST or live are simply not doable, although many have tried.

Natural speech scripts emulate how the best agent would handle a call, after having handled fifty other calls for the same client on the same day. We know that an agent at a live call center, as the day goes by, will deviate from what is appearing on the agent prompt screen, using, instead, expressions the agent is more comfortable saying.

This is not simply the personal preference of the agent, as some would suggest. Rather, it is that the clear meaning of the words dictated by a written or screen script may not be what is being communicated to the other party.

For example, we all know what is intended by the words “Your call is important to us”. When people hear that message on the phone, the communication instead is a negative “Your call means little to us” and manifests insincerity. The communication is the opposite of the clear meaning of the words (which is why call center agents do not say it).

Each prompt, whether from a machine or live, “tells” callers something. “Communications,” in contrast, is what is communicated to the caller by the prompt. The two expressions “tell” and “communicate” are not synonymous. What is communicated subsumes the actual words spoken. This distinction is often very subtle. The agent’s spontaneous revisions of the script reflect a subconscious communication to correct the inadequacies of the scripted words.

ConServIT’s natural speech technology (NST) focuses only on what is communicated, whereas other voice technologies focus principally on what is told to callers. That difference is pervasive and overriding in the implementation of an NST script.

Failing to recognize this significant difference between NST and other automated voice systems, the client will often dictate all or a portion of the script, making changes in the verbiage under the fallacious assumption that callers share the client’s preferences or understandings of what they are being told.

For example, ConServIT might propose, as part of a draft script:

And the ZIP Code?

The client might demand that the clause be changed as shown below, believing the script revision would be more polite, friendlier for the caller and clearer.

May I have the ZIP code please?

Or worse,

Please give me (or Please state) your ZIP code.

Because of the negative communication to the caller associated with the client permutation, the likelihood of hang-ups, errors and caller frustration increases. This differential can be cumulative over several prompts. Differentiating what is communicated as a concept distinctly separate from what is told is a critical element for a successful NST program.

When a client gives us no choice but to accept their script revisions, we have lost the ability to maximize the caller service level as a primary objective, and minimize expense as a secondary goal. The client focus has changed from reaching a high level measurable objective, to a non-descript subjective standard that undermines both objectives. The application implementation has become client centric, rather than caller centric.

The seriousness of this dichotomy is aggravated by the high “yield” expected from an NST program (the “yield” is the percentage of the calls received that are successfully completed). When the yield is already very high, it is difficult to increase it further, and easy to cause the yield to plummet. A good NST program at ConServIT’s call center will experience a 90+% yield (higher than a live center). What may seem to a client as a token script change might increase the yield a couple percent for the reason given by the client (such as clarity), but then reduce the yield a much larger amount for another reason (such as being patronizing), for a significant net loss. There is almost always a trade off.

The Voice Recording

ConServIT’s voice programs generally use a generic voice. At live call centers, the caller could be served by the next-door neighbor, your child’s college roommate, or someone in India about whom the caller knows nothing. ConServIT’s voices could be the telephone receptionist in the office next door. ConServIT’s selection would be based on the voice that produces the best results. Those are the people the real world expects to answer their calls.

Not suitable for NST applications, where results are paramount, are the “persona” voices, the “professional” voices used for the client’s other IVR applications, or voices that the client’s advertising agency promotes. The objectives of the NST voice and the IVR voice are divergent, the former being caller centric and results oriented, and the latter being client centric and image oriented.

Metrics, Metrics and Metrics

After a new program has been implemented, ConServIT measures how callers respond. Changes and corrections are made to the program based on test metrics. Testing is an arduous, time-consuming high-level chore. All the confidence and the best of technology are worthless if the whiz-bang does not work with real world, bona fide callers - the target audience. That audience for NST is the real world caller, not the client staff.

This simple formula is far from being obvious. It is the opposite of the generic IVR/word recognition approach, which generally employs small test panels to opine on improvements to be made. But the problem with this approach is that everyone, the panelists, the people that are designing the program and the technicians and client staff are outside the statistical body of people that will be using the system. The two sets are mutually exclusive!

A real world caller to a call center has no preparation, has no preconditioning, and only wants a service or problem resolved. That caller is not calling to find out how progressive the company is, to hear about expanded services (the caller’s ears are turned off), has minimal preparation for the call, if any, and is not calling to please anyone or be on the next panel. This real world is what ConServIT’s natural speech technology is all about and what it deals with on a daily basis in its call center.

Perhaps the characteristic that particularly distinguishes ConServIT’s natural speech technology from generic voice systems (and live call centers as well, incidentally) is that NST programs are designed to meet measurable objectives. To accomplish these objectives, NST metrics focus on the unsuccessful calls, not the successful ones, to make rational program adjustments to measurably improve the yield.

To know details about the callers that successfully completed their calls is largely irrelevant for NST. NST needs to know about the calls that were NOT successfully completed.

Obtaining those metrics is a science in itself. The metrics cannot be obtained through panel research. They cannot be developed by calling a sampling of callers whose calls were incomplete or transferring to a live agent a caller that has just hung up for some unknown reason.

When putting an NST program on line, ConServIT asks its clients to identify each of their “test” calls to our systems so they can be distinguished in our statistics from real world calls. Although a few unidentified test calls are not problematic if they are joined by thousands of real world calls. They are a significant statistical hurdle if there are only a couple dozen calls coming from the real world, as often happens when a program is just beginning to roll out, and when initial program changes are being made.

One client, just before a program rolled out, made several unidentified test calls (many of which we recognized and isolated) and then even accused us of trying to manipulate their results by our insistence that they identify their “test” calls. In fact, the client was unintentionally manipulating our statistics by not identifying their test calls.

Preparation of the Call Information for the Client

To reduce the expense and investment for our client, ConServIT formats all the caller information, in virtually any format and field order, and transmits it as ASCII digital text ready to be merged into its client’s database upon receipt. It is that easy and inexpensive for the client.

I appreciate your comments and thoughts, and encourage you to call or email me at or 1-800-994-4400.




Purchase reprints of this article by calling (800) 290-5460 or buy them directly online at

Respond to this article in our forums!

Technology Marketing Corporation

35 Nutmeg Drive Suite 340, Trumbull, Connecticut 06611 USA
Ph: 800-243-6002, 203-852-6800
Fx: 203-866-3326

General comments:
Comments about this site:


© 2019 Technology Marketing Corporation. All rights reserved | Privacy Policy