TMCnet - World's Largest Communications and Technology Community




Publisher's Outlook
March/April 2001

Rich Tehrani

VoiceXML: The Hosted Response To IVR


Go Right To:  XML And VoiceXML Introduced

I am absolutely convinced that communications ASPs are going to be the biggest thing to ever happen to telephony. ASPs change the entire way companies develop and deploy applications. I have been waxing poetic about this group of ASPs for almost a year; and now more than ever, I am convinced we're going to see incredible growth in this area of communications.

VoiceXML Growth
One area to pay particular attention to is the VoiceXML market, which I would describe as open, programmable interactive voice response (IVR) with built-in speech recognition. This area is going to witness explosive growth for a number of reasons. First of all, corporations that use proprietary IVR systems are now able to leverage the great number of VoiceXML/XML developers available worldwide. The Internet has spawned these XML developers in record numbers. VoiceXML speech recognition allows anyone to develop reusable grammars that can be leveraged by multiple applications. This ability to reuse code written by others makes this language even more powerful.

The End User Pull
Ok, so we now know that VoiceXML is an open language allowing anyone with a Web browser to develop and share code. Why will anyone need new voice apps in a Web-enabled world? We all know that the Web has spoiled us, allowing anyone immediate and unlimited access to volumes of information on just about any topic. As we become busier and busier however, our need to retrieve information persists, even when we don't have access to a GUI-based computer system. VoiceXML applications such as voice portals allow users access to a great wealth of information using any telephone. Finally, there is a way to make much of the data found on the Internet available to telephone users. The need for these systems becomes even greater as cell phone minutes become less expensive and cell phones become more portable and ubiquitous. Finally, I have to say that there are many times I find it easier to access simple information like the day's weather by phone rather than PC.

The Corporate Push
Corporations also have a major incentive to add VoiceXML applications to their arsenals in order to provide another vital channel of service to their potential and current customers. Web content can easily be repurposed for the phone through effectively written VoiceXML programs. In many cases, this allows an entirely new audience access to content that was once only available online.

Customer Service Initiatives
It was predicted that the Internet would quickly kill telephone sales but in reality, the Web has dramatically increased the demand for telephone sales and support. In the January issue of CUSTOMER INTER@CTION Solutions, a sister publication to Communications ASP, I recently coined Tehrani's Law of Customer Service, which addresses this issue and states, "In the Internet era, it takes 100 times more money to attract a new customer than it does to keep an old one." But how do you provide excellent service when it is tough to find good people at a reasonable salary level? Effectively written VoiceXML programs allow corporations to reduce the amount of people staffing their tech support, inside sales, and customer service departments, making these departments more productive and more efficient. Not only does this boost the bottom line, it simultaneously improves service and support levels.

Saving Time
In my opinion, the best part of speech recognition-enabled VoiceXML apps is that customers are able to interrupt the prompts generated by the automated attendant. For example, if you call Amtrak to reserve a train, and the prompt takes 60 seconds to complete, you may interrupt at any time and say, "Give me two round-trip tickets from New York to Atlanta." This of course saves time and increases efficiency.

I recently visited with representatives of Tellme (, a CASP that allows you to develop and run VoiceXML apps on their server for free. Once written, finished applications can be paid for by consumers via per-minute fees. Their development environment is Web based, and they ran me through a few demos such as the standard "Hello World" demonstration that took only a few lines of code to implement.

I decided to go for it and write my own VoiceXML program. I've had a great deal of prior coding experience centered on database languages, Basic, and C, but none of my development ever involved the Web. I wanted to see how easy it would be for a relative beginner like myself to get started with VoiceXML.

At first I looked at the varied and well-documented sample programs provided at The applications provide enough samples to allow you to easily adapt them and build a very powerful application. I wondered how quickly I could adapt the logic of a few sample programs in order to write a truly useful application for the parent company of this publication, TMC? My intention was to present you with a useful sample program that is easy to follow and powerful in function. For those of you who are beginners, I hope this application helps you get your feet wet in the voice development market and for those experienced programmers out there, any suggestions are always appreciated.

The application I developed works as follows. At our upcoming trade show Communications Solutions EXPO, we have various conference tracks devoted to constituents like service providers, enterprise conferees, and government users. Within each track we cover the absolute hottest topics in telecom. For example, we have a unique Communications ASP track for every category of attendee so that we may address your individual needs perfectly (see our Communications Solutions EXPO Spring 2001 preview for more information). I decided to write an app that lets you say your type of company, at which point you will be told the name of the specific session that makes the most sense for you.

The full program can be seen in Figure 1. You may also wish to examine this code on the Web by going to [link will open in a new window]. Regardless of whether you are a programmer or not, the goal of this article is to familiarize you with how easy it is to develop powerful and useful VoiceXML applications.

With this background let's delve right into the program in Figure 1 and see what makes it tick. You'll notice that letter A points to a variable named initial_greeting that I define as equal to one. You'll notice that letter D references an If/Then statement that only plays the long greeting when initial_greeting is equal to one. Once played, letter E shows that initial_greeting is then set to zero and in the future, this If/Then statement will always play the shorter greeting shown in letter .

Letter B defines the grammar of this program, allowing it to ascertain what company type the caller is asking about. If the caller says the words "enterprise," "corporate," etc., the value of "one" will be returned. The DTMF code assigns numerals one through seven to each type of company, for access by voice or through the phone keypad. This value will be used in section J. The next line gets a bit more intricate. When two words are put in parentheses, they must both be said to be recognized. If one of the words is preceded by a question mark, then zero or more occurrences of that word may be uttered to be recognized, as well as the other word or words in parentheses. Please look closely at the third line in this section. If a caller says the phrase "call center," the word "call" will be recognized and then the word center will be recognized in the expression (?contact center). The caller could just as easily say "customer center," "marketing center," or any word followed by center and still be recognized as contact center.

I think it is important to point out the block-like nature of this language exemplified by letter C, where <prompt> begins the block, and letter G, where </prompt> ends the block. Continuing right along, the next block named <nomatch> is executed if a word or phrase is not understood. You can see in this block that users are told that they were not understood, at which point they are played the shorter introductory prompt. Reprompt is responsible for directing the program back up to the prompt portion of the program.

Letter H refers us to the area of the program that is executed when an utterance is recognized, and I refers to a result of "one," in which case letter J actually uses text-to-speech technology to utter the phrase "The most suitable enterprise course is titled Kicking Your Business Communications Into the Network Cloud" located in the <audio> block.

I invite you to try this application for yourself by dialing 877-933-8355, as it will give you an incredible taste of just how powerful a simple VoiceXML application can be. Using an environment like Tellme's or a competitor's like Voxeo ( allows anyone with Web access to develop and deliver sophisticated VoiceXML applications with no hardware investment at all. It should be noted that just about any Web content can be accessed and read using VoiceXML, allowing the development of an infinitely large number of useful applications.

This language is so simple to learn and the library of reusable code objects is growing so quickly that anyone can start a business based on these simple Web-based development environments. Here is a great application that I wish someone would write so I could access it as I drive home from work. Allow me to select a restaurant and then allow me to select the type of food I am in the mood for, and read me the menu choices available that contain chicken, fish, or whatever else I request. When I am ready to choose, allow me to order my food, at which point my order should be faxed or e-mailed to the restaurant and delivered to my house soon after I arrive, or made available for pickup. I think busy people would pay a small monthly fee for this service, and I am sure restaurants would be interested in increasing their business and thus participating in this program (maybe even paying a fee per order). And of course the national pizza delivery companies should trip over themselves for a chance to sponsor each call-in.

The opportunity is phenomenal for a reseller to use these tools in order to sell finished applications to corporate customers or even service providers. Best of all, these tools allow anyone with access to a phone and a browser to develop applications that can generate revenue. Now anyone can start a communications business with absolutely no equipment investment. If you have some powerful applications you've developed in VoiceXML and want to share your ideas with our readers, please send them to

[ Return To The March/April 2001 Table Of Contents ]

Figure 1: Sample VoiceXML Code. [Click here to see a full-screen version.]
Please note that "communications A Es Pea" is used throughout this code as the phonetic spelling of "communications ASP".


<?xml version="1.0" ?>
- <!--
Communications Solutions Expo, May 23-25, 2001 Course Example.
Targeted selection of Communications ASP courses.
- <vxml>
<meta name="scoping" content="new" />
[A]<var name="initial_greeting" expr="1" />
 - <form id="getcourse">
 - <field name="course_number">
 - <grammar>
 - <![CDATA[
[dtmf-0 zero 0 help] {<option "help">}
[dtmf-1 one enterprise corporate corporation] {<option "one">}
[dtmf-2 two call (contact ?center) (?contact center)] {<option "two">}
[dtmf-3 three developer (?systems integrator) consultant] {<option "three">}
[dtmf-4 four (?federal government) federal municipal authority fedcom] {<option "four">}
[dtmf-5 five service provider phone telco cable wireless ] {<option "five">}
[dtmf-6 six (?e commerce) (e sales) (e service) (e business)] {<option "six">}
[dtmf-7 seven (?market research) analysis statistics charts graphs] {<option "seven">}
[C]- <prompt>
- <if cond="initial_greeting ==1">
[D]<audio>Please say your company type so I may determine the targeted communications A ES Pea course for you at Communications Solutions Expo, May 23 through 25 2001 at the Washington Convention Center. If you are interested in market research or e commerce, please specify which. You do not need to wait for prompts to finish before speaking. Please say help or press zero for more information</audio>
[E]<assign name="initial_greeting" expr="0" />
<else />
[F]<audio>Please say your company type or help</audio>
- <nomatch>
<audio>Sorry, I didn't understand</audio>
<reprompt />
- <noinput>
<audio>Sorry, I didn't hear you</audio>
<reprompt />
- <help>
<audio>Please say the type of company you are to determine the targeted communications A ES Pea course for you. Examples include enterprise, contact center, developer, integrator, government, or service provider. If you are interested in market research or e commerce, please specify which. You may also say one through seven to hear each course name.</audio>
<reprompt />
- <default>
<reprompt />
[H]- <filled>
[I]- <result name="one">
[J]<audio>The most suitable enterprise course is titled Kicking Your Business Communications Into the Network Cloud!</audio>
<reprompt />
- <result name="two">
<audio>The most suitable contact center course is titled The Contact Center as Customer: Communications A ES Peas Answer the Call!</audio>
<reprompt />
- <result name="three">
<audio>The most suitable developer course is titled Building a Communications A ES Pea From Scratch</audio>
<reprompt />
- <result name="four">
<audio>The most suitable government course is titled The Absolute Service Provider: Government as Communications A ES Pea</audio>
<reprompt />
- <result name="five">
<audio>The most suitable service provider course is titled Adding Value to Traditional Offerings with Hosted Communications Services</audio>
<reprompt />
- <result name="six">
<audio>The most suitable e sales e service course is titled The Hosted Commerce Trend: From Sale to Delivery</audio>
<reprompt />
- <result name="seven">
<audio>The most suitable Comm, Trends course is titled The Market Opportunity for Communications A ES Peas</audio>
<reprompt />

XML And VoiceXML Introduced

In order to give you some background on XML and Voice XML, I decided to publish relevant portions of Tellmes documentation.

What is XML?
XML is the standard format for defining structured documents and data on the Web. XML enables programmers to define an arbitrary vocabulary, formally known as a schema, using a standard, well-defined, easily-parsed syntax. One XML schema might describe customer information, another might describe a mathematical equation, and yet another might describe a recipe for chocolate chip cookies.

What makes up an XML document?
An XML document is comprised of one or more named elements organized into a nested hierarchy. An element consists of an opening tag, some data, and a closing tag. A tag consists of a name preceded by a less-than symbol (<) and followed by a greater-than (>) symbol. For any given element, the name of the opening tag must match that of the closing tag. A closing tag is identical to an opening tag except that the less-than symbol is immediately followed by a forward-slash (/).

<welcome>Welcome to Tellme University</welcome>

If an element does not contain any data, the opening and closing tags can be combined. Observe the location of the forward slash just prior to the greater-than (>) symbol.


How does VoiceXML differ from XML?
VoiceXML is a derivative of XML that describes audio prompts as well as call flow.

[ Return To The March/April 2001 Table Of Contents ]

Technology Marketing Corporation

35 Nutmeg Drive Suite 340, Trumbull, Connecticut 06611 USA
Ph: 800-243-6002, 203-852-6800
Fx: 203-866-3326

General comments:
Comments about this site:


© 2018 Technology Marketing Corporation. All rights reserved | Privacy Policy