SUBSCRIBE TO TMCnet
TMCnet - World's Largest Communications and Technology Community
 
tmclabs.gif (5633 bytes)

October 1998


Nuance Conversational Transactions Suite

Nuance Communications
1380 Willow Road
Menlo Park, CA 94025
Ph: 650-847-0000

Fx: 650-847-7979
Web site: www.nuance.com

Price: Contact vendor.

ctied98.gif

RATINGS (0-5)
Installation: 5
Documentation: 5
Features: 4.5
GUI: N/A
Overall: A


While they were intended as conveniences, early speech recognition applications occasioned more than a little frustration. Indeed, disgruntled users found these application nearly as balky as ordinary, non-speech-enabled auto-attendants. The reason? Auto-attendant, speech-enabled or not, puts all the convenience on the computer side of any human-computer interaction.

Inputting information through a menu-driven process creates a sensation much like the one we feel when we sit behind the wheel of an idling car, while we wait for a break in traffic. We fancy the traffic is inspired by some perverse agency bent on vexing us, and we resign ourselves to fitting in however we can, as stressful as that may be.

Fortunately, the old, "fit in" model of speech recognition is giving way to a new approach: natural language recognition. This new approach lets the caller "own the road," as it were. The caller speaks naturally, and the speech recognition application parses and channels the input on behalf of the computer. The computer no longer requires the caller to parse his or her utterances according to some infernal script or menu.

Now, if the new speech recognition lets callers own the road, the question remains: Who is going to build the road? Developers who take advantage of advanced speech recognition engines. One such engine, the Nuance Speech Recognition System, is discussed in this review. So, buckle your seatbelts, and let's get started.

INSTALLATION
The Nuance system, which ships on a single CD, installs on selected platforms, including Windows NT, Solaris, AIX, and SCO OpenServer 5. We installed the product on a Pentium 200 MMX computer with 64 Megs of Ram running Windows NT.

The installation involved little more than inserting the CD in the computer's CD-ROM drive. Once we had the CD in place, it ran automatically, so we did not have to execute a setup file. From this point, right until the very end of the installation, the entire operation was self-propelled. The system never once asked for a change in directories or for help in finding any specific Windows files. This ease and rapidity compelled us to give the Nuance system the highest allowable score for installation.

DOCUMENTATION
We received two documents with the Nuance system. One document, a thin booklet, was entitled Getting Started. The other document, a 400-page tome, constituted the Developer's Manual. The Getting Started booklet provided information on the platforms supported by Nuance, as well as the audio setups for those platforms. It also reviewed some issues specific to the Solaris operating system. The remainder of the booklet described the commands necessary to run applications on the system. The Developer's Manual covered many aspects of developing applications, from compiling recognition packages to run-time support to special topics.

We didn't stop at reviewing the literature supplied by Nuance. We also explored the Nuance home page. We noticed that the section of the Nuance site devoted to Nuance6 was quite thorough. In this section, topics such as descriptions, features, supported platforms, the developer's toolkit, and developer training were covered. In addition, new features of the Nuance system, such as the Speech Verifier, were illustrated.

Perhaps the most impressive portion of the site was the Nuance demo section, which includes four demos, which any visitor can access. Visitors who take advantage of these demos can use them to test the accuracy and usability of the system. We encourage anyone to go the Nuance Web site and try these very useful demos.

FEATURES
Core Client/Server Software

  • Speech recognition, which eliminates the need for complicated and frustrating touch-tone call attendants.
  • Natural language capabilities, which allow the system to recognize speech patterns rather than just single words. (Thus, the system can pick out key words. If the system fails to recognize a particular word, it does not reject the entire phrase.)
  • Telephony control, which allows the Speech Server to control a variety of telephony processes.
  • An easy to use, powerful API, which facilitate the creation of speech recognition applications.
  • SQL query integration, which gives developers access to a wide variety of database functions.
  • Barge-in capabilities, which allow callers to use the system without having to insert unnatural pauses.
  • Confidence scores, which indicate the reliability of the matches the system makes between the input it receives and the words in its vocabulary. (Confidence scores are configurable to create systems geared towards speed or accuracy.)

Nuance Verifier

  • Speech recognition as a means of instituting security.
  • Simultaneous recognition and authentication of speech, which allows for very fast, real-time verification of speakers.

Developer's Toolkit

  • Grammar specification.
  • Natural language specification.
  • SpokenSQL, which can be used to generate a database query from speech.
  • Xwavedit, which is a means of recording prompts. (Allows the developer to edit prompts for maximum recognition accuracy and efficiency.)

OPERATIONAL TESTING
Although actually developing our own application was beyond the scope of our review, we acquainted ourselves with the Nuance systems by running a sample application, and by reviewing a few of the demonstration programs we found on Nuance's home page.

Sample Application
This application, working from a specified vocabulary, prompts the user for input, compares the input to the words in the vocabulary, and returns the word that it deems the closest match, along with a confidence score. We experimented quite a bit - so much, in fact, that we started to feel at one with our microphone-equipped headset. In the course of our work, we took careful note of the confidence scores, which displayed such variation that we satisfied ourselves that the system was acting properly.

Demonstration Programs
Two of the demonstrations, Travel Plan and Better Bank, gave us a chance to evaluate Nuance's natural language capabilities. Also, Travel Plan let us check out Nuance's barge-in feature. Another feature, speech verification, was displayed to advantage in the Nuance Verifier demo. And, finally, the Stock Quotes demo let us work with a system that boasted an enormous vocabulary.

Travel Plan: This demo, which was an interactive, real-time travel planner, showed off how a Nuance-based application could recognize alternative names for airports. (We need hardly add that the ability to cope with alternative names is one of the manifestations of natural language recognition.)

We started by referring to an airport first as JFK, then as Kennedy. The system responded appropriately, recognizing that both names corresponded to the same airport. So, we decided to try something a little trickier. We decided to refer to Dulles, for Dulles International Airport. We supposed the system might mistake Dulles for Dallas, which sounds much the same. We supposed wrong, however. The system did in fact recognize Dulles, and we had to admit we were impressed.

Before we moved on to the next demo, we made sure to check out the system's barge-in capability, that is, the system's ability to accept user input even if the user supplies it before hearing the appropriate prompt. This capability has its advantages. It lets callers familiar with the menu to move through it more quickly. However, it can increase the potential for errors. Hence, good barge-in functionality accommodates impatient callers without sacrificing accuracy.

Since we're naturally impatient, we had no difficulty testing the barge-in functionality. For example, after we had heard only two of three flight options, we knew which option we wanted. So, we barged in, not caring to hear the third option. The system responded exactly as it should have. It stopped reading the flight information, and it asked us if we would like to obtain pricing information on that particular flight.

Better Bank: In this demo, Nuance's Natural Grammar and Natural Language capabilities are displayed. These capabilities go beyond the recognition of individual words. Instead, the idea is to recognize alternative word combinations, sparing the user the challenge of speaking information in any particular predefined format.

The few tests we tried here gave us favorable responses. When asked how much we wanted to transfer from checking into savings, we responded, "Fifty-seven dollars and thirty-two cents." Later, in response to the same question, we said, "Fifty-seven thirty-two." In both cases, the demo transferred the correct amount.

We did have one problem with the demo, however. When we indicated we wanted to make a payment on our American Express bill, the system asked us to indicate the amount. We said, "Pay in full." However, the system told us it didn't understand. Then, we tried other, equivalent phrases, to no avail.

Finally, we tried a question: "How much do I owe?" The system then asked us if it was correct that we wanted to pay twelve dollars. Well, that wasn't the balance. Perhaps "twelve" was the closest match the system could produce. If so, it would appear the demo's programming simply didn't anticipate the sort of input we provided. We don't suppose, however, that our problem had anything to do with the underlying speech recognition engine.

Nuance Verifier: We called into the demo and enrolled by giving our seven-digit phone number. The system then asked one of our engineers to repeat the phrase "My voice is my password" three times until it was satisfied that it could recognize his speech patterns.

Then, to proceed with the demo, we had a second engineer attempt to log into the first engineer's account. When the system prompted the second engineer to say, "My voice is my password," he complied, but the system refused to let him access the account. Thus, we confirmed the new speech verification feature actually worked.

Stock Quotes: Finally, we came to what is probably the best-known application of the Nuance system, the Stock Quotes demo. In 1996, Nuance developed a system for Charles Schwab & Co. This system, which lets the company offer stock quotes to its clients, needs an enormous vocabulary, for there are thousands of stocks clients might ask about. In fact, there are over thirteen thousand stocks, mutual funds, and market indicators.

To make matters even more complicated, many of these stocks may be referenced in multiple ways, all of which have to be recognized by the system. That such a large system should work so well is impressive, for as more words are added into any system's vocabulary, the confidence levels for any matches delivered by the system invariably decline. Thus, in such a system, it is often a good idea to read the caller's input back to the caller, so the caller knows whether the information the system provides really is pertinent to the original input. Also, it might help to read back some ancillary information. For example, in the case of a company name, it might help to read back the company's city and state, the better to distinguish any given company from sound-alike companies.

ROOM FOR IMPROVEMENT
It appears Nuance believes (and rightly so) that its role is to provide the basis for voice-based systems, and that creating working speech recognition applications is up to developers. Thus, Nuance concentrates on refining its speech recognition engine. And, while Nuance doesn't neglect to give developers some guidance (with the Developer's Toolkit, for example), it hasn't gone so far as to release a full application generator. We would like to see a development tool of this sort specifically aimed at creating a Nuance-compliant application. Such a tool would be a convenience to developers, and it would (more than incidentally) promote Nuance's interests.

Nuance could look after its own interests in yet another way, and yet again extend a convenience to other parties, by eliminating whatever problem caused our difficulties with the Banking demo. (We suppose the problem is a limited vocabulary.) Of course, this problem doesn't raise any issues with the speech recognition engine itself. We just feel the demo should do justice to the engine. So, you might consider our suggestion a backhanded compliment.

CONCLUSION
The continued evolution of speech recognition seems assured. This evolution is driven, of course, by the need for more natural human-computer interfaces. With the right interfaces, people will no longer need to adapt to the computer's way of working. Instead, it will be the computer that does the adapting; interfaces will acquire whatever attributes are needed to maximize user convenience. One new attribute that is already facilitating human-computer interactions is, as we've seen, natural language recognition. This attribute will soon enhance many applications, thanks to tools such as those provided by Nuance and other companies.

 







Technology Marketing Corporation

800 Connecticut Ave, 1st Floor East, Norwalk, CT 06854 USA
Ph: 800-243-6002, 203-852-6800
Fx: 203-866-3326

General comments: tmc@tmcnet.com.
Comments about this site: webmaster@tmcnet.com.

STAY CURRENT YOUR WAY

© 2013 Technology Marketing Corporation. All rights reserved.