×

SUBSCRIBE TO TMCnet
TMCnet - World's Largest Communications and Technology Community

CHANNEL BY TOPICS


QUICK LINKS




 

TMC Labs
April 2000

 

SpeechWorks 6.0

SpeechWorks International
695 Atlantic Avenue
Boston, MA 02111
P: 888-SAY-DEMO
Web: www.speechworks.com 

Price: $500-$1,500 per port, depending on size

Edchoice00.gif (5331 bytes)

RATINGS (0-5)
Installation: 5
Documentation 5+
Features: 4.75
GUI: 5
Testing: 4.75
Overall: A


SpeechWorks 6.0 is a tool for enabling speech recognition (SR) in communications software. Developers create speech recognition functions using �DialogModules� which they port to their main C, C++, ActiveX, or Java application. For developers who prefer a graphical environment, SpeechWorks partners with app-gen makers like InterVoice-Brite, Aspect, and Artisoft. New features for version 6 include more modules, better tuning tools, foreign language support, caller personalization, and scalability.

INSTALLATION
We tested a small-scale version with a two-port Dialogic board. The computer we used was a relatively new (and recently FDISKed) machine running Windows NT Server 4.0. Installing the SpeechWorks software was a no-brainer; everything worked the first time. (Generally, typical systems using Dialogic Host D/x series or Antares series boards have off-the-shelf hardware requirements. However, for real installations where the application must run non-stop, you�ll want to be particular about using an industrial PC with plenty of extra RAM and hard disk space, multiple boards and redundant components, and perhaps even running on UNIX instead of Windows NT.) Beyond installing the board(s), the actual SpeechWorks 6 software configuration will vary depending on the application you are building and your stage of the development process. Those issues are addressed in the Features and Operational Testing sections.

DOCUMENTATION
The good news is that SpeechWorks 6 includes several books that, as far as CTI manuals go, are fine works of literature. The bad news is that they�re only supplied in electronic formats. The specifics: there are nine .PDF files, two Word documents, and a .TXT file. The .PDF topics include command-line tuning tools, tuning tools/event logs, custom contexts, installation, platform integration, an introduction to speech applications, a service creation guide, reference notes, and more release notes. The two Word documents are redundant copies of the installation guide and release notes, and the text file is the release notes section for the tuning tools.

Additionally, there is online help for the tuning tools feature, and there are several links at the SpeechWorks Web site to white papers and other resources. However, of all these documents, our favorites are the introduction to SR guide and the service creation guide � for unlike most vendors� manuals that simply explain how to use the product, these guides are excellent academic discussions of the concepts of speech recognition. Some parts of these manuals are obviously biased toward SpeechWorks� own products, but even if you�re developing with products from Nuance, L&H, etc., we would still consider the SpeechWorks guides to be required reading. A note to SpeechWorks CEO Stuart Patterson: give your technical writers a raise!

FEATURES
Before programmers start working on their SR applications, they can configure the development parameters using SpeechWorks� configuration tool. The tool has options for the integration type (Dialogic host, Dialogic Antares, or custom), DLL locations (for prompts, recording, and call control), data capture switches (tuning tools logging and SMARTRecognizer logging), and data storage (location and megabytes of space allocated). The highlight of this menu is the data capture switches. �Capture Tuning Tools Data� logs the success or failure for recognition information of every step of a call; the data is viewed and analyzed with the actual tuning tools program. �Capture SMARTRecognizer Data� is a tool that improves recognition accuracy.

The tuning tools feature is, in a word, awesome. Entire calls can be logged, from system answer to hang-up, and the caller and system actions are represented visually in a separate GUI from the other SpeechWorks features. There are seven items to examine for each call, including the DialogModules, the call summary, the call flow, the entire call actions transcript, the start times, the call durations, and the recognition results. A great feature in the show calls option is that you can replay a recording of the caller�s input, in the context of where it happened during a call (Figure 2). This way, you can compare the input to the replies that your application expected to hear, and you can better analyze what went wrong, why it went wrong, and how to make the application better.

The other six options are just as useful as show calls. The main screen for the DialogModules summary shows a bar graph of the success or failure of each module used in your application. Also shown is the number of times the module is called up, and a color-coded means of analyzing the results of each occurrence. Selecting any of the modules individually shows even more detail and color-coded bar charts, including specific information on the confidence SpeechWorks� recognition engine had in each heard reply. The call summary screen shows a synopsis of every module and caller reply, including the call duration, which system port was used, etc.

Call flow shows a flow chart of a call�s possible routes, which can be viewed with any chart cell as the starting point. Call start times shows another bar graph indicating what times calls arrived in hour-long blocks of a 24-hour period. The call durations option shows a similar chart indicating the average length of time calls lasted for. �Recognition results� provides more information on the success of your DialogModules.

Back to the SMARTRecognizer feature: �SMART� is an acronym for Self-Modifying Automatic Recognition Tuning Engine. This engine uses another program called LEARN. Basically, it gambles its accuracy on what you�ll say; for example, imagine there is an application for using the telephone to listen to the statistics of and ordering tickets for local sports teams.

If you call into such a system in Los Angeles, the system would know the players� statistics and ticket ordering information for the Angels and Dodgers (baseball), Clippers and Lakers (basketball), Kings (hockey), and Galaxy (soccer). But, if you asked for the free-throw percentage of Kobe Bryant and then you asked for ticket information, the SMARTRecognizer would assume that you�re interested in basketball tickets, so of the six teams� names, it would particularly listen for the Clippers or Lakers. Conversely, if you called and inquired about baseball tickets and then asked for player statistics, it would listen more carefully (weight the responses in favor of) the 50 players on the Angels and Dodgers.

SpeechWorks is the kind of product for which we could not possibly list every feature. However, some of the remaining highlights of this version include barge-in capabilities, multilingual support, improved vocabulary and grammar errors (based on phoneme models), �Hot Insert� (hot-swappable, in a sense) application tuning, alphanumeric character pronunciations, and improved echo cancellation, recognition, and rejection algorithms.

OPERATIONAL TESTING
Two weeks of testing is hardly enough time to make a full-fledged sample application, but in reality, a few months would be sufficient, especially when SpeechWorks is used in conjunction with its application generator partners. The most important factors in developing a good SR-enabled application is the foresight to know what responses your system will get and when it will get them. Every time a caller gives input that your application wasn�t expecting to hear, the recognition accuracy lessens.

To accomplish the best application, we found that SpeechWorks� tuning tools are the best of their class, largely because they are provided in a more easily analyzed fashion than just black-and-white numbers. Of course, for the ultimate testing of your SR application, you would take the additional step of stress-testing the program while receiving many calls during high levels of network congestion.

We also found that even for a novice SR programmer, it is easy to develop for the DialogModules. Items like the vocabulary editor are very straightforward, and the documentation of each module is incredibly detailed. SpeechWorks also provides sample uses of each module, which can be duplicated or used as templates for your customized solution.

Although it might seem basic, anyone who�s been around speech recognition knows that some of the simplest recognition tasks are actually the most difficult. Among these tasks are dates, currency, human names, place names, times, and common words and phrases like �today,� �tomorrow,� �next week,� �I don�t understand,� �yeah,� and so on. A complete list of such utterances cannot exist, but throughout our testing, we were very impressed with how well the sample SpeechWorks applications handled these things.

Using Speechworks� presidential election demonstration, we made a simple application of our own, and we tried to anticipate the responses. Our candidates were Al Gore, Bill Bradley, John McCain, George �Dub-ya�, and, just to keep it interesting, Daffy Duck. We allowed for margins of error; for example, people could say �Bush� for George Dub-ya, or they could accidentally say Donald (instead of Daffy) Duck.

More complicated applications better illustrate SpeechWorks� power. So, while we do not normally mention a vendor�s customers in a product review, in this case we feel that SpeechWorks� own demonstrations (and live applications) are items that potential customers should try out. (For more information on these demonstrations, see www.speechworks.com/demos/index.cfm.)

ROOM FOR IMPROVEMENT
Most of the things we can think of that would improve SpeechWorks 6.0 are already being planned for versions 6.5 and 7.0. Some of these features are voice authentication and security, �plug and play� DialogModules, more and better DialogModule customization, improved natural language abilities for the text-to-speech engine, and improved options for caller personalization (which the current version only begins to address).

CONCLUSION
As implied by the very short improvements section above, we feel that based on the currently available technology, SpeechWorks 6.0 is nearly perfect. There can always be more scalability, better recognition, and more competitive pricing, and we anticipate that speech recognition as an industry (not just this vendor) will face new challenges as packet-switched voice becomes a more mainstream way of communicating � what happens if the software can�t recognize what you said because half of your packets arrived late, damaged, or did not arrive at all?

Overall, we are very impressed with this product, based on comparing it to its competitors and to the current market needs. We consider touchtone-based systems to be nearly obsolete � we say �nearly� because there will always be situations that warrant touchtone on the basis of security, background noise, or failed recognitions. The limit of ways that you can SR-enable a communications product is endless, and we feel that anyone who invests in SpeechWorks 6.0 will be quite pleased.







Technology Marketing Corporation

2 Trap Falls Road Suite 106, Shelton, CT 06484 USA
Ph: +1-203-852-6800, 800-243-6002

General comments: [email protected].
Comments about this site: [email protected].

STAY CURRENT YOUR WAY

© 2024 Technology Marketing Corporation. All rights reserved | Privacy Policy