×

SUBSCRIBE TO TMCnet
TMCnet - World's Largest Communications and Technology Community

CHANNEL BY TOPICS


QUICK LINKS




 

labs.GIF (1895 bytes)
April 1999


SpeechWorks Host 1.1

Sold by: Dialogic Corp.
1515 Route 10
Parsippany, NJ 07054
P (800) 755-4444
F (973) 631-9631
E [email protected]
W www.dialogic.com

Developed by: SpeechWorks International
695 Atlantic Avenue
Boston, MA 02111
P (617) 428-4444
F (617) 428-1122
E [email protected]
W www.speechworks.com

Price: 2 ports, $520 / 4 ports, $1,040

Award logo

RATINGS (0-5)
Installation: 5
Documentation 4.5
Features: 4.75
Overall: A

SpeechWorks Host 1.1 is a phonetic toolkit for adding natural (continuous) speech recognition to CTI applications that use Dialogic boards. This small-scale solution is an entry-level version of SpeechWorks' (formerly ALTech) 4.0 platform. Dialogic licenses and resells this host-based version of the product, which is upgradeable to the full SpeechWorks package. By itself, the Host package requires only a 2- or 4-port board and a Pentium-class CPU, without requiring any special DSP hardware. Its best feature is Dialogue Modules, essentially prepackaged building blocks of the most common speech recognition features. These modules are highly customizable. Users can assemble the modules in three ways, including text coding in any C or C++ environment, as building blocks in Dialogic-specific visual application generator environments, or as ActiveX modules in more mainstream CTI application generators. Host also features a vocabulary of up to 250 words, speaker independence, which means that no speaker training is required, and with a price point of about $250 per port, it is extremely inexpensive. Host 1.1 is surprisingly powerful for an entry-level product, rivaling some of the large-scale products from some of SpeechWorks' competitors.

Installation
A Dialogic or Dialogic-compatible board needs to be installed and configured before the Host software installation can begin. We tested the software on two computers with D21H and D41H boards, but D41EPCI and D41ESC boards also work. Along with the board, additional requirements include a Pentium running at 133 MHz for two ports, at 166 MHz for four ports or at 200 MHz for eight ports; 64 MB RAM highly suggested; 40 MB of hard disk space; Windows NT 4.0; the Windows small fonts setting; Dialogic System Software and SDK DNA 3.0 for NT; Microsoft Visual C++ 4.2 or later; and a SpeechWorks Host enablement disk.

Both the ASR Development Package and the ASR Enablement Package utilize standard Windows install shields. Install the Development package first, reboot your system, install the Enablement Package and reboot again, just to be safe. Without the enablement software, the Host suite will work for only 30 days for evaluation purposes. Meanwhile, the maximum number of boards supported is two, for a total of eight ports.

We had no problems with the installation process. The boards and software installed smoothly on both PCs, and we see no reason why any developer, even a beginner, would have any problems either. We wish that all CTI hardware and software installed so seamlessly.

Documentation
Like the product's installation, the documentation for this package was strictly top-notch. A small release notes manual covered the installation and new features. Missing from the printed documentation was detailed information on the board installation and troubleshooting, but the Configuration Manager application, with features like auto-detect and the ability to control service start-up modes, provided most of the knowledge we needed for the board installation. The Configuration Manager came with its own online help file. The DNA CD comes with its own setup of online documentation. There isn't much online documentation for the Host software itself, but there is extensive printed documentation. A software reference, a Creation Guide and an Integration Guide are included, all accompanied by numerous screen shots, technical and beginner's explanations, glossaries, sample application, etc.

Overall, we were very impressed with the quality of these manuals. Even the SpeechWorks corporate Web site is handy, containing one of the best technology background sheets we've seen. The various documents here can lead even the CTI neophyte to the land of sophisticated speech-recognition development, without much pain or suffering. Most of the documentation was written by Dialogic, not by SpeechWorks, and it's a testament toward considering the addition of improved manuals as a significant value-add.

Features
As we explained above, SpeechWorks Host makes extensive use of prepackaged, highly customizable building blocks called Dialogue Modules. Five essential modules are included. These are: YesNo, ContinuousDigits, AlphaNumeric, VoiceMenu and ItemList.

YesNo does exactly what its name implies: it queries callers for a yes or no answer to a question, determines the response, takes some other action if it understands the answer and repeats the query in a clearer context if the answer is not understood. After multiple and different attempts and methods of trying to understand the answer, the module can be programmed to send callers to live operators. This module, while complicated in its flow-chart form, is actually the simplest module and is nearly 100 percent accurate. (Its complicated flowchart reminds us all to thank SpeechWorks for inventing the modules, which can be vastly more complicated than YesNo fans could dream of.)

ContinuousDigits collects numbers. Minimum and maximum numbers and number of digits can be specified, along with a list of valid digit strings.

AlphaNumeric works just like ContinuousDigits, except that it also collects characters, symbols, etc. For example, instead of being able to just say numbers, system users could find themselves spelling out an e-mail address like [email protected].

VoiceMenu returns a caller's selection from a list that it first reads to the user.

ItemList is like VoiceMenu, but more complicated. It offers more choices, more options for clarifying ambiguous statements, etc. (Users who later upgrade to the full SpeechWorks 4.0, or who buy optional modules, get access to blocks like TelephoneNumber, Currency, Date, Spelling, ZipCode, NaturalNumbers and CustomContext.)

After users master the Dialogue Module theory, they need to learn the vocabulary editor and tuning tools. We caution you: it would be useful to brush up on your undergraduate linguistics curriculum, from the phonetic alphabet to Ferdinand de Saussure. The vocabulary editor includes a graphical display of a phonetic keyboard, which allows developers to enter infinite pronunciations of commonly mispronounced words, like surnames, technical terms, foreign phrases, etc. Fortunately, the user-customizable dictionary already contains almost 200,000 words written in their phonetic form, and there is a phoneme chart, all easily accessible. Even better, the Creation manual includes an entire chapter and an appendix/tutorial for phonetic word and phrase development. It's easy once you get used to it, but we're not joking about brushing up on your linguistics education before beginning a mid- to large-scale speech recognition implementation. Finding a software generation package so directly linked to academia is rare -- it's not a surprise that the SpeechWorks engine was sponsored by ARPA and made by MIT students.

Beside the modules and the editor, another valuable SpeechWorks Host feature is the tuning tool. Used for logging and monitoring call activities, SLEE files (service logic execution environments), etc., the tuning tool is a valuable way to follow, check and confirm or diagnose literally every step of a speech recognition software's performance. The tool was still in beta testing when we received it, but its graphical, numerical and contextual reporting makes the entire Host package complete. Most often, developers will use this tool for checking recognition confidence results.

Other features of SpeechWorks Host 1.1 include:

  • Barge-in ability,
  • N-best output/scoring,
  • Customizable APIs with sample applications,
  • Visual C++ 4.2 support,
  • 250-word vocabulary per application.

Operational Testing
Making speech recognition technology work involves a lot more than recognizing common user feedback words and activating software functions based on them. Even the yes/no function can be complicated, and the complexity increases exponentially as additional choices are added to prompt menus. Despite that, SpeechWorks Host is easy to use, even for a novice programmer, as NT integration experience and linguistics comprehension seem much more important here.

A standard sample application is an election demonstration. It's an included application that polls callers for their voting choice -- Bill Clinton, Bob Dole or Ross Perot. The application sounds simple, but as users follow the directions for building this test, the value of the Dialogue modules becomes clear. Adding additional choices to the poll is one option of the test application. We added Mark McGuire to the list. In phonetics, the spelling might be "M aa r k M ih c G w ih er." Next, we added alternate possible answers, like Big Mac, Mac Attack and Big Red. Each name presents its own set of challenges.

If getting McGuire's name to make sense in phonetics is frustrating, hire someone to develop speech recognition for you because it only gets more complicated. Real-world applications like phone books, bankers' and stockbrokers' utilities and train schedules have a lot more than our choices, and real-world answers rarely score the high confidence ranking that Mark McGuire's or Bill Clinton's name will. Still, as confusing and challenging as developing item lists and their associate flow charts can be, the process is much easier with the ready-made modules. The modules can do things like apply chances automatically and globally, so even the small and tedious tasks of speech recognition application development are covered with Host, not just the recognition itself. Most of all, we like that the time spent making our small test application was used mostly on the logic itself and the possible user responses rather than on using the software. Good software lets you focus on the goal rather than getting there, and SpeechWorks Host does this well.

Room For Improvement
Most of the features that we would like to see improved are addressed with the full version, which is SpeechWorks 4.0. (The full version of SpeechWorks powers General Magic's Portico virtual receptionist tool.) That version includes more words per application, more scalability, more Dialogue modules, etc. We have only minor criticisms of the Host package. For example, the vocabulary editor window is difficult to resize, and the various GUIs are not as intuitive as they could be. We would like to see a tool that tries to auto-phoneticize words that it doesn't recognize -- it would improve recognition within the vocabulary editor. We would also like to see the package bundled with a visual application generator suite, or at the least, a powerful flow-chart editor. This is the one missing piece to the speech recognition puzzle: most app-gen software is very good for developing IVRs and similar software, and some app-gens are adding IP telephony options, but not many have successfully included speech recognition. Tighter integration is the key. Finally, we'd like to see the optional inclusion of a "Flow-charting and Phoneticizing Speech Recognition for Dummies for SpeechWorks Host 1.1" textbook. There is evidence in the included manuals that such an attempt has been made, but currently it's too fragmented and makes too many assumptions of the developers' knowledge of linguistics to work well.

Conclusion
We recommend this product to anyone who is willing to learn it. Learning the software is easy; learning its methods is another story. It's very inexpensive, it's very powerful, it uses mainstream hardware and it's well documented for a CTI product. We're issuing the Editor's Choice award to the SpeechWorks company for making a great software package; but that's qualified with a challenge to SpeechWorks and to their competitors: there is still plenty of room to get better. Options for improved integration and more efficient development platforms do exist, and we predict that they'll be as readily exploited as mainstream CTI app-gens before the year 2000, which SpeechWorks Host is prepared to handle. For small- to mid-scale speech recognition needs, whether in the call center or other environment, this is one of the better purchases a manager and developer can make together.







Technology Marketing Corporation

2 Trap Falls Road Suite 106, Shelton, CT 06484 USA
Ph: +1-203-852-6800, 800-243-6002

General comments: [email protected].
Comments about this site: [email protected].

STAY CURRENT YOUR WAY

© 2023 Technology Marketing Corporation. All rights reserved | Privacy Policy