×

SUBSCRIBE TO TMCnet
TMCnet - World's Largest Communications and Technology Community

CHANNEL BY TOPICS


QUICK LINKS




 

labs.GIF (1895 bytes)
May 1999


Natural Language Speech Assistant 3.0
Unisys Corp.
2476 Swedesford Road
Paoli, PA 19301
Ph: 800-874-8647; Fx: 610-695-5636
E-mail: [email protected]
Web site: www.unisys.com

Price: Grammar Generator, $3,225; Run-Time Environment, $129 per port.

Award logo

RATINGS (0-5)
Installation: 4.5
Documentation: 5
Features: 4.5
GUI: 4
Overall: A-

There are many specialty players in the application generator (app-gen) market. Certain app-gens are specifically designed for Internet telephony, IVRs, Java, etc., and others are designed for certain industries, like banking or ticket ordering. However, until now, few app-gens existed with the ability to produce programming code for speech recognition. Unisys' Natural Language Speech Assistant (NLSA) 3.0 breaks into that field, with a development platform that will be tough for latecomers to beat.

Development is based on a three-step process of application design, response recognition and engine interpretation. The software is widely useful: it integrates with recognition engines from Watson, Nuance, Lucent, Phillips, Lernout & Hauspie or plain Microsoft SAPI; integration with additional mainstream engines is expected soon. It is competitively priced, well documented and easy to learn. It is powerful and it does what it claims to do, and so it is our latest Editor's Choice award winner.

Installation
Installing NLSA 3.0 involves a lot more than running a setup wizard. Start with a multimedia-equipped Pentium system running at least 200 MHz with 128 MB RAM and Windows NT Workstation 4.0 with Service Pack 3. (Although we suggest using this software on a Windows NT system, other options include Windows 95 (using a telephone rather than a multimedia TTS interface), Solaris 2.5.1 or newer, or SCO UNIX Open Server 5.) Attach a Unisys pass-through sentinel to the PC's parallel port. Also required (for live environments, not for offline testing) is a TAPI-compliant voice board -- Dialogic hardware is preferable. We used the Proline 2V model.

Once the board was running, which is only a matter of tweaking jumpers, IRQ conflicts and the usual options, users need to install a logic generator application. Fortunately, Unisys partners with app-gen makers like Parity, Artisoft, Periphonics, Mediasoft Telecom, Pronexus, etc., who bundle the Unisys technology with their own logic-developers. We installed Parity's VOS 6. Finally, we installed the actual NLSA software, including options for the DDA (Dialogue Design Assistant), SAI (Speech Assistant Interpreter) and SAT (Speech Assistant Toolkit). After we copied enablement files to the proper directories and performed the necessary reboot(s), we logged in and ran a quick test of the telephony board. Everything worked, and the entire process took about two hours, despite a sound card/Windows NT conflict that had nothing to do with Unisys' product.

Documentation
Usually we are quite harsh on software documentation. Much of it has a catch: documents are poorly translated from other languages, they have too many spelling mistakes and poor grammar, they lack enough screen shots, they take the reader's conceptual knowledge for granted, and sometimes the cumulative weight of the documents exceeds that of the reader.

The NLSA documentation is the exception. Documentation for the NLSA is, in a word, awesome. While there are sufficient screen shots, explanations, examples, etc., and while the level of detail was appropriate and the language level seemed right for both novice and veteran users, the real value is the textbook element and slide-style writing. At first, we were not optimistic about a manual made up entirely of PowerPoint slides, but here they are used to perfection. There is an extensively detailed and insightful lesson on IVR fundamentals, reader feedback, speech recognition tips and even end-user psychology. Clearly, these manuals were written by someone who understands technology education, as they are complete with worthwhile acronym glossaries and a "for further reading" appendix.

Additional documents, all of the same high quality, are the online help menus and even a "getting started" summary from the Unisys Web site. The help menus discuss application development, installation and more. The Web file is a "capabilities overview" that reads like an interesting white paper. Combined, the multiple documents make up the best software documentation we've seen that does not include an actual book of some sort. The only criticism we have here is that we would like to see a bit more discussion about choosing and installing a speech recognition engine and a TAPI voice board -- currently, the limited treatment of these topics (especially in the printed documents) refers users to their OEM manuals. Finally, another great quality of the printed manual is a section about design methodology, where the discussion focuses on the pros and cons of various development techniques and application planning.

Features
As mentioned above, NLSA 3.0 is divided into three main components. These include the design assistant, the speech interpreter and the translator. Another powerful feature is the Natural Language API, which is a very useful tool for porting NLSA-generated code to app-gen based compilers. There is also a "Sound Tuner" for making voice prompts, a "Wizard of Oz" (WOZ) feature for step-through application simulation, and several available sample applications, ranging from fast food ordering to a mortgage assistant.

Dialogue Design Assistant, the WOZ method and the Speech Application Toolkit (SAT) allow developers to test drive speech applications before they are compiled (or even if they're not finished yet) and to build the application in logical, manageable steps. When those steps are complete, the Speech Assistant Interpreter (SAI) translates the application into programming code and BNF grammar for the speech recognition engines.

The NLSA GUI uses procedures defined by more of the Unisys buzzwords. "Compartments" are complete question/answer/response/action steps for each portion of an application. Options within compartments include prompts, appropriate responses, variables, tokens and DTMF values, actions, replies and continuation prompts. Variables can be strings, numbers, currency, times or dates, which are compliant from January 1, 1900 to December 31, 2100.

There are also "repertoires," which are complete groupings of prompts and possible answers. An example of a repertoire would be, "You ordered six cacti. Is this correct?" Within every repertoire are "snippets" -- from the cacti repertoire, the snippets would be "I heard you say"; the number variable, in this case six; "cacti"; and "Is this correct?" Within the actual prompt files, there are three types: initial, verbose and expert; users sometimes can choose a type based on their prior knowledge of the system. During the application development process, there are four sub-categories of each prompt type. These include prompts that don't have associated text yet, prompts that use text-to-speech, prompts that use text-to-speech with variables and prompts that use prerecorded .WAV files.

"Tokens" are the action directives that make the recognition engines/logic flow charts work. Actions include getting an end-user response, changing the previous answer, goodbye, help, operator, repeat, start over, caller hangup, silence and three levels of misrecognition notification. Meanwhile, the actual WOZ simulator has some drawbacks, such as a built-in half-second delay and half-duplex functionality, but it's a valuable debugging tool because you don't have to wait for the application to be finished before you can test pieces of it. The simulator offers dynamic feedback options like a DTMF pad, silence/mute, an auto-answer on/off toggle, variables toggle and a log.

Other features of NLSA 3.0 include the ability to switch recognition engines in mid-project with a one-click recompile feature, an option for running the NLAPI in embedded mode or in distributed mode for load balancing, automatic TTS generation of voice prompts and the "neutral" SAPI grammars for developers who have not yet selected a recognition engine.

Operational Testing
The step-by-step documentation/tutorial makes it easy to build your first application with minimal compartments and options. We created a basic survey IVR -- vote for your favorite cartoon tiger. Options were Tigger, Tony and Hobbes. After the question is asked and the user replies with either the tiger's name or an alternative (i.e., "Pooh's friend," "the cereal one," "Calvin's friend"), the system is designed to have multiple levels of verification; i.e., "I'm not sure if you said [Tigger] or [Tony]. [Please repeat] your answer." In this case, the variables are the two tiger names and the please repeat action. The snippets are "I'm not sure if you said," "or" and "your answer." That entire process, from the original question (i.e., "Please say the name of your favorite cartoon tiger -- the choices are…") to the caller's reply to the verification to the resulting action, is considered one repertoire. Once you get used to the logic, the process makes sense and even seems intuitive.

When the application is complete, developers run it through the simulation tool, which requires only a decent sound card, not an actual TAPI board. The simulator is controlled using a series of control/key macros and functions keys. We especially like that the simulator works as well with individual compartments as it does with whole applications. Most good app-gens include some kind of software-based simulator that eliminates the need for a board, but eliminating the need for a start-to-finish coding process before testing is a great innovation.

Overall, learning the NLSA's development methodology was simple, but we think that this is because of great documentation, not because of the software. We have several issues with the GUI. The windows can't be resized. Within the DDA, the prompt, reply and action windows are color-coded, but they all appear white on displays with lower resolutions. We also found several instances where the menus do not follow standard Windows conventions. One example would be the menu for token actions: if you click and select an action on the list, you can use the keyboard's arrow keys to scroll for viewing the rest of list, but the selected item does not actually change unless you use the mouse. This would not be so bad, except that the documentation (and even the product's promotional literature) boasts of the software's ability to work well with only a keyboard. We found that this is less than the whole truth, and in fact, we don't know of any serious generation platforms that could possibly work as well without a mouse.

We also have a concern about the year 2000 compliance: because it only goes to the year 2100, what happens if, in 2001, an end user calls an NLSA-developed stock brokers' system and buys a 100-year bond? Does it come due in 1901, or is this dependent on the back-end database? What happens if the database has better Y2K compliance, but the year 2101 never arrives because of front-end limitation? There is year 2000 compliance, and then there is year 2000 compliance done right. If software can be made to count four date digits, then it should be able to count to the year 9999 as easily as the year 1999.

Room For Improvement
Most of our criticism is with the GUI, which at first seems decent, but shows its shortcomings with regular usage -- the non-resizable windows would not be so bad if they weren't so large, taking up nearly three-quarters of a 17-inch monitor at a standard resolution. We also would like to see more information in the manuals about selecting and implementing a speech recognition engine. The writers have done a yeoman's job of explaining design concepts, but all of their eggs are in one basket. Finally, we'd like to see a high-quality sound recorder included. The sound mapper feature is a start, but many developers would assume that the default Windows recorder is enough to make importable .WAV files, but that applet has a 60-second limitation.

Conclusion
Rather than compete with speech-recognition companies, engineers from the Unisys Corporation instead make other company's SR applications easier to develop. Imagine if alien poets could write in an iconic Martian language and compile their work for any Earthly grammar: this suite is nearly as powerful. NLSA 3.0 is the speech recognition industry's Babel fish. Combined with its competitive price, superb documentation and reasonable learning curve, we highly recommend this product. With some improvements to the interfaces and an extended feature set, it can be even better. We welcome reader feedback from anyone who uses NLSA 3.0.







Technology Marketing Corporation

2 Trap Falls Road Suite 106, Shelton, CT 06484 USA
Ph: +1-203-852-6800, 800-243-6002

General comments: [email protected].
Comments about this site: [email protected].

STAY CURRENT YOUR WAY

© 2023 Technology Marketing Corporation. All rights reserved | Privacy Policy