There are many specialty players in the application generator (app-gen) market. Certain
app-gens are specifically designed for Internet telephony, IVRs, Java, etc., and others
are designed for certain industries, like banking or ticket ordering. However, until now,
few app-gens existed with the ability to produce programming code for speech recognition.
Unisys' Natural Language Speech Assistant (NLSA) 3.0 breaks into that field, with a
development platform that will be tough for latecomers to beat.
Development is based on a three-step process of application design, response
recognition and engine interpretation. The software is widely useful: it integrates with
recognition engines from Watson, Nuance, Lucent, Phillips, Lernout & Hauspie or plain
Microsoft SAPI; integration with additional mainstream engines is expected soon. It is
competitively priced, well documented and easy to learn. It is powerful and it does what
it claims to do, and so it is our latest Editor's Choice award winner.
Installation
Installing NLSA 3.0 involves a lot more than running a setup wizard. Start with a
multimedia-equipped Pentium system running at least 200 MHz with 128 MB RAM and Windows NT
Workstation 4.0 with Service Pack 3. (Although we suggest using this software on a Windows
NT system, other options include Windows 95 (using a telephone rather than a multimedia
TTS interface), Solaris 2.5.1 or newer, or SCO UNIX Open Server 5.) Attach a Unisys
pass-through sentinel to the PC's parallel port. Also required (for live environments, not
for offline testing) is a TAPI-compliant voice board -- Dialogic hardware is preferable.
We used the Proline 2V model.
Once the board was running, which is only a matter of tweaking jumpers, IRQ conflicts
and the usual options, users need to install a logic generator application. Fortunately,
Unisys partners with app-gen makers like Parity, Artisoft, Periphonics, Mediasoft Telecom,
Pronexus, etc., who bundle the Unisys technology with their own logic-developers. We
installed Parity's VOS 6. Finally, we installed the actual NLSA software, including
options for the DDA (Dialogue Design Assistant), SAI (Speech Assistant Interpreter) and
SAT (Speech Assistant Toolkit). After we copied enablement files to the proper directories
and performed the necessary reboot(s), we logged in and ran a quick test of the telephony
board. Everything worked, and the entire process took about two hours, despite a sound
card/Windows NT conflict that had nothing to do with Unisys' product.
Documentation
Usually we are quite harsh on software documentation. Much of it has a catch: documents
are poorly translated from other languages, they have too many spelling mistakes and poor
grammar, they lack enough screen shots, they take the reader's conceptual knowledge for
granted, and sometimes the cumulative weight of the documents exceeds that of the reader.
The NLSA documentation is the exception. Documentation for the NLSA is, in a word,
awesome. While there are sufficient screen shots, explanations, examples, etc., and while
the level of detail was appropriate and the language level seemed right for both novice
and veteran users, the real value is the textbook element and slide-style writing. At
first, we were not optimistic about a manual made up entirely of PowerPoint slides, but
here they are used to perfection. There is an extensively detailed and insightful lesson
on IVR fundamentals, reader feedback, speech recognition tips and even end-user
psychology. Clearly, these manuals were written by someone who understands technology
education, as they are complete with worthwhile acronym glossaries and a "for further
reading" appendix.
Additional documents, all of the same high quality, are the online help menus and even
a "getting started" summary from the Unisys Web site. The help menus discuss
application development, installation and more. The Web file is a "capabilities
overview" that reads like an interesting white paper. Combined, the multiple
documents make up the best software documentation we've seen that does not include an
actual book of some sort. The only criticism we have here is that we would like to see a
bit more discussion about choosing and installing a speech recognition engine and a TAPI
voice board -- currently, the limited treatment of these topics (especially in the printed
documents) refers users to their OEM manuals. Finally, another great quality of the
printed manual is a section about design methodology, where the discussion focuses on the
pros and cons of various development techniques and application planning.
Features
As mentioned above, NLSA 3.0 is divided into three main components. These include the
design assistant, the speech interpreter and the translator. Another powerful feature is
the Natural Language API, which is a very useful tool for porting NLSA-generated code to
app-gen based compilers. There is also a "Sound Tuner" for making voice prompts,
a "Wizard of Oz" (WOZ) feature for step-through application simulation, and
several available sample applications, ranging from fast food ordering to a mortgage
assistant.
Dialogue Design Assistant, the WOZ method and the Speech Application Toolkit (SAT)
allow developers to test drive speech applications before they are compiled (or even if
they're not finished yet) and to build the application in logical, manageable steps. When
those steps are complete, the Speech Assistant Interpreter (SAI) translates the
application into programming code and BNF grammar for the speech recognition engines.
The NLSA GUI uses procedures defined by more of the Unisys buzzwords.
"Compartments" are complete question/answer/response/action steps for each
portion of an application. Options within compartments include prompts, appropriate
responses, variables, tokens and DTMF values, actions, replies and continuation prompts.
Variables can be strings, numbers, currency, times or dates, which are compliant from
January 1, 1900 to December 31, 2100.
There are also "repertoires," which are complete groupings of prompts and
possible answers. An example of a repertoire would be, "You ordered six cacti. Is
this correct?" Within every repertoire are "snippets" -- from the cacti
repertoire, the snippets would be "I heard you say"; the number variable, in
this case six; "cacti"; and "Is this correct?" Within the actual
prompt files, there are three types: initial, verbose and expert; users sometimes can
choose a type based on their prior knowledge of the system. During the application
development process, there are four sub-categories of each prompt type. These include
prompts that don't have associated text yet, prompts that use text-to-speech, prompts that
use text-to-speech with variables and prompts that use prerecorded .WAV files.
"Tokens" are the action directives that make the recognition engines/logic
flow charts work. Actions include getting an end-user response, changing the previous
answer, goodbye, help, operator, repeat, start over, caller hangup, silence and three
levels of misrecognition notification. Meanwhile, the actual WOZ simulator has some
drawbacks, such as a built-in half-second delay and half-duplex functionality, but it's a
valuable debugging tool because you don't have to wait for the application to be finished
before you can test pieces of it. The simulator offers dynamic feedback options like a
DTMF pad, silence/mute, an auto-answer on/off toggle, variables toggle and a log.
Other features of NLSA 3.0 include the ability to switch recognition engines in
mid-project with a one-click recompile feature, an option for running the NLAPI in
embedded mode or in distributed mode for load balancing, automatic TTS generation of voice
prompts and the "neutral" SAPI grammars for developers who have not yet selected
a recognition engine.
Operational Testing
The step-by-step documentation/tutorial makes it easy to build your first application with
minimal compartments and options. We created a basic survey IVR -- vote for your favorite
cartoon tiger. Options were Tigger, Tony and Hobbes. After the question is asked and the
user replies with either the tiger's name or an alternative (i.e., "Pooh's
friend," "the cereal one," "Calvin's friend"), the system is
designed to have multiple levels of verification; i.e., "I'm not sure if you said
[Tigger] or [Tony]. [Please repeat] your answer." In this case, the variables are the
two tiger names and the please repeat action. The snippets are "I'm not sure if you
said," "or" and "your answer." That entire process, from the
original question (i.e., "Please say the name of your favorite cartoon tiger -- the
choices are
") to the caller's reply to the verification to the resulting
action, is considered one repertoire. Once you get used to the logic, the process makes
sense and even seems intuitive.
When the application is complete, developers run it through the simulation tool, which
requires only a decent sound card, not an actual TAPI board. The simulator is controlled
using a series of control/key macros and functions keys. We especially like that the
simulator works as well with individual compartments as it does with whole applications.
Most good app-gens include some kind of software-based simulator that eliminates the need
for a board, but eliminating the need for a start-to-finish coding process before testing
is a great innovation.
Overall, learning the NLSA's development methodology was simple, but we think that this
is because of great documentation, not because of the software. We have several issues
with the GUI. The windows can't be resized. Within the DDA, the prompt, reply and action
windows are color-coded, but they all appear white on displays with lower resolutions. We
also found several instances where the menus do not follow standard Windows conventions.
One example would be the menu for token actions: if you click and select an action on the
list, you can use the keyboard's arrow keys to scroll for viewing the rest of list, but
the selected item does not actually change unless you use the mouse. This would not be so
bad, except that the documentation (and even the product's promotional literature) boasts
of the software's ability to work well with only a keyboard. We found that this is less
than the whole truth, and in fact, we don't know of any serious generation platforms that
could possibly work as well without a mouse.
We also have a concern about the year 2000 compliance: because it only goes to the year
2100, what happens if, in 2001, an end user calls an NLSA-developed stock brokers' system
and buys a 100-year bond? Does it come due in 1901, or is this dependent on the back-end
database? What happens if the database has better Y2K compliance, but the year 2101 never
arrives because of front-end limitation? There is year 2000 compliance, and then there is
year 2000 compliance done right. If software can be made to count four date digits, then
it should be able to count to the year 9999 as easily as the year 1999.
Room For Improvement
Most of our criticism is with the GUI, which at first seems decent, but shows its
shortcomings with regular usage -- the non-resizable windows would not be so bad if they
weren't so large, taking up nearly three-quarters of a 17-inch monitor at a standard
resolution. We also would like to see more information in the manuals about selecting and
implementing a speech recognition engine. The writers have done a yeoman's job of
explaining design concepts, but all of their eggs are in one basket. Finally, we'd like to
see a high-quality sound recorder included. The sound mapper feature is a start, but many
developers would assume that the default Windows recorder is enough to make importable
.WAV files, but that applet has a 60-second limitation.
Conclusion
Rather than compete with speech-recognition companies, engineers from the Unisys
Corporation instead make other company's SR applications easier to develop. Imagine if
alien poets could write in an iconic Martian language and compile their work for any
Earthly grammar: this suite is nearly as powerful. NLSA 3.0 is the speech recognition
industry's Babel fish. Combined with its competitive price, superb documentation and
reasonable learning curve, we highly recommend this product. With some improvements to the
interfaces and an extended feature set, it can be even better. We welcome reader feedback
from anyone who uses NLSA 3.0. |