What is the W3C and what do they offer for the speech framework and standards?
W3C is a standardization consortium that is creating a set of interoperable standards. In the W3C, two groups work actively on speech frameworks and standards:
The first, the Voice Browser working group, completed the VoiceXML 2.0 and 2.1 standards in the last few years, along with many other standards that allow for easy, interoperable access to speech technologies, such as: SRGS and SISR for speech grammars; SSML for speech synthesis; PLS for both recognition and synthesis, and CCXML for call control. These standards comprise the Speech Interface Framework, created in 2000 by James A. Larson, co-chair of the VBWG. The Framework today is not only almost complete, but also widely accessible in the markets of voice platforms, IVRs and speech engines.
Another active area is the Multimodal Interaction working group, whose goal is to define standards for the creation of multimodal interfaces. This group is lead by Deborah Dahl and is currently working on producing new standards.
The consortium recently released new standards and architectural changes. Can you talk a little about them and the benefit they provide?
The most recent W3C Recommendations (the final stage of a W3C specification) are:
- PLS 1.0, Pronunciation Lexicon Specification, which makes it possible to improve the pronunciation of words through phonetic languages or by transliteration. This can be a very useful tool for improving, by using standards, speech synthesis and speech recognition performance. PLS complements and completes the Speech Interface Framework mentioned earlier.
- EMMA 1.0, Extensible MultiModal Annotation, is a rich mark-up language that provides representations of multimodal inputs, whether via voice, gesture or pen/stylus. It can be used to convey complex results including N-best alternative results for speech recognition, and word lattices (graphs of word hypotheses). EMMA will prove especially valuable for mobile device application developers by facilitating and simplifying the creation of multimodal applications which make use of multiple input types - such as speech, touch screens, stylus, etc.
How are these technologies similar to the way the Web works?
There are similarities because, as with a Web browser - and there are many, from proprietary to open source - you can browse the entire Web using the http protocol. The basis for building speech applications is therefore the same: in a voice platform, there is a VoiceXML interpreter which can be accessed by http, and a Web application which generates VoiceXML instead of HTML.
What role do VoiceXML and Voice Browsing play in improving standards?
The role of VoiceXML has been of paramount importance because it proposed a Web-based model to describe voice and DTMF applications. This idea immediately provoked a giant transformation in voice platforms to accept these languages as primary. This helped to transition from a legacy world of platforms with proprietary application development and proprietary use of speech technologies, to standard VoiceXML platforms and a standardized way to develop voice applications.
I also believe that the standards proposed by the Voice Browser working group have helped to increase adoption in the speech applications industry.