Which Is Better, Speech or Input, for IVR? Maybe Both
July 29, 2011
By David Sims, TMCnet Contributing Editor
A recent white paper titled “Increasing the use of Speech in IVR Applications” produced by Interactive Digital does a good job addressing that topic.
As the paper notes, “telephone self-service in general is one of the few technologies that is strongly disliked by the user community. This is largely because the bulk of the implementations have been done so poorly.”
Indeed -- it only takes a few experiences with badly-done IVR -- “Press 17 for...” -- to turn somebody off to the whole concept and simply press “0” for the operator as soon as the menu starts.
It still remains wildly popular among customer service providers, however, primarily for the eyebrow-raising cost savings which are quite difficult to find anywhere else. As the paper notes, the projected CAGR for this market for 2010 - 2015 is 7.3 percent, with the outsourced sector leading the charge: “Virtually every enterprise has installed some form of telephone self-service.”
A well-designed voice application takes into account the fact that the customer may be calling from a noisy background, or that they simply do not speak clearly enough. A good Voice User Interface designs around this, determining the best input modality for a given dialogue interaction point.
Also, certain tasks are best suited to speech as a means of input to self-service voice applications, as the paper notes, adding that “this usually includes easily recognizable yes/no options, menu selections and other short and unambiguous speech interactions.”
If you want your customers entering lengthy account or ID numbers, a PIN or an easily misunderstood phrase, better give them key-in options.
The white paper also notes that “specific interaction points in the voice application dialogue may generally lend themselves better to one form of input modality over another,” such as speech or key input. An elderly caller entering a prescription number from a medicine bottle or a credit card number from a credit card is facing a tough task and needs to speak to someone.
“Or it could be a 20 year old, tech savvy iPhone (News - Alert) user looking to see if funds have
cleared their bank account yet,” the paper adds, explaining that yes, “cool as the iPhone is, if he keeps taking it away from his ear to press the last four digits of his social, his PIN or whatever,” there’s a good chance he’ll miss a command or prompt.
This happens, too, we can add from experience. “Frustrating” is the nice word for it.
The paper talks about Best Modality Signaling in a caller adaptive-environment as allowing more information on whether Speech or DTMF is the optimal form of input for a given caller, task and environment.
“As part of the Adaptive Audio software available from Interactive Digital, the BMS
feature continuously signals the voice application regarding which form of input modality (speech or touch-tone) is optimal at each interaction point in the call script. It does this in real time during the call, based on analyzing which input mode has historically been more efficient.”
David Sims is a contributing editor for TMCnet. To read more of David’s articles, please visit his columnist page. He also blogs for TMCnet here.Edited by Juliana Kenny