For bots and humans alike, successful conversations hinge on the ability to listen well. It’s nearly impossible for bad listeners—whether they’re humans or machines—to provide an excellent customer experience. So while it’s often tempting to emphasize the speed with which a machine can reply to a query, the smarter design choice is often to slow things down to make sure a user can finish a thought before the machine jumps toward formulating a response.
One of the key indicators that help bots cater to the way that humans converse is by having them look for a pause—called an inter-speech time out. This is reflective of the way humans know if someone has completed a thought, and it’s one of the most simple options to look at when working on end-of-speech detection.
There’s a tendency with conversational AI to want to show off a bot’s fast response time—essentially parading around algorithms that are speedy when responding to a sentence. While this makes for a flashy demo, if a real life user says “yes” before pausing to complete their thought, the bot is left needing to backtrack and start from an earlier point in the
conversation. This scenario creates a puzzle for conversational designers and frustrates users. It should be avoided whenever possible. It’s better to be accurate than fast, and you can still give someone feedback, so they are not waiting in silence.
It’s hard enough for humans to quickly determine that someone has completed a thought—it can be even more challenging for machines, especially over the phone. When they fail at this, machines often cut o a user or completely misunderstand their request.
Good bot experiences require bots that are good at listening, but a bot providing a faster response doesn’t necessarily mean it’s providing a better experience.
The goal isn’t to mimic human behavior, it’s to be useful. An interaction with a machine that’s mildly stilted because it’s being careful to make sure users finish their thoughts is vastly preferable to one that’s trying to match the flow of natural conversation or dazzle with speed.
Inter-speech timeout is a design consideration, not a latency consideration. Good design requires context.
Inter-speech timeouts are critical to the success of conversational AI on voice modals. By definition, inter-speech timeouts are the pauses between words or phrases in speech. The length of these pauses allows your chosen speech-to-text (STT) engines or your voice engine to determine when a customer is done speaking (or when they complete thought) and to process results.
Voice platforms and STT vendors rarely allow for the granular control of these timeouts for each user response. We believe this level of fine-tuning is critical for the success of any voice-based solution. In order to create a great voice experience, you need granular control of inter-speech timeout by phrase so that you can design for the context of the question. This includes a combination of STT options and direct voice platform control.
There are several key considerations to take into account when optimizing inter-speech timeouts. Here are four.
Inter-speech timeouts are critical to the success of speech-to-text for conversational AI. The timeout should be optimized for the task at hand and the acoustic environment. It’s a difficult balance to strike, but one that is necessary to provide users with the best possible experience. You need to be sure you’re equipping yourself with this art so you can get the most out of your speech-to-text application.
Trying to predict when a person has completed a thought is a party trick with not a lot of upside. Instead, try to optimize for each question and context for the highest containment rates – look to the data around containment rates (in particular, your rate of understanding) as your north star.