Robb Wilson’s Thoughts on AI Voice Cloning

post

October 13, 2022

Robb Wilson’s Thoughts on AI Voice Cloning

Recently, I was asked about voice replication technology and whether there were potential applications in human resource management scenarios, like replicating a CEO’s voice for internal communications. The timing of the question was interesting because the answer ties directly to something I’ve been thinking a lot about: the skeuomorphic phase of conversational design that we’re stuck in at the moment.

In visual design, early versions of Mac OS employed skeuomorphism well, shading the buttons on their calculator app to look similar to a real one. This was done to help users understand what it was they were looking at. As adoption widened, the need for those kinds of visual cues dissipated and, having served their purpose, visual skeuomorphs rode off into a well-earned sunset.

In conversational design, however, skeuomorphism is being applied in a backward fashion. Artificially replicating a human voice—whether it’s a specific human or a more generalized version—misses the main point of conversational AI in productivity settings. The real power of a conversational interface is that it can give users direct access to automations operating within a technology ecosystem of functionalities that are orchestrated to create better-than-human experiences. Trying to make people believe that they’re speaking with something vaguely human sets expectations that machines can’t meet. It makes more sense to get people accustomed to the fact that they are speaking with a machine—one that is very good at doing machine things like flawlessly executing complex processes and analyzing data.

From a business perspective, replicating a CEO’s voice creates far more liability than opportunity. The question led to another realization: the many deceitful ways this kind of technology could be used to mislead employees and customers are so potentially damaging that businesses will need to explicitly state that they will never use it. It’s a common business practice to let customers know that you will never call asking for information. This extends that further. Businesses will need to create private communication systems that rely on more substantial forms of authentication in an era when individual’s voices can be recreated and manipulated at-will.

In HR settings, I’ve learned directly through projects with Deloitte that conversational AI is most helpful to employees when it meets them where they are. During an onboarding process, receiving a call with a welcome message from an artificial version of the CEO would likely be alienating and odd. By contrast, if a new employee is greeted by an intelligent bot that gets forms completed and training material reviewed when it’s convenient for the new-hire, that represents time saved and, more importantly, a rewarding experience for the user. This bot wouldn’t need to seem overtly human in order to do its job well—that might even be a liability. If the new employee thinks they’ve been interacting with a human only to find out later that it was a machine, they could feel violated and become wary of future interactions within the organization. The better approach is to make it clear to the user that they’re interacting with a machine and then impress them with the machine’s ability to perform tasks in ways that surpass what humans alone are capable of.