Guarding Against the Risk of "Fake Audio"

When Adobe demonstrated their Project VoCo at the MAX event in November the media focused mostly on the downside risks, though the tool potentially has many legitimate uses.

All you have to do is feed this experimental audio workstation tool a sample recording of someone speaking then you can rewrite the text and the audio will be automatically corrected, even creating new words with the correct voice.

For media production companies this kind of tool will be immensely useful, making it possible to correct voiceovers and dialog without having to bring the talent back into the studio. Actors might be less enthusiastic and will have to consider this possibility in their contracts.

But much of the media coverage focused on dark applications of this technology. With this tool you can literally put words in someone's mouth, editing a speech so that someone appears to say something they didn't. Sooner of later someone will actually do this in the wild, but I suspect that the biggest danger is just that politicians will learn to dismiss secret recordings as "fake audio". We will never know if an audio recording is genuine or not. Or will we?

VoCo is not available yet in any commercially available tool, and there is no news yet about when it might be available, but meantime I have been wondering if there is any way that people can protect themselves. To a certain extent the answer is yes;

Politicians will be able to guard against this type of audio manipulation to some degree by providing a video recording of each speech where there is an uncut single camera view of their lips. Any manipulated version of the audio will not match this reference recording and it will be much harder to change the video to match a faked audio track. It could be done with CGI but this would be expensive and take a long time.

Journalists can protect themselves from spurious audio recordings simply by being very suspicious about the provenance of recordings. A file that comes directly from a known source is more likely to be genuine than something found on YouTube or SoundCloud.

Everyone else should be wary of any audio recordings that do not come from a trustworthy source and especially when there is no matching video. I would also caution people to be more suspicious of phone calls that sound like they come from someone famous. Next time you have someone calling who claims to be Vladimir Putin don't believe it just because the voice is right.

But will this technology be a problem anyway? My first reaction last year was to fear the worst, but I said the same at the beginning of the 2000's when I first tried Syntrillium's Cool Edit Pro digital audio workstation software, a product that later became Adobe Audition. I used to demonstrate this by taking JFK's "Ask not what your country can do for you..." speech and with a few mouse movements I moved the "not", reversing the sense. You can do the same yourself using the free open-source audio tool Audacity.

At the time I suspected that this kind of manipulation would become commonplace but it never did, even though any user of any digital audio workstation tool could do this. Perhaps it didn't catch on because video dominates news cycles today and perhaps Adobe's VoCo will have a limited effect for the same reason. But in any case I suggest that everyone be aware of the risks that this technology poses. Be suspicious of any audio that comes to you through indirect sources and remember that with this kind of technology anyone can fake a voice realistically.

Lectures, Workshops, Coaching & Writing

For lectures, workshops, one-to-one coaching and writing about communications you can contact Andrew Hennigan at or 0046 73 089 44 75


Popular posts from this blog

Dear Best Regards: How to Start and End Your Emails

TED’s Magical Red Carpet

Reverting to Emails: Confusion and the Indian English Language