It was reported earlier this year that HSBC UK’s voice ID technology has prevented £249 million worth of fraud in the last year. The bank says that since its launch in 2016 the technology has prevented £981 million of customers’ money from falling into the hands of fraudsters, with the rate of attempted fraud down 50% year-on-year as of May 2021.
It is use cases such as this that’s driving growth in voice biometrics solutions. Indeed, research shows that the voice biometrics market will increase from $1.1 billion in 2020 to $3.9 billion in 2026, driven by demand from the banking, financial services and insurance (BFSI) industry, as well as the need to reduce authentication costs.
But are banks and public services right to be reliant on voice? Are voice biometrics the future of verification?
Andersen Cheng, CEO of Nomidio says that a good fake voice (even just a good impersonator) can be enough to fool a human, but voice biometric software is much stronger at identifying differences which the human ear either doesn’t discern or chooses to ignore. This means that voice biometric ID can help prevent fraud if identity is checked against the voice.
“Even so-called deep fakes create a poor copy of someone’s voice when analysed at the digital level; they make quite convincing cameos, especially when combined with video, but again these are poor imitations at a digital level,” he says.
“Regardless of a potential vulnerability to recording a copy of my voice, they are vastly more secure than passwords and PINs, and are also much more difficult to copy and distribute for nefarious re-use than a password,” he adds. “You can’t brute-force attack a voice in the same way you can a password.”
Cheng also notes that voice requires presence – there’s a discernible difference between recording of a human voice me speaking and a recording of it.
“One is produced by my vocal chords and the volume of my throat and mouth, the other by the electronic vibration of a speaker membrane. So I need to be present to speak into my phone to create a genuine voice recording – which means you can tell if someone tries to use a recording or a clone of my voice to speak into a phone. In the same way it is possible to distinguish between a photograph taken of my face and a photograph taken of an existing photograph of my face.”
You might also like
However, like any single security measure, voice biometrics on their own can be vulnerable. Nomidio, advocates that multiple factors should be used to prove identity. A combination of, for example, voice, face and a PIN code is highly secure as any single factor may be possible to fake, but to fake all three in the same instance is virtually impossible.
Daniel Kornitzer, chief business development officer at online payments company Paysafe agrees.
“Like all biometric technologies, voice ID has pros and cons. On the positive side, it delivers a seamless user experience and it combines over 100 unique characteristics, both physiological (such as shape of the speaker’s larynx or nasal cavity and signature of the vocal chords) as well as behavioural (learned) attributes. However, voice-based systems need to be made impervious to the fraudsters’ potential use of cloned voice samples and deepfakes,” he says.
“The best way to make voice ID safer and more effective is to use multimodal authentication: imagine for example if voice ID were combined with a secret pass-phrase and used from your mobile phone, it would then combine biometry with knowledge and possession, delivering a virtually foolproof system. Voice ID is not perfect but can be a valuable component in a broader biometry-based ecosystem.”
Security vs. useability
Veritone is a voice synthesising specialist that recently launched its Marvel.ai product, which can perfectly capture a person’s voice, after that person reads a short script, and then apply that saved voice to any other text. Ryan Steelberg, co-founder and president of Veritone says that having experienced most of the text-to-speech technologies, today, the likelihood of an AI engine learning your voice to the quality of a biometric security test is extremely low.
“As the tech becomes better so too will the biometrics security. As we establish best practices with partners like Open Voice Network (OVN), the inaudible fingerprint will be the standard to protect all voices, and security systems will be able to detect these as synthetic.”
Ultimately, all security measures are a balance between security and useability – and voice has ease of use in its corner.
As Cheng notes: “The humble door key is vulnerable to picking and copying, but scores highly as the single most used security device because it is extremely easy to own, to carry with you and to deploy i.e. put in the lock and turn. Likewise with voice biometrics there are some vulnerabilities (albeit quite minor and very difficult to attack) but the ease of use is very high – you always have your voice with you and you don’t forget how to speak.