Secret messages for Alexa and Co.

Hidden audio commands can manipulate voice assistants

Voice-assisted assistants like Alexa are practical but easy to manipulate, researchers now prove. © Petmal / iStock
Read out

Inaudible manipulation: The speech recognition of Alexa, Cortana, Siri and Co can be a gateway for subtle manipulation, as German IT researchers have discovered. For channels that are inaudible to us - hidden in a radio song, for example - secret commands can be sent to the assistants. These manipulative messages then make the system open a door or buy goods online - unnoticed by us.

Whether Alexa, Siri, Cortana or others: Assistant systems with speech recognition enjoy increasing popularity. These learning systems - often based on neural networks - respond to our voice commands, helping us to control our devices, delivering information from the web, or even buying online for us. These digital assistants are usually activated by their spoken name.

Hidden orders

"That's nice, as long as it is you who activates the assistant's functions, " explain Thorsten Holz from the Ruhr-Universität Bochum and his team. "But it becomes a problem if an attacker can do that as well. It gets even worse when you do not even hear that such an attack is taking place. "

This would be possible by the fact that for us barely audible audio commands are embedded in seemingly harmless noises - such as a radio song, a promotional jingle or the like. Similar attacks, known in technical jargon as Adversarial Examples, were described several years ago for image recognition software. Also already possible is the audio removal by means of acoustic cookies.

We hear something harmless, but the voice assistant gets an inaudible command. © Holz et al./ Ruhr-Universität Bochum

Manipulation successful

Whether and how such a manipulation works, the Bochum researchers have tried in an experiment. They hid audio commands in non-music channels of MP3 files. The added components sound like random noise to humans, which is not or hardly noticeable in the overall signal. "When the ear is busy processing a loud tone of a certain frequency, we can not hear any other, softer sound for a few milliseconds on that frequency, " explains Wood's colleague Dorothea Colossa. display

In the test, the researchers hid arbitrary commands in different types of audio signals, such as speech, birdsong or music. The manipulated audio file was incorporated into the Kaldi speech recognition system - the software used by Alexa and other digital assistants. The result: the speech recognition system understood the hidden commands and executed them.

Make assistant systems safer

"As one of many examples of where such an attack could be exploited, one can imagine a language assistant who can execute online orders, " says Holz. "We could manipulate an audio file, such as a song played on the radio, to have the order to purchase a specific product." If the assistant had access to components of a smarthome, it could An attacker thus also give the command, for example, to open a door lock or a Venetian blind.

With their experiments, the researchers want to identify potential risks and help make language assistants more robust against such attacks. For the now presented audio manipulation, it is conceivable that the systems calculate which parts of an audio signal are not audible for humans, and remove them. "However, there are other ways to hide the secret commands in the files than the MP3 principle, " Kolossa explains. And they would require other protections again.

Danger still low, but present

The danger of such audio attacks is still rather low, as the researchers emphasize. For most language assistants are currently not in security-relevant areas in use, but are merely for convenience. But that could change in the future. "But as the systems become more sophisticated and popular, work on the protection mechanisms has to continue, " says Holz.

In addition, the speech recognition system in their tests got the manipulated audio file by cable and not acoustically over the air. In future studies, the researchers want to show that the attack works well when the signal is played over a loudspeaker and transported through the air to the speech assistant. "Due to the background noise, the attack will not be that efficient anymore, " suspects Holz's colleague Lea Sch nherr. "But we assume it still works."

Examples of the manipulated audio files and other explanations can be found on the researchers' website.

(Ruhr-University Bochum, 25.09.2018 - NPO)