research-article

MASKER: Adaptive Mobile Security Enhancement against Automatic Speech Recognition in Eavesdropping

Authors:

Xiang ChenAuthors Info & Claims

DAC '19: Proceedings of the 56th Annual Design Automation Conference 2019

Article No.: 163, Pages 1 - 6

https://doi.org/10.1145/3316781.3317861

Published: 02 June 2019 Publication History

Abstract

Benefited from recent artificial intelligence evolution, Automatic Speech Recognition (ASR) technology has achieved enormous performance improvement and wider application. Unfortunately, ASR is also heavily leveraged by speech eavesdropping, where ASR is used to translate large volume of intercepted vocal speech into text content, causing considerable information leakage. In this work, we propose MASKER -- a mobile security enhancement solution to protect the mobile speech data from ASR in eavesdropping. By identifying ASR models' ubiquitous vulnerability, MASKER is designed to generate human imperceptible adversarial noises into the real-time speech on the mobile device (e.g. phone call and voice message). Even the speech data is exposed to eavesdropping during data transmission, the adversarial noises can effectively perturb the ASR process with significant Word Error Rate (WER). Meanwhile, MASKER is further optimized for mobile user perception quality and enhanced for environmental noises adaptation. Moreover, MASKER has outstanding computation efficiency for mobile system integration. Experiments show that, MASKER can achieve security enhancement with an average WER of 84.55% for ASR perturbation, 32% noise reduction for user perception quality and 16× faster processing speed compared to the state-of-the-art method.

References

[1]

Martín Abadi et al. 2016. TensorFlow: A System for Large-Scale Machine Learning. In USENIX Symposium on Operating Systems Design and Implementation.

Digital Library

[2]

Khan Suhail Ahmad and et al. 2015. A unique approach in text independent speaker recognition using MFCC feature sets and probabilistic neural network. In Advances in Pattern Recognition (ICAPR), 2015 Eighth International Conference on. IEEE, 1--6.

[3]

Moustafa Alzantot and et al. 2018. Did you hear that? Adversarial Examples Against Automatic Speech Recognition. arXiv:1801.00554 (2018).

[4]

Inc. Amazon.com. 2018. Amazon Alexa. https://developer.amazon.com/alexa

[5]

le Inc. App. 2018. Apple Siri. https://www.apple.com/ios/siri/

[6]

David Butcher and et al. 2007. Security challenge and defense in VoIP infrastructures. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews) 37, 6 (2007), 1152--1162.

Digital Library

[7]

Nicholas Carlini and et al. 2018. Audio Adversarial Examples: Targeted Attacks on Speech-to-Text. arXiv:1801.01944 (2018).

[8]

Leslie Cauley. 2006. NSA has massive database of AmericansâĂ&Zacute; phone calls. USA today 11, 06 (2006).

[9]

Microsoft Corporation. 2018. Bing Voice. https://azure.microsoft.com/en-us/services/cognitive-services/speech/

[10]

Mozilla Corporation. 2018. Common Voice. https://voice.mozilla.org/en

[11]

Stanley A Gelfand. 2017. Hearing: An introduction to psychological and physiological acoustics. CRC Press.

[12]

Yuan Gong and et al. 2017. Crafting Adversarial Examples For Speech Paralinguistics Applications. arXiv:1711.03280 (2017).

[13]

Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. 2014. Explaining and harnessing adversarial examples. arXiv:1412.6572 (2014).

[14]

Awni Hannun and et al. 2014. Deep speech: Scaling up end-to-end speech recognition. arXiv preprint arXiv:1412.5567 (2014).

[15]

Howard Jonas and et al. 2000. Method and apparatus for enabling transmission of data packets over a bypass circuit-switched public telephone connection. US Patent 6,137,792.

[16]

Veton Z Këpuska and et al. 2015. Robust speech recognition system using conventional and hybrid features of MFCC, LPCC, PLP, RASTA-PLP and hidden markov model classifier in noisy conditions. Journal of Computer and Communications 3, 06 (2015), 1.

[17]

Eric Limer. {n. d.}. Hundreds of Apps Can Eavesdrop Through Phone Microphones to Target Ads.

[18]

Chung-Yu Lin. 2013. Method of identity authentication and fraudulent phone call verification that utilizes an identification code of a communication device and a dynamic password. US Patent 8,549,594.

[19]

Google LLC. 2018. Google Voice. https://voice.google.com/u/0/about

[20]

Yajie Miao and et al. 2015. EESEN: End-to-end speech recognition using deep RNN models and WFST-based decoding. In Automatic Speech Recognition and Understanding (ASRU), 2015 IEEE Workshop on. IEEE, 167--174.

[21]

Jyhi-Kong Wey and et al. 1995. Clone terminator: An authentication service for advanced mobile phone system. In Vehicular Technology Conference, 1995 IEEE 45th, Vol. 1. IEEE, 175--179.

[22]

Dong Yu and et al. 2016. Automatic Speech Recognition. Springer.

[23]

Xuejing Yuan and et al. 2018. CommanderSong: A Systematic Approach for Practical Adversarial Voice Recognition. arXiv:1801.08535 (2018).

Digital Library

Cited By

Gyulyustan HHristov HStavrev SEnkov S(2024)Measuring and analysis of speech-to-text accuracy of some automatic speech recognition services in dynamic environment conditionsINTERNATIONAL CONFERENCE ON ENVIRONMENTAL, MINING, AND SUSTAINABLE DEVELOPMENT 202210.1063/5.0196448(030001)Online publication date: 2024
https://doi.org/10.1063/5.0196448
Wang ZHe SLi G(2024)Secure speech-recognition data transfer in the internet of things using a power system and a tried-and-true key generation techniqueCluster Computing10.1007/s10586-024-04649-327:10(14669-14684)Online publication date: 29-Jul-2024
https://doi.org/10.1007/s10586-024-04649-3

Recommendations

HAMPER: high-performance adaptive mobile security enhancement against malicious speech and image recognition
ASPDAC '19: Proceedings of the 24th Asia and South Pacific Design Automation Conference

Recently, the machine learning technologies have been widely used in cognitive applications such as Automatic Speech Recognition (ASR) and Image Recognition (IR). Unfortunately, these techniques have been massively used in unauthorized audio/image data ...
Prosody modification for speech recognition in emotionally mismatched conditions

A degradation in the performance of automatic speech recognition systems (ASR) is observed in mismatched training and testing conditions. One of the reasons for this degradation is due to the presence of emotions in the speech. The main objective of ...
Artificial neural networks as speech recognisers for dysarthric speech: Identifying the best-performing set of MFCC parameters and studying a speaker-independent approach

Dysarthria is a neurological impairment of controlling the motor speech articulators that compromises the speech signal. Automatic Speech Recognition (ASR) can be very helpful for speakers with dysarthria because the disabled persons are often ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

DAC '19: Proceedings of the 56th Annual Design Automation Conference 2019

June 2019

1378 pages

ISBN:9781450367257

DOI:10.1145/3316781

Copyright © 2019 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGDA: ACM Special Interest Group on Design Automation
IEEE-CEDA

In-Cooperation

SIGBED: ACM Special Interest Group on Embedded Systems

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 June 2019

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

DAC '19

Sponsor:

SIGDA

DAC '19: The 56th Annual Design Automation Conference 2019

June 2 - 6, 2019

NV, Las Vegas, USA

Acceptance Rates

Overall Acceptance Rate 1,770 of 5,499 submissions, 32%

Upcoming Conference

DAC '25

Sponsor:
sigda

62nd ACM/IEEE Design Automation Conference

June 22 - 26, 2025

San Francisco , CA , USA

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
230
Total Downloads

Downloads (Last 12 months)6
Downloads (Last 6 weeks)1

Reflects downloads up to 11 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Gyulyustan HHristov HStavrev SEnkov S(2024)Measuring and analysis of speech-to-text accuracy of some automatic speech recognition services in dynamic environment conditionsINTERNATIONAL CONFERENCE ON ENVIRONMENTAL, MINING, AND SUSTAINABLE DEVELOPMENT 202210.1063/5.0196448(030001)Online publication date: 2024
https://doi.org/10.1063/5.0196448
Wang ZHe SLi G(2024)Secure speech-recognition data transfer in the internet of things using a power system and a tried-and-true key generation techniqueCluster Computing10.1007/s10586-024-04649-327:10(14669-14684)Online publication date: 29-Jul-2024
https://doi.org/10.1007/s10586-024-04649-3

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten