[go: up one dir, main page]

skip to main content
10.1145/3316781.3317861acmconferencesArticle/Chapter ViewAbstractPublication PagesdacConference Proceedingsconference-collections
research-article

MASKER: Adaptive Mobile Security Enhancement against Automatic Speech Recognition in Eavesdropping

Published: 02 June 2019 Publication History

Abstract

Benefited from recent artificial intelligence evolution, Automatic Speech Recognition (ASR) technology has achieved enormous performance improvement and wider application. Unfortunately, ASR is also heavily leveraged by speech eavesdropping, where ASR is used to translate large volume of intercepted vocal speech into text content, causing considerable information leakage. In this work, we propose MASKER -- a mobile security enhancement solution to protect the mobile speech data from ASR in eavesdropping. By identifying ASR models' ubiquitous vulnerability, MASKER is designed to generate human imperceptible adversarial noises into the real-time speech on the mobile device (e.g. phone call and voice message). Even the speech data is exposed to eavesdropping during data transmission, the adversarial noises can effectively perturb the ASR process with significant Word Error Rate (WER). Meanwhile, MASKER is further optimized for mobile user perception quality and enhanced for environmental noises adaptation. Moreover, MASKER has outstanding computation efficiency for mobile system integration. Experiments show that, MASKER can achieve security enhancement with an average WER of 84.55% for ASR perturbation, 32% noise reduction for user perception quality and 16× faster processing speed compared to the state-of-the-art method.

References

[1]
Martín Abadi et al. 2016. TensorFlow: A System for Large-Scale Machine Learning. In USENIX Symposium on Operating Systems Design and Implementation.
[2]
Khan Suhail Ahmad and et al. 2015. A unique approach in text independent speaker recognition using MFCC feature sets and probabilistic neural network. In Advances in Pattern Recognition (ICAPR), 2015 Eighth International Conference on. IEEE, 1--6.
[3]
Moustafa Alzantot and et al. 2018. Did you hear that? Adversarial Examples Against Automatic Speech Recognition. arXiv:1801.00554 (2018).
[4]
Inc. Amazon.com. 2018. Amazon Alexa. https://developer.amazon.com/alexa
[5]
le Inc. App. 2018. Apple Siri. https://www.apple.com/ios/siri/
[6]
David Butcher and et al. 2007. Security challenge and defense in VoIP infrastructures. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews) 37, 6 (2007), 1152--1162.
[7]
Nicholas Carlini and et al. 2018. Audio Adversarial Examples: Targeted Attacks on Speech-to-Text. arXiv:1801.01944 (2018).
[8]
Leslie Cauley. 2006. NSA has massive database of AmericansâĂŹ phone calls. USA today 11, 06 (2006).
[9]
Microsoft Corporation. 2018. Bing Voice. https://azure.microsoft.com/en-us/services/cognitive-services/speech/
[10]
Mozilla Corporation. 2018. Common Voice. https://voice.mozilla.org/en
[11]
Stanley A Gelfand. 2017. Hearing: An introduction to psychological and physiological acoustics. CRC Press.
[12]
Yuan Gong and et al. 2017. Crafting Adversarial Examples For Speech Paralinguistics Applications. arXiv:1711.03280 (2017).
[13]
Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. 2014. Explaining and harnessing adversarial examples. arXiv:1412.6572 (2014).
[14]
Awni Hannun and et al. 2014. Deep speech: Scaling up end-to-end speech recognition. arXiv preprint arXiv:1412.5567 (2014).
[15]
Howard Jonas and et al. 2000. Method and apparatus for enabling transmission of data packets over a bypass circuit-switched public telephone connection. US Patent 6,137,792.
[16]
Veton Z Këpuska and et al. 2015. Robust speech recognition system using conventional and hybrid features of MFCC, LPCC, PLP, RASTA-PLP and hidden markov model classifier in noisy conditions. Journal of Computer and Communications 3, 06 (2015), 1.
[17]
Eric Limer. {n. d.}. Hundreds of Apps Can Eavesdrop Through Phone Microphones to Target Ads.
[18]
Chung-Yu Lin. 2013. Method of identity authentication and fraudulent phone call verification that utilizes an identification code of a communication device and a dynamic password. US Patent 8,549,594.
[19]
Google LLC. 2018. Google Voice. https://voice.google.com/u/0/about
[20]
Yajie Miao and et al. 2015. EESEN: End-to-end speech recognition using deep RNN models and WFST-based decoding. In Automatic Speech Recognition and Understanding (ASRU), 2015 IEEE Workshop on. IEEE, 167--174.
[21]
Jyhi-Kong Wey and et al. 1995. Clone terminator: An authentication service for advanced mobile phone system. In Vehicular Technology Conference, 1995 IEEE 45th, Vol. 1. IEEE, 175--179.
[22]
Dong Yu and et al. 2016. Automatic Speech Recognition. Springer.
[23]
Xuejing Yuan and et al. 2018. CommanderSong: A Systematic Approach for Practical Adversarial Voice Recognition. arXiv:1801.08535 (2018).

Cited By

View all
  • (2024)Measuring and analysis of speech-to-text accuracy of some automatic speech recognition services in dynamic environment conditionsINTERNATIONAL CONFERENCE ON ENVIRONMENTAL, MINING, AND SUSTAINABLE DEVELOPMENT 202210.1063/5.0196448(030001)Online publication date: 2024
  • (2024)Secure speech-recognition data transfer in the internet of things using a power system and a tried-and-true key generation techniqueCluster Computing10.1007/s10586-024-04649-327:10(14669-14684)Online publication date: 29-Jul-2024

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
DAC '19: Proceedings of the 56th Annual Design Automation Conference 2019
June 2019
1378 pages
ISBN:9781450367257
DOI:10.1145/3316781
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 June 2019

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Adversarial Example
  2. Automatic Speech Recognition

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

DAC '19
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,770 of 5,499 submissions, 32%

Upcoming Conference

DAC '25
62nd ACM/IEEE Design Automation Conference
June 22 - 26, 2025
San Francisco , CA , USA

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)6
  • Downloads (Last 6 weeks)1
Reflects downloads up to 11 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Measuring and analysis of speech-to-text accuracy of some automatic speech recognition services in dynamic environment conditionsINTERNATIONAL CONFERENCE ON ENVIRONMENTAL, MINING, AND SUSTAINABLE DEVELOPMENT 202210.1063/5.0196448(030001)Online publication date: 2024
  • (2024)Secure speech-recognition data transfer in the internet of things using a power system and a tried-and-true key generation techniqueCluster Computing10.1007/s10586-024-04649-327:10(14669-14684)Online publication date: 29-Jul-2024

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media