Li et al., 2023 - Google Patents

Learning normality is enough: a software-based mitigation against inaudible voice attacks

Li et al., 2023

Document ID: 5986533926879498456
Author: Li X; Ji X; Yan C; Li C; Li Y; Zhang Z; Xu W
Publication year: 2023
Publication venue: 32nd USENIX Security Symposium (USENIX Security 23)

External Links

Cited by

Snippet

Inaudible voice attacks silently inject malicious voice commands into voice assistants to manipulate voice-controlled devices such as smart speakers. To alleviate such threats for both existing and future devices, this paper proposes NormDetect, a software-based …

Continue reading at www.usenix.org (PDF) (other versions)

230000000116 mitigating effect 0 title abstract description 8

Classifications

- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signal, using source filter models or psychoacoustic analysis
- G10L19/018—Audio watermarking, i.e. embedding inaudible data in the audio signal
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 specially adapted for particular use for comparison or discrimination
- G10L25/66—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 specially adapted for particular use for comparison or discrimination for extracting parameters related to health condition
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/02—Feature extraction for speech recognition; Selection of recognition unit
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G10L17/04—Training, enrolment or model building
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/50—Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
- G06F21/55—Detecting local intrusion or implementing counter-measures
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice

Similar Documents

Publication	Publication Date	Title
Abdullah et al.	2021	Sok: The faults in our asrs: An overview of attacks against automatic speech recognition and speaker identification systems
Chen et al.	2021	Who is real bob? adversarial attacks on speaker recognition systems
Ahmed et al.	2020	Void: A fast and light voice liveness detection system
Koffas et al.	2022	Can you hear it? backdoor attacks via ultrasonic triggers
Yan et al.	2022	A survey on voice assistant security: Attacks and countermeasures
Li et al.	2019	Adversarial music: Real world audio adversary against wake-word detection system
Li et al.	2023	Learning normality is enough: a software-based mitigation against inaudible voice attacks
Zhang et al.	2017	Hearing your voice is not enough: An articulatory gesture based liveness detection for voice authentication
KR102386155B1 (en)	2022-04-12	How to protect your voice assistant from being controlled by machine learning-based silent commands
Wang et al.	2019	Secure your voice: An oral airflow-based continuous liveness detection for voice assistants
Roy et al.	2016	Listening through a vibration motor
US20200243067A1 (en)	2020-07-30	Environment classifier for detection of laser-based audio injection attacks
Jiang et al.	2022	Securing liveness detection for voice authentication via pop noises
Zong et al.	2023	Trojanmodel: A practical trojan attack against automatic speech recognition systems
CN116868265A (en)	2023-10-10	Systems and methods for data enhancement and speech processing in dynamic acoustic environments
Liu et al.	2021	Defending against microphone-based attacks with personalized noise
Guo et al.	2023	Phantomsound: Black-box, query-efficient audio adversarial attack via split-second phoneme injection
He et al.	2024	Fast and lightweight voice replay attack detection via time-frequency spectrum difference
Salvi et al.	2025	Poliphone: A dataset for smartphone model identification from audio recordings
Mathur et al.	2018	On robustness of cloud speech apis: An early characterization
Shahid et al.	2023	" Is this my president speaking?" Tamper-proofing Speech in Live Recordings
Cao et al.	2022	LiveProbe: Exploring continuous voice liveness detection via phonemic energy response patterns
WO2025031170A1 (en)	2025-02-13	Voiceprint recognition system evaluation method and apparatus, storage medium, and electronic device
US20240086759A1 (en)	2024-03-14	System and Method for Watermarking Training Data for Machine Learning Models
Nagakrishnan et al.	2022	Generic speech based person authentication system with genuine and spoofed utterances: different feature sets and models