Tak et al., 2022 - Google Patents

Automatic speaker verification spoofing and deepfake detection using wav2vec 2.0 and data augmentation

Tak et al., 2022

Document ID: 4241556577822851334
Author: Tak H; Todisco M; Wang X; Jung J; Yamagishi J; Evans N
Publication year: 2022
Publication venue: arXiv preprint arXiv:2202.12233

External Links

Cited by

Snippet

The performance of spoofing countermeasure systems depends fundamentally upon the use of sufficiently representative training data. With this usually being limited, current solutions typically lack generalisation to attacks encountered in the wild. Strategies to improve …

Continue reading at arxiv.org (PDF) (other versions)

230000003416 augmentation 0 title abstract description 19

Classifications

- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6217—Design or setup of recognition systems and techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30781—Information retrieval; Database structures therefor; File system structures therefor of video data
- G06F17/30784—Information retrieval; Database structures therefor; File system structures therefor of video data using features automatically derived from the video content, e.g. descriptors, fingerprints, signatures, genre
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6267—Classification techniques
- G06K9/6268—Classification techniques relating to the classification paradigm, e.g. parametric or non-parametric approaches
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/02—Feature extraction for speech recognition; Selection of recognition unit
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computer systems based on biological models
- G06N3/02—Computer systems based on biological models using neural network models
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signal, using source filter models or psychoacoustic analysis
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/36—Image preprocessing, i.e. processing the image information without deciding about the identity of the image
- G06K9/46—Extraction of features or characteristics of the image
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N99/00—Subject matter not provided for in other groups of this subclass

Similar Documents

Publication	Publication Date	Title
Tak et al.	2022	Automatic speaker verification spoofing and deepfake detection using wav2vec 2.0 and data augmentation
US20220004870A1 (en)	2022-01-06	Speech recognition method and apparatus, and neural network training method and apparatus
Tak et al.	2021	Graph attention networks for anti-spoofing
Li et al.	2020	Speaker-invariant affective representation learning via adversarial training
Stöter et al.	2018	CountNet: Estimating the number of concurrent speakers using supervised learning
Kons et al.	2013	Audio event classification using deep neural networks.
EP2695160B1 (en)	2020-01-08	Speech syllable/vowel/phone boundary detection using auditory attention cues
Langari et al.	2020	Efficient speech emotion recognition using modified feature extraction
Perero-Codosero et al.	2022	X-vector anonymization using autoencoders and adversarial training for preserving speech privacy
Sun et al.	2017	Ensemble softmax regression model for speech emotion recognition
US20190318723A1 (en)	2019-10-17	Method for training voice data set, computer device, and computer-readable storage medium
Xia et al.	2013	Using denoising autoencoder for emotion recognition.
Khan et al.	2022	Voice spoofing countermeasures: Taxonomy, state-of-the-art, experimental analysis of generalizability, open challenges, and the way forward
Wang et al.	2024	Can large-scale vocoded spoofed data improve speech spoofing countermeasure with a self-supervised front end?
Firooz et al.	2017	Improvement of automatic speech recognition systems via nonlinear dynamical features evaluated from the recurrence plot of speech signals
US20230267950A1 (en)	2023-08-24	Audio signal generation model and training method using generative adversarial network
Xue et al.	2023	Cross-modal information fusion for voice spoofing detection
Wen et al.	2023	Robust audio anti-spoofing with fusion-reconstruction learning on multi-order spectrograms
Soboleva et al.	2021	Replacing human audio with synthetic audio for on-device unspoken punctuation prediction
Shivakumar et al.	2014	Simplified and supervised i-vector modeling for speaker age regression
Cai et al.	2023	The dku-dukeece system for the manipulation region location task of add 2023
Vaca-Castano et al.	2010	Using syllabic mel cepstrum features and k-nearest neighbors to identify anurans and birds species
Cornell et al.	2023	Implicit acoustic echo cancellation for keyword spotting and device-directed speech detection
CN113345410A (en)	2021-09-03	Training method of general speech and target speech synthesis model and related device
Nair et al.	2021	Classification of Pitch and Gender of Speakers for Forensic Speaker Recognition from Disguised Voices Using Novel Features Learned by Deep Convolutional Neural Networks.