Tak et al., 2022 - Google Patents
Automatic speaker verification spoofing and deepfake detection using wav2vec 2.0 and data augmentationTak et al., 2022
View PDF- Document ID
- 4241556577822851334
- Author
- Tak H
- Todisco M
- Wang X
- Jung J
- Yamagishi J
- Evans N
- Publication year
- Publication venue
- arXiv preprint arXiv:2202.12233
External Links
Snippet
The performance of spoofing countermeasure systems depends fundamentally upon the use of sufficiently representative training data. With this usually being limited, current solutions typically lack generalisation to attacks encountered in the wild. Strategies to improve …
- 230000003416 augmentation 0 title abstract description 19
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6217—Design or setup of recognition systems and techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30781—Information retrieval; Database structures therefor; File system structures therefor of video data
- G06F17/30784—Information retrieval; Database structures therefor; File system structures therefor of video data using features automatically derived from the video content, e.g. descriptors, fingerprints, signatures, genre
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6267—Classification techniques
- G06K9/6268—Classification techniques relating to the classification paradigm, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/02—Feature extraction for speech recognition; Selection of recognition unit
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computer systems based on biological models
- G06N3/02—Computer systems based on biological models using neural network models
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signal, using source filter models or psychoacoustic analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/36—Image preprocessing, i.e. processing the image information without deciding about the identity of the image
- G06K9/46—Extraction of features or characteristics of the image
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N99/00—Subject matter not provided for in other groups of this subclass
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Tak et al. | Automatic speaker verification spoofing and deepfake detection using wav2vec 2.0 and data augmentation | |
US20220004870A1 (en) | Speech recognition method and apparatus, and neural network training method and apparatus | |
Tak et al. | Graph attention networks for anti-spoofing | |
Li et al. | Speaker-invariant affective representation learning via adversarial training | |
Stöter et al. | CountNet: Estimating the number of concurrent speakers using supervised learning | |
Kons et al. | Audio event classification using deep neural networks. | |
EP2695160B1 (en) | Speech syllable/vowel/phone boundary detection using auditory attention cues | |
Langari et al. | Efficient speech emotion recognition using modified feature extraction | |
Perero-Codosero et al. | X-vector anonymization using autoencoders and adversarial training for preserving speech privacy | |
Sun et al. | Ensemble softmax regression model for speech emotion recognition | |
US20190318723A1 (en) | Method for training voice data set, computer device, and computer-readable storage medium | |
Xia et al. | Using denoising autoencoder for emotion recognition. | |
Khan et al. | Voice spoofing countermeasures: Taxonomy, state-of-the-art, experimental analysis of generalizability, open challenges, and the way forward | |
Wang et al. | Can large-scale vocoded spoofed data improve speech spoofing countermeasure with a self-supervised front end? | |
Firooz et al. | Improvement of automatic speech recognition systems via nonlinear dynamical features evaluated from the recurrence plot of speech signals | |
US20230267950A1 (en) | Audio signal generation model and training method using generative adversarial network | |
Xue et al. | Cross-modal information fusion for voice spoofing detection | |
Wen et al. | Robust audio anti-spoofing with fusion-reconstruction learning on multi-order spectrograms | |
Soboleva et al. | Replacing human audio with synthetic audio for on-device unspoken punctuation prediction | |
Shivakumar et al. | Simplified and supervised i-vector modeling for speaker age regression | |
Cai et al. | The dku-dukeece system for the manipulation region location task of add 2023 | |
Vaca-Castano et al. | Using syllabic mel cepstrum features and k-nearest neighbors to identify anurans and birds species | |
Cornell et al. | Implicit acoustic echo cancellation for keyword spotting and device-directed speech detection | |
CN113345410A (en) | Training method of general speech and target speech synthesis model and related device | |
Nair et al. | Classification of Pitch and Gender of Speakers for Forensic Speaker Recognition from Disguised Voices Using Novel Features Learned by Deep Convolutional Neural Networks. |