[go: up one dir, main page]

Tak et al., 2022 - Google Patents

Automatic speaker verification spoofing and deepfake detection using wav2vec 2.0 and data augmentation

Tak et al., 2022

View PDF
Document ID
4241556577822851334
Author
Tak H
Todisco M
Wang X
Jung J
Yamagishi J
Evans N
Publication year
Publication venue
arXiv preprint arXiv:2202.12233

External Links

Snippet

The performance of spoofing countermeasure systems depends fundamentally upon the use of sufficiently representative training data. With this usually being limited, current solutions typically lack generalisation to attacks encountered in the wild. Strategies to improve …
Continue reading at arxiv.org (PDF) (other versions)

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06KRECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K9/00Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
    • G06K9/62Methods or arrangements for recognition using electronic means
    • G06K9/6217Design or setup of recognition systems and techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/30Information retrieval; Database structures therefor; File system structures therefor
    • G06F17/30781Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F17/30784Information retrieval; Database structures therefor; File system structures therefor of video data using features automatically derived from the video content, e.g. descriptors, fingerprints, signatures, genre
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06KRECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K9/00Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
    • G06K9/62Methods or arrangements for recognition using electronic means
    • G06K9/6267Classification techniques
    • G06K9/6268Classification techniques relating to the classification paradigm, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06NCOMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computer systems based on biological models
    • G06N3/02Computer systems based on biological models using neural network models
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signal, using source filter models or psychoacoustic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06KRECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K9/00Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
    • G06K9/36Image preprocessing, i.e. processing the image information without deciding about the identity of the image
    • G06K9/46Extraction of features or characteristics of the image
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06NCOMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N99/00Subject matter not provided for in other groups of this subclass

Similar Documents

Publication Publication Date Title
Tak et al. Automatic speaker verification spoofing and deepfake detection using wav2vec 2.0 and data augmentation
US20220004870A1 (en) Speech recognition method and apparatus, and neural network training method and apparatus
Tak et al. Graph attention networks for anti-spoofing
Li et al. Speaker-invariant affective representation learning via adversarial training
Stöter et al. CountNet: Estimating the number of concurrent speakers using supervised learning
Kons et al. Audio event classification using deep neural networks.
EP2695160B1 (en) Speech syllable/vowel/phone boundary detection using auditory attention cues
Langari et al. Efficient speech emotion recognition using modified feature extraction
Perero-Codosero et al. X-vector anonymization using autoencoders and adversarial training for preserving speech privacy
Sun et al. Ensemble softmax regression model for speech emotion recognition
US20190318723A1 (en) Method for training voice data set, computer device, and computer-readable storage medium
Xia et al. Using denoising autoencoder for emotion recognition.
Khan et al. Voice spoofing countermeasures: Taxonomy, state-of-the-art, experimental analysis of generalizability, open challenges, and the way forward
Wang et al. Can large-scale vocoded spoofed data improve speech spoofing countermeasure with a self-supervised front end?
Firooz et al. Improvement of automatic speech recognition systems via nonlinear dynamical features evaluated from the recurrence plot of speech signals
US20230267950A1 (en) Audio signal generation model and training method using generative adversarial network
Xue et al. Cross-modal information fusion for voice spoofing detection
Wen et al. Robust audio anti-spoofing with fusion-reconstruction learning on multi-order spectrograms
Soboleva et al. Replacing human audio with synthetic audio for on-device unspoken punctuation prediction
Shivakumar et al. Simplified and supervised i-vector modeling for speaker age regression
Cai et al. The dku-dukeece system for the manipulation region location task of add 2023
Vaca-Castano et al. Using syllabic mel cepstrum features and k-nearest neighbors to identify anurans and birds species
Cornell et al. Implicit acoustic echo cancellation for keyword spotting and device-directed speech detection
CN113345410A (en) Training method of general speech and target speech synthesis model and related device
Nair et al. Classification of Pitch and Gender of Speakers for Forensic Speaker Recognition from Disguised Voices Using Novel Features Learned by Deep Convolutional Neural Networks.