Chen et al., 2015 - Google Patents

Phone-centric local variability vector for text-constrained speaker verification.

Chen et al., 2015

Document ID: 2614221084964270421
Author: Chen L; Lee K; Ma B; Guo W; Li H; Dai L
Publication year: 2015
Publication venue: INTERSPEECH

External Links

Cited by

Snippet

This paper investigates the use of frame alignment given by a deep neural network (DNN) for text-constrained speaker verification task, where the lexical contents of the test utterances are limited to a finite set of vocabulary. The DNN makes use of information carried by the …

Continue reading at www.isca-archive.org (PDF) (other versions)

230000000875 corresponding 0 abstract description 6

Classifications

- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G10L15/183—Speech classification or search using natural language modelling using context dependencies, e.g. language models
- G10L15/187—Phonemic context, e.g. pronunciation rules, phonotactical constraints or phoneme n-grams
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/14—Speech classification or search using statistical models, e.g. hidden Markov models [HMMs]
- G10L15/142—Hidden Markov Models [HMMs]
- G10L15/144—Training of HMMs
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/065—Adaptation
- G10L15/07—Adaptation to the speaker
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G10L17/04—Training, enrolment or model building
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6267—Classification techniques
- G06K9/6268—Classification techniques relating to the classification paradigm, e.g. parametric or non-parametric approaches
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6217—Design or setup of recognition systems and techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
- G06K9/6232—Extracting features by transforming the feature space, e.g. multidimensional scaling; Mappings, e.g. subspace methods
- G06K9/6247—Extracting features by transforming the feature space, e.g. multidimensional scaling; Mappings, e.g. subspace methods based on an approximation criterion, e.g. principal component analysis
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/28—Constructional details of speech recognition systems
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G10L17/06—Decision making techniques; Pattern matching strategies
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computer systems based on biological models
- G06N3/02—Computer systems based on biological models using neural network models
- G06N3/08—Learning methods
- G06N3/082—Learning methods modifying the architecture, e.g. adding or deleting nodes or connections, pruning
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N99/00—Subject matter not provided for in other groups of this subclass

Similar Documents

Publication	Publication Date	Title
Raj et al.	2019	Probing the information encoded in x-vectors
An et al.	2019	Deep CNNs with self-attention for speaker identification
Lei et al.	2014	A novel scheme for speaker recognition using a phonetically-aware deep neural network
Jaitly et al.	2012	Application of Pretrained Deep Neural Networks to Large Vocabulary Speech Recognition.
Ghahabi et al.	2017	Deep learning backend for single and multisession i-vector speaker recognition
Senior et al.	2014	Improving DNN speaker independence with i-vector inputs
Ghosh et al.	2016	Representation learning for speech emotion recognition.
Miao et al.	2014	Towards speaker adaptive training of deep neural network acoustic models.
Ghahabi et al.	2014	Deep belief networks for i-vector based speaker recognition
Wang et al.	2017	What does the speaker embedding encode?
Shum et al.	2012	On the use of spectral and iterative methods for speaker diarization
Novoselov et al.	2018	Triplet Loss Based Cosine Similarity Metric Learning for Text-independent Speaker Recognition.
Garcia-Romero et al.	2016	Stacked Long-Term TDNN for Spoken Language Recognition.
Tomashenko et al.	2014	Speaker adaptation of context dependent deep neural networks based on MAP-adaptation and GMM-derived feature processing.
Tian et al.	2015	Investigation of bottleneck features and multilingual deep neural networks for speaker verification.
Ferrer et al.	2014	Spoken language recognition based on senone posteriors.
Chen et al.	2015	Phone-centric local variability vector for text-constrained speaker verification.
Tomashenko et al.	2016	On the Use of Gaussian Mixture Model Framework to Improve Speaker Adaptation of Deep Neural Network Acoustic Models.
Miguel et al.	2018	Tied hidden factors in neural networks for end-to-end speaker recognition
Gosztolya et al.	2015	Building context-dependent DNN acoustic models using Kullback-Leibler divergence-based state tying
Chen et al.	2016	Content-aware local variability vector for speaker verification with short utterance
Safari et al.	2016	From features to speaker vectors by means of restricted boltzmann machine adaptation
Mccree et al.	2018	Language Recognition for Telephone and Video Speech: The JHU HLTCOE Submission for NIST LRE17.
Jorrın et al.	2017	Dnn bottleneck features for speaker clustering
Joy et al.	2017	DNNs for unsupervised extraction of pseudo speaker-normalized features without explicit adaptation data