Chen et al., 2015 - Google Patents
Phone-centric local variability vector for text-constrained speaker verification.Chen et al., 2015
View PDF- Document ID
- 2614221084964270421
- Author
- Chen L
- Lee K
- Ma B
- Guo W
- Li H
- Dai L
- Publication year
- Publication venue
- INTERSPEECH
External Links
Snippet
This paper investigates the use of frame alignment given by a deep neural network (DNN) for text-constrained speaker verification task, where the lexical contents of the test utterances are limited to a finite set of vocabulary. The DNN makes use of information carried by the …
- 230000000875 corresponding 0 abstract description 6
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G10L15/183—Speech classification or search using natural language modelling using context dependencies, e.g. language models
- G10L15/187—Phonemic context, e.g. pronunciation rules, phonotactical constraints or phoneme n-grams
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/14—Speech classification or search using statistical models, e.g. hidden Markov models [HMMs]
- G10L15/142—Hidden Markov Models [HMMs]
- G10L15/144—Training of HMMs
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/065—Adaptation
- G10L15/07—Adaptation to the speaker
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G10L17/04—Training, enrolment or model building
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6267—Classification techniques
- G06K9/6268—Classification techniques relating to the classification paradigm, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6217—Design or setup of recognition systems and techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
- G06K9/6232—Extracting features by transforming the feature space, e.g. multidimensional scaling; Mappings, e.g. subspace methods
- G06K9/6247—Extracting features by transforming the feature space, e.g. multidimensional scaling; Mappings, e.g. subspace methods based on an approximation criterion, e.g. principal component analysis
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/28—Constructional details of speech recognition systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G10L17/06—Decision making techniques; Pattern matching strategies
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computer systems based on biological models
- G06N3/02—Computer systems based on biological models using neural network models
- G06N3/08—Learning methods
- G06N3/082—Learning methods modifying the architecture, e.g. adding or deleting nodes or connections, pruning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N99/00—Subject matter not provided for in other groups of this subclass
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Raj et al. | Probing the information encoded in x-vectors | |
An et al. | Deep CNNs with self-attention for speaker identification | |
Lei et al. | A novel scheme for speaker recognition using a phonetically-aware deep neural network | |
Jaitly et al. | Application of Pretrained Deep Neural Networks to Large Vocabulary Speech Recognition. | |
Ghahabi et al. | Deep learning backend for single and multisession i-vector speaker recognition | |
Senior et al. | Improving DNN speaker independence with i-vector inputs | |
Ghosh et al. | Representation learning for speech emotion recognition. | |
Miao et al. | Towards speaker adaptive training of deep neural network acoustic models. | |
Ghahabi et al. | Deep belief networks for i-vector based speaker recognition | |
Wang et al. | What does the speaker embedding encode? | |
Shum et al. | On the use of spectral and iterative methods for speaker diarization | |
Novoselov et al. | Triplet Loss Based Cosine Similarity Metric Learning for Text-independent Speaker Recognition. | |
Garcia-Romero et al. | Stacked Long-Term TDNN for Spoken Language Recognition. | |
Tomashenko et al. | Speaker adaptation of context dependent deep neural networks based on MAP-adaptation and GMM-derived feature processing. | |
Tian et al. | Investigation of bottleneck features and multilingual deep neural networks for speaker verification. | |
Ferrer et al. | Spoken language recognition based on senone posteriors. | |
Chen et al. | Phone-centric local variability vector for text-constrained speaker verification. | |
Tomashenko et al. | On the Use of Gaussian Mixture Model Framework to Improve Speaker Adaptation of Deep Neural Network Acoustic Models. | |
Miguel et al. | Tied hidden factors in neural networks for end-to-end speaker recognition | |
Gosztolya et al. | Building context-dependent DNN acoustic models using Kullback-Leibler divergence-based state tying | |
Chen et al. | Content-aware local variability vector for speaker verification with short utterance | |
Safari et al. | From features to speaker vectors by means of restricted boltzmann machine adaptation | |
Mccree et al. | Language Recognition for Telephone and Video Speech: The JHU HLTCOE Submission for NIST LRE17. | |
Jorrın et al. | Dnn bottleneck features for speaker clustering | |
Joy et al. | DNNs for unsupervised extraction of pseudo speaker-normalized features without explicit adaptation data |