Xie et al., 2017 - Google Patents

RNN-LDA Clustering for Feature Based DNN Adaptation.

Xie et al., 2017

Document ID: 16357101372156606636
Author: Xie X; Liu X; Lee T; Wang L
Publication year: 2017
Publication venue: INTERSPEECH

External Links

Cited by

Snippet

Abstract Model based deep neural network (DNN) adaptation approaches often require multi-pass decoding in test time. Input feature based DNN adaptation, for example, based on latent Dirichlet allocation (LDA) clustering, provide a more efficient alternative. In …

Continue reading at www.isca-archive.org (PDF) (other versions)

230000004301 light adaptation 0 title abstract description 50

Classifications

- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/14—Speech classification or search using statistical models, e.g. hidden Markov models [HMMs]
- G10L15/142—Hidden Markov Models [HMMs]
- G10L15/144—Training of HMMs
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/065—Adaptation
- G10L15/07—Adaptation to the speaker
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L2015/088—Word spotting
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/02—Feature extraction for speech recognition; Selection of recognition unit
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/28—Constructional details of speech recognition systems
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G10L17/06—Decision making techniques; Pattern matching strategies
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G10L17/04—Training, enrolment or model building
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6217—Design or setup of recognition systems and techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6296—Graphical models, e.g. Bayesian networks
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/04—Segmentation; Word boundary detection
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
- G10L25/27—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 characterised by the analysis technique
- G10L25/30—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 characterised by the analysis technique using neural networks
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6267—Classification techniques
- G06K9/6268—Classification techniques relating to the classification paradigm, e.g. parametric or non-parametric approaches
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computer systems based on biological models
- G06N3/02—Computer systems based on biological models using neural network models

Similar Documents

Publication	Publication Date	Title
Senior et al.	2014	Improving DNN speaker independence with i-vector inputs
Gupta et al.	2014	I-vector-based speaker adaptation of deep neural networks for french broadcast audio transcription
Yu et al.	2011	Improved bottleneck features using pretrained deep neural networks.
Perero-Codosero et al.	2022	X-vector anonymization using autoencoders and adversarial training for preserving speech privacy
Wang et al.	2015	Acoustic segment modeling with spectral clustering methods
WO2012075641A1 (en)	2012-06-14	Device and method for pass-phrase modeling for speaker verification, and verification system
Deena et al.	2018	Recurrent neural network language model adaptation for multi-genre broadcast speech recognition and alignment
WO2012075640A1 (en)	2012-06-14	Modeling device and method for speaker recognition, and speaker recognition system
Guo et al.	2018	Deep neural network based i-vector mapping for speaker verification using short utterances
Huang et al.	2016	Speaker adaptation of RNN-BLSTM for speech recognition based on speaker code
Mirsamadi et al.	2015	A study on deep neural network acoustic model adaptation for robust far-field speech recognition.
Jung et al.	2017	Linear-scale filterbank for deep neural network-based voice activity detection
Samarakoon et al.	2015	Learning factorized feature transforms for speaker normalization
Ejbali et al.	2010	Wavelet network for recognition system of Arabic word
Sun et al.	2020	Subspace Gaussian mixture based language modeling for large vocabulary continuous speech recognition
Koolagudi et al.	2012	Speaker recognition in the case of emotional environment using transformation of speech features
Xie et al.	2017	RNN-LDA Clustering for Feature Based DNN Adaptation.
Hernández-Sierra et al.	2012	Speaker recognition using a binary representation and specificities models
Hasan et al.	2020	Investigation of the effect of MFCC variation on the convolutional neural network-based speech classification
Lung	2010	Improved wavelet feature extraction using kernel analysis for text independent speaker recognition
Han et al.	2020	Information Preservation Pooling for Speaker Embedding.
Zhipeng et al.	2019	Voiceprint recognition based on BP Neural Network and CNN
Higuchi et al.	2019	Speaker Adversarial Training of DPGMM-Based Feature Extractor for Zero-Resource Languages.
Cai et al.	2018	Deep speaker embeddings with convolutional neural network on supervector for text-independent speaker recognition
Ben-Amor et al.	2024	Extraction of interpretable and shared speaker-specific speech attributes through binary auto-encoder