Kumar et al., 2020 - Google Patents

Unsupervised neural mask estimator for generalized eigen-value beamforming based ASR

Kumar et al., 2020

Document ID: 7590679785163027426
Author: Kumar R; Sreeram A; Purushothaman A; Ganapathy S
Publication year: 2020
Publication venue: ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

External Links

Cited by

Snippet

The state-of-art methods for acoustic beamforming in multi-channel ASR are based on a neural mask estimator that predicts the presence of speech and noise. These models are trained using a paired corpus of clean and noisy recordings (teacher model). In this paper …

Continue reading at arxiv.org (PDF) (other versions)

230000001537 neural 0 title abstract description 20

Classifications

- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
- G10L2021/02161—Number of inputs available containing the signal or the noise to be suppressed
- G10L2021/02166—Microphone arrays; Beamforming
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/065—Adaptation
- G10L15/07—Adaptation to the speaker
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/14—Speech classification or search using statistical models, e.g. hidden Markov models [HMMs]
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/20—Speech recognition techniques specially adapted for robustness in adverse environments, e.g. in noise, of stress induced speech
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/24—Speech recognition using non-acoustical features
- G10L15/25—Speech recognition using non-acoustical features using position of the lips, movement of the lips or face analysis
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
- G10L25/78—Detection of presence or absence of voice signals
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G10L17/04—Training, enrolment or model building
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signal, using source filter models or psychoacoustic analysis

Similar Documents

Publication	Publication Date	Title
JP5738020B2 (en)	2015-06-17	Speech recognition apparatus and speech recognition method
Seltzer et al.	2004	Likelihood-maximizing beamforming for robust hands-free speech recognition
US8538751B2 (en)	2013-09-17	Speech recognition system and speech recognizing method
Kumatani et al.	2012	Microphone array processing for distant speech recognition: Towards real-world deployment
Woodruff et al.	2012	Binaural detection, localization, and segregation in reverberant environments based on joint pitch and azimuth cues
Liu et al.	2018	Neural network based time-frequency masking and steering vector estimation for two-channel MVDR beamforming
Xiao et al.	2014	The NTU-ADSC systems for reverberation challenge 2014
Nakatani et al.	2019	Maximum likelihood convolutional beamformer for simultaneous denoising and dereverberation
Nakatani et al.	2013	Dominance based integration of spatial and spectral features for speech enhancement
Kumatani et al.	2009	Beamforming with a maximum negentropy criterion
Li et al.	2022	The PCG-AIID system for L3DAS22 challenge: MIMO and MISO convolutional recurrent network for multi channel speech enhancement and speech recognition
Delcroix et al.	2013	Speech recognition in living rooms: Integrated speech enhancement and recognition system based on spatial, spectral and temporal modeling of sounds
Habets et al.	2018	Dereverberation
Kumar et al.	2020	Unsupervised neural mask estimator for generalized eigen-value beamforming based ASR
Kumatani et al.	2007	Adaptive beamforming with a minimum mutual information criterion
Vu et al.	2015	Speech enhancement using beamforming and non negative matrix factorization for robust speech recognition in the CHiME-3 challenge
Shi et al.	2006	Phase-based dual-microphone speech enhancement using a prior speech model
Purushothaman et al.	2020	3-D acoustic modeling for far-field multi-channel speech recognition
Kumatani et al.	2011	Maximum kurtosis beamforming with a subspace filter for distant speech recognition
Han et al.	2010	Robust GSC-based speech enhancement for human machine interface
Ito et al.	2017	Data-driven and physical model-based designs of probabilistic spatial dictionary for online meeting diarization and adaptive beamforming
Mandel et al.	2016	Multichannel Spatial Clustering for Robust Far-Field Automatic Speech Recognition in Mismatched Conditions.
Tu et al.	2020	A Space-and-Speaker-Aware Iterative Mask Estimation Approach to Multi-Channel Speech Recognition in the CHiME-6 Challenge.
Ji et al.	2021	An end-to-end far-field keyword spotting system with neural beamforming
Ni et al.	2020	Improved MVDR Beamforming Using LSTM Speech Models to Clean Spatial Clustering Masks