Wang et al., 2018 - Google Patents

Two-stage enhancement of noisy and reverberant microphone array speech for automatic speech recognition systems trained with only clean speech

Wang et al., 2018

Document ID: 16251994478918878122
Author: Wang Q; Wang S; Ge F; Han C; Lee J; Guo L; Lee C
Publication year: 2018
Publication venue: 2018 11th International Symposium on Chinese Spoken Language Processing (ISCSLP)

External Links

Cited by

Snippet

We propose a two-stage approach to enhancement of far-field microphone array speech collected in reverberant conditions corrupted by interfering speakers and noises. We intend to produce top-quality enhanced speech to be used by a black-box automatic speech …

Continue reading at ieeexplore.ieee.org (other versions)

230000002452 interceptive 0 abstract description 11

Classifications

- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
- G10L2021/02161—Number of inputs available containing the signal or the noise to be suppressed
- G10L2021/02166—Microphone arrays; Beamforming
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R3/00—Circuits for transducers, loudspeakers or microphones
- H04R3/005—Circuits for transducers, loudspeakers or microphones for combining the signals of two or more microphones
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signal, using source filter models or psychoacoustic analysis
- G10L19/008—Multichannel audio signal coding or decoding, i.e. using interchannel correlation to reduce redundancies, e.g. joint-stereo, intensity-coding, matrixing
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signal, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signal, using source filter models or psychoacoustic analysis using predictive techniques
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10K—SOUND-PRODUCING DEVICES; ACOUSTICS NOT OTHERWISE PROVIDED FOR
- G10K11/00—Methods or devices for transmitting, conducting or directing sound in general; Methods or devices for protecting against, or for damping, noise or other acoustic waves in general
- G10K11/18—Methods or devices for transmitting, conducting, or directing sound
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G10L17/04—Training, enrolment or model building
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R2430/00—Signal processing covered by H04R, not provided for in its groups

Similar Documents

Publication	Publication Date	Title
CN112017681B (en)	2022-05-13	Method and system for enhancing directional voice
Kinoshita et al.	2016	A summary of the REVERB challenge: state-of-the-art and remaining challenges in reverberant speech processing research
Schwartz et al.	2014	Multi-microphone speech dereverberation and noise reduction using relative early transfer functions
CN105869651B (en)	2019-05-31	Binary channels Wave beam forming sound enhancement method based on noise mixing coherence
US8880396B1 (en)	2014-11-04	Spectrum reconstruction for automatic speech recognition
Mertins et al.	2009	Room impulse response shortening/reshaping with infinity-and $ p $-norm optimization
CN114694670B (en)	2025-09-12	A microphone array speech enhancement system and method based on multi-task network
CN110660406A (en)	2020-01-07	Real-time voice noise reduction method of double-microphone mobile phone in close-range conversation scene
Chen et al.	2017	Improving Mask Learning Based Speech Enhancement System with Restoration Layers and Residual Connection.
Hussain et al.	2019	Ensemble hierarchical extreme learning machine for speech dereverberation
Nesta et al.	2013	A flexible spatial blind source extraction framework for robust speech recognition in noisy environments
Sarabia et al.	2023	Spatial Librispeech: An augmented dataset for spatial audio learning
Wang et al.	2018	Two-stage enhancement of noisy and reverberant microphone array speech for automatic speech recognition systems trained with only clean speech
Meng et al.	2024	Deep Kronecker product beamforming for large-scale microphone arrays
Yang et al.	2025	Design and optimization of superdirective beamforming and post-filtering for speech enhancement
Kovalyov et al.	2023	Dfsnet: A steerable neural beamformer invariant to microphone array configuration for real-time, low-latency speech enhancement
Mohammadamini et al.	2021	Compensate multiple distortions for speaker recognition systems
Jing et al.	2025	End-to-end doa-guided speech extraction in noisy multi-talker scenarios
Li et al.	2020	Speech separation based on reliable binaural cues with two-stage neural network in noisy-reverberant environments
Xue et al.	2018	A study on improving acoustic model for robust and far-field speech recognition
Himawan et al.	2008	Dealing with uncertainty in microphone placement in a microphone array speech recognition system
JP2014143570A (en)	2014-08-07	Sound pick-up device and reproducer
Youssef et al.	2010	From monaural to binaural speaker recognition for humanoid robots
Zhou et al.	2017	Combined beamforming and deep neural networks for multichannel speech enhancement
Chen et al.	2021	Early reflections based speech enhancement