Wang et al., 2018 - Google Patents
Two-stage enhancement of noisy and reverberant microphone array speech for automatic speech recognition systems trained with only clean speechWang et al., 2018
- Document ID
- 16251994478918878122
- Author
- Wang Q
- Wang S
- Ge F
- Han C
- Lee J
- Guo L
- Lee C
- Publication year
- Publication venue
- 2018 11th International Symposium on Chinese Spoken Language Processing (ISCSLP)
External Links
Snippet
We propose a two-stage approach to enhancement of far-field microphone array speech collected in reverberant conditions corrupted by interfering speakers and noises. We intend to produce top-quality enhanced speech to be used by a black-box automatic speech …
- 230000002452 interceptive 0 abstract description 11
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
- G10L2021/02161—Number of inputs available containing the signal or the noise to be suppressed
- G10L2021/02166—Microphone arrays; Beamforming
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R3/00—Circuits for transducers, loudspeakers or microphones
- H04R3/005—Circuits for transducers, loudspeakers or microphones for combining the signals of two or more microphones
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signal, using source filter models or psychoacoustic analysis
- G10L19/008—Multichannel audio signal coding or decoding, i.e. using interchannel correlation to reduce redundancies, e.g. joint-stereo, intensity-coding, matrixing
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signal, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signal, using source filter models or psychoacoustic analysis using predictive techniques
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10K—SOUND-PRODUCING DEVICES; ACOUSTICS NOT OTHERWISE PROVIDED FOR
- G10K11/00—Methods or devices for transmitting, conducting or directing sound in general; Methods or devices for protecting against, or for damping, noise or other acoustic waves in general
- G10K11/18—Methods or devices for transmitting, conducting, or directing sound
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G10L17/04—Training, enrolment or model building
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R2430/00—Signal processing covered by H04R, not provided for in its groups
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN112017681B (en) | Method and system for enhancing directional voice | |
| Kinoshita et al. | A summary of the REVERB challenge: state-of-the-art and remaining challenges in reverberant speech processing research | |
| Schwartz et al. | Multi-microphone speech dereverberation and noise reduction using relative early transfer functions | |
| CN105869651B (en) | Binary channels Wave beam forming sound enhancement method based on noise mixing coherence | |
| US8880396B1 (en) | Spectrum reconstruction for automatic speech recognition | |
| Mertins et al. | Room impulse response shortening/reshaping with infinity-and $ p $-norm optimization | |
| CN114694670B (en) | A microphone array speech enhancement system and method based on multi-task network | |
| CN110660406A (en) | Real-time voice noise reduction method of double-microphone mobile phone in close-range conversation scene | |
| Chen et al. | Improving Mask Learning Based Speech Enhancement System with Restoration Layers and Residual Connection. | |
| Hussain et al. | Ensemble hierarchical extreme learning machine for speech dereverberation | |
| Nesta et al. | A flexible spatial blind source extraction framework for robust speech recognition in noisy environments | |
| Sarabia et al. | Spatial Librispeech: An augmented dataset for spatial audio learning | |
| Wang et al. | Two-stage enhancement of noisy and reverberant microphone array speech for automatic speech recognition systems trained with only clean speech | |
| Meng et al. | Deep Kronecker product beamforming for large-scale microphone arrays | |
| Yang et al. | Design and optimization of superdirective beamforming and post-filtering for speech enhancement | |
| Kovalyov et al. | Dfsnet: A steerable neural beamformer invariant to microphone array configuration for real-time, low-latency speech enhancement | |
| Mohammadamini et al. | Compensate multiple distortions for speaker recognition systems | |
| Jing et al. | End-to-end doa-guided speech extraction in noisy multi-talker scenarios | |
| Li et al. | Speech separation based on reliable binaural cues with two-stage neural network in noisy-reverberant environments | |
| Xue et al. | A study on improving acoustic model for robust and far-field speech recognition | |
| Himawan et al. | Dealing with uncertainty in microphone placement in a microphone array speech recognition system | |
| JP2014143570A (en) | Sound pick-up device and reproducer | |
| Youssef et al. | From monaural to binaural speaker recognition for humanoid robots | |
| Zhou et al. | Combined beamforming and deep neural networks for multichannel speech enhancement | |
| Chen et al. | Early reflections based speech enhancement |