Huang et al., 2018 - Google Patents
A regression approach to speech source localization exploiting deep neural networkHuang et al., 2018
View PDF- Document ID
- 5562338033814302631
- Author
- Huang Z
- Xu J
- Pan J
- Publication year
- Publication venue
- 2018 IEEE Fourth International Conference on Multimedia Big Data (BigMM)
External Links
Snippet
This paper presents a data-driven framework to speech source localization (SSL) using deep neural network (DNN), which directly construct the nonlinear regressive transform between the extracted feature and the direction-of-arrival (DOA) of indoor speech source …
- 230000001537 neural 0 title abstract description 15
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
- G10L2021/02161—Number of inputs available containing the signal or the noise to be suppressed
- G10L2021/02166—Microphone arrays; Beamforming
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/065—Adaptation
- G10L15/07—Adaptation to the speaker
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G10L17/04—Training, enrolment or model building
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0272—Voice signal separating
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01S—RADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
- G01S3/00—Direction-finders for determining the direction from which infrasonic, sonic, ultrasonic, or electromagnetic waves, or particle emission, not having a directional significance, are being received
- G01S3/80—Direction-finders for determining the direction from which infrasonic, sonic, ultrasonic, or electromagnetic waves, or particle emission, not having a directional significance, are being received using ultrasonic, sonic or infrasonic waves
- G01S3/802—Systems for determining direction or deviation from predetermined direction
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R3/00—Circuits for transducers, loudspeakers or microphones
- H04R3/005—Circuits for transducers, loudspeakers or microphones for combining the signals of two or more microphones
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN109830245B (en) | A method and system for multi-speaker speech separation based on beamforming | |
| Adavanne et al. | Direction of arrival estimation for multiple sound sources using convolutional recurrent neural network | |
| Li et al. | Online direction of arrival estimation based on deep learning | |
| Chakrabarty et al. | Broadband DOA estimation using convolutional neural networks trained with noise signals | |
| Takeda et al. | Discriminative multiple sound source localization based on deep neural networks using independent location model | |
| He et al. | Deep neural networks for multiple speaker detection and localization | |
| Xiao et al. | A learning-based approach to direction of arrival estimation in noisy and reverberant environments | |
| Perotin et al. | Regression versus classification for neural network based audio source localization | |
| CN111239680B (en) | Direction-of-arrival estimation method based on differential array | |
| Sivasankaran et al. | Keyword-based speaker localization: Localizing a target speaker in a multi-speaker environment | |
| Koldovský et al. | Semi-blind noise extraction using partially known position of the target source | |
| Huang et al. | A regression approach to speech source localization exploiting deep neural network | |
| Bai et al. | Time difference of arrival (TDOA)-based acoustic source localization and signal extraction for intelligent audio classification | |
| Dwivedi et al. | Long-term temporal audio source localization using sh-crnn | |
| Wang et al. | Pseudo-determined blind source separation for ad-hoc microphone networks | |
| Zheng et al. | Spectral mask estimation using deep neural networks for inter-sensor data ratio model based robust DOA estimation | |
| Zhang et al. | Sound event localization and classification using WASN in Outdoor Environment | |
| Wang et al. | U-net based direct-path dominance test for robust direction-of-arrival estimation | |
| Mane et al. | Localization of steady sound source and direction detection of moving sound source using CNN | |
| Chen et al. | Overlapped Speech Detection Based on Spectral and Spatial Feature Fusion. | |
| Xue et al. | Noise robust direction of arrival estimation for speech source with weighted bispectrum spatial correlation matrix | |
| Fu et al. | Locate and beamform: Two-dimensional locating all-neural beamformer for multi-channel speech separation | |
| Beit-On et al. | Binaural direction-of-arrival estimation in reverberant environments using the direct-path dominance test | |
| Hammer et al. | FCN approach for dynamically locating multiple speakers | |
| Pak et al. | LOCATA challenge: A deep neural networks-based regression approach for direction-of-arrival estimation |