Huang et al., 2018 - Google Patents

A regression approach to speech source localization exploiting deep neural network

Huang et al., 2018

Document ID: 5562338033814302631
Author: Huang Z; Xu J; Pan J
Publication year: 2018
Publication venue: 2018 IEEE Fourth International Conference on Multimedia Big Data (BigMM)

External Links

Cited by

Snippet

This paper presents a data-driven framework to speech source localization (SSL) using deep neural network (DNN), which directly construct the nonlinear regressive transform between the extracted feature and the direction-of-arrival (DOA) of indoor speech source …

Continue reading at www.researchgate.net (PDF) (other versions)

230000001537 neural 0 title abstract description 15

Classifications

- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
- G10L2021/02161—Number of inputs available containing the signal or the noise to be suppressed
- G10L2021/02166—Microphone arrays; Beamforming
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/065—Adaptation
- G10L15/07—Adaptation to the speaker
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G10L17/04—Training, enrolment or model building
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0272—Voice signal separating
- G—PHYSICS
- G01—MEASURING; TESTING
- G01S—RADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
- G01S3/00—Direction-finders for determining the direction from which infrasonic, sonic, ultrasonic, or electromagnetic waves, or particle emission, not having a directional significance, are being received
- G01S3/80—Direction-finders for determining the direction from which infrasonic, sonic, ultrasonic, or electromagnetic waves, or particle emission, not having a directional significance, are being received using ultrasonic, sonic or infrasonic waves
- G01S3/802—Systems for determining direction or deviation from predetermined direction
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R3/00—Circuits for transducers, loudspeakers or microphones
- H04R3/005—Circuits for transducers, loudspeakers or microphones for combining the signals of two or more microphones

Similar Documents

Publication	Publication Date	Title
CN109830245B (en)	2021-03-12	A method and system for multi-speaker speech separation based on beamforming
Adavanne et al.	2018	Direction of arrival estimation for multiple sound sources using convolutional recurrent neural network
Li et al.	2018	Online direction of arrival estimation based on deep learning
Chakrabarty et al.	2017	Broadband DOA estimation using convolutional neural networks trained with noise signals
Takeda et al.	2016	Discriminative multiple sound source localization based on deep neural networks using independent location model
He et al.	2018	Deep neural networks for multiple speaker detection and localization
Xiao et al.	2015	A learning-based approach to direction of arrival estimation in noisy and reverberant environments
Perotin et al.	2019	Regression versus classification for neural network based audio source localization
CN111239680B (en)	2022-09-16	Direction-of-arrival estimation method based on differential array
Sivasankaran et al.	2018	Keyword-based speaker localization: Localizing a target speaker in a multi-speaker environment
Koldovský et al.	2013	Semi-blind noise extraction using partially known position of the target source
Huang et al.	2018	A regression approach to speech source localization exploiting deep neural network
Bai et al.	2018	Time difference of arrival (TDOA)-based acoustic source localization and signal extraction for intelligent audio classification
Dwivedi et al.	2023	Long-term temporal audio source localization using sh-crnn
Wang et al.	2018	Pseudo-determined blind source separation for ad-hoc microphone networks
Zheng et al.	2015	Spectral mask estimation using deep neural networks for inter-sensor data ratio model based robust DOA estimation
Zhang et al.	2024	Sound event localization and classification using WASN in Outdoor Environment
Wang et al.	2020	U-net based direct-path dominance test for robust direction-of-arrival estimation
Mane et al.	2019	Localization of steady sound source and direction detection of moving sound source using CNN
Chen et al.	2021	Overlapped Speech Detection Based on Spectral and Spatial Feature Fusion.
Xue et al.	2015	Noise robust direction of arrival estimation for speech source with weighted bispectrum spatial correlation matrix
Fu et al.	2023	Locate and beamform: Two-dimensional locating all-neural beamformer for multi-channel speech separation
Beit-On et al.	2019	Binaural direction-of-arrival estimation in reverberant environments using the direct-path dominance test
Hammer et al.	2020	FCN approach for dynamically locating multiple speakers
Pak et al.	2018	LOCATA challenge: A deep neural networks-based regression approach for direction-of-arrival estimation