Giacobello, 2018 - Google Patents

An online expectation-maximization algorithm for tracking acoustic sources in multi-microphone devices during music playback

Giacobello, 2018

View PDF

Document ID: 3536953281454333079
Author: Giacobello D
Publication year: 2018
Publication venue: 2018 26th European Signal Processing Conference (EUSIPCO)

External Links

Cited by

Snippet

In this paper, we propose an expectation-maximization algorithm to perform online tracking of moving sources around multi-microphone devices. We are particularly targeting the application scenario of distant-talking control of a music playback device. The goal is to …

Continue reading at giacobello.github.io (PDF) (other versions)

239000000203 mixture 0 abstract description 24

Classifications

- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
- G10L2021/02161—Number of inputs available containing the signal or the noise to be suppressed
- G10L2021/02166—Microphone arrays; Beamforming
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0272—Voice signal separating
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
- G10L25/78—Detection of presence or absence of voice signals
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G10L17/04—Training, enrolment or model building
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R3/00—Circuits for transducers, loudspeakers or microphones
- H04R3/005—Circuits for transducers, loudspeakers or microphones for combining the signals of two or more microphones
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signal, using source filter models or psychoacoustic analysis
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R25/00—Deaf-aid sets providing an auditory perception; Electric tinnitus maskers providing an auditory perception
- H04R25/40—Arrangements for obtaining a desired directivity characteristic
- H04R25/407—Circuits for combining signals of a plurality of transducers

Similar Documents

Publication	Publication Date	Title
CN111370014B (en)	2024-05-28	System and method for multi-stream object-speech detection and channel fusion
Yamamoto et al.	2005	Enhanced robot speech recognition based on microphone array source separation and missing feature theory
CN110610718B (en)	2021-10-08	Method and device for extracting expected sound source voice signal
Wisdom et al.	2016	Deep unfolding for multichannel source separation
Roman et al.	2006	Binaural segregation in multisource reverberant environments
US12175965B2 (en)	2024-12-24	Method and apparatus for normalizing features extracted from audio data for signal recognition or modification
Valin	2016	Auditory system for a mobile robot
Haeb-Umbach et al.	2025	Microphone array signal processing and deep learning for speech enhancement: Combining model-based and data-driven approaches to parameter estimation and filtering
EP3847645B1 (en)	2022-04-13	Determining a room response of a desired source in a reverberant environment
Pfeifenberger et al.	2015	Multi-channel speech processing architectures for noise robust speech recognition: 3rd CHiME challenge results
Pasha et al.	2017	Blind speaker counting in highly reverberant environments by clustering coherence features
Gburrek et al.	2023	Spatial diarization for meeting transcription with ad-hoc acoustic sensor networks
Kounades-Bastian et al.	2017	Exploiting the intermittency of speech for joint separation and diarization
Giacobello	2018	An online expectation-maximization algorithm for tracking acoustic sources in multi-microphone devices during music playback
Kim et al.	2018	Sound source separation using phase difference and reliable mask selection selection
Spille et al.	2013	Using binarual processing for automatic speech recognition in multi-talker scenes
Kundegorski et al.	2014	Two-microphone dereverberation for automatic speech recognition of Polish
Pfeifenberger et al.	2017	Eigenvector-Based Speech Mask Estimation Using Logistic Regression.
Yang et al.	2022	A stacked self-attention network for two-dimensional direction-of-arrival estimation in hands-free speech communication
Marti et al.	2012	Automatic speech recognition in cocktail-party situations: A specific training for separated speech
Hammer et al.	2020	FCN approach for dynamically locating multiple speakers
Milano et al.	2024	Sector-based interference cancellation for robust keyword spotting applications using an informed mpdr beamformer
Himawan	2010	Speech recognition using ad-hoc microphone arrays
Potamitis et al.	2004	Speech activity detection and enhancement of a moving speaker based on the wideband generalized likelihood ratio and microphone arrays
Kim et al.	2018	Sound source separation using phase difference and reliable mask selection