Giacobello, 2018 - Google Patents
An online expectation-maximization algorithm for tracking acoustic sources in multi-microphone devices during music playbackGiacobello, 2018
View PDF- Document ID
- 3536953281454333079
- Author
- Giacobello D
- Publication year
- Publication venue
- 2018 26th European Signal Processing Conference (EUSIPCO)
External Links
Snippet
In this paper, we propose an expectation-maximization algorithm to perform online tracking of moving sources around multi-microphone devices. We are particularly targeting the application scenario of distant-talking control of a music playback device. The goal is to …
- 239000000203 mixture 0 abstract description 24
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
- G10L2021/02161—Number of inputs available containing the signal or the noise to be suppressed
- G10L2021/02166—Microphone arrays; Beamforming
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0272—Voice signal separating
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
- G10L25/78—Detection of presence or absence of voice signals
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G10L17/04—Training, enrolment or model building
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R3/00—Circuits for transducers, loudspeakers or microphones
- H04R3/005—Circuits for transducers, loudspeakers or microphones for combining the signals of two or more microphones
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signal, using source filter models or psychoacoustic analysis
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R25/00—Deaf-aid sets providing an auditory perception; Electric tinnitus maskers providing an auditory perception
- H04R25/40—Arrangements for obtaining a desired directivity characteristic
- H04R25/407—Circuits for combining signals of a plurality of transducers
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN111370014B (en) | System and method for multi-stream object-speech detection and channel fusion | |
| Yamamoto et al. | Enhanced robot speech recognition based on microphone array source separation and missing feature theory | |
| CN110610718B (en) | Method and device for extracting expected sound source voice signal | |
| Wisdom et al. | Deep unfolding for multichannel source separation | |
| Roman et al. | Binaural segregation in multisource reverberant environments | |
| US12175965B2 (en) | Method and apparatus for normalizing features extracted from audio data for signal recognition or modification | |
| Valin | Auditory system for a mobile robot | |
| Haeb-Umbach et al. | Microphone array signal processing and deep learning for speech enhancement: Combining model-based and data-driven approaches to parameter estimation and filtering | |
| EP3847645B1 (en) | Determining a room response of a desired source in a reverberant environment | |
| Pfeifenberger et al. | Multi-channel speech processing architectures for noise robust speech recognition: 3rd CHiME challenge results | |
| Pasha et al. | Blind speaker counting in highly reverberant environments by clustering coherence features | |
| Gburrek et al. | Spatial diarization for meeting transcription with ad-hoc acoustic sensor networks | |
| Kounades-Bastian et al. | Exploiting the intermittency of speech for joint separation and diarization | |
| Giacobello | An online expectation-maximization algorithm for tracking acoustic sources in multi-microphone devices during music playback | |
| Kim et al. | Sound source separation using phase difference and reliable mask selection selection | |
| Spille et al. | Using binarual processing for automatic speech recognition in multi-talker scenes | |
| Kundegorski et al. | Two-microphone dereverberation for automatic speech recognition of Polish | |
| Pfeifenberger et al. | Eigenvector-Based Speech Mask Estimation Using Logistic Regression. | |
| Yang et al. | A stacked self-attention network for two-dimensional direction-of-arrival estimation in hands-free speech communication | |
| Marti et al. | Automatic speech recognition in cocktail-party situations: A specific training for separated speech | |
| Hammer et al. | FCN approach for dynamically locating multiple speakers | |
| Milano et al. | Sector-based interference cancellation for robust keyword spotting applications using an informed mpdr beamformer | |
| Himawan | Speech recognition using ad-hoc microphone arrays | |
| Potamitis et al. | Speech activity detection and enhancement of a moving speaker based on the wideband generalized likelihood ratio and microphone arrays | |
| Kim et al. | Sound source separation using phase difference and reliable mask selection |