[go: up one dir, main page]

Giacobello, 2018 - Google Patents

An online expectation-maximization algorithm for tracking acoustic sources in multi-microphone devices during music playback

Giacobello, 2018

View PDF
Document ID
3536953281454333079
Author
Giacobello D
Publication year
Publication venue
2018 26th European Signal Processing Conference (EUSIPCO)

External Links

Snippet

In this paper, we propose an expectation-maximization algorithm to perform online tracking of moving sources around multi-microphone devices. We are particularly targeting the application scenario of distant-talking control of a music playback device. The goal is to …
Continue reading at giacobello.github.io (PDF) (other versions)

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L2021/02161Number of inputs available containing the signal or the noise to be suppressed
    • G10L2021/02166Microphone arrays; Beamforming
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0272Voice signal separating
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
    • G10L25/78Detection of presence or absence of voice signals
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification
    • G10L17/04Training, enrolment or model building
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R3/00Circuits for transducers, loudspeakers or microphones
    • H04R3/005Circuits for transducers, loudspeakers or microphones for combining the signals of two or more microphones
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signal, using source filter models or psychoacoustic analysis
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R25/00Deaf-aid sets providing an auditory perception; Electric tinnitus maskers providing an auditory perception
    • H04R25/40Arrangements for obtaining a desired directivity characteristic
    • H04R25/407Circuits for combining signals of a plurality of transducers

Similar Documents

Publication Publication Date Title
CN111370014B (en) System and method for multi-stream object-speech detection and channel fusion
Yamamoto et al. Enhanced robot speech recognition based on microphone array source separation and missing feature theory
CN110610718B (en) Method and device for extracting expected sound source voice signal
Wisdom et al. Deep unfolding for multichannel source separation
Roman et al. Binaural segregation in multisource reverberant environments
US12175965B2 (en) Method and apparatus for normalizing features extracted from audio data for signal recognition or modification
Valin Auditory system for a mobile robot
Haeb-Umbach et al. Microphone array signal processing and deep learning for speech enhancement: Combining model-based and data-driven approaches to parameter estimation and filtering
EP3847645B1 (en) Determining a room response of a desired source in a reverberant environment
Pfeifenberger et al. Multi-channel speech processing architectures for noise robust speech recognition: 3rd CHiME challenge results
Pasha et al. Blind speaker counting in highly reverberant environments by clustering coherence features
Gburrek et al. Spatial diarization for meeting transcription with ad-hoc acoustic sensor networks
Kounades-Bastian et al. Exploiting the intermittency of speech for joint separation and diarization
Giacobello An online expectation-maximization algorithm for tracking acoustic sources in multi-microphone devices during music playback
Kim et al. Sound source separation using phase difference and reliable mask selection selection
Spille et al. Using binarual processing for automatic speech recognition in multi-talker scenes
Kundegorski et al. Two-microphone dereverberation for automatic speech recognition of Polish
Pfeifenberger et al. Eigenvector-Based Speech Mask Estimation Using Logistic Regression.
Yang et al. A stacked self-attention network for two-dimensional direction-of-arrival estimation in hands-free speech communication
Marti et al. Automatic speech recognition in cocktail-party situations: A specific training for separated speech
Hammer et al. FCN approach for dynamically locating multiple speakers
Milano et al. Sector-based interference cancellation for robust keyword spotting applications using an informed mpdr beamformer
Himawan Speech recognition using ad-hoc microphone arrays
Potamitis et al. Speech activity detection and enhancement of a moving speaker based on the wideband generalized likelihood ratio and microphone arrays
Kim et al. Sound source separation using phase difference and reliable mask selection