Katoh et al., 2005 - Google Patents

State estimation of meetings by information fusion using Bayesian network.

Katoh et al., 2005

Document ID: 16510137732448087550
Author: Katoh M; Yamamoto K; Ogata J; Yoshimura T; Asano F; Asoh H; Kitawaki N
Publication year: 2005
Publication venue: INTERSPEECH

External Links

Cited by

Snippet

In this paper, a method of structuring the multi-media recording of a small-sized meeting based on various information such as sound source localization, multiple-talk detection, and the detection of non-speech sound events, is proposed. The information from these …

Continue reading at www.isca-archive.org (PDF) (other versions)

230000004927 fusion 0 title 1

Classifications

- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
- G10L2021/02161—Number of inputs available containing the signal or the noise to be suppressed
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L2015/088—Word spotting
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
- G10L25/78—Detection of presence or absence of voice signals
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G10L17/04—Training, enrolment or model building
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G10L17/26—Recognition of special voice characteristics, e.g. for use in lie detectors; Recognition of animal voices
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/02—Feature extraction for speech recognition; Selection of recognition unit
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 specially adapted for particular use for comparison or discrimination
- G10L25/66—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 specially adapted for particular use for comparison or discrimination for extracting parameters related to health condition

Similar Documents

Publication	Publication Date	Title
Zmolikova et al.	2023	Neural target speech extraction: An overview
Temko et al.	2009	Acoustic event detection in meeting-room environments
US10878824B2 (en)	2020-12-29	Speech-to-text generation using video-speech matching from a primary speaker
Chen et al.	2022	The first multimodal information based speech processing (misp) challenge: Data, tasks, baselines and results
Eronen et al.	2005	Audio-based context recognition
Anguera et al.	2012	Speaker diarization: A review of recent research
Renals et al.	2007	Recognition and understanding of meetings the AMI and AMIDA projects
Heittola et al.	2011	Sound event detection in multisource environments using source separation
Temko et al.	2009	Acoustic event detection and classification
Pardo et al.	2007	Speaker diarization for multiple-distant-microphone meetings using several sources of information
Chaudhuri et al.	2018	Ava-speech: A densely labeled dataset of speech activity in movies
Ji et al.	2020	Speaker-aware target speaker enhancement by jointly learning with speaker embedding extraction
Yella et al.	2014	Overlapping speech detection using long-term conversational features for speaker diarization in meeting room conversations
Temko et al.	2006	Acoustic event detection and classification in smart-room environments: Evaluation of CHIL project systems
Tao et al.	2017	Bimodal Recurrent Neural Network for Audiovisual Voice Activity Detection.
Lathoud et al.	2003	Location based speaker segmentation
Kang et al.	2020	Multimodal speaker diarization of real-world meetings using d-vectors with spatial features
Friedland et al.	2011	The ICSI RT-09 speaker diarization system
Cho et al.	2011	Enhanced voice activity detection using acoustic event detection and classification
McCowan et al.	2004	Towards computer understanding of human interactions
Wyatt et al.	2007	A Privacy-Sensitive Approach to Modeling Multi-Person Conversations.
Papayiannis et al.	2018	Detecting Media Sound Presence in Acoustic Scenes.
Mitrofanov et al.	2025	Accurate speaker counting, diarization and separation for advanced recognition of multichannel multispeaker conversations
Brueckmann et al.	2007	Adaptive noise reduction and voice activity detection for improved verbal human-robot interaction using binaural data
Martinez-Gonzalez et al.	2017	Spatial features selection for unsupervised speaker segmentation and clustering