Katoh et al., 2005 - Google Patents
State estimation of meetings by information fusion using Bayesian network.Katoh et al., 2005
View PDF- Document ID
- 16510137732448087550
- Author
- Katoh M
- Yamamoto K
- Ogata J
- Yoshimura T
- Asano F
- Asoh H
- Kitawaki N
- Publication year
- Publication venue
- INTERSPEECH
External Links
Snippet
In this paper, a method of structuring the multi-media recording of a small-sized meeting based on various information such as sound source localization, multiple-talk detection, and the detection of non-speech sound events, is proposed. The information from these …
- 230000004927 fusion 0 title 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
- G10L2021/02161—Number of inputs available containing the signal or the noise to be suppressed
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L2015/088—Word spotting
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
- G10L25/78—Detection of presence or absence of voice signals
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G10L17/04—Training, enrolment or model building
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G10L17/26—Recognition of special voice characteristics, e.g. for use in lie detectors; Recognition of animal voices
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/02—Feature extraction for speech recognition; Selection of recognition unit
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 specially adapted for particular use for comparison or discrimination
- G10L25/66—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 specially adapted for particular use for comparison or discrimination for extracting parameters related to health condition
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| Zmolikova et al. | Neural target speech extraction: An overview | |
| Temko et al. | Acoustic event detection in meeting-room environments | |
| US10878824B2 (en) | Speech-to-text generation using video-speech matching from a primary speaker | |
| Chen et al. | The first multimodal information based speech processing (misp) challenge: Data, tasks, baselines and results | |
| Eronen et al. | Audio-based context recognition | |
| Anguera et al. | Speaker diarization: A review of recent research | |
| Renals et al. | Recognition and understanding of meetings the AMI and AMIDA projects | |
| Heittola et al. | Sound event detection in multisource environments using source separation | |
| Temko et al. | Acoustic event detection and classification | |
| Pardo et al. | Speaker diarization for multiple-distant-microphone meetings using several sources of information | |
| Chaudhuri et al. | Ava-speech: A densely labeled dataset of speech activity in movies | |
| Ji et al. | Speaker-aware target speaker enhancement by jointly learning with speaker embedding extraction | |
| Yella et al. | Overlapping speech detection using long-term conversational features for speaker diarization in meeting room conversations | |
| Temko et al. | Acoustic event detection and classification in smart-room environments: Evaluation of CHIL project systems | |
| Tao et al. | Bimodal Recurrent Neural Network for Audiovisual Voice Activity Detection. | |
| Lathoud et al. | Location based speaker segmentation | |
| Kang et al. | Multimodal speaker diarization of real-world meetings using d-vectors with spatial features | |
| Friedland et al. | The ICSI RT-09 speaker diarization system | |
| Cho et al. | Enhanced voice activity detection using acoustic event detection and classification | |
| McCowan et al. | Towards computer understanding of human interactions | |
| Wyatt et al. | A Privacy-Sensitive Approach to Modeling Multi-Person Conversations. | |
| Papayiannis et al. | Detecting Media Sound Presence in Acoustic Scenes. | |
| Mitrofanov et al. | Accurate speaker counting, diarization and separation for advanced recognition of multichannel multispeaker conversations | |
| Brueckmann et al. | Adaptive noise reduction and voice activity detection for improved verbal human-robot interaction using binaural data | |
| Martinez-Gonzalez et al. | Spatial features selection for unsupervised speaker segmentation and clustering |