[go: up one dir, main page]

Katoh et al., 2005 - Google Patents

State estimation of meetings by information fusion using Bayesian network.

Katoh et al., 2005

View PDF
Document ID
16510137732448087550
Author
Katoh M
Yamamoto K
Ogata J
Yoshimura T
Asano F
Asoh H
Kitawaki N
Publication year
Publication venue
INTERSPEECH

External Links

Snippet

In this paper, a method of structuring the multi-media recording of a small-sized meeting based on various information such as sound source localization, multiple-talk detection, and the detection of non-speech sound events, is proposed. The information from these …
Continue reading at www.isca-archive.org (PDF) (other versions)

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L2021/02161Number of inputs available containing the signal or the noise to be suppressed
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L2015/088Word spotting
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
    • G10L25/78Detection of presence or absence of voice signals
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification
    • G10L17/04Training, enrolment or model building
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification
    • G10L17/26Recognition of special voice characteristics, e.g. for use in lie detectors; Recognition of animal voices
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/66Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 specially adapted for particular use for comparison or discrimination for extracting parameters related to health condition

Similar Documents

Publication Publication Date Title
Zmolikova et al. Neural target speech extraction: An overview
Temko et al. Acoustic event detection in meeting-room environments
US10878824B2 (en) Speech-to-text generation using video-speech matching from a primary speaker
Chen et al. The first multimodal information based speech processing (misp) challenge: Data, tasks, baselines and results
Eronen et al. Audio-based context recognition
Anguera et al. Speaker diarization: A review of recent research
Renals et al. Recognition and understanding of meetings the AMI and AMIDA projects
Heittola et al. Sound event detection in multisource environments using source separation
Temko et al. Acoustic event detection and classification
Pardo et al. Speaker diarization for multiple-distant-microphone meetings using several sources of information
Chaudhuri et al. Ava-speech: A densely labeled dataset of speech activity in movies
Ji et al. Speaker-aware target speaker enhancement by jointly learning with speaker embedding extraction
Yella et al. Overlapping speech detection using long-term conversational features for speaker diarization in meeting room conversations
Temko et al. Acoustic event detection and classification in smart-room environments: Evaluation of CHIL project systems
Tao et al. Bimodal Recurrent Neural Network for Audiovisual Voice Activity Detection.
Lathoud et al. Location based speaker segmentation
Kang et al. Multimodal speaker diarization of real-world meetings using d-vectors with spatial features
Friedland et al. The ICSI RT-09 speaker diarization system
Cho et al. Enhanced voice activity detection using acoustic event detection and classification
McCowan et al. Towards computer understanding of human interactions
Wyatt et al. A Privacy-Sensitive Approach to Modeling Multi-Person Conversations.
Papayiannis et al. Detecting Media Sound Presence in Acoustic Scenes.
Mitrofanov et al. Accurate speaker counting, diarization and separation for advanced recognition of multichannel multispeaker conversations
Brueckmann et al. Adaptive noise reduction and voice activity detection for improved verbal human-robot interaction using binaural data
Martinez-Gonzalez et al. Spatial features selection for unsupervised speaker segmentation and clustering