[go: up one dir, main page]

Tsai et al., 2015 - Google Patents

A study of multimodal addressee detection in human-human-computer interaction

Tsai et al., 2015

View PDF
Document ID
12593320410755520157
Author
Tsai T
Stolcke A
Slaney M
Publication year
Publication venue
IEEE Transactions on Multimedia

External Links

Snippet

The goal of addressee detection is to answer the question,“Are you talking to me?” When a dialogue system interacts with multiple users, it is crucial to detect when a user is speaking to the system as opposed to another person. We study this problem in a multimodal …
Continue reading at www.slaney.org (PDF) (other versions)

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/1822Parsing for meaning understanding
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L2015/088Word spotting
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification
    • G10L17/26Recognition of special voice characteristics, e.g. for use in lie detectors; Recognition of animal voices
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/065Adaptation
    • G10L15/07Adaptation to the speaker
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/28Constructional details of speech recognition systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/66Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 specially adapted for particular use for comparison or discrimination for extracting parameters related to health condition
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06NCOMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N99/00Subject matter not provided for in other groups of this subclass
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06QDATA PROCESSING SYSTEMS OR METHODS, SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL, SUPERVISORY OR FORECASTING PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL, SUPERVISORY OR FORECASTING PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management

Similar Documents

Publication Publication Date Title
Tsai et al. A study of multimodal addressee detection in human-human-computer interaction
Lotfian et al. Building naturalistic emotionally balanced speech corpus by retrieving emotional speech from existing podcast recordings
Shaqra et al. Recognizing emotion from speech based on age and gender using hierarchical models
de Pinto et al. Emotions understanding model from spoken language using deep neural networks and mel-frequency cepstral coefficients
Latif et al. Self supervised adversarial domain adaptation for cross-corpus and cross-language speech emotion recognition
Hung et al. Estimating dominance in multi-party meetings using speaker diarization
Mariooryad et al. Building a naturalistic emotional speech corpus by retrieving expressive behaviors from existing speech corpora.
JP2017016566A (en) Information processing device, information processing method and program
Johansson et al. Opportunities and obligations to take turns in collaborative multi-party human-robot interaction
Zhang et al. Multimodal Deception Detection Using Automatically Extracted Acoustic, Visual, and Lexical Features.
Ishii et al. Multimodal fusion using respiration and gaze for predicting next speaker in multi-party meetings
Tsai et al. Multimodal addressee detection in multiparty dialogue systems
Lahiri et al. Interpersonal synchrony across vocal and lexical modalities in interactions involving children with autism spectrum disorder
Fu et al. Improving meeting inclusiveness using speech interruption analysis
Noh et al. Emotion-aware speaker identification with transfer learning
Saraswathi et al. Voice based emotion detection using deep neural networks
Ishii et al. Trimodal prediction of speaking and listening willingness to help improve turn-changing modeling
Hayashi et al. A ranking model for evaluation of conversation partners based on rapport levels
Jiang et al. Target speech diarization with multimodal prompts
Nakano et al. Implementation and evaluation of a multimodal addressee identification mechanism for multiparty conversation systems
Zhang et al. Aware: intuitive device activation using prosody for natural voice interactions
Robi et al. Active speaker detection using audio, visual and depth modalities: A survey
Araki et al. Collection of multimodal dialog data and analysis of the result of annotation of users’ interest level
Kawahara Smart posterboard: Multi-modal sensing and analysis of poster conversations
Xu et al. Affective audio annotation of public speeches with convolutional clustering neural network