Tsai et al., 2015 - Google Patents
A study of multimodal addressee detection in human-human-computer interactionTsai et al., 2015
View PDF- Document ID
- 12593320410755520157
- Author
- Tsai T
- Stolcke A
- Slaney M
- Publication year
- Publication venue
- IEEE Transactions on Multimedia
External Links
Snippet
The goal of addressee detection is to answer the question,“Are you talking to me?” When a dialogue system interacts with multiple users, it is crucial to detect when a user is speaking to the system as opposed to another person. We study this problem in a multimodal …
- 238000001514 detection method 0 title abstract description 32
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G10L15/1822—Parsing for meaning understanding
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L2015/088—Word spotting
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G10L17/26—Recognition of special voice characteristics, e.g. for use in lie detectors; Recognition of animal voices
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/065—Adaptation
- G10L15/07—Adaptation to the speaker
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/28—Constructional details of speech recognition systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 specially adapted for particular use for comparison or discrimination
- G10L25/66—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 specially adapted for particular use for comparison or discrimination for extracting parameters related to health condition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N99/00—Subject matter not provided for in other groups of this subclass
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06Q—DATA PROCESSING SYSTEMS OR METHODS, SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL, SUPERVISORY OR FORECASTING PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL, SUPERVISORY OR FORECASTING PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Tsai et al. | A study of multimodal addressee detection in human-human-computer interaction | |
Lotfian et al. | Building naturalistic emotionally balanced speech corpus by retrieving emotional speech from existing podcast recordings | |
Shaqra et al. | Recognizing emotion from speech based on age and gender using hierarchical models | |
de Pinto et al. | Emotions understanding model from spoken language using deep neural networks and mel-frequency cepstral coefficients | |
Latif et al. | Self supervised adversarial domain adaptation for cross-corpus and cross-language speech emotion recognition | |
Hung et al. | Estimating dominance in multi-party meetings using speaker diarization | |
Mariooryad et al. | Building a naturalistic emotional speech corpus by retrieving expressive behaviors from existing speech corpora. | |
JP2017016566A (en) | Information processing device, information processing method and program | |
Johansson et al. | Opportunities and obligations to take turns in collaborative multi-party human-robot interaction | |
Zhang et al. | Multimodal Deception Detection Using Automatically Extracted Acoustic, Visual, and Lexical Features. | |
Ishii et al. | Multimodal fusion using respiration and gaze for predicting next speaker in multi-party meetings | |
Tsai et al. | Multimodal addressee detection in multiparty dialogue systems | |
Lahiri et al. | Interpersonal synchrony across vocal and lexical modalities in interactions involving children with autism spectrum disorder | |
Fu et al. | Improving meeting inclusiveness using speech interruption analysis | |
Noh et al. | Emotion-aware speaker identification with transfer learning | |
Saraswathi et al. | Voice based emotion detection using deep neural networks | |
Ishii et al. | Trimodal prediction of speaking and listening willingness to help improve turn-changing modeling | |
Hayashi et al. | A ranking model for evaluation of conversation partners based on rapport levels | |
Jiang et al. | Target speech diarization with multimodal prompts | |
Nakano et al. | Implementation and evaluation of a multimodal addressee identification mechanism for multiparty conversation systems | |
Zhang et al. | Aware: intuitive device activation using prosody for natural voice interactions | |
Robi et al. | Active speaker detection using audio, visual and depth modalities: A survey | |
Araki et al. | Collection of multimodal dialog data and analysis of the result of annotation of users’ interest level | |
Kawahara | Smart posterboard: Multi-modal sensing and analysis of poster conversations | |
Xu et al. | Affective audio annotation of public speeches with convolutional clustering neural network |