Tsai et al., 2015 - Google Patents

A study of multimodal addressee detection in human-human-computer interaction

Tsai et al., 2015

Document ID: 12593320410755520157
Author: Tsai T; Stolcke A; Slaney M
Publication year: 2015
Publication venue: IEEE Transactions on Multimedia

External Links

Cited by

Snippet

The goal of addressee detection is to answer the question,“Are you talking to me?” When a dialogue system interacts with multiple users, it is crucial to detect when a user is speaking to the system as opposed to another person. We study this problem in a multimodal …

Continue reading at www.slaney.org (PDF) (other versions)

238000001514 detection method 0 title abstract description 32

Classifications

- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G10L15/1822—Parsing for meaning understanding
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L2015/088—Word spotting
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G10L17/26—Recognition of special voice characteristics, e.g. for use in lie detectors; Recognition of animal voices
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/065—Adaptation
- G10L15/07—Adaptation to the speaker
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/28—Constructional details of speech recognition systems
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 specially adapted for particular use for comparison or discrimination
- G10L25/66—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 specially adapted for particular use for comparison or discrimination for extracting parameters related to health condition
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N99/00—Subject matter not provided for in other groups of this subclass
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06Q—DATA PROCESSING SYSTEMS OR METHODS, SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL, SUPERVISORY OR FORECASTING PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL, SUPERVISORY OR FORECASTING PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management

Similar Documents

Publication	Publication Date	Title
Tsai et al.	2015	A study of multimodal addressee detection in human-human-computer interaction
Lotfian et al.	2017	Building naturalistic emotionally balanced speech corpus by retrieving emotional speech from existing podcast recordings
Shaqra et al.	2019	Recognizing emotion from speech based on age and gender using hierarchical models
de Pinto et al.	2020	Emotions understanding model from spoken language using deep neural networks and mel-frequency cepstral coefficients
Latif et al.	2022	Self supervised adversarial domain adaptation for cross-corpus and cross-language speech emotion recognition
Hung et al.	2010	Estimating dominance in multi-party meetings using speaker diarization
Mariooryad et al.	2014	Building a naturalistic emotional speech corpus by retrieving expressive behaviors from existing speech corpora.
JP2017016566A (en)	2017-01-19	Information processing device, information processing method and program
Johansson et al.	2015	Opportunities and obligations to take turns in collaborative multi-party human-robot interaction
Zhang et al.	2020	Multimodal Deception Detection Using Automatically Extracted Acoustic, Visual, and Lexical Features.
Ishii et al.	2015	Multimodal fusion using respiration and gaze for predicting next speaker in multi-party meetings
Tsai et al.	2015	Multimodal addressee detection in multiparty dialogue systems
Lahiri et al.	2022	Interpersonal synchrony across vocal and lexical modalities in interactions involving children with autism spectrum disorder
Fu et al.	2022	Improving meeting inclusiveness using speech interruption analysis
Noh et al.	2023	Emotion-aware speaker identification with transfer learning
Saraswathi et al.	2021	Voice based emotion detection using deep neural networks
Ishii et al.	2022	Trimodal prediction of speaking and listening willingness to help improve turn-changing modeling
Hayashi et al.	2023	A ranking model for evaluation of conversation partners based on rapport levels
Jiang et al.	2024	Target speech diarization with multimodal prompts
Nakano et al.	2013	Implementation and evaluation of a multimodal addressee identification mechanism for multiparty conversation systems
Zhang et al.	2022	Aware: intuitive device activation using prosody for natural voice interactions
Robi et al.	2024	Active speaker detection using audio, visual and depth modalities: A survey
Araki et al.	2018	Collection of multimodal dialog data and analysis of the result of annotation of users’ interest level
Kawahara	2016	Smart posterboard: Multi-modal sensing and analysis of poster conversations
Xu et al.	2019	Affective audio annotation of public speeches with convolutional clustering neural network