Zhang et al., 2006 - Google Patents

Boosting-based multimodal speaker detection for distributed meetings

Zhang et al., 2006

Document ID: 12005420207792059065
Author: Zhang C; Yin P; Rui Y; Cutler R; Viola P
Publication year: 2006
Publication venue: 2006 IEEE Workshop on Multimedia Signal Processing

External Links

Cited by

Snippet

Speaker detection is a very important task in distributed meeting applications. This paper discusses a number of challenges we met while designing a speaker detector for the Microsoft RoundTable distributed meeting device, and proposes a boosting-based …

Continue reading at research.lenovo.com (PDF) (other versions)

238000001514 detection method 0 title abstract description 61

Classifications

- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/00221—Acquiring or recognising human faces, facial parts, facial sketches, facial expressions
- G06K9/00288—Classification, e.g. identification
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N5/00—Details of television systems
- H04N5/222—Studio circuitry; Studio devices; Studio equipment; Cameras comprising an electronic image sensor, e.g. digital cameras, video cameras, TV cameras, video cameras, camcorders, webcams, camera modules for embedding in other devices, e.g. mobile phones, computers or vehicles
- H04N5/225—Television cameras; Cameras comprising an electronic image sensor, e.g. digital cameras, video cameras, video cameras, camcorders, webcams, camera modules for embedding in other devices, e.g. mobile phones, computers or vehicles
- H04N5/232—Devices for controlling television cameras, e.g. remote control; Control of cameras comprising an electronic image sensor, e.g. digital cameras, video cameras, TV cameras, video cameras, camcorders, webcams, camera modules for embedding in, e.g. mobile phones, computers or vehicles
- H04N5/23219—Control of camera operation based on recognized human faces, facial parts, facial expressions or other parts of the human body
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/00221—Acquiring or recognising human faces, facial parts, facial sketches, facial expressions
- G06K9/00268—Feature extraction; Face representation
- G06K9/00281—Local features and components; Facial parts ; Occluding parts, e.g. glasses; Geometrical relationships
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N7/00—Television systems
- H04N7/14—Systems for two-way working
- H04N7/15—Conference systems
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/00624—Recognising scenes, i.e. recognition of a whole field of perception; recognising scene-specific objects
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/00362—Recognising human body or animal bodies, e.g. vehicle occupant, pedestrian; Recognising body parts, e.g. hand
- G06K9/00369—Recognition of whole body, e.g. static pedestrian or occupant recognition
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/36—Image preprocessing, i.e. processing the image information without deciding about the identity of the image
- G06K9/46—Extraction of features or characteristics of the image
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6217—Design or setup of recognition systems and techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion

Similar Documents

Publication	Publication Date	Title
CN112088402B (en)	2024-07-26	Federated neural network for speaker recognition
Zotkin et al.	2002	Joint audio-visual tracking using particle filters
Busso et al.	2005	Smart room: Participant and speaker localization and identification
Qian et al.	2019	Multi-speaker tracking from an audio–visual sensing device
US6894714B2 (en)	2005-05-17	Method and apparatus for predicting events in video conferencing and other applications
US7433495B2 (en)	2008-10-07	Automatic detection and tracking of multiple individuals using multiple cues
Zhang et al.	2008	Boosting-based multimodal speaker detection for distributed meeting videos
US20210281739A1 (en)	2021-09-09	Information processing device and method, and program
JP4669150B2 (en)	2011-04-13	Main subject estimation apparatus and main subject estimation method
Zhang et al.	2006	Boosting-based multimodal speaker detection for distributed meetings
CN114513622A (en)	2022-05-17	Speaker detection method, speaker detection apparatus, storage medium, and program product
US11689380B2 (en)	2023-06-27	Method and device for viewing conference
Zhang et al.	2006	Robust multi-view multi-camera face detection inside smart rooms using spatio-temporal dynamic programming
Pingali et al.	1999	Audio-visual tracking for natural interactivity
Gatica-Perez et al.	2003	A mixed-state i-particle filter for multi-camera speaker tracking
Schauerte et al.	2009	Multi-modal and multi-camera attention in smart environments
Korchagin et al.	2011	Just-in-time multimodal association and fusion from home entertainment
Wang et al.	2001	Real-time automated video and audio capture with multiple cameras and microphones
Yan et al.	2013	Computational audiovisual scene analysis in online adaptation of audio-motor maps
Soldatos et al.	2005	Perceptual interfaces and distributed agents supporting ubiquitous computing services
CN110730378A (en)	2020-01-24	Information processing method and system
Canton-Ferrer et al.	2008	Multimodal real-time focus of attention estimation in smartrooms
Vryzas et al.	2016	Investigating Multimodal Audiovisual Event Detection and Localization
Zhang	1994	Other Applications
Potamianos et al.	2006	Audio-visual ASR from multiple views inside smart rooms