Zhang et al., 2006 - Google Patents
Boosting-based multimodal speaker detection for distributed meetingsZhang et al., 2006
View PDF- Document ID
- 12005420207792059065
- Author
- Zhang C
- Yin P
- Rui Y
- Cutler R
- Viola P
- Publication year
- Publication venue
- 2006 IEEE Workshop on Multimedia Signal Processing
External Links
Snippet
Speaker detection is a very important task in distributed meeting applications. This paper discusses a number of challenges we met while designing a speaker detector for the Microsoft RoundTable distributed meeting device, and proposes a boosting-based …
- 238000001514 detection method 0 title abstract description 61
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/00221—Acquiring or recognising human faces, facial parts, facial sketches, facial expressions
- G06K9/00288—Classification, e.g. identification
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N5/00—Details of television systems
- H04N5/222—Studio circuitry; Studio devices; Studio equipment; Cameras comprising an electronic image sensor, e.g. digital cameras, video cameras, TV cameras, video cameras, camcorders, webcams, camera modules for embedding in other devices, e.g. mobile phones, computers or vehicles
- H04N5/225—Television cameras; Cameras comprising an electronic image sensor, e.g. digital cameras, video cameras, video cameras, camcorders, webcams, camera modules for embedding in other devices, e.g. mobile phones, computers or vehicles
- H04N5/232—Devices for controlling television cameras, e.g. remote control; Control of cameras comprising an electronic image sensor, e.g. digital cameras, video cameras, TV cameras, video cameras, camcorders, webcams, camera modules for embedding in, e.g. mobile phones, computers or vehicles
- H04N5/23219—Control of camera operation based on recognized human faces, facial parts, facial expressions or other parts of the human body
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/00221—Acquiring or recognising human faces, facial parts, facial sketches, facial expressions
- G06K9/00268—Feature extraction; Face representation
- G06K9/00281—Local features and components; Facial parts ; Occluding parts, e.g. glasses; Geometrical relationships
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N7/00—Television systems
- H04N7/14—Systems for two-way working
- H04N7/15—Conference systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/00624—Recognising scenes, i.e. recognition of a whole field of perception; recognising scene-specific objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/00362—Recognising human body or animal bodies, e.g. vehicle occupant, pedestrian; Recognising body parts, e.g. hand
- G06K9/00369—Recognition of whole body, e.g. static pedestrian or occupant recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/36—Image preprocessing, i.e. processing the image information without deciding about the identity of the image
- G06K9/46—Extraction of features or characteristics of the image
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6217—Design or setup of recognition systems and techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112088402B (en) | Federated neural network for speaker recognition | |
Zotkin et al. | Joint audio-visual tracking using particle filters | |
Busso et al. | Smart room: Participant and speaker localization and identification | |
Qian et al. | Multi-speaker tracking from an audio–visual sensing device | |
US6894714B2 (en) | Method and apparatus for predicting events in video conferencing and other applications | |
US7433495B2 (en) | Automatic detection and tracking of multiple individuals using multiple cues | |
Zhang et al. | Boosting-based multimodal speaker detection for distributed meeting videos | |
US20210281739A1 (en) | Information processing device and method, and program | |
JP4669150B2 (en) | Main subject estimation apparatus and main subject estimation method | |
Zhang et al. | Boosting-based multimodal speaker detection for distributed meetings | |
CN114513622A (en) | Speaker detection method, speaker detection apparatus, storage medium, and program product | |
US11689380B2 (en) | Method and device for viewing conference | |
Zhang et al. | Robust multi-view multi-camera face detection inside smart rooms using spatio-temporal dynamic programming | |
Pingali et al. | Audio-visual tracking for natural interactivity | |
Gatica-Perez et al. | A mixed-state i-particle filter for multi-camera speaker tracking | |
Schauerte et al. | Multi-modal and multi-camera attention in smart environments | |
Korchagin et al. | Just-in-time multimodal association and fusion from home entertainment | |
Wang et al. | Real-time automated video and audio capture with multiple cameras and microphones | |
Yan et al. | Computational audiovisual scene analysis in online adaptation of audio-motor maps | |
Soldatos et al. | Perceptual interfaces and distributed agents supporting ubiquitous computing services | |
CN110730378A (en) | Information processing method and system | |
Canton-Ferrer et al. | Multimodal real-time focus of attention estimation in smartrooms | |
Vryzas et al. | Investigating Multimodal Audiovisual Event Detection and Localization | |
Zhang | Other Applications | |
Potamianos et al. | Audio-visual ASR from multiple views inside smart rooms |