[go: up one dir, main page]

Zhang et al., 2006 - Google Patents

Boosting-based multimodal speaker detection for distributed meetings

Zhang et al., 2006

View PDF
Document ID
12005420207792059065
Author
Zhang C
Yin P
Rui Y
Cutler R
Viola P
Publication year
Publication venue
2006 IEEE Workshop on Multimedia Signal Processing

External Links

Snippet

Speaker detection is a very important task in distributed meeting applications. This paper discusses a number of challenges we met while designing a speaker detector for the Microsoft RoundTable distributed meeting device, and proposes a boosting-based …
Continue reading at research.lenovo.com (PDF) (other versions)

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06KRECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K9/00Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
    • G06K9/00221Acquiring or recognising human faces, facial parts, facial sketches, facial expressions
    • G06K9/00288Classification, e.g. identification
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/222Studio circuitry; Studio devices; Studio equipment; Cameras comprising an electronic image sensor, e.g. digital cameras, video cameras, TV cameras, video cameras, camcorders, webcams, camera modules for embedding in other devices, e.g. mobile phones, computers or vehicles
    • H04N5/225Television cameras; Cameras comprising an electronic image sensor, e.g. digital cameras, video cameras, video cameras, camcorders, webcams, camera modules for embedding in other devices, e.g. mobile phones, computers or vehicles
    • H04N5/232Devices for controlling television cameras, e.g. remote control; Control of cameras comprising an electronic image sensor, e.g. digital cameras, video cameras, TV cameras, video cameras, camcorders, webcams, camera modules for embedding in, e.g. mobile phones, computers or vehicles
    • H04N5/23219Control of camera operation based on recognized human faces, facial parts, facial expressions or other parts of the human body
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06KRECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K9/00Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
    • G06K9/00221Acquiring or recognising human faces, facial parts, facial sketches, facial expressions
    • G06K9/00268Feature extraction; Face representation
    • G06K9/00281Local features and components; Facial parts ; Occluding parts, e.g. glasses; Geometrical relationships
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working
    • H04N7/15Conference systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06KRECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K9/00Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
    • G06K9/00624Recognising scenes, i.e. recognition of a whole field of perception; recognising scene-specific objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06KRECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K9/00Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
    • G06K9/00362Recognising human body or animal bodies, e.g. vehicle occupant, pedestrian; Recognising body parts, e.g. hand
    • G06K9/00369Recognition of whole body, e.g. static pedestrian or occupant recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06KRECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K9/00Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
    • G06K9/36Image preprocessing, i.e. processing the image information without deciding about the identity of the image
    • G06K9/46Extraction of features or characteristics of the image
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06KRECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K9/00Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
    • G06K9/62Methods or arrangements for recognition using electronic means
    • G06K9/6217Design or setup of recognition systems and techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion

Similar Documents

Publication Publication Date Title
CN112088402B (en) Federated neural network for speaker recognition
Zotkin et al. Joint audio-visual tracking using particle filters
Busso et al. Smart room: Participant and speaker localization and identification
Qian et al. Multi-speaker tracking from an audio–visual sensing device
US6894714B2 (en) Method and apparatus for predicting events in video conferencing and other applications
US7433495B2 (en) Automatic detection and tracking of multiple individuals using multiple cues
Zhang et al. Boosting-based multimodal speaker detection for distributed meeting videos
US20210281739A1 (en) Information processing device and method, and program
JP4669150B2 (en) Main subject estimation apparatus and main subject estimation method
Zhang et al. Boosting-based multimodal speaker detection for distributed meetings
CN114513622A (en) Speaker detection method, speaker detection apparatus, storage medium, and program product
US11689380B2 (en) Method and device for viewing conference
Zhang et al. Robust multi-view multi-camera face detection inside smart rooms using spatio-temporal dynamic programming
Pingali et al. Audio-visual tracking for natural interactivity
Gatica-Perez et al. A mixed-state i-particle filter for multi-camera speaker tracking
Schauerte et al. Multi-modal and multi-camera attention in smart environments
Korchagin et al. Just-in-time multimodal association and fusion from home entertainment
Wang et al. Real-time automated video and audio capture with multiple cameras and microphones
Yan et al. Computational audiovisual scene analysis in online adaptation of audio-motor maps
Soldatos et al. Perceptual interfaces and distributed agents supporting ubiquitous computing services
CN110730378A (en) Information processing method and system
Canton-Ferrer et al. Multimodal real-time focus of attention estimation in smartrooms
Vryzas et al. Investigating Multimodal Audiovisual Event Detection and Localization
Zhang Other Applications
Potamianos et al. Audio-visual ASR from multiple views inside smart rooms