Fusion of Speech, Faces and Text for Person Identification in TV Broadcast

Hervé Bredin¹⁹,
Johann Poignant²⁰,
Makarand Tapaswi²¹,
Guillaume Fortier²²,
Viet Bac Le²³,
Thibault Napoleon²⁴,
Hua Gao²¹,
Claude Barras¹⁹,
Sophie Rosset¹⁹,
Laurent Besacier²⁰,
Jakob Verbeek²²,
Georges Quénot²⁰,
Frédéric Jurie²⁴ &
…
Hazim Kemal Ekenel²¹

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 7585))

Included in the following conference series:

European Conference on Computer Vision

4152 Accesses
7 Citations

Abstract

The Repere challenge is a project aiming at the evaluation of systems for supervised and unsupervised multimodal recognition of people in TV broadcast. In this paper, we describe, evaluate and discuss QCompere consortium submissions to the 2012 Repere evaluation campaign dry-run. Speaker identification (and face recognition) can be greatly improved when combined with name detection through video optical character recognition. Moreover, we show that unsupervised multimodal person recognition systems can achieve performance nearly as good as supervised monomodal ones (with several hundreds of identity models).

Download to read the full chapter text

Chapter PDF

Multimodal Speaker Diarization Utilizing Face Clustering Information

Naming multi-modal clusters to identify persons in TV broadcast

Article 01 July 2015

Unified System for Visual Speech Recognition and Speaker Identification

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Zhao, W., Chellappa, R., Phillips, P.J., Rosenfeld, A.: Face Recognition: a Literature Survey. ACM Comput. Surv. 35(4), 399–458 (2003)
Article Google Scholar
Bimbot, F., Bonastre, J.F., Fredouille, C., Gravier, G., Magrin-Chagnolleau, I., Meignier, S., Merlin, T., Ortega-García, J., Petrovska-Delacrétaz, D., Reynolds, D.A.: A Tutorial on Text-Independent Speaker Verification. EURASIP J. Appl. Signal Process. 2004, 430–451 (2004)
Article Google Scholar
Barras, C., Zhu, X., Meignier, S., Gauvain, J.L.: Multi-Stage Speaker Diarization of Broadcast News. IEEE Transactions on Audio, Speech and Language Processing 14(5), 1505–1512 (2006)
Article Google Scholar
Le, V.B., Barras, C., Ferràs, M.: On the use of GSV-SVM for Speaker Diarization and Tracking. In: Proc. Odyssey 2010 - The Speaker and Language Recognition Workshop, Brno, Czech Republic, pp. 146–150 (June 2010)
Google Scholar
Baeuml, M., Bernardin, K., Fischer, M., Ekenel, H., Stiefelhagen, R.: Multi-Pose Face Recognition for Person Retrieval in Camera Networks. In: Advanced Video and Signal-based Surveillance (2010)
Google Scholar
Ekenel, H., Stiefelhagen, R.: Analysis of Local Appearance Based Face Recognition: Effects of Feature Selection and Feature Normalization. In: CVPR Biometrics Workshop (2006)
Google Scholar
Everingham, M., Sivic, J., Zisserman, A.: “Hello! My name is... Buffy” – Automatic Naming of Characters in TV video. In: British Machine Vision Conference (2006)
Google Scholar
Dalal, N., Triggs, B.: Histograms of Oriented Gradients for Human Detection. In: International Conference on Computer Vision & Pattern Recognition, pp. 886–893 (2005)
Google Scholar
Guillaumin, M., Mensink, T., Verbeek, J., Schmid, C.: Face Recognition from Caption-based Supervision. International Journal of Computer Vision 96(1), 64–82 (2012)
Article MathSciNet MATH Google Scholar
Poignant, J., Besacier, L., Quénot, G., Thollard, F.: From Text Detection in Videos to Person Identification. In: IEEE ICME, Melbourne, Australia (2012)
Google Scholar
Gauvain, J., Lamel, L., Adda, G.: The LIMSI Broadcast News Transcription System. Speech Communication 37(1-2), 89–109 (2002)
Article MATH Google Scholar
Dinarelli, M., Rosset, S.: Models Cascade for Tree-Structured Named Entity Detection. In: Proceedings of International Joint Conference of Natural Language Processing (IJCNLP), Chiang Mai, Thailand (November 2011)
Google Scholar

Download references

Author information

Authors and Affiliations

CNRS-LIMSI UPR 3251, Univ Paris-Sud, BP 133, F-91403, Orsay, France
Hervé Bredin, Claude Barras & Sophie Rosset
UJF-Grenoble 1 / UPMF-Grenoble 2 / Grenoble INP / CNRS-LIG UMR 5217, F-38041, Grenoble, France
Johann Poignant, Laurent Besacier & Georges Quénot
Karlsruher Institut fur Technologie, Karlsruhe, Germany
Makarand Tapaswi, Hua Gao & Hazim Kemal Ekenel
INRIA Rhone-Alpes, 655 Avenue de lEurope, F-38330, Montbonnot, France
Guillaume Fortier & Jakob Verbeek
Vocapia Research, Parc Orsay Université, 3 rue Jean Rostand, F-91400, Orsay, France
Viet Bac Le
Université de Caen / GREYC UMR 6072, F-14050, Caen Cedex, France
Thibault Napoleon & Frédéric Jurie

Authors

Hervé Bredin
View author publications
You can also search for this author in PubMed Google Scholar
Johann Poignant
View author publications
You can also search for this author in PubMed Google Scholar
Makarand Tapaswi
View author publications
You can also search for this author in PubMed Google Scholar
Guillaume Fortier
View author publications
You can also search for this author in PubMed Google Scholar
Viet Bac Le
View author publications
You can also search for this author in PubMed Google Scholar
Thibault Napoleon
View author publications
You can also search for this author in PubMed Google Scholar
Hua Gao
View author publications
You can also search for this author in PubMed Google Scholar
Claude Barras
View author publications
You can also search for this author in PubMed Google Scholar
Sophie Rosset
View author publications
You can also search for this author in PubMed Google Scholar
Laurent Besacier
View author publications
You can also search for this author in PubMed Google Scholar
Jakob Verbeek
View author publications
You can also search for this author in PubMed Google Scholar
Georges Quénot
View author publications
You can also search for this author in PubMed Google Scholar
Frédéric Jurie
View author publications
You can also search for this author in PubMed Google Scholar
Hazim Kemal Ekenel
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Dipartimento di Ingegneria Elettrica, Gestionale e Meccanica (DIEGM), Università degli Studi di Udine, Via delle Scienze, 208, 33100, Udine, Italy
Andrea Fusiello
IIT Istituto Italiano di Tecnologia, Via Morego 30, 16163, Genoa, Italy
Vittorio Murino
Dipartimento di Ingegneria dell’Informazione, Università degli Studi di Modena e Reggio Emilia, Strada Vignolege, 905, 41125, Modena, Italy
Rita Cucchiara

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bredin, H. et al. (2012). Fusion of Speech, Faces and Text for Person Identification in TV Broadcast. In: Fusiello, A., Murino, V., Cucchiara, R. (eds) Computer Vision – ECCV 2012. Workshops and Demonstrations. ECCV 2012. Lecture Notes in Computer Science, vol 7585. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33885-4_39

Download citation

DOI: https://doi.org/10.1007/978-3-642-33885-4_39
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-33884-7
Online ISBN: 978-3-642-33885-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Fusion of Speech, Faces and Text for Person Identification in TV Broadcast

Abstract

Chapter PDF

Similar content being viewed by others

Multimodal Speaker Diarization Utilizing Face Clustering Information

Naming multi-modal clusters to identify persons in TV broadcast

Unified System for Visual Speech Recognition and Speaker Identification

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Fusion of Speech, Faces and Text for Person Identification in TV Broadcast

Abstract

Chapter PDF

Similar content being viewed by others

Multimodal Speaker Diarization Utilizing Face Clustering Information

Naming multi-modal clusters to identify persons in TV broadcast

Unified System for Visual Speech Recognition and Speaker Identification

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation