Abstract
The Repere challenge is a project aiming at the evaluation of systems for supervised and unsupervised multimodal recognition of people in TV broadcast. In this paper, we describe, evaluate and discuss QCompere consortium submissions to the 2012 Repere evaluation campaign dry-run. Speaker identification (and face recognition) can be greatly improved when combined with name detection through video optical character recognition. Moreover, we show that unsupervised multimodal person recognition systems can achieve performance nearly as good as supervised monomodal ones (with several hundreds of identity models).
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Zhao, W., Chellappa, R., Phillips, P.J., Rosenfeld, A.: Face Recognition: a Literature Survey. ACM Comput. Surv. 35(4), 399–458 (2003)
Bimbot, F., Bonastre, J.F., Fredouille, C., Gravier, G., Magrin-Chagnolleau, I., Meignier, S., Merlin, T., Ortega-García, J., Petrovska-Delacrétaz, D., Reynolds, D.A.: A Tutorial on Text-Independent Speaker Verification. EURASIP J. Appl. Signal Process. 2004, 430–451 (2004)
Barras, C., Zhu, X., Meignier, S., Gauvain, J.L.: Multi-Stage Speaker Diarization of Broadcast News. IEEE Transactions on Audio, Speech and Language Processing 14(5), 1505–1512 (2006)
Le, V.B., Barras, C., Ferràs, M.: On the use of GSV-SVM for Speaker Diarization and Tracking. In: Proc. Odyssey 2010 - The Speaker and Language Recognition Workshop, Brno, Czech Republic, pp. 146–150 (June 2010)
Baeuml, M., Bernardin, K., Fischer, M., Ekenel, H., Stiefelhagen, R.: Multi-Pose Face Recognition for Person Retrieval in Camera Networks. In: Advanced Video and Signal-based Surveillance (2010)
Ekenel, H., Stiefelhagen, R.: Analysis of Local Appearance Based Face Recognition: Effects of Feature Selection and Feature Normalization. In: CVPR Biometrics Workshop (2006)
Everingham, M., Sivic, J., Zisserman, A.: “Hello! My name is... Buffy” – Automatic Naming of Characters in TV video. In: British Machine Vision Conference (2006)
Dalal, N., Triggs, B.: Histograms of Oriented Gradients for Human Detection. In: International Conference on Computer Vision & Pattern Recognition, pp. 886–893 (2005)
Guillaumin, M., Mensink, T., Verbeek, J., Schmid, C.: Face Recognition from Caption-based Supervision. International Journal of Computer Vision 96(1), 64–82 (2012)
Poignant, J., Besacier, L., Quénot, G., Thollard, F.: From Text Detection in Videos to Person Identification. In: IEEE ICME, Melbourne, Australia (2012)
Gauvain, J., Lamel, L., Adda, G.: The LIMSI Broadcast News Transcription System. Speech Communication 37(1-2), 89–109 (2002)
Dinarelli, M., Rosset, S.: Models Cascade for Tree-Structured Named Entity Detection. In: Proceedings of International Joint Conference of Natural Language Processing (IJCNLP), Chiang Mai, Thailand (November 2011)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Bredin, H. et al. (2012). Fusion of Speech, Faces and Text for Person Identification in TV Broadcast. In: Fusiello, A., Murino, V., Cucchiara, R. (eds) Computer Vision – ECCV 2012. Workshops and Demonstrations. ECCV 2012. Lecture Notes in Computer Science, vol 7585. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33885-4_39
Download citation
DOI: https://doi.org/10.1007/978-3-642-33885-4_39
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-33884-7
Online ISBN: 978-3-642-33885-4
eBook Packages: Computer ScienceComputer Science (R0)