Abstract
Content-based analysis to find where violence appears in multimedia content has several applications, from parental control and children protection to surveillance. This paper presents the design and annotation of the Violent Scene Detection dataset, a corpus targeting the detection of physical violence in Hollywood movies. We discuss definitions of physical violence and provide a simple and objective definition which was used to annotate a set of 18 movies, thus resulting in the largest freely-available dataset for such a task. We discuss borderline cases and compare with annotations based on a subjective definition which requires multiple annotators. We provide a detailed analysis of the corpus, in particular regarding the relationship between violence and a set of key audio and visual concepts which were also annotated. The VSD dataset results from two years of benchmarking in the framework of the MediaEval initiative. We provide results from the 2011 and 2012 benchmarks as a validation of the dataset and as a state-of-the-art baseline. The VSD dataset is freely available at the address: http://www.technicolor.com/en/innovation/research-innovation/scientific-data-sharing/violent-scenes-dataset..






Similar content being viewed by others
References
Acar E, Albayrak S (2012) Dai lab at mediaeval 2012 affect task: the detection of violent scenes using affective features. In: MediaEval 2012, multimedia benchmark workshop
Acar E, Spiegel S, Albayrak S (2011) Mediaeval 2011 affect task: violent scene detection combining audio and visual features with svm. In: MediaEval 2011, multimedia benchmark workshop
Chen L-H, Hsu H-W, Wang L-Y, Su C-W (2011) Violence detection in movies. In: 2011 8th international conference on computer graphics, imaging and visualization (CGIV), pp 119–124
Chen L-H, Su C-W, Weng C-F, Liao H-YM (2009) Action scene detection with support vector machines. J Multimed 4:248–253
Chen Y, Zhang L, Lin B, Xu Y, Ren X (2011) Fighting detection based on optical flow context histogram. In: Second international conference on innovations in Bio-inspired computing and applications (IBICA), 2011, pp 95–98
de Souza F D M, Chavez G C, do Valle Jr E A, de Araujo AA (2010) Violence detection in video using spatio-temporal features. In: Proceedings of the 2010 23rd SIBGRAPI conference on graphics, patterns and images. IEEE Computer Society, Washington, DC, pp 224–230
Demarty C-H, Penet C, Gravier G, Soleymani M (2011) The mediaeval 2011 affect task: violent scenes detection in hollywood movies. In: MediaEval 2011, multimedia benchmark workshop, CEUR workshop proceedings, vol 807. CEUR-WS.org
Demarty C-H, Penet C, Gravier G, Soleymani M (2011) The MediaEval 2012 affect task: violent scenes detection. In: MediaEval 2012 workshop, vol 927, Pisa, Italy, 4–5 October 2012. ceur-ws.org.
Demarty C-H, Penet C, Gravier G, Soleymani M (2012) A benchmarking campaign for the multimodal detection of violent scenes in movies. In: Springer, editor, ECCV 2012 workshop on IFCVCR, pp 416–425
Derbas N, Thollard F, Safadi B, Quénot G (2012) Lig at mediaeval 2012 affect task: use of a generic method. In: MediaEval 2012, multimedia benchmark workshop
Eyben F, Weninger F, Lehment N, Rigoll G, Schuller B (2012) Violent scenes detection with large, brute-forced acoustic and visual feature sets. In: MediaEval 2012 multimedia benchmark workshop
Giannakopoulos T, Kosmopoulos DI, Aristidou A, Theodoridis S (2006) Violence content classification using audio features. In: Proceedings of the 4th helenic conference on artificial intelligence, pp 502–507
Giannakopoulos T, Kosmopoulos DI, Aristidou A, Theodoridis S (2007) A multi-class audio classification method with respect to violent content in movies using Bayesian networks. In: Proceedings of the 9th IEEE workshop on multimedia signal processing, pp 90–93
Giannakopoulos T, Makris A, Kosmopoulos D, Perantonis S, Theodoridis S (2010) Audio-visual fusion for detecting violent scenes in videos. In: Konstantopoulos S et al (eds) Artificial intelligence: theories, models and applications, LNCS, vol 6040. Springer, pp 91–100
Gninkoun G, Soleymani M (2011) Automatic violence scenes detection: a multi-modal approach. In: MediaEval 2011, multimedia benchmark workshop
Gong Y, Wang W, Jiang S, Huang Q, Gao W (2008) Detecting violent scenes in movies by auditory and visual cues. In: Huang Y-M et al (eds) Advances in multimedia information processing - PCM 2008, LNCS, vol 5353. Springer, pp 317–326
Jiang Y-G, Dai Q, Tan CC, Xue X, Ngo C-W (2012) The shanghai-hongkong team at mediaeval 2012: violent scene detection using trajectory-based features. In: MediaEval 2012, multimedia benchmark workshop
Kriegel B (2003) La violence à la télévision. Rapport de la Mission d’évaluation, d’analyse et de propositions relative aux représentations violentes à la télévision. Technical report, Ministère de la Culture et de la Communication, Paris, France
Krug E G, Mercy J A, Dahlberg L L, Zwi A B (2002) The world report on violence and health. Lancet 360(9339):1083–1088
Lam V, Le D-D, Le S-P, Satoh S, Duong DA (2012) Nii, Japan at Mediaeval 2012 violent scenes detection affect task. In: MediaEval 2011, multimedia benchmark workshop
Lam V, Le D-D, Satoh S, Duong DA (2011) Nii, Japan at Mediaeval 2011 violent scenes detection task. In: MediaEval 2011, multimedia benchmark workshop
Larson M, Rae A, Demarty C-H, Koer C, Metze F, Troncy R, Mezaris V, Jones GJF (eds) (2011) Working notes proceedings of the MediaEval 2011 workshop, Pisa, Italy, 1–2 September 2011, CEUR workshop proceedings, vol 807. CEUR-WS.org
Larson M, Schmiedeke S, Kelm P, Rae A, Mezaris V, Piatrik T, Soleymani M, Metze F, Jones GJF (eds) (2012) Working notes proceedings of the MediaEval 2012 workshop, Pisa, Italy, 4–5 October 2012, CEUR workshop proceedings, vol 927. CEUR-WS.org
Li L (2012) A novel violent videos classification scheme based on the bag of audio words features. In: 2012 9th international conference on information technology: new generations (ITNG), pp 7–13
Lin W, Sun M-T, Poovendran R, Zhang Z (2010) Group event detection with a varying number of group members for video surveillance. IEEE Trans Circ Syst Video Technol 20(8):1057–1067
Lin J, Sun Y, Wang W (2010) Violence detection in movies with auditory and visual cues. In: Proceedings of the international conference on computational intelligence and security, pp 561 –565
Lin J, Wang W (2009) Weakly-supervised violence detection in movies with audio and video based co-training. In: Proceedings of the 10th pacific-rim conference on multimedia, pp 930–935
Marszałek M, Laptev I, Schmid C (2009) Actions in context. In: IEEE conference on computer vision & pattern recognition
Moncrieff S, Dorai C, Venkatesh S (2001) Affect computing in film through sound energy dynamics. In: Proceedings of the ACM international conference on multimedia, pp 525–527
Moncrieff S, Dorai C, Venkatesh S (2001) Detecting indexical signs in film audio for scene interpretation. In: Proceedings of the IEEE internation conference on multimedia and expo. pp 989–992
Nievas EB, Suarez OD, García G B, Sukthankar R (2011) Violence detection in video using computer vision techniques. In: Proceedings of the 14th international conference on computer analysis of images and patterns - vol Part II, CAIP’11. Springer, Berlin, pp 332–339
Over P, Awad G, Fiscus J, Antonishek B, Michel M, Smeaton FA, Kraaij W, Quénot G (2011) An overview of the goals, tasks, data, evaluation mechanisms and metrics. In: TRECVID 2011 - TREC video retrieval evaluation online, Gaithersburg, MD, USA
Penet C, Demarty C-H, Gravier G, Gros P (2011) Technicolor and inria/irisa at mediaeval 2011: learning temporal modality integration with bayesian networks. In: MediaEval 2011, multimedia benchmark workshop, CEUR workshop proceedings, vol 807. CEUR-WS.org
Penet C, Demarty C-H, Soleymani M, Gravier G, Gros P (2012) Technicolor/Inria/Imperial College London at the Mediaeval 2012 violent scene detection task. In: MediaEval 2012, multimedia benchmark workshop
Perperis T, Giannakopoulos T, Makris A, Kosmopoulos DI, Tsekeridou S, Perantonis SJ, Theodoridis S (2011) Multimodal and ontology-based fusion approaches of audio and visual processing for violence detection in movies. J Expert Syst Appl 38(11):14102–14116
Safadi B, Quéenot G (2011) Lig at Mediaeval 2011 affect task: use of a generic method. In: MediaEval 2011, multimedia benchmark workshop
Schlüter J, Ionescu B, Mironicǎ I, Schedl M (2012) Arf @ mediaeval 2012: an uninformed approach to violence detection in hollywood movies. In: MediaEval 2012, multimedia benchmark workshop
Vasconcelos N, Lippman A (1997) Towards semantically meaningful feature spaces for the characterization of video content. In: Proceedings of the IEEE international conference on image processing, vol 1, pp 25–28
Wang S, Jiang S, Huang Q, Gao W (2008) Shot classification for action movies based on motion characteristics. In: Proceedings of the IEEE international conference on image processing, pp 2508–2511
WHO (1996) Violence: a public health priority. Technical Report WHO/EHA/SPI.POA.2, World Health Organization, Geneva, Switzerland
Zajdel W, Krijnders JD, Andringa T, Gavrila DM (2007) Cassandra: audio-video sensor fusion for aggression detection. In: IEEE conference on advanced video and signal based surveillance, 2007. AVSS 2007. IEEE, pp 200-205
Acknowledgments
This work was partially supported by the Quaero Program. We would also like to acknowledge the MediaEval Multimedia Benchmark for providing the framework to evaluate the task of violent scene detection.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Demarty, CH., Penet, C., Soleymani, M. et al. VSD, a public dataset for the detection of violent scenes in movies: design, annotation, analysis and evaluation. Multimed Tools Appl 74, 7379–7404 (2015). https://doi.org/10.1007/s11042-014-1984-4
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-014-1984-4