Training Data Extraction and Object Detection in Surveillance Scenario †
<p>Structure of the training procedure.</p> "> Figure 2
<p>Results of automatic foreground–background separation.</p> "> Figure 3
<p>Division into positive (P) and negative (N1–N4) examples.</p> "> Figure 4
<p>Neural Network Structure, <math display="inline"><semantics> <mrow> <mi>n</mi> <mo>∈</mo> <mo>{</mo> <mn>1</mn> <mo>,</mo> <mn>2</mn> <mo>,</mo> <mn>4</mn> <mo>,</mo> <mn>7</mn> <mo>}</mo> </mrow> </semantics></math>.</p> "> Figure 5
<p>Example training samples of hat, logo, helmet, and shirt.</p> "> Figure 6
<p>Frame with marked hat, logo, helmet, and shirt samples.</p> "> Figure 7
<p>Detection results filtered by minimum distance (25 frames) between hits.</p> "> Figure 8
<p>Impact of different number of cascades (<span class="html-italic">k</span>): (<b>a</b>) training/detection time; (<b>b</b>) hat detection EER.</p> "> Figure 9
<p>Hat in ’00012’ detection results with respect to number of the requested cascade stages.</p> "> Figure 10
<p>Hat in ’00012’ detection results for different training data collection methods.</p> "> Figure 11
<p>Foreground/background segmentation results using only background subtraction (on the left) and background subtraction + GrabCut (on the right) for background cut-off threshold <math display="inline"><semantics> <mrow> <msub> <mi>c</mi> <mrow> <mi>t</mi> <mi>h</mi> <mi>r</mi> </mrow> </msub> <mo>=</mo> <mn>600</mn> </mrow> </semantics></math>.</p> "> Figure 12
<p>Comparison of detector accuracy with GrabCut turned on or off for different thresholds of background cut-off (pattern hat).</p> "> Figure 13
<p>Hat in ’00012’ detection results for different levels of geometric synthesis.</p> "> Figure 14
<p>Hat in ’00012’ detection results for different contrast and sharpness synthesis levels.</p> "> Figure 15
<p>Comparison of detector performance using 24 × 24 and 32 × 32 HOG features for different sizes of the training set and hat pattern.</p> "> Figure 16
<p>Receiver Operator Characteristic (ROC) and Precision-Recall (PR) curves of hat, logo, helmet, and shirt detections in ’00012’ sequence.</p> "> Figure 17
<p>Three tested t-shirt logo patterns: (<b>a</b>) pattern P1, (<b>b</b>) pattern P2, (<b>c</b>) pattern P3.</p> "> Figure 18
<p>(<b>a</b>) ROC curve for the pattern P1. (<b>b</b>) Accumulated ROC curve for 5-elements sequence analysis. (<b>c</b>) ROC curve for cereal_1 object in desk_3 sequence.</p> ">
Abstract
:1. Introduction
- introduction of a new foreground/background segmentation procedure,
- incorporation and evaluation of a CNN classifier in the detector framework,
- additional experiments covering new and previous features,
- extended state-of-the-art analysis.
2. Methods
2.1. Detector Overview
2.2. Collection of Positive Training Samples
2.2.1. Object Tracking and Foreground–Background Separation Using Motion Information
2.2.2. GrabCut Algorithm
2.2.3. Object Tracking and Foreground–Background Separation Using Hybrid Motion Information and GrabCut
- 1.
- The object is tracked and its foreground mask is obtained using methods from Section 2.2.1.
- 2.
- For each frame the current foreground–background segmentation results are used to initialize a GrabCut trimap, specifically:
- the foreground region is used to initialize the G-C , with an exception for the area for which no background model could be reliably established (areas that belong to each collected tracking ROI),
- the G-C is initialized outside the tracked ROI border, to provide enough pixels for background estimation the tracking area is scaled uniformly by 50%,
- the remaining area of ROI becomes the .
2.2.4. Image Stabilization in a Short Sequence
2.3. Collection of Negative Training Samples
2.4. Positive Samples Generalization and Synthesis
2.4.1. Geometric Generalization
2.4.2. Intensity and Contrast Synthesis
2.4.3. Application of Blur
2.4.4. Merging with the Background
2.5. Detector Training
2.6. Detector Training Using HOG+SVM
2.7. Detector Training Using Tuned VGG16 Network
2.8. Detection and Post-Processing
3. Experiments
3.1. Experimental Setup
- ROC curve: Receiver Operator Characteristic curve is the curve relating the True Positive Rate (TPR) and the False Positive Rate (FPR). The True Positive Rate (TPR) is defined as the number of correctly detected examples (TP) by the total number of positive occurrences (P), the False Positive Rate (FPR) is defined as the number of incorrectly detected examples (FP) by the total number of negative samples (N). The ROC curve gives comparable results even for the imbalanced datasets.
- PR curve: Precision-Recall curve is the curve relating Precision (the number of truly positive samples among all positive detections) and the Recall (another name for True Positive Rate). The PR curve is useful for establishing how many good hits can be expected among those best ranked by the detector.
- AUC: Area Under Curve is a useful single measure to summarize ROC curve, computed by the integration of TPR over FPR
- AVGPR: Average Precision-Recall is another a useful measure to summarize PR curve, computed by integration of the Precision over Recall
- EER: Equal Error Rate denotes a line on ROC plot where TPR + FPR = 1 or a point on ROC curve meeting this requirement
3.2. Preliminary Experiments
3.2.1. Two-Layer Detector
3.2.2. Collection of Training Samples
3.2.3. Application of the GrabCut Algorithm
3.2.4. Synthetic Generalization of Training Data
3.2.5. Selection of Training Data Size
3.2.6. VGG16 as 2-nd Stage Classifier
3.2.7. Application of Faster-RCNN to Preprocessed Data
3.2.8. Detection of Various Patterns
3.3. Large-Scale Experiments
4. Conclusions
Author Contributions
Funding
Conflicts of Interest
Abbreviations
AUC | Area Under Curve (ROC) |
AVGPR | Average Precision-Recall |
CC | Cascade Classifier |
CNN | Convolutional Neural Network |
CSK | Circulant Structure of Kernels |
EER | Equal Error Rate |
FPR | False Positive Rate |
GPU | Graphic Processing Unit |
HD | High Definition |
HOG | Histogram of Oriented Gradients |
PR | Precision-Recal (curve) |
P-REACT | Petty cRiminality diminution through sEarch and Analysis |
in multi-source video Capturing and archiving plaTform | |
RBF | Radial Basis Function |
RGB-D | Red Green Blue-Depth |
ROC | Receiver Operator Characteristics |
ROI | Region of Interest |
RPCA | Robust Principal Component Analysis |
SIFT | Scale Invariant Feature Transform |
SURF | Speeded-Up Robust Features |
SVM | Support Vector Machine |
TPR | True Positive Rate |
References
- Arraiza, J.; Aginako, N.; Kioumourtzis, G.; Leventakis, G.; Stavropoulos, G.; Tzovaras, D.; Zotos, N.; Sideris, A.; Charalambous, E.; Koutras, N. Fighting Volume Crime: An Intelligent, Scalable, and Low Cost Approach. In Proceedings of the 9th Summer Safety & Reliability Seminars, SSARS 2015, Gdansk/Sopot, Poland, 21–27 June 2015. [Google Scholar]
- Blunsden, S.; Fisher, R. The BEHAVE video dataset: Ground truthed video for multi-person behavior classification. Ann. BMVA 2010, 2010, 1–11. [Google Scholar]
- Awad, G.; Snoek, C.G.M.; Smeaton, A.F.; Quénot, G. TRECVid Semantic Indexing of Video: A 6-Year Retrospective. ITE Trans. Media Technol. Appl. 2016, 4, 187–208. [Google Scholar] [CrossRef] [Green Version]
- Wilkowski, A.; Kasprzak, W.; Stefańczyk, M. Object detection in the police surveillance scenario. In Proceedings of the 2019 Federated Conference on Computer Science and Information Systems, Leipzig, Germany, 1–4 September 2019; Volume 18, pp. 363–372. [Google Scholar]
- Redmon, J.; Divvala, S.K.; Girshick, R.B.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015. [Google Scholar]
- Zeng, D.; Zhao, F.; Ge, S.; Shen, W. Fast cascade face detection with pyramid network. Pattern Recognit. Lett. 2019, 119, 180–186. [Google Scholar] [CrossRef]
- Woźniak, M.; Połap, D. Object detection and recognition via clustered features. Neurocomputing 2018, 320, 76–84. [Google Scholar] [CrossRef]
- Yang, L.; Jin, R. Distance metric learning: A comprehensive survey. Mich. State Univ. 2006, 2, 4. [Google Scholar]
- Sohn, K. Improved deep metric learning with multi-class N-pair loss objective. In Advances in Neural Information Processing Systems; Curran Associates Inc.: Red Hook, NY, USA, 2016; pp. 1857–1865. [Google Scholar]
- Sung, F.; Yang, Y.; Zhang, L.; Xiang, T.; Torr, P.H.; Hospedales, T.M. Learning to compare: Relation network for few-shot learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 1199–1208. [Google Scholar]
- Wang, J.; Zhou, F.; Wen, S.; Liu, X.; Lin, Y. Deep metric learning with angular loss. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2593–2601. [Google Scholar]
- Zhou, F.; Wu, B.; Li, Z. Deep meta-learning: Learning to learn in the concept space. arXiv 2018, arXiv:1802.03596. [Google Scholar]
- Wang, Y.X.; Girshick, R.; Hebert, M.; Hariharan, B. Low-shot learning from imaginary data. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 7278–7286. [Google Scholar]
- Hariharan, B.; Girshick, R. Low-shot visual recognition by shrinking and hallucinating features. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 3018–3027. [Google Scholar]
- Chiatti, A.; Bardaro, G.; Bastianelli, E.; Tiddi, I.; Mitra, P.; Motta, E. Task-agnostic object recognition for mobile robots through few-shot image matching. Electronics 2020, 9, 380. [Google Scholar] [CrossRef] [Green Version]
- Chen, H.; Wang, Y.; Wang, G.; Qiao, Y. Lstd: A low-shot transfer detector for object detection. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, New Orleans, LA, USA, 2–7 February 2018. [Google Scholar]
- Dong, X.; Zheng, L.; Ma, F.; Yang, Y.; Meng, D. Few-example object detection with model communication. IEEE Trans. Pattern Anal. Mach. Intell. 2018, 41, 1641–1654. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Shao, Q.; Qi, J.; Ma, J.; Fang, Y.; Wang, W.; Hu, J. Object Detection-Based One-Shot Imitation Learning with an RGB-D Camera. Appl. Sci. 2020, 10, 803. [Google Scholar] [CrossRef] [Green Version]
- Karlinsky, L.; Shtok, J.; Harary, S.; Schwartz, E.; Aides, A.; Feris, R.; Giryes, R.; Bronstein, A.M. RepMet: Representative-based metric learning for classification and few-shot object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–20 June 2019; pp. 5197–5206. [Google Scholar]
- Wang, Y.; Yao, Q.; Kwok, J.; Ni, L.M. Generalizing from a Few Examples: A Survey on Few-Shot Learning. arXiv 2019, arXiv:cs.LG/1904.05046. [Google Scholar]
- Abramson, Y.; Freund, Y. Active Learning for Visual Object Detection; Technical Report; UCSD: San Diego, CA, USA, 2006. [Google Scholar]
- Abramson, Y.; Freund, Y. SEmi-automatic VIsual LEarning (SEVILLE): Tutorial on active learning for visual object recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, San Diego, CA, USA, 25 July 2005. [Google Scholar]
- Sivic, J.; Zisserman, A. Video Google: A Text Retrieval Approach to Object Matching in Videos; IEEE Computer Society: Washington, DC, USA, 2003; Volume 2, p. 1470. [Google Scholar]
- Rother, C.; Kolmogorov, V.; Blake, A. “GrabCut”: Interactive Foreground Extraction Using Iterated Graph Cuts; ACM SIGGRAPH 2004 Papers; Association for Computing Machinery: New York, NY, USA, 2004; pp. 309–314. [Google Scholar]
- Kalal, Z.; Mikolajczyk, K.; Matas, J. Tracking-Learning-Detection. IEEE Trans. Pattern Anal. Mach. Intell. 2012, 34, 1409–1422. [Google Scholar] [CrossRef] [Green Version]
- Andriluka, M.; Roth, S.; Schiele, B. People-tracking-by-detection and people-detection-by-tracking. In Proceedings of the 2008 IEEE Conference on Computer Vision and Pattern Recognition, Anchorage, AK, USA, 24–26 June 2008; pp. 1–8. [Google Scholar]
- Feichtenhofer, C.; Pinz, A.; Zisserman, A. Detect to Track and Track to Detect. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 3057–3065. [Google Scholar]
- Kang, K.; Ouyang, W.; Li, H.; Wang, X. Object Detection from Video Tubelets with Convolutional Neural Networks. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 26–30 June 2016; pp. 817–825. [Google Scholar]
- Ren, S.; He, K.; Girshick, R.B.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef] [Green Version]
- Henriques, J.F.; Caseiro, R.; Martins, P.; Batista, J. Exploiting the Circulant Structure of Tracking-by-Detection with Kernels. In Proceedings of the 12th European Conference on Computer Vision, Florence, Italy, 7–13 October 2012. [Google Scholar]
- Danelljan, M.; Khan, F.S.; Felsberg, M.; Van de Weijer, J. Adaptive Color Attributes for Real-Time Visual Tracking. In Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 1090–1097. [Google Scholar]
- Zivkovic, Z. Improved adaptive Gaussian mixture model for background subtraction. In Proceedings of the 17th International Conference on Pattern Recognition, Cambridge, UK, 23–26 August 2004; Volume 2, pp. 28–31. [Google Scholar]
- Chen, B.; Shi, L.; Ke, X. A Robust Moving Object Detection in Multi-Scenario Big Data for Video Surveillance. IEEE Trans. Circuits Syst. Video Technol. 2019, 29, 982–995. [Google Scholar] [CrossRef]
- Cao, X.; Yang, L.; Guo, X. Total Variation Regularized RPCA for Irregularly Moving Object Detection Under Dynamic Background. IEEE Trans. Cybern. 2016, 46, 1014–1027. [Google Scholar] [CrossRef] [PubMed]
- Itseez. Open Source Computer Vision Library. 2015. Available online: https://github.com/itseez/opencv (accessed on 7 May 2020).
- Posłuszny, T.; Putz, B. An Improved Extraction Process of Moving Objects’ Silhouettes in Video Sequences. In Advanced Mechatronics Solutions; Jabłoński, R., Brezina, T., Eds.; Springer International Publishing: Cham, Switzerland, 2016; pp. 57–65. [Google Scholar]
- Bay, H.; Tuytelaars, T.; Van Gool, L. SURF: Speeded Up Robust Features. In Proceedings of the Computer Vision–ECCV 2006: 9th European Conference on Computer Vision, Graz, Austria, 7–13 May 2006. [Google Scholar]
- Lowe, D.G. Distinctive Image Features from Scale-Invariant Keypoints. Int. J. Comput. Vis. 2004, 60, 91–110. [Google Scholar] [CrossRef]
- Rousseeuw, P.J.; Leroy, A.M. Robust Regression and Outlier Detection. In Robust Regression and Outlier Detection; John Wiley & Sons, Inc.: Hoboken, NJ, USA, 2005; pp. 197–215. [Google Scholar]
- Dalal, N.; Triggs, B. Histograms of oriented gradients for human detection. In Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), San Diego, CA, USA, 20–26 June 2005; Volume 1, pp. 886–893. [Google Scholar]
- Hu, Q.; Paisitkriangkrai, S.; Shen, C.; van den Hengel, A.; Porikli, F. Fast Detection of Multiple Objects in Traffic Scenes With a Common Detection Framework. IEEE Trans. Intell. Transp. Syst. 2016, 17, 1002–1014. [Google Scholar] [CrossRef] [Green Version]
- Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
- Deng, J.; Dong, W.; Socher, R.; Li, L.J.; Li, K.; Fei-Fei, L. ImageNet: A Large-Scale Hierarchical Image Database. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami Beach, FL, USA, 20–25 June 2009. [Google Scholar]
- Tan, C.; Sun, F.; Kong, T.; Zhang, W.; Yang, C.; Liu, C. A Survey on Deep Transfer Learning. arXiv 2018, arXiv:cs.LG/1808.01974. [Google Scholar]
- Wang, Q.; Zhang, L.; Bertinetto, L.; Hu, W.; Torr, P.H. Fast online object tracking and segmentation: A unifying approach. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–20 June 2019; pp. 1328–1338. [Google Scholar]
- Perazzi, F.; Pont-Tuset, J.; McWilliams, B.; Van Gool, L.; Gross, M.; Sorkine-Hornung, A. A Benchmark Dataset and Evaluation Methodology for Video Object Segmentation. In Proceedings of the Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 26–30 June 2016. [Google Scholar]
- Bisong, E. Google Colaboratory. In Building Machine Learning and Deep Learning Models on Google Cloud Platform; Apress: Berkeley, CA, USA, 2019; pp. 59–64. [Google Scholar]
- Lai, K.; Bo, L.; Ren, X.; Fox, D. A large-scale hierarchical multi-view RGB-D object dataset. In Proceedings of the 2011 IEEE International Conference on Robotics and Automation (ICRA), Shanghai, China, 9–13 May 2011; pp. 1817–1824. [Google Scholar]
AUC avg. | ||||
AUC st.dev. | 0.02 | 0.01 | 0.02 | 0.06 |
AVGPR avg. | ||||
AVGPR st.dev. | 0.02 | 0.00 | 0.02 | 0.06 |
AUC avg. | ||||
AUC st.dev. | 0.07 | 0.04 | 0.04 | 0.06 |
AVGPR avg. | ||||
AVGPR st.dev. | 0.10 | 0.05 | 0.07 | 0.07 |
BS + GC: | BS: | SiamMask | |
---|---|---|---|
AUC avg. | |||
AUC st.dev. | 0.02 | 0.04 | 0.03 |
AVGPR avg. | |||
AVGPR st.dev. | 0.02 | 0.05 | 0.04 |
HOG24(300) | HOG24(600) | HOG24(1200) | HOG24(2400) | |
---|---|---|---|---|
AUC avg. | ||||
AUC st.dev. | 0.04 | 0.01 | 0.01 | 0.01 |
AVGPR avg. | ||||
AVGPR st.dev. | 0.05 | 0.00 | 0.01 | 0.01 |
training (sec.) | 33 | 49 | 92 | 170 |
detection(msec/fr.) | 58 | 63 | 65 | 63 |
HOG32(300) | HOG32(600) | HOG32(1200) | HOG32(2400) | |
---|---|---|---|---|
AUC avg. | ||||
AUC st.dev. | 0.05 | 0.04 | 0.02 | 0.01 |
AVGPR avg. | ||||
AVGPR st.dev. | 0.10 | 0.04 | 0.02 | 0.00 |
training (sec.) | 44 | 68 | 171 | 353 |
detection (msec/fr.) | 70 | 77 | 107 | 100 |
VGG16(32) | VGG16(64) | VGG16(128) | VGG16(224) | HOG(2400) | |
---|---|---|---|---|---|
AUC avg. | |||||
AUC st.dev. | 0.06 | 0.01 | 0.01 | 0.05 | 0.01 |
AVGPR avg. | |||||
AVGPR st.dev. | 0.08 | 0.03 | 0.03 | 0.06 | 0.00 |
training (sec.) | 60 | 74 | 140 | 223 | 353 |
detection(msec./fr.) | 118 | 225 | 466 | 1457 | 100 |
Sequence | GT | Hit | Skip | Miss | Fragmentation |
---|---|---|---|---|---|
146 | 1 | 3 | 0 | 0 | 3 |
154 | 3 | 3 | 0 | 2 | 1 |
162 (A) | 1 | 1 | 0 | 0 | 1 |
162 (B) | 8 | 14 | 0 | 3 | 1.75 |
163 | 2 | 2 | 0 | 0 | 1 |
164 | 3 | 5 | 0 | 0 | 1.67 |
168 | 1 | 1 | 0 | 3 | 1 |
total | 19 | 29 | 0 | 8 | 1.53 |
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Wilkowski, A.; Stefańczyk, M.; Kasprzak, W. Training Data Extraction and Object Detection in Surveillance Scenario. Sensors 2020, 20, 2689. https://doi.org/10.3390/s20092689
Wilkowski A, Stefańczyk M, Kasprzak W. Training Data Extraction and Object Detection in Surveillance Scenario. Sensors. 2020; 20(9):2689. https://doi.org/10.3390/s20092689
Chicago/Turabian StyleWilkowski, Artur, Maciej Stefańczyk, and Włodzimierz Kasprzak. 2020. "Training Data Extraction and Object Detection in Surveillance Scenario" Sensors 20, no. 9: 2689. https://doi.org/10.3390/s20092689