Siamese Tracking from Single Point Initialization
<p>Block diagram of the main steps.</p> "> Figure 2
<p>Results of both bounding box proposal method and object contour proposal method: (<b>a</b>) Bounding box proposal; (<b>b</b>) Object contour proposal.</p> "> Figure 3
<p>An example of false tracking due to target and background mixing: the green box indicates ground truth and the purple one indicates tracking result.</p> "> Figure 4
<p>Architecture of the proposed contour detection network.</p> "> Figure 5
<p>Results of different methods on an image taken by DARPA.</p> "> Figure 6
<p>Results of our contour detection method: (<b>a</b>) Original image; (<b>b</b>) Contour.</p> "> Figure 7
<p>Sketch of our template extraction module: (<b>a</b><b>)</b> Original image; (<b>b</b>) Contour; (<b>c</b>) Mask: (<b>d</b>) Target.</p> "> Figure 8
<p>The schematic diagram of our feature extraction network’s structure: black boxes denote convolutional layers and red boxes denote max pooling layers.</p> "> Figure 9
<p>Main framework of our fully convolutional Siamese network.</p> "> Figure 10
<p>Results of the top 10 trackers of OTB100 vehicle videos: (<b>a</b>) Distance precision based on one-pass evaluation (OPE); (<b>b</b>) Success rate based on OPE.</p> "> Figure 11
<p>Results on images in datasets taken by DARPA: (<b>a</b>–<b>e</b>) are five groups of comparison between the tracking results of SiamFC and our method. In each group, the left images are results from SiamFC and right images are results from our method. For each method, the image at the bottom is the tracking result in the origin frame, the image at the top left corner is the partially enlarged detail of the tracking result in the current frame, and the image at the top right corner is the corresponding score map that denotes the similarity. All the green boxes denote the ground truth and all the red boxes denote the tracking results.</p> "> Figure 11 Cont.
<p>Results on images in datasets taken by DARPA: (<b>a</b>–<b>e</b>) are five groups of comparison between the tracking results of SiamFC and our method. In each group, the left images are results from SiamFC and right images are results from our method. For each method, the image at the bottom is the tracking result in the origin frame, the image at the top left corner is the partially enlarged detail of the tracking result in the current frame, and the image at the top right corner is the corresponding score map that denotes the similarity. All the green boxes denote the ground truth and all the red boxes denote the tracking results.</p> "> Figure 12
<p>Statistical results of SiamFC and our method in DARPA VIVID datasets: (<b>a</b>–<b>e</b>) are centering errors in five different video sequences; (<b>f</b>) is the average centering error.</p> "> Figure 12 Cont.
<p>Statistical results of SiamFC and our method in DARPA VIVID datasets: (<b>a</b>–<b>e</b>) are centering errors in five different video sequences; (<b>f</b>) is the average centering error.</p> "> Figure 12 Cont.
<p>Statistical results of SiamFC and our method in DARPA VIVID datasets: (<b>a</b>–<b>e</b>) are centering errors in five different video sequences; (<b>f</b>) is the average centering error.</p> ">
Abstract
:1. Introduction
2. Related Work
2.1. Algorithm Design
2.2. Bounding Box Proposal vs. Object Contour Proposal
2.3. Siamese Network
3. Results
3.1. Contour Detection Network Architecture
3.2. Template Extraction Module
3.3. Fully Convolutional Siamese Network Architecture
3.4. Details of Training End-To-End
4. Experiments and Results
4.1. Results on OTB100
4.2. Results on DARPA VIVID Datasets
4.2.1. Qualitative Evaluation
4.2.2. Quantitative Evaluation
5. Conclusions
Author Contributions
Conflicts of Interest
References
- Tang, S.; Andriluka, M.; Andres, B.; Schiele, B. Multiple people tracking by lifted multicut and person reidentification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 3701–3710. [Google Scholar]
- Lee, K.H.; Hwang, J.N. On-road pedestrian tracking across multiple driving recorders. IEEE Trans. Multimed. 2015, 17, 1429–1438. [Google Scholar] [CrossRef]
- Mei, X.; Ling, H. Robust visual tracking using l1 minimization. In Proceedings of the 2009 IEEE 12th International Conference on Computer Vision, Kyoto, Japan, 29 September–2 October 2009. [Google Scholar]
- Mei, X.; Ling, H.; Wu, Y.; Blasch, E.; Bai, L. Minimum error bounded efficient l1 tracker with occlusion detection. In Proceedings of the CVPR, Colorado Springs, CO, USA, 20–25 June 2011. [Google Scholar]
- Zhang, T.; Ghanem, B.; Liu, S.; Ahuja, N. Robust visual tracking via multi-task sparse learning. In Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, USA, 16–21 June 2012. [Google Scholar]
- Ma, C.; Huang, J.; Yang, X.; Yang, M. Hierarchical convolutional features for visual tracking. In Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015. [Google Scholar]
- Zhang, K.; Zhang, L.; Liu, Q.; Zhang, D.; Yang, M. Fast visual tracking via dense spatio-temporal context learning. In Proceedings of the ECCV, Zurich, Switzerland, 6–12 September 2014. [Google Scholar]
- Yang, B.; Nevatia, R. Online learned discriminative part-based appearance models for multi-human tracking. In Proceedings of the 12th European Conference on Computer Vision, Florence, Italy, 7–13 October 2012. [Google Scholar]
- Shu, G.; Dehghan, A.; Oreifej, O.; Hand, E.; Shah, M. Part-based multiple-person tracking with partial occlusion handling. In Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, USA, 16–21 June 2012. [Google Scholar]
- Huang, Y.; Essa, I. Tracking multiple objects through occlusions. In Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), San Diego, CA, USA, 20–25 June 2005. [Google Scholar]
- Andriluka, M.; Roth, S.; Schiele, B. People-tracking-by-detection and people detection-by-tracking. In Proceedings of the 2008 IEEE Conference on Computer Vision and Pattern Recognition, Anchorage, AK, USA, 23–28 June 2008. [Google Scholar]
- Xing, J.; Ai, H.; Lao, S. Multi-object tracking through occlusions by local tracklets filtering and global tracklets association with detection responses. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009. [Google Scholar]
- Leibe, B.; Schindler, K.; van Gool, L. Coupled detection and trajectory estimation for multi-object tracking. In Proceedings of the 2007 IEEE 11th International Conference on Computer Vision, Rio de Janeiro, Brazil, 14–21 October 2007. [Google Scholar]
- Breitenstein, M.D.; Reichlin, F.; Leibe, B.; Koller-Meier, E.; van Gool, L. Robust tracking-by-detection using a detector confidence particle filter. In Proceedings of the 2009 IEEE 12th International Conference on Computer Vision, Kyoto, Japan, 29 September–2 October 2009. [Google Scholar]
- Zhu, Z.; Wu, W.; Zou, W.; Yan, J. End-to-end flow correlation tracking with spatial-temporal attention. arXiv, 2017; arXiv:1711.01124. [Google Scholar]
- Wu, Y.; Lim, J.; Yang, M.H. Object tracking benchmark. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 37, 1834–1848. [Google Scholar] [CrossRef] [PubMed]
- Grabner, H.; Grabner, M.; Bischof, H. Real-time tracking via on-line boosting. In Proceedings of the British Machine Vision Association (BMVC), Edinburgh, UK, 4–7 September 2006; pp. 47–56. [Google Scholar]
- Adam, A.; Rivlin, E.; Shimshoni, I. Robust fragmentsbased tracking using the integral histogram. In Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06), New York, NY, USA, 17–22 June 2006; pp. 798–805. [Google Scholar]
- Babenko, B.; Yang, M.-H.; Belongie, S. Robust object tracking with online multiple instance learning. IEEE Trans. Pattern Anal. Mach. Intell. 2011, 33, 1619–1632. [Google Scholar] [CrossRef] [PubMed]
- Kalal, Z.; Mikolajczyk, K.; Matas, J. Tracking-learningdetection. IEEE Trans. Pattern Anal. Mach. Intell. 2012, 34, 1409–1422. [Google Scholar] [CrossRef] [PubMed]
- Henriques, J.F.; Caseiro, R.; Martins, P.; Batista, J. Exploiting the circulant structure of tracking-by-detection with kernels. In Proceedings of the 12th European Conference on Computer Vision, Florence, Italy, 7–13 October 2012; pp. 702–715. [Google Scholar]
- Zhang, J.; Ma, S.; Sclaroff, S. Meem: Robust tracking via multiple experts using entropy minimization. In Proceedings of the 13th European Conference, Zurich, Switzerland, 6–12 September 2014; pp. 188–203. [Google Scholar]
- Aeschliman, C.; Park, J.; Kak, A.C. A probabilistic framework for joint segmentation and tracking. In Proceedings of the 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Francisco, CA, USA, 13–18 June 2010; pp. 1371–1378. [Google Scholar]
- Godec, M.; Roth, P.M.; Bischof, H. Hough-based tracking of non-rigid objects. In Proceedings of the 2011 International Conference on Computer Vision, Barcelona, Spain, 6–13 November 2011; pp. 81–88. [Google Scholar]
- Belagiannis, V.; Schubert, F.; Navab, N.; Ilic, S. Segmentation based particle filtering for real-time 2d object tracking. In Proceedings of the 12th European Conference on Computer Vision, Florence, Italy, 7–13 October 2012; pp. 842–855. [Google Scholar]
- Duffner, S.; Garcia, C. Pixeltrack: A fast adaptive algorithm for tracking non-rigid objects. In Proceedings of the 2013 IEEE International Conference on Computer Vision, Sydney, NSW, Australia, 1–8 December 2013; pp. 2480–2487. [Google Scholar]
- Son, J.; Jung, I.; Park, K.; Han, B. Tracking-bysegmentation using online gradient boosting decision tree. In Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015; pp. 3056–3064. [Google Scholar]
- Held, D.; Thrun, S.; Savarese, S. Learning to track at 100fps with deep regression networks. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 11–14 October 2016; pp. 749–765. [Google Scholar]
- Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. In Proceedings of the International Conference on Learning Representations (ICLR), San Diego, CA, USA, 7–9 May 2015. [Google Scholar]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. In Proceedings of the Neural Information Processing Systems (NIPS), Lake Tahoe, NV, USA, 3–8 December 2012. [Google Scholar]
- Pont-Tuset, J.; Gool, L.J.V. Boosting object proposals: From Pascal to COCO. In Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015. [Google Scholar]
- Hosang, J.; Benenson, R.; Doll´ar, P.; Schiele, B. What makes for effective detection proposals? IEEE Trans. Pattern Anal. Mach. Intell. 2015, 38, 814–830. [Google Scholar] [CrossRef] [PubMed]
- Gordon, D.; Farhadi, A.; Fox, D. Re3: Real-time recurrent regression networks for object tracking. arXiv, 2017; arXiv:1705.06368. [Google Scholar]
- Bertinetto, L.; Valmadre, J.; Henriques, J.F.; Vedaldi, A.; Torr, P.H.S. Fully-convolutional siamese networks for object tracking. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 11–14 October 2016; pp. 850–865. [Google Scholar]
- Xu, T.; Feng, Z.H.; Wu, X.J.; Kittler, J. Learning adaptive discriminative correlation_filters via temporal consistency preserving spatial feature selection for robust visual tracking. arXiv, 2018; arXiv:1807.11348. [Google Scholar]
- He, A.; Luo, C.; Tian, X.; Zeng, W. A twofold siamese network for real-time object tracking. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–22 June 2018. [Google Scholar]
- Xie, S.; Tu, Z. Holistically-nested edge detection. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015. [Google Scholar]
- Pinheiro, P.O.; Lin, T.-Y.; Collobert, R.; Dollar, P. Learning to refine object segments. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 11–14 October 2016. [Google Scholar]
- Xu, Z.; Luo, H.; Hui, B.; Chang, Z. Contour detection using an improved holistically-nested edge detection network. Proc. SPIE 2018, 10835, 1083503. [Google Scholar]
- Luo, W.; Schwing, A.G.; Urtasun, R. Efficient deep learning for stereo matching. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 5695–5703. [Google Scholar]
- Russakovsky, O.; Deng, J.; Su, H.; Krause, J.; Satheesh, S.; Ma, S.; Huang, Z.; Karpathy, A.; Khosla, A.; Bernstein, M.; et al. ImageNet Large Scale Visual Recognition Challenge. Int. J. Comput. Vis. 2015, 115, 211–252. [Google Scholar] [CrossRef] [Green Version]
Layer | C1_2 | P1 | C2_2 | P2 | C3_3 | P3 | C4_3 | P4 | C5_3 |
---|---|---|---|---|---|---|---|---|---|
RF size | 5 | 6 | 14 | 16 | 40 | 44 | 92 | 100 | 196 |
stride | 1 | 2 | 2 | 4 | 4 | 8 | 8 | 16 | 16 |
layer | C1 | P1 | C2 | P2 | C3 | C4 | C5 |
---|---|---|---|---|---|---|---|
RF size | 11 | 15 | 31 | 39 | 71 | 103 | 135 |
stride | 4 | 8 | 8 | 16 | 16 | 16 | 16 |
© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Xu, Z.; Luo, H.; Hui, B.; Chang, Z. Siamese Tracking from Single Point Initialization. Sensors 2019, 19, 514. https://doi.org/10.3390/s19030514
Xu Z, Luo H, Hui B, Chang Z. Siamese Tracking from Single Point Initialization. Sensors. 2019; 19(3):514. https://doi.org/10.3390/s19030514
Chicago/Turabian StyleXu, Zheng, Haibo Luo, Bin Hui, and Zheng Chang. 2019. "Siamese Tracking from Single Point Initialization" Sensors 19, no. 3: 514. https://doi.org/10.3390/s19030514