An Integrated Detection and Multi-Object Tracking Pipeline for Satellite Video Analysis of Maritime and Aerial Objects
<p>The architecture of R-FCN. <math display="inline"><semantics> <mrow> <mi>k</mi> <mo>∗</mo> <mi>k</mi> </mrow> </semantics></math> represents the size of RoIs on the position-sensitive score map. C represents the number of categories. The <math display="inline"><semantics> <mrow> <mi>c</mi> <mi>o</mi> <mi>n</mi> <mi>v</mi> </mrow> </semantics></math> symbol represents the convolution operation. Softmax is a multi-class classifier.</p> "> Figure 2
<p>Schematic of a tracking method based on a correlation filter.</p> "> Figure 3
<p>The proposed pipeline.</p> "> Figure 4
<p>The flowchart of the proposed pipeline.</p> "> Figure 5
<p>(<b>a</b>,<b>c</b>) The detection results of R-FCN for planes and ships. (<b>b</b>,<b>d</b>) The in-class numbering results according to the detection results.</p> "> Figure 6
<p>The basic structure of MT-KCF. The green box indicates the stationary targets, while the blue box indicates the moving targets.</p> "> Figure 7
<p>An example of multi-target tracking, where the azure lines denote the trajectories of the moving targets.</p> "> Figure 8
<p>The basic structure of DAT. The green boxes indicate the stationary target and the blue boxes indicate the moving targets. The red boxes indicate the R-FCN-based detection results.</p> "> Figure 9
<p>The basic structure of NTR. The blue boxes indicate the moving targets. The red boxes indicate the R-FCN-based detection results.</p> "> Figure 10
<p>The total six data sets. (<b>a</b>) The first frame of Video 1. (<b>b</b>) The first frame of Video 2. (<b>c</b>) The first frame of Video 3. (<b>d</b>) The first frame of Video 4. (<b>e</b>) The first frame of Video 5. (<b>f</b>–<b>h</b>) The first, tenth, and twenty-second frames of Video 6.</p> "> Figure 11
<p>Illustration of a training frame for R-FCN.</p> "> Figure 12
<p>Visualization of the SC_ST tracking results as captured via TDNet<sub>(DAT)</sub> on Video 1 and Video 2.</p> "> Figure 13
<p>Precision plots of the moving target as captured by TDNet<sub>DAT</sub> and the other seven compared methods on (<b>a</b>) Video <b>1</b>, (<b>b</b>) Video <b>2</b>, and (<b>c</b>) Video <b>3</b>.</p> "> Figure 14
<p>Visualisation of the TDNet-based SC_MT tracking results of Video 3 (<b>a</b>) and Video 4 (<b>b</b>).</p> "> Figure 15
<p>Visualisation of the tracking results of the TDNet<sub>(DAT)</sub>-based MC_MT tracking method on Video 5, as well as a visualisation of tracking results of the TDNet<sub>(DAT)</sub>-based NTR tracking method on Video 6. (<b>a</b>) Video 5. (<b>b</b>,<b>c</b>) Video 6.</p> ">
Abstract
:1. Introduction
2. Preliminaries
2.1. R-FCN-Based Object Detection Approach
2.2. Kernel Correlation Filter Tracking
2.2.1. Correlation Filter-Based Methods
2.2.2. KCF
3. The Proposed TDNet
3.1. Object Detection
3.1.1. Data Set Composition and Sample Selection
3.1.2. Parameter Setting and Optimisation
3.2. Multi-Target Tracking (MT-KCF)
3.3. Detecting-Assisted Tracking (DAT)
3.4. New Target Recognition (NTR)
4. Experimental Results
4.1. Data Sets
4.2. Evaluation Metrics
4.3. Performance Comparison of the Single-Moving-Target Tracking Experiments
4.4. Performance Comparison of the Multi-Moving-Target Tracking Experiments
4.4.1. SC_MT Tracking
4.4.2. MC_MT Tracking
4.4.3. NTR Tracking
5. Concluding Remarks
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Yuan, Y.; He, G.; Jiang, W.; Wang, G. Application of earth observation system of video satellite. Remote Sens. Land Resour. 2018, 30, 1–8. [Google Scholar]
- Du, Y.; Song, Y.; Yang, B.; Zhao, Y. StrongSORT: Make DeepSORT Great Again. IEEE Trans. Multimed. 2022, 25, 8725–8737. [Google Scholar] [CrossRef]
- He, Q.; Sun, X.; Yan, Z.; Li, B.; Fu, K. Multi-Object Tracking in Satellite Videos with Graph-Based Multitask Modeling. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5619513. [Google Scholar] [CrossRef]
- Zhang, J.; Zhang, X.; Huang, Z.; Cheng, X.; Feng, J.; Jiao, L. Bidirectional Multiple Object Tracking Based on Trajectory Criteria in Satellite Videos. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5603714. [Google Scholar] [CrossRef]
- Xiao, J.; Cheng, H.; Sawhney, H.S.; Han, F. Vehicle detection and tracking in wide field-of-view aerial video. In Proceedings of the 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Francisco, CA, USA, 13–18 June 2010; pp. 679–684. [Google Scholar]
- Prokaj, J.; Medioni, G. Persistent Tracking for Wide Area Aerial Surveillance. In Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Columbus, OH, USA, 23–28 June 2014. [Google Scholar]
- LaLonde, R.; Zhang, D.; Shah, M. ClusterNet: Detecting Small Objects in Large Scenes by Exploiting Spatio-Temporal Information. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 23 June 2018; pp. 4003–4012. [Google Scholar]
- Dai, J.; Li, Y.; He, K.; Sun, J. R-FCN: Object Detection via Region-based Fully Convolutional Networks. In Proceedings of the 30th International Conference on Neural Information Processing Systems, Barcelona, Spain, 5–10 December 2016. [Google Scholar]
- Han, J.; Zhang, D.; Cheng, G.; Guo, L.; Ren, J. Object Detection in Optical Remote Sensing Images Based on Weakly Supervised Learning and High-Level Feature Learning. IEEE Trans. Geosci. Remote Sens. 2015, 53, 3325–3337. [Google Scholar] [CrossRef]
- Zhou, X.; Koltun, V.; Krähenbühl, P. Tracking Objects as Points. In Proceedings of the Computer Vision- ECCV 2020—16th European Conference, Glasgow, UK, 23–28 August 2020, Proceedings, Part IV; Vedaldi, A., Bischof, H., Brox, T., Frahm, J., Eds.; Springer: Berlin/Heidelberg, Germany, 2020; Volume 12349, pp. 474–490. [Google Scholar]
- Lukezic, A.; Vojír, T.; Cehovin, L.; Matas, J.; Kristan, M. Discriminative Correlation Filter with Channel and Spatial Reliability. arXiv 2016, arXiv:1611.08461. [Google Scholar]
- Danelljan, M.; Bhat, G.; Khan, F.S.; Felsberg, M. ECO: Efficient Convolution Operators for Tracking. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, 21–26 July 2017; pp. 6931–6939. [Google Scholar] [CrossRef]
- Bolme, D.S.; Beveridge, J.R.; Draper, B.A.; Lui, Y.M. Visual object tracking using adaptive correlation filters. In Proceedings of the 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Francisco, CA, USA, 13–18 June 2010; pp. 2544–2550. [Google Scholar]
- Babenko, B.; Yang, M.H.; Belongie, S. Robust Object Tracking with Online Multiple Instance Learning. IEEE Trans. Pattern Anal. Mach. Intell. 2011, 33, 1619–1632. [Google Scholar] [CrossRef]
- Rui, C.; Martins, P.; Batista, J. Exploiting the Circulant Structure of Tracking-by-Detection with Kernels. In Proceedings of the ECCV 2012-12th European Conference on Computer Vision, Florence, Italy, 7–13 October 2012; pp. 702–715. [Google Scholar]
- Henriques, J.F.; Rui, C.; Martins, P.; Batista, J. High-Speed Tracking with Kernelized Correlation Filters. IEEE Trans. Pattern Anal. Mach. Intell. 2014, 37, 583–596. [Google Scholar] [CrossRef] [PubMed]
- Hinton, G.E.; Salakhutdinov, R.R. Reducing the dimensionality of data with neural networks. Science 2006, 313, 504–507. [Google Scholar] [CrossRef] [PubMed]
- Jiao, L.; Liu, F. Wishart Deep Stacking Network for Fast POLSAR Image Classification. IEEE Trans. Image Process. Publ. IEEE Signal Process. Soc. 2016, 25, 3273–3286. [Google Scholar] [CrossRef]
- Danelljan, M.; Robinson, A.; Khan, F.S.; Felsberg, M. Beyond Correlation Filters: Learning Continuous Convolution Operators for Visual Tracking. In Proceedings of the ECCV 2016-14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016; pp. 472–488. [Google Scholar]
- Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
- Bay, H.; Ess, A.; Tuytelaars, T.; Van Gool, L. Speeded-Up Robust Features. Comput. Vis. Image Underst. 2008, 110, 404–417. [Google Scholar] [CrossRef]
- Lindeberg, T. Scale invariant feature transform. Scholarpedia 2012, 7, 2012–2021. [Google Scholar] [CrossRef]
- Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. In Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Columbus, OH, USA, 23–28 June 2014; pp. 580–587. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recigniztion. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards real-time object detection with region proposal networks. In Proceedings of the NIPS, Montreal, QC, Canada, 7–12 December 2015; pp. 91–99. [Google Scholar]
- Choi, W. Near-Online Multi-target Tracking with Aggregated Local Flow Descriptor. In Proceedings of the ICCV, Santiago, Chile, 7–13 December 2015; pp. 3029–3037. [Google Scholar]
- Deng, J.; Dong, W.; Socher, R.; Li, L.J.; Li, K.; Li, F.F. ImageNet: A large-scale hierarchical image database. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009. [Google Scholar]
- Redmon, J.; Farhadi, A. YOLO9000: Better, faster, stronger. arXiv 2016, arXiv:1612.08242. [Google Scholar]
- Cheng, G.; Zhou, P.; Han, J. Learning Rotation-Invariant Convolutional Neural Networks for Object Detection in VHR Optical Remote Sensing Images. IEEE Trans. Geosci. Remote Sens. 2016, 54, 7405–7415. [Google Scholar] [CrossRef]
- Bernardin, K.; Stiefelhagen, R. Evaluating Multiple Object Tracking Performance: The CLEAR MOT Metrics. Eurasip J. Image Video Process. 2008, 246309. [Google Scholar] [CrossRef]
- Ristani, E.; Solera, F.; Zou, R.; Cucchiara, R.; Tomasi, C. Performance Measures and a Data Set for Multi-target, Multi-camera Tracking. In Proceedings of the ECCV, Amsterdam, The Netherlands, 11–14 October 2016; pp. 17–35. [Google Scholar]
- Li, Y.; Huang, C.; Nevatia, R. Learning to associate: HybridBoosted multi-target tracker for crowded scene. In Proceedings of the CVPR, Miami, FL, USA, 20–25 June 2009; pp. 2953–2960. [Google Scholar]
- Mueller, M.; Smith, N.; Ghanem, B. Context-Aware Correlation Filter Tracking. In Proceedings of the CVPR, Honolulu, HI, USA, 26 July 2017; pp. 1387–1395. [Google Scholar]
- Xiang, Y.; Alahi, A.; Savarese, S. Learning to Track: Online Multi-object Tracking by Decision Making. In Proceedings of the ICCV, Santiago, Chile, 7–13 December 2015; pp. 4705–4713. [Google Scholar]
The Six Data Sets | ||||||
---|---|---|---|---|---|---|
Video 1 | Video 2 | Video 3 | Video 4 | Video 5 | Video 6 | |
Category | plane | ship | ship | plane | ship and plane | plane |
Moving targets | 1 | 1 | 3 | 3 | 3 | 2 |
Stationary targets | 10 | 11 | 16 | 3 | 0 | 0 |
Total | 11 | 12 | 19 | 6 | 3 | 2 |
Video 1 | |||||||||
CASAMF [33] | CAMOSSE [33] | KCF [16] | CASTAPLE [33] | C-COT [19] | ECO-HC [12] | ECO [12] | TDNet(NO DAT) | TDNet(DAT) | |
Mean P | 74.71% | 77.01% | 89.82% | 92.29% | 92.76% | 93.08% | 93.81% | 89.82% | 94.65% |
Fps | 6.08 | 89.99 | 93.59 | 14.32 | 0.20 | 15.89 | 1.06 | 93.59 | 82.84 |
mAP(od) | - | - | - | - | - | - | - | 100% | 100% |
Video 2 | |||||||||
CASAMF [33] | CAMOSSE [33] | KCF [16] | CASTAPLE [33] | C-COT [19] | ECO-HC [12] | ECO [12] | TDNet(NO DAT) | TDNet(DAT) | |
Mean P | 92.06% | 93.44% | 93.89% | 93.94% | 93.97% | 94.16% | 94.69% | 93.89% | 97.25% |
Fps | 6.02 | 90.49 | 90.60 | 14.85 | 0.15 | 16.89 | 1.05 | 90.60 | 79.34 |
mAP(od) | - | - | - | - | - | - | - | 90.91% | 90.91% |
Video 3 | |||||||||
CASAMF [33] | CAMOSSE [33] | KCF [16] | CASTAPLE [33] | C-COT [19] | ECO-HC [12] | ECO [12] | TDNet(NO DAT) | TDNet(DAT) | |
Mean P | 87.91% | 92.51% | 92.75% | 93.78% | 93.80% | 93.85% | 95.39% | 92.75% | 97.69% |
Fps | 8.64 | 86.43 | 125 | 15.00 | 0.19 | 18.02 | 1.05 | 125 | 97.59 |
mAP(od) | - | - | - | - | - | - | - | 94.74% | 94.74% |
State Statistics | ||||||
---|---|---|---|---|---|---|
Video 1 | Video 2 | Video 3 | Video 4 | Video 5 | Video 6 | |
Category | plane | ship | ship | plane | plane and ship | plane |
TDNetDAT (moving targets) | 1 | 1 | 3 | 3 | 3 | 2 |
TDNetDAT (stationary targets) | 10 | 9 | 15 | 3 | 0 | 0 |
SC_MT Tracking on Video 4 and Video 3 | ||||||||
---|---|---|---|---|---|---|---|---|
Methods | MOTA | MOTP | MT | ML | FP | FN | IDS | Hz |
MDP [34] | 80.21% | 87.31% | 90.37% | 3.21% | 0 | 421 | 3 | 8.52 |
CenterTrack [10] | 82.10% | 87.62% | 91.25% | 2.43% | 0 | 411 | 2 | 21.80 |
TDNet(NO DAT) | 84.62% | 89.41% | 91.56% | 1.06% | 0 | 398 | 0 | 11.36 |
TDNet(DAT) | 85.31% | 91.38% | 93.41% | 0.83% | 0 | 347 | 0 | 10.21 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Su, Z.; Wan, G.; Zhang, W.; Guo, N.; Wu, Y.; Liu, J.; Cong, D.; Jia, Y.; Wei, Z. An Integrated Detection and Multi-Object Tracking Pipeline for Satellite Video Analysis of Maritime and Aerial Objects. Remote Sens. 2024, 16, 724. https://doi.org/10.3390/rs16040724
Su Z, Wan G, Zhang W, Guo N, Wu Y, Liu J, Cong D, Jia Y, Wei Z. An Integrated Detection and Multi-Object Tracking Pipeline for Satellite Video Analysis of Maritime and Aerial Objects. Remote Sensing. 2024; 16(4):724. https://doi.org/10.3390/rs16040724
Chicago/Turabian StyleSu, Zhijuan, Gang Wan, Wenhua Zhang, Ningbo Guo, Yitian Wu, Jia Liu, Dianwei Cong, Yutong Jia, and Zhanji Wei. 2024. "An Integrated Detection and Multi-Object Tracking Pipeline for Satellite Video Analysis of Maritime and Aerial Objects" Remote Sensing 16, no. 4: 724. https://doi.org/10.3390/rs16040724