Personnel Monitoring in Shipboard Surveillance Using Improved Multi-Object Detection and Tracking Algorithm
<p>Structure of the tracking algorithm.</p> "> Figure 2
<p>Improved YOLOv8 network architecture.</p> "> Figure 3
<p>BRA structure.</p> "> Figure 4
<p>Network structure diagram of BiFormer and C2f.</p> "> Figure 5
<p>Comparison diagram of Bottleneck structure.</p> "> Figure 6
<p>Comparison diagram of C2f structure.</p> "> Figure 7
<p>Part OSNet backbone network schematic.</p> "> Figure 8
<p>OSNet foundation building blocks schematic.</p> "> Figure 9
<p>Operation flow of Part OSNet.</p> "> Figure 10
<p>Example of autonomous datasets.</p> "> Figure 11
<p>Results of Grad-CAM heat map visualization.</p> "> Figure 12
<p>Performance comparison of object-detection algorithm.</p> "> Figure 13
<p>Object Detection comparison experiments (green box in the figure indicates a missed target, white circle circled for redundant background).</p> "> Figure 14
<p>Performance comparison of tracking algorithm on the Bohai Sea Ro-Ro Ship Dataset.</p> "> Figure 15
<p>Performance comparison of tracking algorithm on MOT17.</p> "> Figure 16
<p>Results of multi-object-tracking algorithms (the green dashed line indicates a missed target, the circle indicates an ID error or skip, the blue dashed line indicates an incorrectly tracked target, and the yellow box indicates a misdetected target).</p> ">
Abstract
:1. Introduction
- 1.
- In response to the high density of people onboard, personnel occlusion, and the prevalence of small targets in locations such as decks, this study integrates YOLOv8 with a transformer by introducing the BiFormer module. BiFormer builds upon visual transformer architecture and achieves flexible sparse attention allocation through the design of bilevel routing. This approach enhances the detection capability of small and occluded targets while maintaining relatively low parameters.
- 2.
- To address the issue of model lightweight design, an improvement was made to the C2f block of YOLOv8. This enhancement involves adopting a lightweight design and introducing the RepGhost. By incorporating lightweight design principles and reparameterization techniques, the model maintains accuracy while improving inference speed.
- 3.
- To tackle the complexities of personnel movement trajectories and the challenges of occlusion, a refined approach introduces the Part OSNet network to extract richer appearance information. Part OSNet maintains both full-scale feature-extraction capabilities and a lightweight structure while leveraging four distinct pooling branches to extract part-level features. This refined framework combines motion-prediction insights from Kalman filters, detection data from an enhanced YOLOv8 model, and detailed appearance features from Part OSNet. The result is a sophisticated multi-object-tracking system.
- 4.
- This study independently constructs a Bohai Sea Ro-Ro Ship Dataset for multi-object detection and tracking. The dataset is taken from a shipboard surveillance video of the Bohai Sea Ro-Ro Ship, which contains different scenes such as the cabin, deck, and cockpit. A large number of comparison experiments show that the improved YOLOv8 significantly improves the speed and accuracy under the premise of parameter lightweighting. As for multi-object tracking, this study is tested on both the Bohai Sea Ro-Ro Ship Dataset and the MOT17 dataset, and the results show that this study’s method significantly improves the HOTA, MOTA, MOTP, IDF1, and IDSwitch.
2. Related Works
3. Our Approach
3.1. BR-YOLO
3.1.1. BiFormer
3.1.2. RepGhost-C2f
3.2. OSNet Based on Part-Level Feature
4. Experiment and Analysis
4.1. Experimental Platform and Parameter Settings
4.2. Evaluation Metrics
4.2.1. Object-Detection Evaluation Metrics
4.2.2. Multi-Object-Tracking Evaluation Metrics
4.3. Dataset Construction
4.4. Object-Detection Experiments
4.4.1. Comparison of Different Attention Mechanisms
4.4.2. Quantitative Comparison Experiments
4.4.3. Ablation Experiments of Object Detection
4.4.4. Qualitative Comparison Experiments
4.5. Tracking Algorithms Experiments
4.5.1. Quantitative Comparison Experiments
4.5.2. Ablation Experiments of Tracking
4.5.3. Qualitative Comparison Experiments
5. Conclusions
- 1.
- A multi-object detection and tracking dataset is constructed for shipboard surveillance scenarios. This dataset is derived from surveillance videos in the Bohai Sea Ro-Ro Ship, serving as both training and testing data for the models and providing essential data support for the research conducted.
- 2.
- This study meticulously examines the constraints of existing methodologies in shipboard surveillance settings and introduces targeted enhancements. For object detection, BR-YOLO is proposed as a detector for multi-object tracking. In response to the problem of dense population density and severe occlusion in the video images of shipboard surveillance, within BR-YOLO, the BiFormer module in its shallow layers aims to amplify the detection capabilities, particularly for small and densely packed objects. Moreover, the RepGhost-C2f module is introduced to optimize model parameters and maintain the accuracy and speed of reasoning to make the model easy to deploy on ships. In order to improve the negative effects of masking and the complexity of the movement trajectories of personnel on ships, the Part OSNet network is presented. This network employs four distinct pooling branches to extract features across various scales, thereby integrating a more comprehensive range of appearance information.
- 3.
- In comparative experiments conducted on the Bohai Sea Ro-Ro Ship Dataset, the proposed approach demonstrates superiority over existing mainstream methods. In terms of object detection, compared with YOLOv8, the precision, mAP0.5, and mAP0.5:0.95 of BR-YOLO increased by 0.7, 1.1, and 1.2 percentage points, respectively, and the parameters decreased by 12%. In terms of tracking, compared with the original algorithm, the method proposed in this article improved the HOTA, MOTA, MOTP, and IDF1 by 10.66, 9.65, 6.42, and 11.4 percentage points, respectively, and reduced IDs by 13.3%. Moreover, the method also shows significant improvement on the MOT17 dataset.
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Qu, J.; Liu, R.W.; Guo, Y.; Lu, Y.; Su, J.; Li, P. Improving maritime traffic surveillance in inland waterways using the robust fusion of AIS and visual data. Ocean Eng. 2023, 275, 114198. [Google Scholar] [CrossRef]
- Li, X.; He, Y.; Zhu, W.; Qu, W.; Li, Y.; Li, C.; Zhu, B. Split_ Composite: A Radar Target Recognition Method on FFT Convolution Acceleration. Sensors 2024, 24, 4476. [Google Scholar] [CrossRef] [PubMed]
- Ogunrinde, I.; Bernadin, S. Improved DeepSORT-Based Object Tracking in Foggy Weather for AVs Using Sematic Labels and Fused Appearance Feature Network. Sensors 2024, 24, 4692. [Google Scholar] [CrossRef]
- Zhai, X.; Wei, H.; Wu, H.; Zhao, Q.; Huang, M. Multi-target tracking algorithm in aquaculture monitoring based on deep learning. Ocean Eng. 2023, 289, 116005. [Google Scholar] [CrossRef]
- Ciaparrone, G.; Sánchez, F.L.; Tabik, S.; Troiano, L.; Tagliaferri, R.; Herrera, F. Deep learning in video multi-object tracking: A survey. Neurocomputing. Neurocomputing 2020, 381, 61–88. [Google Scholar] [CrossRef]
- Yang, F.; Wang, Z.; Wu, Y.; Sakti, S.; Nakamura, S. Tackling multiple object tracking with complicated motions—Re-designing the integration of motion and appearance. Image Vision Comput. 2022, 124, 104514. [Google Scholar] [CrossRef]
- Chen, C.; Guo, Z.; Zeng, H.; Xiong, P.; Dong, J. Repghost: A hardware-efficient ghost module via re-parameterization. arXiv 2022, arXiv:2211.06088. [Google Scholar] [CrossRef]
- Zheng, X.; Cui, H.; Lu, X. Multiple source domain adaptation for multiple object tracking in satellite video. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5626911. [Google Scholar] [CrossRef]
- Li, Z.; Wang, R.; Li, H.; Wei, B.; Shi, Y.; Ling, H.; Zheng, H. Hierarchical clustering and refinement for generalized multi-camera person tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 18–22 June 2023; pp. 5520–5529. [Google Scholar] [CrossRef]
- Li, L.; Li, Y.; Yue, C.; Xu, G.; Wang, H.; Feng, X. Real-time underwater target detection for AUV using side scan sonar images based on deep learning. Appl. Ocean Res. 2023, 138, 103630. [Google Scholar] [CrossRef]
- Lyu, Z.; Wang, C.; Sun, X.; Zhou, Y.; Ni, X.; Yu, P. Real-time ship detection system for wave glider based on YOLOv5s-lite-CBAM model. Appl. Ocean Res. 2024, 144, 103833. [Google Scholar] [CrossRef]
- Bochkovskiy, A.; Wang, C.-Y.; Liao, H.-Y.M. Yolov4: Optimal speed and accuracy of object detection. arXiv 2020, arXiv:2004.10934. [Google Scholar] [CrossRef]
- Xu, H.H.; Wang, X.Q.; Wang, D.; Duan, B.G.; Rui, T. Object detection in crowded scenes via joint prediction. Def. Technol. 2023, 21, 103–115. [Google Scholar] [CrossRef]
- Dong, W.; Pan, L.; Zhang, Q.; Zhang, W. Athlete target detection method in dynamic scenario based on nonlinear filtering and YOLOv5. Alexandria Eng. J. 2023, 82, 208–217. [Google Scholar] [CrossRef]
- Bewley, A.; Ge, Z.; Ott, L.; Ramos, F.; Upcroft, B. Simple online and realtime tracking. In Proceedings of the IEEE International Conference on Image Processing (ICIP), Phoenix, AZ, USA, 25–28 September 2016; pp. 3464–3468. [Google Scholar] [CrossRef]
- Wojke, N.; Bewley, A.; Paulus, D. Simple online and realtime tracking with a deep association metric. In Proceedings of the 2017 IEEE International Conference on Image Processing (ICIP), Beijing, China, 17–20 September 2017; pp. 3645–3649. [Google Scholar] [CrossRef]
- Cao, J.; Pang, J.; Weng, X.; Khirodkar, R.; Kitani, K. Observation-centric sort: Rethinking sort for robust multi-object tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 18–22 June 2023; pp. 9686–9696. [Google Scholar] [CrossRef]
- Maggiolino, G.; Ahmad, A.; Cao, J.; Kitani, K. Deep oc-sort: Multi-pedestrian tracking by adaptive re-identification. In Proceedings of the IEEE International Conference on Image Processing, Kuala Lumpur, Malaysia, 8–11 October 2023; pp. 3025–3029. [Google Scholar] [CrossRef]
- Yang, Z.; Huang, Z.; He, D.; Zhang, T.; Yang, F. Dynamic representation-based tracker for long-term pedestrian tracking with occlusion. J. Visual Commun. Image Represent. 2023, 90, 103710. [Google Scholar] [CrossRef]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar] [CrossRef]
- Zhou, K.; Yang, Y.; Cavallaro, A.; Xiang, T. Omni-scale feature learning for person re-identification. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 3702–3712. [Google Scholar] [CrossRef]
- Yuan, Y.; Wu, Y.; Zhao, L.; Chen, H.; Zhang, Y. Multiple object detection and tracking from drone videos based on GM-YOLO and multi-tracker. Image Vision Comput. 2024, 143, 104951. [Google Scholar] [CrossRef]
- Wang, S. Research towards yolo-series algorithms: Comparison and analysis of object detection models for real-time UAV applications. J. Phys. Conf. Ser. 2021, 1948, 012021. [Google Scholar] [CrossRef]
- Jia, Z.; Su, X.; Ma, G.; Dai, T.; Sun, J. Crack identification for marine engineering equipment based on improved SSD and YOLOv5. Ocean Eng. 2023, 268, 113534. [Google Scholar] [CrossRef]
- Sohan, M.; Sai Ram, T.; Reddy, R.; Venkata, C. A Review on YOLOv8 and Its Advancements. In Proceedings of the International Conference on Data Intelligence and Cognitive Informatics, Tirunelveli, India, 18–20 November 2024; pp. 529–545. [Google Scholar] [CrossRef]
- He, L.; Liao, X.; Liu, W.; Liu, X.; Cheng, P.; Mei, T. Fastreid: A pytorch toolbox for general instance re-identification. In Proceedings of the 31st ACM International Conference on Multimedia, Ottawa, ON, Canada, 29 October–3 November 2023; pp. 9664–9667. [Google Scholar] [CrossRef]
- Herzog, F.; Ji, X.; Teepe, T.; Hörmann, S.; Gilg, J.; Rigoll, G. Lightweight multi-branch network for person re-identification. In Proceedings of the IEEE International Conference on Image Processing (ICIP), Anchorage, AK, USA, 19–22 September 2021; pp. 1129–1133. [Google Scholar] [CrossRef]
- Jiao, S.; Pan, Z.; Hu, G.; Shen, Q.; Du, L.; Chen, Y.; Wang, J. Multi-scale and multi-branch feature representation for person re-identification. Neurocomputing 2020, 414, 120–130. [Google Scholar] [CrossRef]
- Zhu, L.; Wang, X.; Ke, Z.; Zhang, W.; Lau, R.W. Biformer: Vision transformer with bi-level routing attention. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 17–24 June 2023; pp. 10323–10333. [Google Scholar] [CrossRef]
- Han, K.; Wang, Y.; Tian, Q.; Guo, J.; Xu, C.; Xu, C. Ghostnet: More features from cheap operations. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 1580–1589. [Google Scholar] [CrossRef]
- Han, K.; Wang, Y.; Xu, C.; Guo, J.; Xu, C.; Wu, E.; Tian, Q. GhostNets on heterogeneous devices via cheap operations. Int. J. Comput. Vision 2022, 130, 1050–1069. [Google Scholar] [CrossRef]
- Chen, X.; Jia, Y.; Tong, X.; Li, Z. Research on pedestrian detection and deepsort tracking in front of intelligent vehicle based on deep learning. Sustainability 2022, 14, 9281. [Google Scholar] [CrossRef]
- Zhang, R.; Tan, J.; Cao, Z.; Xu, L.; Liu, Y.; Si, L.; Sun, F. Part-Aware Correlation Networks for Few-shot Learning. IEEE Trans. Multimedia 2024, 1–13. [Google Scholar] [CrossRef]
- Hermans, A.; Beyer, L.; Leibe, B. In defense of the triplet loss for person re-identification. arXiv 2017, arXiv:1703.07737. [Google Scholar] [CrossRef]
- Zhang, R.; Cao, Z.; Yang, S.; Si, L.; Sun, H.; Xu, L.; Sun, F. Cognition-Driven Structural Prior for Instance-Dependent Label Transition Matrix Estimation. IEEE Trans. Neural Networks Learn. Syst. 2024. [Google Scholar] [CrossRef] [PubMed]
- Aharon, N.; Orfaig, R.; Bobrovsky, B.Z. BoT-SORT: Robust associations multi-pedestrian tracking. arXiv 2022, arXiv:2206.14651. [Google Scholar] [CrossRef]
- Du, Y.; Zhao, Z.; Song, Y.; Zhao, Y.; Su, F.; Gong, T.; Meng, H. StrongSORT: Make DeepSORT Great Again. arXiv 2022, arXiv:2202.13514. [Google Scholar] [CrossRef]
- Zhang, Y.; Sun, P.; Jiang, Y.; Yu, D.; Weng, F.; Yuan, Z.; Wang, X. Bytetrack: Multi-object tracking by associating every detection box. In Proceedings of the European Conference on Computer Vision (ECCV), Tel Aviv, Israel, 23–27 October 2022; pp. 1–21. [Google Scholar] [CrossRef]
Configuration | Versions |
---|---|
Operation system | Windows 10 |
CPU | AMD EPYC 7542 32-Core Processor |
GPU | NVIDIA GeForce RTX 4090 |
RAM | 128 GB |
IDE | Pycharm2021.3.3 |
Compiler | Python3.8.16 |
Framework | Pytorch1.8.1 |
Toolkit | CUDA11.1 + cuDNN8.8.1 |
Image Size | Epoch | Batch Size | Initial Learning Rate |
---|---|---|---|
640 × 640 | 500 | 16 | 0.002 |
Measure | Better | Perfect |
---|---|---|
HOTA | higher | 100% |
MOTA | higher | 100% |
MOTP | higher | 100% |
IDF1 | higher | 100% |
IDS | lower | 0 |
No. | Location | Resolution | FPS |
---|---|---|---|
01 | Forward stairway of deck 2 | ||
02 | Forward stairway of deck 5 | ||
03 | Forward stairway of deck 9 | ||
04 | Port side of deck 10 | 1920 × 1080 | 25 |
05 | Starboard side of deck 9 | ||
06 | Starboard side of the bridge | ||
07 | Port side of the bridge | ||
08 | Bridge |
Attention Mechanism | P | R | mAP0.5 | mAP0.5:0.95 | Param |
---|---|---|---|---|---|
None | 0.827 | 0.702 | 0.792 | 0.500 | 3.01 M |
CBAM | 0.835 | 0.693 | 0.790 | 0.495 | 5.77 M |
ECA | 0.832 | 0.709 | 0.796 | 0.510 | 5.77 M |
SE | 0.829 | 0.698 | 0.791 | 0.497 | 4.26 M |
GC | 0.828 | 0.696 | 0.79 | 0.493 | 3.95 M |
BiFormer | 0.846 | 0.702 | 0.805 | 0.512 | 3.08 M |
Detector | P | R | mAP0.5 | mAP0.5:0.95 | Param | Time |
---|---|---|---|---|---|---|
YOLOv3 | 0.762 | 0.604 | 0.697 | 0.382 | 65.2 M | 9.0 |
YOLOv5s | 0.812 | 0.635 | 0.734 | 0.449 | 7.4 M | 6.4 |
YOLOv7 | 0.838 | 0.720 | 0.799 | 0.512 | 36.72 M | 7.7 |
YOLOv8n | 0.827 | 0.701 | 0.792 | 0.500 | 3.01 M | 3.8 |
BR-YOLO | 0.834 | 0.701 | 0.803 | 0.512 | 2.65 M | 3.8 |
Detector | Index | Modules | P | R | mAP0.5 | mAP0.5:0.95 | Param | Time | |
---|---|---|---|---|---|---|---|---|---|
BiFormer | RepGhost | ||||||||
YOLOv8n | 1 | - | - | 0.827 | 0.702 | 0.792 | 0.500 | 3.01 M | 3.8 |
2 | ✓ | - | 0.839 | 0.702 | 0.805 | 0.512 | 3.08 M | 4.6 | |
3 | - | ✓ | 0.832 | 0.694 | 0.790 | 0.500 | 2.30 M | 3.0 | |
4 | ✓ | ✓ | 0.834 | 0.701 | 0.803 | 0.512 | 2.65 M | 3.8 |
Detector–Tracker | HOTA | MOTA | MOTP | IDF1 | IDs |
---|---|---|---|---|---|
YOLOv8-BoT-SORT | 31.608 | 41.704 | 74.859 | 43.330 | 44 |
YOLOv8-OC-SORT | 33.168 | 43.391 | 74.708 | 46.049 | 17 |
YOLOv8-StrongSORT | 33.564 | 44.162 | 74.869 | 46.569 | 19 |
YOLOv8-ByteTrack | 31.134 | 42.658 | 74.828 | 43.203 | 19 |
YOLOv8-Deep OC-SORT | 33.585 | 43.993 | 74.872 | 46.881 | 15 |
Ours | 44.243 | 53.642 | 81.295 | 58.283 | 13 |
Detector–Tracker | HOTA | MOTA | MOTP | IDF1 | IDs |
---|---|---|---|---|---|
YOLOv8-BoT-SORT | 63.449 | 57.371 | 80.004 | 73.732 | 91 |
YOLOv8- OC-SORT | 63.452 | 57.248 | 79.993 | 73.716 | 74 |
YOLOv8- StrongSORT | 63.586 | 57.264 | 80.261 | 73.266 | 75 |
YOLOv8- ByteTrack | 58.381 | 48.771 | 81.233 | 65.753 | 68 |
YOLOv8-Deep OC-SORT | 63.740 | 57.494 | 80.878 | 74.063 | 69 |
Ours | 65.385 | 58.248 | 82.229 | 76.990 | 56 |
Tracker | Index | Modules | HOTA | MOTA | MOTP | IDF1 | IDs | |
---|---|---|---|---|---|---|---|---|
BR-YOLO | Part OSNet | |||||||
Deep OC-SORT | 1 | - | - | 33.5847 | 43.9931 | 74.6716 | 46.8818 | 15 |
2 | ✓ | - | 38.0400 | 45.9014 | 80.4762 | 47.6858 | 19 | |
3 | - | ✓ | 42.8900 | 52.0822 | 75.3015 | 56.0410 | 11 | |
4 | ✓ | ✓ | 44.2430 | 53.6423 | 81.2956 | 58.2834 | 13 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Li, Y.; Zhang, B.; Liu, Y.; Wang, H.; Zhang, S. Personnel Monitoring in Shipboard Surveillance Using Improved Multi-Object Detection and Tracking Algorithm. Sensors 2024, 24, 5756. https://doi.org/10.3390/s24175756
Li Y, Zhang B, Liu Y, Wang H, Zhang S. Personnel Monitoring in Shipboard Surveillance Using Improved Multi-Object Detection and Tracking Algorithm. Sensors. 2024; 24(17):5756. https://doi.org/10.3390/s24175756
Chicago/Turabian StyleLi, Yiming, Bin Zhang, Yichen Liu, Huibing Wang, and Shibo Zhang. 2024. "Personnel Monitoring in Shipboard Surveillance Using Improved Multi-Object Detection and Tracking Algorithm" Sensors 24, no. 17: 5756. https://doi.org/10.3390/s24175756