An Improved YOLOv7 Model Based on Visual Attention Fusion: Application to the Recognition of Bouncing Locks in Substation Power Cabinets
<p>Structure of YOLOv7-tiny model.</p> "> Figure 2
<p>The architecture of MobileViT.</p> "> Figure 3
<p>Structure of CBAM.</p> "> Figure 4
<p>Structure of FF-FPN.</p> "> Figure 5
<p>Overall structure of GF-YOLOv7.</p> "> Figure 6
<p>The opening and closing states of four kinds of substation power cabinet spring locks. (<b>a</b>) THA0; (<b>b</b>) THA1; (<b>c</b>) THB0; (<b>d</b>) THB1; (<b>e</b>) THC0; (<b>f</b>) THC1; (<b>g</b>) THD0; (<b>h</b>) THD1.</p> "> Figure 7
<p>Label fabrication.</p> "> Figure 8
<p>Comparison of test results. (<b>a</b>) Change curve of loss function; (<b>b</b>) Change curve of mAP@0.5:0.95.</p> "> Figure 9
<p>Comparison of single bounce lock detection with a simple background and large target.</p> "> Figure 10
<p>Comparison of mixed bounce lock detection with a complex background.</p> ">
Abstract
:Featured Application
Abstract
1. Introduction
- The MobileViT module is used to improve the feature extraction ability of the YOLOv7 backbone network.
- A lightweight feature fusion network is designed based on the channel attention module and the spatial attention module.
- An improved GF-YOLOv7 network model is proposed for the recognition of substation lock state. The experimental results show that the recognition accuracy can be improved while the weight of the model can be reduced.
2. Related Works
2.1. Principle of YOLOv7 Algorithm
2.2. Self-Attention Mechanism
- (1)
- The local spatial information of the input tensor is learned by using a general convolution;
- (2)
- The output feature map adjustment channel of step (1) is projected into a high-dimensional space through point convolution, and the tensor is obtained by learning global feature transformation through Transformer after expanding the tensor ;
- (3)
- Refold tensor and obtain tensor through point convolution;
- (4)
- After splicing the result of step (1) with tensor , the local features and global features are fused by convolution.
2.3. Channel Attention Mechanism and Spatial Attention Mechanism
3. Proposed Methods
3.1. Improvement of Feature Extraction Network
3.2. Improvement of Feature Fusion Network
- (1)
- The P5 feature layer extracted from the backbone network undergoes a convolution downsampling to obtain the feature layer P6 with a higher level of semantic information.
- (2)
- The feature layer with higher semantic information guides the information fusion of the next feature layer with the help of the attention mechanism. The specific method is to obtain the spatial attention weight of P6, P5, and P4 feature layers through the SA module once, and then splice them with P5, P4, and P3 layers after upsampling. The spliced feature layers are perceived by the channel attention weight through the CA module once, and the feature channels are fused using convolution to obtain the preliminary fused feature maps P’5, P’4, and P’3.
- (3)
- Integrate the low-level feature map to the high level. The bottom feature maps P’4 and P’3 are spliced with the higher-level P’5 and P’4 through downsampling, the channel attention weight is obtained through a CA module, and then the feature channels are fused through convolution to obtain the further fused feature layers P″5, P″4, and P″3.
- (4)
- According to the needs of the detection task, repeat steps (2) and (3) to obtain the final fused feature layers C5, C4, and C3.
3.3. GF-YOLOv7 Network Model
4. Experimental Results
4.1. Implementation Details
4.2. Evaluation Indicators
4.3. Test Results and Analysis of Power Cabinet Spring Lock
5. Conclusions and Future Works
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Zhu, M.; Qin, Q.; Huang, C.; Zhang, W.; Liang, Z.; Chen, J. A Detection Method of Unsafe Behavior in Substation Based on Deep Learning. In Proceedings of the 3rd International Conference on Information Technologies and Electrical Engineering, Changde, China, 3–5 December 2020; pp. 499–502. [Google Scholar]
- Gong, Q.; Li, J.; Luo, Y.; Gu, Q. State Detection Method of Secondary Equipment in Smart Substation Based on Deep Belief Network and Trend Prediction. In Proceedings of the 2019 IEEE Sustainable Power and Energy Conference (iSPEC), Beijing, China, 21–23 November 2019; pp. 2369–2373. [Google Scholar]
- Fu, C.-Z.; Si, W.-R.; Huang, H.; Chen, L.; Gao, Q.-J.; Shi, C.-B.; Wang, C. Research on a Detection and Recognition Algorithm for High-Voltage Switch Cabinet Based on Deep Learning with an Improved YOLOv2 Network. In Proceedings of the 2018 11th International Conference on Intelligent Computation Technology and Automation (ICICTA), Changsha, China, 22–23 September 2018; pp. 346–350. [Google Scholar]
- Song, W.; Liu, X.; Zhao, J.; Wang, M.; Liu, Y. Research on the Intelligent Identification Method of the Substation Equipment Faults Based on Deep Learning. In Proceedings of the 2020 IEEE International Conference on Power, Intelligent Computing and Systems (ICPICS), Shenyang, China, 28–30 July 2020; pp. 888–891. [Google Scholar]
- Yilin, J.; Jian, S. Substation Equipment Fault Identification Based on Infrared Image Analysis. In Proceedings of the Journal of Physics: Conference Series, Moscow, Russia, 20–21 October 2020; p. 012004. [Google Scholar]
- Li, Y.; Xu, Y.; Xu, M.; Wang, S.; Xie, Z.; Li, Z.; Jiang, X. Automatic infrared image recognition method for substation equipment based on a deep self-attention network and multi-factor similarity calculation. Glob. Energy Interconnect. 2022, 5, 397–408. [Google Scholar] [CrossRef]
- Zheng, H.; Sun, Y.; Liu, X.; Djike, C.L.T.; Li, J.; Liu, Y.; Ma, J.; Xu, K.; Zhang, C. Infrared Image Detection of Substation Insulators Using an Improved Fusion Single Shot Multibox Detector. IEEE Trans. Power Deliv. 2020, 36, 3351–3359. [Google Scholar] [CrossRef]
- Ciric, R.; Milkov, M. Application of Thermal Imaging in Assesment of Equipment in Power Plants. Monit. Expert. Saf. Eng. 2014, 4, 1–8. [Google Scholar]
- Wang, L.; Kou, Q.; Zeng, Q.; Ji, Z.; Zhou, L.; Zhou, S. Substation switching device identification method based on deep learning. In Proceedings of the 2022 4th International Conference on Data-Driven Optimization of Complex Systems (DOCS), Chengdu, China, 28–30 October 2022; pp. 1–6. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 37, 1904–1916. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 580–587. [Google Scholar]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.-Y.; Berg, A.C. SSD: Single Shot MultiBox Detector. In Proceedings of the Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016; pp. 21–37. [Google Scholar]
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 26 June–1 July 2016; pp. 779–788. [Google Scholar]
- Redmon, J.; Farhadi, A. YOLOv3: An Incremental Improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar]
- Carion, N.; Massa, F.; Synnaeve, G.; Usunier, N.; Kirillov, A.; Zagoruyko, S. End-to-End Object Detection with Transformers. In Proceedings of the Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, 23–28 August 2020; pp. 213–229. [Google Scholar]
- Zhu, X.; Lyu, S.; Wang, X.; Zhao, Q. TPH-YOLOv5: Improved YOLOv5 Based on Transformer Prediction Head for Object Detection on Drone-Captured Scenarios. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 2778–2788. [Google Scholar]
- Yue, X.; Wang, Q.; He, L.; Li, Y.; Tang, D. Research on Tiny Target Detection Technology of Fabric Defects Based on Improved YOLO. Appl. Sci. 2022, 12, 6823. [Google Scholar] [CrossRef]
- Girshick, R. Fast R-CNN. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 1440–1448. [Google Scholar]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. Adv. Neural Inf. Process. Syst. 2015, 28, 1–9. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Zhu, Y.; Zhao, C.; Wang, J.; Zhao, X.; Wu, Y.; Lu, H. CoupleNet: Coupling Global Structure with Local Parts for Object Detection. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 4126–4134. [Google Scholar]
- Cai, Z.; Vasconcelos, N. Cascade R-CNN: Delving Into High Quality Object Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 6154–6162. [Google Scholar]
- Redmon, J.; Farhadi, A. YOLO9000: Better, Faster, Stronger. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 7263–7271. [Google Scholar]
- Bochkovskiy, A.; Wang, C.-Y.; Liao, H.-Y.M. YOLOv4: Optimal Speed and Accuracy of Object Detection. arXiv 2020, arXiv:2004.10934. [Google Scholar]
- Wang, C.-Y.; Bochkovskiy, A.; Liao, H.-Y.M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. arXiv 2022, arXiv:2207.02696. [Google Scholar]
- Ge, Z.; Liu, S.; Wang, F.; Li, Z.; Sun, J. YOLOX: Exceeding YOLO Series in 2021. arXiv 2021, arXiv:2107.08430. [Google Scholar]
- Wang, C.-Y.; Yeh, I.-H.; Liao, H.-Y.M. You Only Learn One Representation: Unified Network for Multiple Tasks. arXiv 2021, arXiv:2105.04206. [Google Scholar]
- Dai, J.; Qi, H.; Xiong, Y.; Li, Y.; Zhang, G.; Hu, H.; Wei, Y. Deformable Convolutional Networks. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 764–773. [Google Scholar]
- Mehta, S.; Rastegari, M. MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer. arXiv 2021, arXiv:2110.02178. [Google Scholar]
- Zhang, X.; Zeng, H.; Guo, S.; Zhang, L. Efficient Long-Range Attention Network for Image Super-resolution. In Proceedings of the Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, 23–27 October 2022; pp. 649–667. [Google Scholar]
- Woo, S.; Park, J.; Lee, J.-Y.; Kweon, I.S. CBAM: Convolutional block attention module. In Proceedings of the European conference on computer vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]
Input | Network Unit | Channel Number | Step |
---|---|---|---|
12 × 3 | CBL | 32 | 2 |
1/22 × 32 | CBL | 64 | 2 |
1/42 × 64 | E-ELAN | 64 | 1 |
1/42 × 64 | Maxpool | 2 | |
1/82 × 64 | E-ELAN | 64 | 1 |
1/82 × 64 | MVBlock | 64 | 1 |
1/82 × 64 | Maxpool | 2 | |
1/162 × 64 | E-ELAN | 128 | 1 |
1/162 × 128 | MVBlock | 128 | 1 |
1/322 × 128 | Maxpool | 2 | |
1/322 × 128 | E-ELAN | 128 | 1 |
Models | Backbone Network | Neck | Layer | Parameter Quantity | Model Size/mb |
---|---|---|---|---|---|
YOLOv7-tiny | E-ELAN | FPN+PAN | 200 | 6,025,525 | 12.3 |
FF-YOLOv7 | E-ELAN | FF-FPN | 184 | 4,166,174 | 8.5 |
GAE-YOLOv7 | GAE-ELAN | FPN+PAN | 338 | 9,296,693 | 18.9 |
GF-YOLOv7 | GAE-ELAN | FF-FPN | 322 | 4,929,915 | 9.1 |
Models | Precision (%) | Recall (%) | [email protected] (%) | mAP@[0.5:0.95] (%) |
---|---|---|---|---|
YOLOv7-tiny | 0.9916 | 0.9948 | 0.9931 | 0.7099 |
FF-YOLOv7 | 0.9917 | 0.9961 | 0.9939 | 0.7081 |
GAE-YOLOv7 | 0.8318 | 0.9817 | 0.8741 | 0.6119 |
GF-YOLOv7 | 0.9914 | 0.9960 | 0.9947 | 0.7191 |
Models | Misidentification | Accuracy (%) | FPS | Inference Time (s) |
---|---|---|---|---|
YOLOv7-tiny | 63 | 93.0 | 33.9 | 22.9 |
FF-YOLOv7 | 32 | 97.6 | 32.4 | 23.9 |
GAE-YOLOv7 | 143 | 81.0 | 33.8 | 22.9 |
GF-YOLOv7 | 19 | 97.8 | 32.1 | 24.1 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Wang, Y.; Zhang, X.; Li, L.; Wang, L.; Zhou, Z.; Zhang, P. An Improved YOLOv7 Model Based on Visual Attention Fusion: Application to the Recognition of Bouncing Locks in Substation Power Cabinets. Appl. Sci. 2023, 13, 6817. https://doi.org/10.3390/app13116817
Wang Y, Zhang X, Li L, Wang L, Zhou Z, Zhang P. An Improved YOLOv7 Model Based on Visual Attention Fusion: Application to the Recognition of Bouncing Locks in Substation Power Cabinets. Applied Sciences. 2023; 13(11):6817. https://doi.org/10.3390/app13116817
Chicago/Turabian StyleWang, Yang, Xiaofeng Zhang, Longmei Li, Liming Wang, Ziyang Zhou, and Peng Zhang. 2023. "An Improved YOLOv7 Model Based on Visual Attention Fusion: Application to the Recognition of Bouncing Locks in Substation Power Cabinets" Applied Sciences 13, no. 11: 6817. https://doi.org/10.3390/app13116817
APA StyleWang, Y., Zhang, X., Li, L., Wang, L., Zhou, Z., & Zhang, P. (2023). An Improved YOLOv7 Model Based on Visual Attention Fusion: Application to the Recognition of Bouncing Locks in Substation Power Cabinets. Applied Sciences, 13(11), 6817. https://doi.org/10.3390/app13116817