DS-SIAUG: A Self-Training Approach Using a Disrupted Student Model for Enhanced Side-Scan Sonar Image Augmentation
<p>The overall training process of DS-SIAUG.</p> "> Figure 2
<p>Schematic diagram of the Disrupted Student model structure.</p> "> Figure 3
<p>Forward and reverse diffusion processes of DDPM.</p> "> Figure 4
<p>Dataset example.</p> "> Figure 5
<p>Generated image results. (<b>a</b>) A generated 128 × 128 image of a shipwreck. (<b>b</b>) A generated 128 × 128 image of an aircraft wreckage. (<b>c</b>) A generated 128 × 128 image of an underwater reef. Due to the limited data on aircraft wreckages and underwater reefs, higher-resolution images were not generated. (<b>d</b>) A generated 256 × 256 image of a shipwreck. (<b>e</b>) A generated 512 × 512 image of a shipwreck.</p> "> Figure 6
<p>On the left is the generated image at 256 × 256 resolution, and on the right is the generated image at 512 × 512 resolution.</p> ">
Abstract
:1. Introduction
2. Materials and Methods
2.1. Overall Structure of DS-SIAUG
- (1)
- Data preparation. Use the side-scan sonar shipwreck target as an example to explain the dataset augmentation method. Select a dataset of actual side-scan sonar shipwreck target images (Real SSS Image, 1,000 images), part of which is used as the test dataset (Test Dataset, 200 images) and part as the training dataset (Training Dataset, 800 images). To expand the training dataset, conventional image augmentation methods, such as rotation and mirroring, are first used to increase the training set to about 7000 images, called “The First Augmented Dataset”.
- (2)
- Initial training phase. The First Augmented Dataset is input and trained using the DDPM (Deep Diffusion Model) and YOLOv5 network models. The result of this phase is obtaining the initial DDPM model (DDPM 1) and YOLO detection model (YOLO Detection Model 1).
- (3)
- Image generation, filtering, and augmentation. Using the trained DDPM 1 model, generate a dataset of side-scan sonar images (Augmented Images, about 10,000 images) and input it into the YOLO Detection Model 1 for detection. A threshold is set (different thresholds will filter different numbers of side-scan sonar images with shipwreck features), with the threshold in this paper set at 0.5, filtering out about 2000 images (YOLO filter images). These filtered images are then augmented using conventional image augmentation methods, such as rotation and mirroring, increasing them to about 7000 images, called “Diffusion Generated Dataset 1”.
- (4)
- Iterative training. Iterative training includes Disrupted Student training and diffusion model training. Disrupted Student training refers to adding student disruptions to “Diffusion Generated Dataset 1” and merging it with “The First Augmented Dataset” to train the YOLO detection model (YOLO Detection Model 2); diffusion model training refers to merging “Diffusion Generated Dataset 1” with “The First Augmented Dataset” to form “The Second Augmented Dataset”, which is then used to train the diffusion model (DDPM2).
- (5)
- Using the DDPM2 model to regenerate the side-scan sonar image dataset (Augmented Images, about 10,000 images), input it into the YOLO Detection Model 2 for filtering, and perform conventional augmentation on the filtered dataset, naming the augmented dataset “Diffusion Generated Dataset 2”.
- (6)
- Repeat steps (4) and (5), iteratively generating new datasets to continuously enhance the model’s detection capabilities.
2.2. Disrupted Student Training Model
2.3. DDPM Model Structure
Algorithm 1 Training |
1: repeat |
2: |
3: |
4: |
5: Take gradient descent step on |
6: until converged |
Algorithm 2 Sampling |
1: |
2: for do |
3: if , else |
4: |
5: end for |
6: return |
2.4. YOLO Detection Model Structure
- (1)
- Input and Feature Extraction: YOLOv5 uses slicing technology to divide the input image into four feature maps with the same number of channels and reduces the number of parameters through concatenation operations across channel dimensions. After processing through scale and channel pathways, it further reduces the computational burden. A Spatial Pyramid Pooling (SPP) module at the end of this stage generates multi-scale feature maps, preparing for subsequent processing in the “neck” section. The input and feature extraction capabilities of the YOLOv5 network model effectively enhance computational efficiency and reduce model parameters, achieving improved detection speed and accuracy.
- (2)
- Feature Fusion Layer: Utilizing the Path Aggregation Network (PANet), YOLOv5 efficiently merges feature maps of different scales in the feature fusion stage, enhancing the detection capability for targets of various sizes and effectively integrating shallow image features with deep semantic features, significantly improving detection performance.
- (3)
- Detection Head: Composed of three convolutional modules, each process output features layers of three different scales capable of outputting predictions for five different categories and their location coordinates, plus a channel for representing confidence. This demonstrates YOLOv5’s high flexibility and accuracy in the field of object detection.
3. Experimental Validation
3.1. Adding Disruptions to the Student Model
3.1.1. Physical Background Noise
3.1.2. Geometric Deformation Disruptions
3.2. Evaluation Metrics
3.3. DDPM Augmentation Effect Analysis
3.4. Interference Experiments and Iterative Training Effects
3.4.1. The Impact of Different Disruptions on the Student Model
3.4.2. Iterative Training Effects (Compared with No Noise Filtering)
4. Discussion
4.1. Introduction of the Noise Student in a Cyclic Adversarial Structure
4.2. Manual Selection Comparison and Similar Methods Comparison
4.3. Reasons for Improved Model Performance with the Disrupted Student Model
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Buscombe, D. Shallow water benthic imaging and substrate characterization using recreational-grade side scan-sonar. Environ. Model. Softw. 2017, 89, 1–18. [Google Scholar] [CrossRef]
- Flowers, H.J.; Hightower, J.E. A novel approach to surveying sturgeon using side-scan sonar and occupancy modeling. Mar. Coast. Fish. 2013, 5, 211–223. [Google Scholar] [CrossRef]
- Johnson, S.G.; Deaett, M.A. The application of automated recognition techniques to side-scan sonar imagery. IEEE J. Ocean. Eng. J. Devoted Appl. Electr. Electron. Eng. Ocean. Environ. 1994, 19, 138–144. [Google Scholar] [CrossRef]
- Burguera, A.; Bonin-Font, F. On-line multi-class segmentation of side-scan sonar imagery using an autonomous underwater vehicle. J. Mar. Sci. Eng. 2020, 8, 557. [Google Scholar] [CrossRef]
- Chen, E.; Guo, J. Real time map generation using Side-scan sonar scanlines for unmanned underwater vehicles. Ocean Eng. 2014, 91, 252–262. [Google Scholar] [CrossRef]
- Langner, F.; Knauer, C.; Jans, W.; Ebert, A. Side scan sonar image resolution and automatic object detection, classification and identification. In Proceedings of the OCEANS 2009-EUROPE, Bremen, Germany, 11–14 May 2009; IEEE: Piscataway, NJ, USA, 2009; pp. 1–8. [Google Scholar]
- Huang, C.; Zhao, J.; Yu, Y.; Zhang, H. Comprehensive sample augmentation by fully considering SSS imaging mechanism and environment for shipwreck detection under zero real samples. IEEE Trans. Geosci. Remote Sens. 2021, 60, 5906814. [Google Scholar] [CrossRef]
- Zhu, P.; Isaacs, J.; Fu, B.; Ferrari, S. Deep learning feature extraction for target recognition and classification in underwater sonar images. In Proceedings of the 2017 IEEE 56th Annual Conference on Decision and Control (CDC), Melbourne, VIC, Australia, 12–15 December 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 2724–2731. [Google Scholar]
- Neupane, D.; Seok, J. A review on deep learning-based approaches for automatic sonar target recognition. Electronics 2020, 9, 1972. [Google Scholar] [CrossRef]
- Topple, J.M.; Fawcett, J.A. MiNet: Efficient deep learning automatic target recognition for small autonomous vehicles. IEEE Geosci. Remote Sens. Lett. 2020, 18, 1014–1018. [Google Scholar] [CrossRef]
- Huo, G.; Yang, S.X.; Li, Q.; Zhou, Y. A robust and fast method for sidescan sonar image segmentation using nonlocal despeckling and active contour model. IEEE Trans. Cybern. 2016, 47, 855–872. [Google Scholar] [CrossRef]
- Feldens, P.; Darr, A.; Feldens, A.; Tauber, F. Detection of boulders in side scan sonar mosaics by a neural network. Geosciences 2019, 9, 159. [Google Scholar] [CrossRef]
- Tang, Y.; Jin, S.; Bian, G.; Zhang, Y. Shipwreck target recognition in side-scan sonar images by improved YOLOv3 model based on transfer learning. IEEE Access 2020, 8, 173450–173460. [Google Scholar]
- Tang, Y.; Li, H.; Zhang, W.; Bian, S.; Zhai, G.; Liu, M.; Zhang, X. Lightweight DETR-YOLO method for detecting shipwreck target in side-scan sonar. Syst. Eng. Electron. 2022, 44, 2427. [Google Scholar]
- Nguyen, H.T.; Lee, E.H.; Lee, S. Study on the classification performance of underwater sonar image classification based on convolutional neural networks for detecting a submerged human body. Sensors 2019, 20, 94. [Google Scholar] [CrossRef]
- Li, C.; Ye, X.; Cao, D.; Hou, J.; Yang, H. Zero shot objects classification method of side scan sonar image based on synthesis of pseudo samples. Appl. Acoust. 2021, 173, 107691. [Google Scholar] [CrossRef]
- Nayak, N.; Nara, M.; Gambin, T.; Wood, Z.; Clark, C.M. Machine learning techniques for AUV side-scan sonar data feature extraction as applied to intelligent search for underwater archaeological sites. In Proceedings of the Field and Service Robotics: Results of the 12th International Conference, Tokyo, Japan, 29–31 August 2019; Springer: Singapore, 2021; pp. 219–233. [Google Scholar]
- Lee, S.; Park, B.; Kim, A. Deep learning from shallow dives: Sonar image generation and training for underwater object detection. arXiv 2018, arXiv:1810.07990. [Google Scholar]
- Huo, G.; Wu, Z.; Li, J. Underwater object classification in sidescan sonar images using deep transfer learning and semisynthetic training data. IEEE Access 2020, 8, 47407–47418. [Google Scholar] [CrossRef]
- Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial nets. Adv. Neural Inf. Process. Syst. 2014, 27. [Google Scholar]
- Van den Oord, A.; Kalchbrenner, N.; Espeholt, L.; Graves, A.; Kavukcuoglu, K. Conditional image generation with pixelcnn decoders. Adv. Neural Inf. Process. Syst. 2016, 29. [Google Scholar]
- Rezende, D.; Mohamed, S. Variational inference with normalizing flows. In Proceedings of the International Conference on Machine Learning, Lille, France, 7–9 July 2015; PMLR: London, UK, 2015; pp. 1530–1538. [Google Scholar]
- Kingma, D.P.; Welling, M. Auto-encoding variational bayes. arXiv 2013, arXiv:1312.6114. [Google Scholar]
- Bore, N.; Folkesson, J. Modeling and simulation of sidescan using conditional generative adversarial network. IEEE J. Ocean. Eng. 2020, 46, 195–205. [Google Scholar] [CrossRef]
- Jiang, Y.; Ku, B.; Kim, W.; Ko, H. Side-scan sonar image synthesis based on generative adversarial network for images in multiple frequencies. IEEE Geosci. Remote Sens. Lett. 2020, 18, 1505–1509. [Google Scholar] [CrossRef]
- Brock, A.; Donahue, J.; Simonyan, K. Large scale GAN training for high fidelity natural image synthesis. arXiv 2018, arXiv:1809.11096. [Google Scholar]
- Karras, T.; Laine, S.; Aila, T. A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2019; pp. 4401–4410. [Google Scholar]
- Sohl-Dickstein, J.; Weiss, E.; Maheswaranathan, N.; Ganguli, S. Deep unsupervised learning using nonequilibrium thermodynamics. In Proceedings of the International Conference on Machine Learning, Lille, France, 7–9 July 2015; PMLR: London, UK, 2015; pp. 2256–2265. [Google Scholar]
- Ho, J.; Jain, A.; Abbeel, P. Denoising diffusion probabilistic models. Adv. Neural Inf. Process. Syst. 2020, 33, 6840–6851. [Google Scholar]
- Yang, Z.; Zhao, J.; Zhang, H.; Yu, Y.; Huang, C. A Side-Scan Sonar Image Synthesis Method Based on a Diffusion Model. J. Mar. Sci. Eng. 2023, 11, 1103. [Google Scholar] [CrossRef]
- Song, J.; Meng, C.; Ermon, S. Denoising diffusion implicit models. arXiv 2020, arXiv:2010.02502. [Google Scholar]
- Li, J.; Li, D.; Xiong, C.; Hoi, S. Blip: Bootstrapping language-image pre-training for unified vision-language understanding and generation. In Proceedings of the International Conference on Machine Learning, Baltimore, MY, USA, 17–23 July 2022; PMLR: London, UK, 2022; pp. 12888–12900. [Google Scholar]
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 26 June–1 July 2016; pp. 779–788. [Google Scholar]
- Scudder, H. Probability of error of some adaptive pattern-recognition machines. IEEE Trans. Inf. Theory 1965, 11, 363–371. [Google Scholar] [CrossRef]
- Yarowsky, D. Unsupervised word sense disambiguation rivaling supervised methods. In Proceedings of the 33rd Annual Meeting of the Association for Computational Linguistics, Cambridge, MA, USA, 26–30 June 1995; pp. 189–196. [Google Scholar]
- Hinton, G.; Vinyals, O.; Dean, J. Distilling the knowledge in a neural network. arXiv 2015, arXiv:1503.02531. [Google Scholar]
Noise Type | Noise Characteristics | Noise Source | Addition Intensity | Proportion of Total Images |
---|---|---|---|---|
Gaussian Noise | Probability density function follows Gaussian distribution | Thermal noise from electronic devices, sensor reading errors | Random variation between 0 and 0.5 | Randomly selected for 25% of images |
Stripe Noise | Appears as horizontal or vertical stripes in the image | Influenced by sensor mechanical movement and temperature changes | Vertical gradient ratio randomly varies between 0 and 0.1 | Randomly selected for 25% of images |
Speckle Noise | Multiplicative noise, formed by the speckle pattern from multiple small beams converging | Caused by wave coherence leading to variations in wave strength, resulting in granular noise in the image | Noise size randomly chosen between 0 and 0.1 | Randomly selected for 25% of images |
Disruption Type | Method of Addition | Proportion of Total Images |
---|---|---|
Rotation | Random rotation at 0, 90, 180, 270 degrees | 100% |
Mirroring | Randomly selected | 50% |
Deformation | Random selection of scaling ratios at 0.5, 0.75, 1.25, 1.5 | 66.6% |
Group | Training Set Category | Batch Size | Image Size |
---|---|---|---|
T1 | Shipwreck, plane, stone | 512 | 128*128 |
T2 | Shipwreck | 80 | 256*256 |
T3 | Shipwreck | 16 | 512*512 |
Group | Training Set Category | FID | LPIPS | PSNR | MMD | SSIM |
---|---|---|---|---|---|---|
T1 | Shipwreck (128*128) | 123.7 | 0.6860 | 4.7845 | 0.00204 | 0.0872 |
Plane (128*128) | 112.3 | 0.8129 | 3.8046 | 0.0049 | 0.08077 | |
Stone (128*128) | 102.9 | 0.7789 | 3.5698 | 0.0156 | 0.2114 | |
T2 | Shipwreck (256*256) | 150.6 | 0.6631 | 4.7634 | 0.0045 | 0.0727 |
T3 | Shipwreck (512*512) | 148.2 | 0.6622 | 5.0441 | 0.0039 | 0.05479 |
Group | Noise Disruption | Geometric Deformation Disruption |
---|---|---|
C1 | - | - |
C2 | √ | - |
C3 | - | √ |
C4 | √ | √ |
Group | Precision | Recall | Map |
---|---|---|---|
C1 | 0.854 | 0.807 | 0.869 |
C2 | 0.859 | 0.832 | 0.892 |
C3 | 0.878 | 0.866 | 0.904 |
C4 | 0.883 | 0.863 | 0.913 |
Group | Precision | Recall | Map | Group | Precision | Recall | Map |
---|---|---|---|---|---|---|---|
T1 | 0.854 | 0.807 | 0.869 | G1 | 0.883 | 0.863 | 0.913 |
T2 | 0.873 | 0.862 | 0.901 | G2 | 0.891 | 0.906 | 0.931 |
T3 | 0.892 | 0.856 | 0.885 | G3 | 0.904 | 0.906 | 0.938 |
T4 | 0.889 | 0.827 | 0.885 | G4 | 0.896 | 0.9 | 0.939 |
Group | Precision | Recall | Map | Group | Precision | Recall | Map |
---|---|---|---|---|---|---|---|
B1 | 0.878 | 0.865 | 0.904 | G1 | 0.883 | 0.863 | 0.913 |
B2 | 0.892 | 0.916 | 0.931 | G2 | 0.891 | 0.906 | 0.931 |
B3 | 0.906 | 0.911 | 0.941 | G3 | 0.904 | 0.906 | 0.938 |
B4 | 0.900 | 0.876 | 0.939 | G4 | 0.896 | 0.9 | 0.939 |
Network | Precision | Recall | Map |
---|---|---|---|
DDPM | 0.851 | 0.847 | 0.899 |
DDIM + DPM-Solver | 0.862 | 0.853 | 0.913 |
DS-SIAUG | 0.904 | 0.906 | 0.938 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Peng, C.; Jin, S.; Bian, G.; Cui, Y. DS-SIAUG: A Self-Training Approach Using a Disrupted Student Model for Enhanced Side-Scan Sonar Image Augmentation. Sensors 2024, 24, 5060. https://doi.org/10.3390/s24155060
Peng C, Jin S, Bian G, Cui Y. DS-SIAUG: A Self-Training Approach Using a Disrupted Student Model for Enhanced Side-Scan Sonar Image Augmentation. Sensors. 2024; 24(15):5060. https://doi.org/10.3390/s24155060
Chicago/Turabian StylePeng, Chengyang, Shaohua Jin, Gang Bian, and Yang Cui. 2024. "DS-SIAUG: A Self-Training Approach Using a Disrupted Student Model for Enhanced Side-Scan Sonar Image Augmentation" Sensors 24, no. 15: 5060. https://doi.org/10.3390/s24155060