[go: up one dir, main page]

Next Article in Journal
Characterization of Invar Syntactic Foams Obtained by Spark Plasma Sintering
Previous Article in Journal
Accuracy and Reliability of Digital Dental Models Obtained by Intraoral Scans Compared with Plaster Models
Previous Article in Special Issue
Three-Dimensional Path-Following with Articulated 6DoF Robot and ToF Sensors
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Lightweight TA-YOLOv8 Method for the Spot Weld Surface Anomaly Detection of Body in White

School of Mechanical Engineering, Shanghai Jiao Tong University, Shanghai 201100, China
*
Authors to whom correspondence should be addressed.
Appl. Sci. 2025, 15(6), 2931; https://doi.org/10.3390/app15062931 (registering DOI)
Submission received: 13 January 2025 / Revised: 5 March 2025 / Accepted: 6 March 2025 / Published: 8 March 2025
(This article belongs to the Special Issue Motion Control for Robots and Automation)
Figure 1
<p>Architecture of YOLOv8 model. The different color parts of the input batches represent different image data. The different color parts of the architecture represent different function modules.</p> ">
Figure 2
<p>Improved backbone network of our architecture. On the left is a schematic diagram of the backbone network process for detecting spot weld images, while the right side shows the corresponding structure layer parameters and related information. The different color parts are the same as <a href="#applsci-15-02931-f001" class="html-fig">Figure 1</a>.</p> ">
Figure 3
<p>Proposed Multiple Cross-Layer FPN (MC-FPN) network. The different color dotted lines represent multiple cross layers, with P<sub>2</sub>-P<sub>5</sub> being simplified representations of the intermediate connection layers.</p> ">
Figure 4
<p>Task-Aligned head structure: to learn extensive task-interactive features from multiple convolutional layers.</p> ">
Figure 5
<p>Welding spots sample images and annotated data in Body-in-White production lines. (<b>a</b>) shows the samples we collected in the production lines, while (<b>b</b>) shows the pretraining dataset and the labels (yellow squares).</p> ">
Figure 5 Cont.
<p>Welding spots sample images and annotated data in Body-in-White production lines. (<b>a</b>) shows the samples we collected in the production lines, while (<b>b</b>) shows the pretraining dataset and the labels (yellow squares).</p> ">
Figure 6
<p>Performance comparison with typical object detection algorithms on test set.</p> ">
Figure 7
<p>Some results of WSDDM and comparison between small welding spot detection models. We use green to represent the detected weld spots are normal, and red to represent the detected weld spots have defects or abnormalities.</p> ">
Figure 8
<p>The weld spot dataset obtained from image segmentation using the WSDDM.</p> ">
Figure 9
<p>Data augmentation and labeling.</p> ">
Figure 10
<p>Visualization and validation sample results for model testing. The model effectively captures the location of welding defects through highlighted (green) regions.</p> ">
Figure 11
<p>Validation sample results for model generalization ability.</p> ">
Figure 12
<p>Experimental pipeline and integrated detection systems.</p> ">
Versions Notes

Abstract

:
The deep learning architecture YOLO (You Only Look Once) has demonstrated its superior visual detection performance in various computer vision tasks and has been widely applied in the field of automatic surface defect detection. In this paper, we propose a lightweight YOLOv8-based method for the quality inspection of car body welding spots. We developed a TA-YOLOv8 network structure which has an improved Task-Aligned (TA) head detection, designed to handle a small sample size, imbalanced positive and negative samples, and high-noise characteristics of Body-in-White welding spot data. By learning with fewer parameters, the model achieves more efficient and accurate classification. Additionally, our algorithm framework can perform anomaly segmentation and classification on our open-world raw datasets obtained from actual production environments. The experimental results show that the lightweight module improves the processing speed by an average of 2.8%, with increases in detection the mAP@50-95 and recall rate of 1.35% and 0.1226, respectively.

1. Introduction

In automotive body welding production lines, various visual defects can occur at spot welds, such as edge welds, overlap, cold solder joint and burr. The formation of welding defects is closely related to sheet surface conditions, assembly precision, and welding parameter settings. For example, contaminants or coatings on the sheet surface can alter electrical conductivity and melting behavior, leading to burn-through or insufficient fusion. Poor assembly precision (e.g., excessive gaps or misalignment) affects nugget formation and joint quality. Improper welding parameters (e.g., an excessive current or insufficient welding time) may result in defects such as over-melting or incomplete fusion.
The images of spot welds are visually rich and can partially reflect the internal quality of the molten core. These defects are critical because they directly impact the mechanical strength and structural integrity of the welding spots. For instance, incomplete fusion and overlap can lead to weak weld nuggets, reducing load-bearing capacity and fatigue life under cyclic stresses. Burrs and edge welding may introduce stress concentrations, increasing the likelihood of crack initiation. Studies [1,2,3] have shown that defective welds often fail mechanical strength tests such as tensile shear and peel tests. Moreover, industry standards such as NQST and AHSS/UHSS welding guidelines set strict criteria to ensure that welds meet the required strength and durability for automotive safety.
Welding spot images provide rich and intuitive information, allowing for a quality assessment by analyzing the weld spot’s shape, brightness distribution, and surface texture. To ensure a comprehensive evaluation, we referred to the automotive industry standard NQST (Nugget Quality Strength Level) and the welding guidelines for AHSS/UHSS steels established by automobile manufacturers.
Currently, the detection of defects on small samples in production lines still relies on manual visual inspection, resulting in low efficiency and delayed feedback. Machine vision-based methods for spot weld quality detection simulate manual visual inspection and offer advantages such as high efficiency, low cost and a high degree of automation. However, due to the complex working conditions in production lines and strong lighting disturbances, the requirements for the accuracy and robustness of visual detection algorithms are quite high. Visual recognition methods can be divided into traditional image processing methods and deep learning methods.
Conventional image processing methods often rely on manually created features to describe spot weld quality, including shape, color and texture. Specialized image processing techniques are used to extract these features for detection. Since spot welds are circular, the Hough circle transform [4] is commonly used to extract the spot weld contours. For example, Duda [5] used the Hough circle transformation and multi-contour clustering analysis to locate spot welds with incomplete circular contours. To quickly locate spot weld regions, Liang et al. [6] proposed an angle-assisted circle detection algorithm based on a randomized Hough transform to improve computational efficiency and robustness. Considering the elliptical geometry of spot welds, some researchers [7,8,9] calculated the area of the molten core by adjusting the major and minor axes of the spot weld footprints to estimate the spot weld quality. There are some alternative non-destructive testing methods for welding spot detection, such as ultrasonic testing, X-ray testing and thermography. Active thermography is a classical non-destructive testing (NDT) technique widely applied in spot weld inspection due to its ability to detect both surface and subsurface defects. These methods [10,11] involve applying an external heat source, such as optical, inductive, or ultrasonic excitation, to the weld region, followed by capturing the resulting thermal response using an infrared camera. Defects such as cracks, pores, and incomplete fusion cause localized temperature variations, which can be analyzed to infer defect location, size, and depth. However, its accuracy can be influenced by material thermal conductivity, surface emissivity, and excitation parameters.
To overcome these limitations, many researchers have introduced classifiers to recognize spot weld quality in addition to manually developed features. Valentin et al. [12] described spot weld quality based on surface texture features using a gray-level co-occurrence matrix and developed a support vector machine (SVM)-based classifier to achieve the intelligent detection of spot weld quality. Javaheri et al. [13] used unsupervised clustering to extract RGB color features from five positions of the spot weld surface representing the spot weld quality and built a 15-dimensional color feature and mechanical property model (Back Propagation Neural Network, BPNN) to predict spot weld quality. Alghannam et al. [14] developed an experimental platform to detect weld defects and extracted surface morphology features using image segmentation and mathematical morphology methods and then applied fuzzy logic rules based on a knowledge base to predict spot weld quality. Younes et al. [15] further improved the image processing algorithm to detect non-circular spot welds. They used least squares ellipse fitting to extract new geometric features of the spot welds and constructed a fuzzy SVM model for detecting weld defects and evaluating weld quality in automobile bodies.
Benefitting from the excellent capabilities of Convolutional Neural Networks (CNNs) for representation and learning, deep learning (DL)-based visual recognition has shown outstanding performance in terms of precision and speed and has gradually become a standard technique. Supervised methods are commonly used in the visual inspection of weld quality, primarily utilizing classical network structures such as AlexNet [16], ResNet [17], MobileNet [18], Faster R-CNN [19] and YOLO [20] series models. For example, Kim et al. [21] developed a model with two folding layers and one fully connected layer that is able to accurately predict the tensile shear strength and nugget diameter of cold-rolled and galvanized steel welds. Liu et al. [22] proposed an improved Faster RCNN model to detect welding spots online. Zhong Wang et al. [23] proposed a network model with a bottleneck structure of feature compression and expansion layers, which achieves high precision with fewer parameters.
Since the images captured on site often contain multiple welds, the models must support both weld quality detection and localization. Detection networks that locate and simultaneously categorize defective welds typically distinguish between one-stage and two-stage methods, with speed and accuracy being the primary considerations. Single-stage methods such as YOLO are favored in industrial applications because of their fast inference speed and low operational costs. Liu et al. [18] developed an efficient, lightweight weld detection network based on YOLO that is capable of localizing and evaluating welds under complex conditions. Similarly, Adarsh used Tiny-YOLOv3 [24] to identify regions of interest, refining the detection accuracy by retaining only the weld contours and using a modified Hough transform for circle detection.
In summary, CNN-based object recognition algorithms use annotated images to predict object positions and classes, mainly distinguishing between one-stage and two-stage methods. The two-stage approach, exemplified by Faster R-CNN, includes a region proposal network and a classifier, while one-stage methods such as YOLO omit the proposal network and provide faster, more compact models suitable for weld detection. The main contribution of this study lies in optimizing the structure of the conventional object detection algorithm and proposing a specific model for the detection of welds with small samples and objects. By incorporating Task-Aligned detection to improve the cross-class feature alignment, the effectiveness of the approach is verified using an on-site image dataset.

2. Architecture Basis

Over the past decade, the YOLO (You Only Look Once) family of algorithms has remained at the forefront of object detection research, with the release of YOLOv8 in 2022 representing a new milestone. Safaldin et al. [25] proposed an improved YOLOv8 to detect moving objects. Some improvements [26,27,28] have been made to the YOLOv8s model to make it perform better in their own studies. Real-time object detection is crucial for various applications, including autonomous vehicles, robotics and video surveillance. Among the numerous object detection methods, the YOLO framework stands out for its impressive speed and accuracy. Since its introduction, the YOLO series has been updated several times to address limitations and improve overall performance.
As illustrated in Figure 1, YOLOv8 integrates the design concepts of the YOLO series into the backbone and neck area and converts the C3 structure of YOLOv5 [29] into the C2f structure. This change improves the gradient flow and adjusts the number of channels for different model scales. The classic backbone, Darknet53, uses residual modules with skip connections to mitigate the problem of gradient disappearance in deep networks. Each residue module consists of DeepConv2D, Batch Normalization, and Leaky ReLU (DBL), skip connections and element-wise additions. In addition, a key Res-n component in Darknet53 includes zero-padding to maintain consistent input–output sizes, followed by a DBL layer and multiple residual modules that generate three feature maps at different scales to support multi-scale detection.
The neck section of YOLOv8 incorporates concepts from the Feature Pyramid Network (FPN) and enriches them with Path Aggregation Network (PAN) extensions to improve multi-scale feature fusion and increase detection and localization accuracy. The head section adopts a decoupled-head header structure that splits the classification and detection tasks for optimized processing.

3. Proposed Model and Algorithms

In the backbone module, the original Darknet-53 model in YOLOv8 contains an intermediate convolutional layer in which the number of filters increases from 512 to 1024 before dropping back to 512 in the next layer. Removing this intermediate layer eliminates 1024 convolutional filters, reducing the model’s convolution operations by 1024 and the number of 3 × 3 convolutional parameters by 9 × 1024. This change significantly reduces the model’s parameters, resulting in a lighter model with improved performance. The backbone module is shown in Figure 2.
For the neck module, the structure of YOLOv8 adopts the PANet (Path Aggregation Network) by organizing the neck section with an FPN (Feature Pyramid Network) structure. Based on this concept, we propose a Multiple Cross-Layer FPN (MC-FPN) network that combines both down-sampling and up-sampling paths with multiple cross-layer connections, enabling efficient feature fusion, as shown in Figure 3.
The head module in YOLOv8 first branches into two CBS convolution modules and then passes through a Conv2D layer. The loss of the classification and the bounding box are calculated separately. In this section, a decoupled head structure is used to separate classification and detection tasks. By applying Distributional Focal Loss (DFL), the number of channels of the regression head is set to 4 × reg_max, where reg_max is normally set to 16. The bounding box head uses a Bbox loss for object identification, while the classification head applies a Binary Cross-Entropy (BCE) loss function instead of VEL. This substitution is better suited for unbalanced data where the positive samples heavily outweigh the negative samples.
In this loss function, the indicator function A i j t represents whether the j-th anchor of the i-th grid is responsible for predicting a specific target; if it is, A i j t = 1 ; otherwise, A i j t = 0. The confidence score reflects the likelihood of an object being present within the network unit. Specifically, c i represents the confidence of the ground truth box, and c ^ i is the confidence of the predicted box. p i ( c ) and p i ^ ( c ) denote the class probabilities of the ground truth and predicted boxes, respectively.
B C E L o s s c i , p i = L C I o U i = 0 S × S j = 0 B A i j t c i log c ^ i + 1 c i log 1 c ^ i i = 0 S × S j = 0 B A i j f c i log c ^ i + 1 c i log 1 c ^ i i = 0 S × S j = 0 B = 3 A i j t c c l a s s [ p i c log p i ^ c + 1 p i c log 1 p i ^ c ]
In contrast to the conventional single-level recognition head, which uses two parallel branches for classification and localization, a Task-Aligned head (T-head) is used in this study to improve the interaction between the two tasks. This collaborative structure allows for more accurate matching predictions. The T-head computes the features of the task interaction and performs the predictions via the Task-Aligned Predictor [30] (TAP). The efficient detection TA-head structure is shown in Figure 4.
To enhance the interaction between classification and localization, a feature extractor is proposed to learn extensive task-interactive features from multiple convolutional layers. This design facilitates task interaction and provides multi-level features with an effective receptive field across multiple scales for both tasks. Here, X_FPN represents FPN features, and H, W, and c denote the height, width, and channel count, respectively, while i and j indicate the channel numbers for localization and classification tasks. The feature extractor computes task-interactive features using N consecutive transformation layers with activation functions:
X k , I n t e r = δ c o n v X F P N , k = 1 δ c o n v X k 1 , i n t e r , k > 1 ,   k 1,2 , , N
We conduct object classification and localization on the computed task-interactive features, allowing both tasks to effectively perceive each other's state. However, due to the single-branch design, task-interactive features inevitably introduce a certain degree of feature conflict between the two distinct tasks. Since object classification and localization have different objectives, they focus on different types of features (e.g., varying levels or receptive fields). Therefore, a layer attention mechanism dynamically computes these task-specific features to encourage task decoupling.
We further adopt Task-Aligned Learning (TAL) to guide our TA-head in making Task-Aligned predictions. From the perspective of task alignment, it dynamically selects high-quality anchors based on designated metrics while simultaneously addressing anchor assignment and weighting. For Task-Aligned sample assignment stage, to handle Non-Maximum Suppression [31] (NMS), anchor assignment for training instances must meet the following criteria:
  • Well-aligned anchors should jointly predict high classification scores with precise localization.
  • Misaligned anchors should receive lower classification scores and be suppressed.
Considering that classification scores and the Complete Intersection over Union [32] (CIoU) between the predicted and ground truth boxes indicate prediction quality for both tasks, this study uses a higher-order combination of classification scores and the CIoU to measure task alignment. The following metric is designed to calculate anchor alignment per instance:
t = v α × w β
where v and w represent the classification score and CIoU value, respectively, while α and β control the influence of the two tasks. Also, t plays a vital role in the joint optimization of both tasks towards task alignment objectives, encouraging the network to dynamically focus on high-quality anchors.
Training sample assignment is crucial to training an object detector effectively. To enhance task alignment, it is essential to focus on Task-Aligned anchors, using a simple assignment rule to select training samples: for each instance, the top m anchors with the highest t values are selected as positive samples, with the remaining anchors designated as negative samples. Training is completed by calculating new loss functions specifically crafted for classification and localization tasks.

4. Experiment

4.1. Environmental Settings

In order to verify the efficacy of the proposed approach, an experimental platform was set up employing Ubuntu 22.04 as the operating system and PyTorch 2.0 as the deep learning framework. YOLOv8 was employed as the baseline network model. The specific configuration of the experimental environment is elaborated in Table 1.

4.2. Dataset

Our dataset consists of 1035 images taken in a real welding production environment, each containing a different number and size of spot welds. Figure 5a shows some samples we collected.
The model is trained using a supervised learning paradigm that requires labeled data. Welding experts annotated the positions and quality of the spot welds in each image based on resistance spot weld signals and chisel test records. The labeled dataset was split into training, validation and test sets in a 10:1:1 ratio, with the input images resized to 1024 × 1024, as shown in Figure 5b and Table 2. Prior to training, the dataset was processed using K-means clustering [33] to determine 9 anchor boxes corresponding to the dimensions of the weld spots. Training was performed for 500 epochs using the Adam optimizer with an initial learning rate of 1e-3, with both momentum terms set to 0.9. A cosine learning rate scheduler with warm-up was used to optimize the network parameters. The stack size was set to 8 and Mosaic data expansion was applied. In our study, the choice of hyperparameters (e.g., learning rate, number of epochs, batch size, momentum) was based on established principles in deep learning and object detection tasks, as well as empirical evaluations during preliminary experiments. The dataset was expanded by random cropping, rotating and arranging, in particular increasing the number of small spot weld samples. The model parameters that performed best in the validation set were saved during training.

4.3. Results and Comparison

Our proposed TA-YOLOv8 model (hereinafter referred to as Weld Spot Defect Detection Model, WSDDM) was trained, validated and tested on a spot weld image dataset alongside several well-known object detection algorithms such as SSD [34], Faster R-CNN [35] and RetinaNet [36]. Figure 6 shows the performance of each trained model on the test set. When the IoU threshold is set to 0.5, the WSDDM shows a significant improvement in accuracy and speed over YOLOv8.
This improvement is due to the improved feature fusion structure, backbone network adjustments, and refined loss functions of the model. While Faster R-CNN achieves the highest detection accuracy, it incurs significant time costs. The lightweight DarkNet53 network used in the WSDDM significantly increases the detection speed without sacrificing accuracy. As can be seen in Figure 6, the WSDDM consistently identifies high- and low-quality spot welds and achieves an mAP of over 90%.
The texting results of the WSDDM are shown in Figure 7. A comparison of YOLOv8 and the WSDDM with images from the test set is also shown in Figure 7. The three columns show the results of YOLOv8, the WSDDM and manual annotations, with green boxes indicating acceptable spot welds and red boxes indicating defective spot welds. The numbers in each box indicate the level of confidence in the presence of spot welds. Despite different lighting conditions and irregular spot weld shapes, both DL models effectively detect spot welds, albeit with slight differences in the confidence level, and they can accurately distinguish similar round spot welds. For tasks involving the detection of dense, small spot welds, the WSDDM outperforms YOLOv8 in terms of detection performance and robustness across different spot weld sizes, with fewer missed or false detections. The model’s dual, cross-scale feature fusion strategy increases sensitivity to detailed features, which is helpful in detecting small spot welds.
Using the WSDDM method described in the previous chapter, each weld spot in the image was located and then cropped and saved, as shown in Figure 8. It provides an organized dataset of spot welds captured under real production conditions, categorized by various defect types as well as normal spots. Meanwhile, each defect type and normal class are represented by multiple images, which helps in creating a balanced dataset. For training object detection models, especially in tasks like small object detection, a balanced dataset with representative defect variations is crucial to achieve high accuracy and reduce model bias. Each image is annotated with labels indicating both class type and an ID number, which may refer to specific samples. This detailed annotation can aid in supervised learning, as the model can be trained to detect both the location and type of defect.
(1)
Multi-Class Detection: The dataset structure supports multi-class defect detection, which is ideal for models like YOLO and Faster R-CNN that are capable of handling multiple classes.
(2)
Class Imbalance Consideration: If any defect type is underrepresented, it may lead to class imbalance, affecting the model’s ability to generalize across all defect types. Techniques like data augmentation, synthetic image generation, or class-weighted loss functions could mitigate this.
(3)
Small Object Detection: Given the small size of the welds, this dataset will test the model's ability to detect small objects, an area where Feature Pyramid Networks (FPNs) and attention mechanisms often prove beneficial.

4.4. Data Augmentation and Labeling

Before training the defect weld spot generation model, the cropped weld images were resized to 64 × 64 pixels and normalized to the range [−1,1]. The dataset includes normal welds and six types of weld defects, with a highly imbalanced class distribution. After simple affine transformations were applied to augment the data, the class distribution for the training, validation and testing sets is shown in Table 2. The defect weld spot generation model was trained using the Adam optimizer, with a learning rate of 0.002, and momentum parameters β1 and β2 set to 0.5 and 0.9, respectively. The batch size was 128, and the latent vector dimension was set to 128. For comparison, the ACGAN [37], cWGAN-GP [38], and BAGAN [39] models were trained on the same dataset, with hyperparameters largely kept consistent. Currently, there is a lack of standardized and accurate methods for evaluating generative model performance [40]. Therefore, this study uses both qualitative and quantitative methods to evaluate the performance of GAN models.
The qualitative evaluation assesses the performance of various GAN models by comparing the quality of generated weld spot image samples. This approach mainly relies on a visual assessment, focusing on the quality of selected images while potentially overlooking issues like overfitting and diversity. Figure 8 presents random samples generated by ACGAN, cWGAN-GP, BAGAN, and the proposed defect weld spot generation model, encompassing defect types such as “edge welding”, “overlap”, “cold solder joint”, “burr”, “distortion”, and “normal”.
Due to the high representation of normal weld samples in the dataset, each GAN model was able to generate high-quality images of normal welds. However, ACGAN and cWGAN-GP struggled with minority class defects, often producing images belonging to unintended classes, like “overlap” or “burn-through”. This limitation stems from the discriminator output structure and loss functions of these models, which are less effective for learning minority classes, making it challenging for the generator to synthesize accurate samples for these classes.
Both BAGAN and the defect weld spot generation model, however, effectively generated minority defect welds. Yet, the latent vectors produced by the AE model used to initialize BAGAN did not adequately differentiate between weld classes, complicating its training process. When weld images from different classes appeared visually similar, BAGAN struggled to generate images matching the intended class, such as “distortion” and “normal”. For classes with very few samples, like "burn-through," BAGAN also failed to produce correct images. Additionally, BAGAN required an extensive fine-tuning of its network structure and hyperparameters. By contrast, the proposed defect weld spot generation model accurately generated images of specific defect types, demonstrating that it can effectively achieve data augmentation for defect classes, even when defect samples are limited.
Figure 9 illustrates the labeling process used in a deep learning dataset, where each image sample of weld spots is annotated with a classification label to differentiate between normal and defective spots. The left section displays various weld spot images grouped by visual characteristics, such as shape and condition, while the right section lists file names with corresponding binary labels, where '0' denotes a normal weld spot, and '1' represents a defective one.
The generated synthetic defect samples were used in the training of YOLOv8 as part of data augmentation, which can be considered a data preprocessing step. To evaluate the effectiveness of this approach, we conducted additional experiments. Without incorporating augmented data, the model's training performance was suboptimal, lacking a comparative and reference value. Thus, in addressing the challenges of small-sample object detection and severe class imbalance between positive and negative samples, this approach is essential.

4.5. Evaluation and Analysis

To objectively assess the performance of models used for detecting welding spot defects, several evaluation metrics are employed. These include GFLOPS (Giga Floating-point Operations Per Second), which quantifies the model’s execution efficiency by measuring the number of floating-point operations performed per second. Model parameters are also considered to assess the model’s size and complexity. Additionally, FPS (Frames Per Second) is used to gauge detection speed by recording the number of frames processed per second. Model accuracy is assessed using mean Average Precision (mAP), calculated as shown in Equation (4). The F1-score, representing a weighted average of precision and recall, is used to evaluate the model’s overall performance and stability, with its calculation given in Equation (5).
m A P = P A N
F 1 S = 2 × P × R P + R
In Equation (4), N denotes the total number of categories. PA represents the area beneath the precision–recall curve, where recall values are plotted on the x-axis and precision values on the y-axis. The metric [email protected] denotes the mean Average Precision at a CIoU threshold of 0.5. In Equation (5), precision measures the model’s ability to distinguish between positive and negative samples, with higher precision indicating an improved distinction of negative samples. S represents score, P means precision, and R represents recall rate. Recall assesses the model’s effectiveness in identifying positive samples, where higher recall signifies a more accurate detection of positive instances. The F1-S provides a balance between precision and recall, where a higher F1-S reflects a model with stronger overall performance and robustness.
The impact of integrating different attention modules on the model’s detection accuracy is presented in Table 3. These experiments were conducted on the YOLOv8 model, in which the neck structure was reconfigured as shown in Figure 3. From the results in Table 3, it is evident that adding different attention modules slightly increases the computational load and parameter count. The model’s [email protected] value demonstrates some variability, where the addition of the channel attention module alone produces a moderate improvement in [email protected]; however, in specific instances, a reduction in accuracy is also observed.
When combining both channel and spatial attention within the CBAM module in YOLOv8, the network achieves further accuracy gains over the configuration with channel attention alone, with an increase of up to 0.9%. This suggests that spatial attention further enhances detection accuracy. Notably, the spatial attention mechanism in MC-FPN yields an impressive accuracy of 90.13%, representing a substantial improvement rate of 1.55%, while also providing faster detection speeds than the configuration with the CBAM module.
Based on the preceding data, we reconstruct the neck structure of YOLOv8 into MC-FPN, resulting in a 32% reduction in the model’s parameter count and a 12.5% decrease in computational complexity. Additionally, model accuracy, measured by [email protected], improved by 1.9%. Replacing SPPF with SimSPPF had minimal effects on model accuracy, parameter count, and computational complexity, although it increased detection speed by 2 frames per second (FPS). The introduction of the Task-Aligned head module, while slightly increasing the parameter count and computational complexity, contributed an additional 1.3 percentage points in [email protected] accuracy. Combining MC-FPN with the SPPF module improved model accuracy while reducing both computational complexity and parameter count. Overall, the enhanced YOLOv8 version, integrating MC-FPN, SimSPPF, and TA-Head modules, outperformed the baseline YOLOv8 model in terms of detection accuracy, computational complexity, and parameter count. On the same dataset, the modified YOLOv8 achieved a 3.3% increase in [email protected], a 25% reduction in parameter count, and over 10% decrease in computational complexity.
Figure 10 shows the validation sample results for model testing. This figure includes the detection results of weld spot samples and the corresponding heat map visualization. In the heat maps, red areas represent regions identified by the model as defective, while blue areas indicate normal regions. This color contrast provides an intuitive way to observe the areas the model focuses on, especially in response to defects. This indicates the model’s high accuracy in detecting even minor defect locations. Additionally, the heat maps from different weld spot images demonstrate strong generalization ability, making the model suitable for various weld surface conditions. The middle five heat maps represent different feature extraction stages. This visualization demonstrates that, under a multi-cross layer network feature extraction and fusion approach, as the extracted feature information progresses to higher levels, the confidence increases gradually from low to high. The model successfully identifies and concentrates on weld spot defects, such as the edges of the welds and areas where cracks or depressions might be present. The model effectively captures the location of welding defects through highlighted (green) regions.
In practical production, some welding spots may be overlooked during manual inspection. To assess the model’s generalization capability, specific weld point samples were intentionally left unannotated before training. The trained model is subsequently used to predict these samples, and the results demonstrate that it effectively identifies the missed welding spots, as shown in Figure 11.

4.6. Experimental Pipeline and Detection Systems

Based on the methods and related technologies discussed in previous sections, we finally developed a Body-in-White welding spot quality detection system, integrating sensing technology, deep learning detection techniques, and communication technology. The system’s architecture, shown in Figure 12, includes multiple modules: a data acquisition module, a vision inspection module, and a data management module. The system has been deployed on an actual automotive body welding production line, verifying the practicality and effectiveness of each module.
Data Acquisition Module: This system was validated on a production line in the welding workshop of an automotive company. The line includes four welding stations, each equipped with two FANUC robots, with welding guns powered by servo motors. Equipment on the automated line is integrated into the field PLC, ensuring high-reliability welding control.
Vision Inspection Module: This module utilizes pretrained model weights to evaluate welding spot quality. By implementing the vision inspection algorithms proposed in this study and integrating real-time welding parameter monitoring, this setup ensures adaptability under data constraints, leveraging deep learning visual models to enhance the accuracy and efficiency of weld quality inspection.
Data Management Module: Comprising a data visualization module, storage and analysis module, and alert module, the data management component serves as the user interface for interacting with the system. It allows users to query relevant information, including the dynamic visualization of data processing, the real-time display of weld spot data, historical data queries, and quality alert notifications. The system provides feedback to the user on the quality of each weld spot on the body, facilitating the further analysis and optimization of the welding process. Additionally, users can review weld quality based on alerts and optimize model learning through continuous quality feedback.

5. Conclusions and Future Works

In conclusion, this paper introduces a novel TA-YOLOv8-based model, termed the Weld Spot Defect Detection Model (WSDDM), for the detection and classification of welding spot defects in car body production. Our proposed model effectively addresses challenges posed by small sample sizes, class imbalance, and noisy data, often encountered in Body-in-White welding spot datasets. The TA-YOLOv8 network employs a Task-Aligned (TA) detection head and a cross-scale feature fusion strategy, which significantly enhances sensitivity to fine-grained features, facilitating the detection of small, dense welding spots with high accuracy. The experimental results demonstrate that the WSDDM achieves a detection speed improvement of 2.8% over YOLOv8 and an mAP@50-95 increase of 1.35%, along with a 0.1226 rise in the recall rate. Furthermore, the WSDDM's lightweight DarkNet53 backbone enables efficient, accurate real-time processing suitable for industrial applications. A comparative analysis shows that while Faster R-CNN achieved the highest detection accuracy, it came at a substantial time cost, whereas the WSDDM maintained high detection precision at a fraction of the computational expense. The model's robustness across varying weld spot shapes and lighting conditions underscores its generalization capability, making it a viable and reliable tool for automated welding spot quality inspection in real-world production environments.
To address the issue of the loss of feature information for spot welding during feed forward convolution computations, this paper improves the YOLOv8 model from the perspective of the backbone network, aiming to enhance the detection accuracy and speed of the small-target welding spot. A defective welding spot generation model is established based on autoencoder initialization and the theory of balanced generative adversarial networks, facilitating accurate data augmentation in imbalanced datasets. Finally, the effectiveness of the proposed method is validated using an image dataset collected from the field. The main conclusions are as follows:
(1)
The proposed small-target solder joint detection model outperforms several typical object detection algorithms in terms of mean Average Precision (mAP) and detection speed. The cross-layer feature fusion structure effectively integrates and refines multi-scale features, utilizing robust low-level features to improve the detection accuracy of small-scale targets. The lightweight network DarkNet53 enhances inference speed while maintaining accuracy.
(2)
Qualitative and quantitative comparative analysis with typical GAN models indicates that the proposed defect welding spot generation model can produce high-quality and diverse defect images in highly imbalanced datasets.
(3)
The ablation experiment results demonstrate that the Task-Aligned method, data augmentation, and feature pre-training all contribute to the deep learning model’s ability to learn discriminative features from solder joint images. For the common weld spots quality categories on the white body production line, the performance of the proposed TA-YOLOv8 model generally surpasses that of other foundational visual models.
In deep visual models, false positives (FPs) and false negatives (FNs) are common issues when evaluating performance. To analyze and explain the model's failure in such cases, a systematic experimental analysis is often required. The following are steps and methods for analyzing and explaining the reasons behind false positives and false negatives, including Error Classification Analysis, Model Output Analysis and case studies. By following these steps, we are able to conduct an in-depth analysis of false positives and false negatives in the model, understand their root causes, and improve the model's performance. Through dataset analysis, visualization, model architecture optimization, and other methods, the model's weaknesses can be identified, providing a foundation for further improvements. In summary, a thorough analysis of the model's failure cases helps to quantify its limitations and guide subsequent optimization efforts.
Future work to further improve the application of the YOLOv8 algorithm in the field of solder joint defect detection could include the following directions:
(1)
Enhanced Multi-Scale Feature Fusion: This involves the further optimization of cross-layer feature fusion mechanisms, utilizing higher-resolution low-level features to improve the accuracy of small defect detection. Improvements in the selection and fusion strategy of high-level features could also enhance the model's robustness in complex backgrounds.
(2)
Efficient Lightweight Network Architectures: Research into more efficient lightweight network architectures, such as hybrid convolutions or dynamic convolutions, could reduce inference time and computational cost while maintaining detection accuracy, making the model more suitable for resource-constrained, real-time detection tasks.
(3)
Handling Imbalanced and Small Sample Datasets: Industrial applications often face imbalanced datasets. Future work could focus on some other generative networks and transfer learning to augment defect data. By generating diverse defect images, these techniques could alleviate small-sample issues and improve model generalization.
These improvements would help further enhance the applicability and accuracy of YOLOv8 in welding spot defect detection, making it better suited to meet the demands of industrial production environments.

Author Contributions

Conceptualization, W.L. and J.H.; methodology, W.L., M.J. and S.Z. (Shuo Zhang); data collection, W.L., J.Q., and J.H.; writing—original draft preparation, S.Z. (Shuo Zhang) and W.L.; writing—review and editing, J.Q.; supervision, M.J.; validation, W.L. and S.Z. (Siyu Zhu); funding acquisition, J.H. All authors have read and agreed to the published version of the manuscript.

Funding

This research is supported by National Natural Science Foundation of China (52475270, 52035007, U23B20102), and Xie Youbai Design Scientific Research Foundation (XYB-DS-202401).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors on request.

Conflicts of Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

References

  1. Kah, P.; Rajan, R.; Martikainen, J.; Suoranta, R. Investigation of Weld Defects in Friction-Stir Welding and Fusion Welding of Aluminium Alloys. J. Mater. Sci. Mater. Eng. 2015, 10, 26. [Google Scholar] [CrossRef]
  2. Den Uijl, N.; Moolevliet, T.; Mennes, A.; Van Der Ellen, A.A.; Smith, S.; Van Der Veldt, T.; Okada, T.; Nishibata, H.; Uchihara, M.; Fukui, K. Performance of Resistance Spot-Welded Joints in Advanced High-Strength Steel in Static and Dynamic Tensile Tests. Weld. World 2012, 56, 51–63. [Google Scholar] [CrossRef]
  3. Chao, Y.J. Ultimate Strength and Failure Mechanism of Resistance Spot Weld Subjected to Tensile, Shear, or Combined Tensile/Shear Loads. J. Eng. Mater. Technol. 2003, 125, 125–132. [Google Scholar] [CrossRef]
  4. Duda, R.O.; Hart, P.E. Use of the Hough Transformation to Detect Lines and Curves in Pictures. Commun. ACM 1972, 15, 11–15. [Google Scholar] [CrossRef]
  5. Oualid, D.; Khadidja, M.; Kahina, A. Incremental Circle Hough Transform: An Improved Method for Circle Detection. Optik 2017, 133, 17–31. [Google Scholar] [CrossRef]
  6. Liang, Q.K.; Long, J.Y.; Nan, Y.; Coppola, G.; Zou, K.L.; Zhang, D.; Sun, W. Angle Aided Circle Detection Based on Randomized Hough Transform and Its Application in Welding Spots Detection. Math. Biosci. Eng. MBE 2019, 16, 1244–1257. [Google Scholar] [CrossRef]
  7. Ronghua, C.; Yoshiaki, O. Numerical Analysis of Freezing Controlled Penetration Behavior of the Molten Core Debris in an Instrument Tube with MPS. Ann. Nucl. Energy 2014, 71, 322–332. [Google Scholar] [CrossRef]
  8. Cho, W.I.; Woizeschke, P. Analysis of Molten Pool Dynamics in Laser Welding with Beam Oscillation and Filler Wire Feeding. Int. J. Heat Mass Transf. 2021, 164, 120623. [Google Scholar] [CrossRef]
  9. Ruisz, J.; Biber, J.; Loipetsberger, M. Quality Evaluation in Resistance Spot Welding by Analysing the Weld Fingerprint on Metal Bands by Computer Vision. Int. J. Adv. Manuf. Technol. 2007, 33, 952–960. [Google Scholar] [CrossRef]
  10. Myrach, P.; Jonietz, F.; Meinel, D.; Suwala, H.; Ziegler, M. Calibration of Thermographic Spot Weld Testing with X-Ray Computed Tomography. Quant. InfraRed Thermogr. J. 2017, 14, 122–131. [Google Scholar] [CrossRef]
  11. Dell’Avvocato, G.; Palumbo, D. Thermographic Procedure for the Assessment of Resistance Projection Welds (RPW): Investigating Parameters and Mechanical Performances. J. Adv. Join. Process. 2024, 9, 100177. [Google Scholar] [CrossRef]
  12. Valentin, P.; Kounalakis, T.; Nalpantidis, L. Weld Classification Using Gray Level Co-Occurrence Matrix and Local Binary Patterns. In Proceedings of the 2018 IEEE International Conference on Imaging Systems and Techniques (IST), Krakow, Poland, 16–18 October 2018; pp. 1–6. [Google Scholar]
  13. Javaheri, E.; Kumala, V.; Javaheri, A.; Rawassizadeh, R.; Lubritz, J.; Graf, B.; Rethmeier, M. Quantifying Mechanical Properties of Automotive Steels with Deep Learning Based Computer Vision Algorithms. Metals 2020, 10, 163. [Google Scholar] [CrossRef]
  14. Alghannam, E.; Lu, H.; Ma, M.; Cheng, Q.; Gonzalez, A.A.; Zang, Y.; Li, S. A Novel Method of Using Vision System and Fuzzy Logic for Quality Estimation of Resistance Spot Welding. Semantic Scholar. Symmetry 2019, 11, 990. [Google Scholar] [CrossRef]
  15. Younes, D.; Alghannam, E.; Tan, Y.; Lu, H. Enhancement in Quality Estimation of Resistance Spot Welding Using Vision System and Fuzzy Support Vector Machine. Symmetry 2020, 12, 1380. [Google Scholar] [CrossRef]
  16. Iandola, F.N.; Han, S.; Moskewicz, M.W.; Ashraf, K.; Dally, W.J.; Keutzer, K. SqueezeNet: AlexNet-Level Accuracy with 50x Fewer Parameters and <0.5MB Model Size. arXiv 2016, arXiv:1602.07360. [Google Scholar]
  17. He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. arXiv 2015, arXiv:1512.03385. [Google Scholar]
  18. Li, Y.; Huang, H.; Xie, Q.; Yao, L.; Chen, Q. Research on a Surface Defect Detection Algorithm Based on MobileNet-SSD. Appl. Sci. 2018, 8, 1678. [Google Scholar] [CrossRef]
  19. Girshick, R. Fast R-CNN. In Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 13–16 December 2015; pp. 1440–1448. [Google Scholar]
  20. Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
  21. Kim, S.; Hwang, I.; Kim, D.Y.; Kim, Y.M.; Kang, M.; Yu, J. Weld-Quality Prediction Algorithm Based on Multiple Models Using Process Signals in Resistance Spot Welding. Metals 2021, 11, 1459. [Google Scholar] [CrossRef]
  22. Liu, W.; Hu, J.; Qi, J. Resistance Spot Welding Defect Detection Based on Visual Inspection: Improved Faster R-CNN Model. Machines 2025, 13, 33. [Google Scholar] [CrossRef]
  23. Wang, Z.; Li, T. A Lightweight CNN Model Based on GhostNet. Comput. Intell. Neurosci. 2022, 2022, 8396550. [Google Scholar] [CrossRef]
  24. Adarsh, P.; Rathi, P.; Kumar, M. YOLO V3-Tiny: Object Detection and Recognition Using One Stage Improved Model. In Proceedings of the 2020 6th International Conference on Advanced Computing and Communication Systems (ICACCS), Coimbatore, India, 6–7 March 2020; pp. 687–694. [Google Scholar]
  25. Safaldin, M.; Zaghden, N.; Mejdoub, M. An Improved YOLOv8 to Detect Moving Objects. IEEE Access 2024, 12, 59782–59806. [Google Scholar] [CrossRef]
  26. Wang, X.; Gao, H.; Jia, Z.; Li, Z. BL-YOLOv8: An Improved Road Defect Detection Model Based on YOLOv8. Sensors 2023, 23, 8361. [Google Scholar] [CrossRef] [PubMed]
  27. Wang, J.; Zhao, H. Improved YOLOv8 Algorithm for Water Surface Object Detection. Sensors 2024, 24, 5059. [Google Scholar] [CrossRef] [PubMed]
  28. Wu, T.; Dong, Y. YOLO-SE: Improved YOLOv8 for Remote Sensing Object Detection and Recognition. Appl. Sci. 2023, 13, 12977. [Google Scholar] [CrossRef]
  29. Zhu, X.; Lyu, S.; Wang, X.; Zhao, Q. TPH-YOLOv5: Improved YOLOv5 Based on Transformer Prediction Head for Object Detection on Drone-Captured Scenarios. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021. [Google Scholar]
  30. Feng, C.; Zhong, Y.; Gao, Y.; Scott, M.R.; Huang, W. TOOD: Task-Aligned One-Stage Object Detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 10–17 October 2021. [Google Scholar]
  31. Neubeck, A.; Van Gool, L. Efficient Non-Maximum Suppression. In Proceedings of the 18th International Conference on Pattern Recognition (ICPR'06), Hong Kong, China, 20–24 August 2006. [Google Scholar]
  32. Wang, X.; Song, J. ICIoU: Improved Loss Based on Complete Intersection Over Union for Bounding Box Regression. IEEE Access 2021, 9, 105686–105695. [Google Scholar] [CrossRef]
  33. Redmon, J.; Farhadi, A. YOLO9000: Better, Faster, Stronger. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017. [Google Scholar]
  34. Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. SSD: Single Shot MultiBox Detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016; Springer International Publishing: Berlin/Heidelberg, Germany, 2016. [Google Scholar]
  35. Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 39, 1137–1149. [Google Scholar] [CrossRef]
  36. Lin, T.-Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal Loss for Dense Object Detection. IEEE Trans. Pattern Anal. Mach. Intell. 2018, 42, 318–327. [Google Scholar] [CrossRef] [PubMed]
  37. Mirza, M.; Osindero, S. Conditional Generative Adversarial Nets. arXiv 2014, arXiv:1411.1784. [Google Scholar]
  38. Zheng, M.; Li, T.; Zhu, R.; Tang, Y.; Tang, M.; Lin, L.; Ma, Z. Conditional Wasserstein Generative Adversarial Network-Gradient Penalty-Based Approach to Alleviating Imbalanced Data Classification. Inf. Sci. 2020, 512, 1009–1023. [Google Scholar] [CrossRef]
  39. Xu, Q.; Huang, G.; Yuan, Y.; Guo, C.; Sun, Y.; Wu, F.; Weinberger, K. An Empirical Study on Evaluation Metrics of Generative Adversarial Networks. arXiv, 2018; arXiv:1806.07755. [Google Scholar]
  40. Borji, A. Pros and Cons of GAN Evaluation Measures. arXiv 2018, arXiv:1806.07755. [Google Scholar] [CrossRef]
  41. Wang, Y.; Wang, H.; Xin, Z. Efficient Detection Model of Steel Strip Surface Defects Based on YOLO-V7. IEEE Access 2022, 10, 133936–133944. [Google Scholar] [CrossRef]
  42. Kaiming, H.; Xiangyu, Z.; Shaoqing, R.; Jian, S. Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 37, 1904–1916. [Google Scholar]
  43. Li, C.; Li, L.; Jiang, H.; Weng, K.; Geng, Y.; Li, L.; Ke, Z.; Li, Q.; Cheng, M.; Nie, W.; et al. YOLOv6: A Single-Stage Object Detection Framework for Industrial Applications. arXiv 2022, arXiv:2209.02976. [Google Scholar]
  44. Zhu, L.; Wang, X.; Ke, Z.; Zhang, W.; Lau, R. BiFormer: Vision Transformer with Bi-Level Routing Attention. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 17–24 June 2023. [Google Scholar]
Figure 1. Architecture of YOLOv8 model. The different color parts of the input batches represent different image data. The different color parts of the architecture represent different function modules.
Figure 1. Architecture of YOLOv8 model. The different color parts of the input batches represent different image data. The different color parts of the architecture represent different function modules.
Applsci 15 02931 g001
Figure 2. Improved backbone network of our architecture. On the left is a schematic diagram of the backbone network process for detecting spot weld images, while the right side shows the corresponding structure layer parameters and related information. The different color parts are the same as Figure 1.
Figure 2. Improved backbone network of our architecture. On the left is a schematic diagram of the backbone network process for detecting spot weld images, while the right side shows the corresponding structure layer parameters and related information. The different color parts are the same as Figure 1.
Applsci 15 02931 g002
Figure 3. Proposed Multiple Cross-Layer FPN (MC-FPN) network. The different color dotted lines represent multiple cross layers, with P2-P5 being simplified representations of the intermediate connection layers.
Figure 3. Proposed Multiple Cross-Layer FPN (MC-FPN) network. The different color dotted lines represent multiple cross layers, with P2-P5 being simplified representations of the intermediate connection layers.
Applsci 15 02931 g003
Figure 4. Task-Aligned head structure: to learn extensive task-interactive features from multiple convolutional layers.
Figure 4. Task-Aligned head structure: to learn extensive task-interactive features from multiple convolutional layers.
Applsci 15 02931 g004
Figure 5. Welding spots sample images and annotated data in Body-in-White production lines. (a) shows the samples we collected in the production lines, while (b) shows the pretraining dataset and the labels (yellow squares).
Figure 5. Welding spots sample images and annotated data in Body-in-White production lines. (a) shows the samples we collected in the production lines, while (b) shows the pretraining dataset and the labels (yellow squares).
Applsci 15 02931 g005aApplsci 15 02931 g005b
Figure 6. Performance comparison with typical object detection algorithms on test set.
Figure 6. Performance comparison with typical object detection algorithms on test set.
Applsci 15 02931 g006
Figure 7. Some results of WSDDM and comparison between small welding spot detection models. We use green to represent the detected weld spots are normal, and red to represent the detected weld spots have defects or abnormalities.
Figure 7. Some results of WSDDM and comparison between small welding spot detection models. We use green to represent the detected weld spots are normal, and red to represent the detected weld spots have defects or abnormalities.
Applsci 15 02931 g007
Figure 8. The weld spot dataset obtained from image segmentation using the WSDDM.
Figure 8. The weld spot dataset obtained from image segmentation using the WSDDM.
Applsci 15 02931 g008
Figure 9. Data augmentation and labeling.
Figure 9. Data augmentation and labeling.
Applsci 15 02931 g009
Figure 10. Visualization and validation sample results for model testing. The model effectively captures the location of welding defects through highlighted (green) regions.
Figure 10. Visualization and validation sample results for model testing. The model effectively captures the location of welding defects through highlighted (green) regions.
Applsci 15 02931 g010
Figure 11. Validation sample results for model generalization ability.
Figure 11. Validation sample results for model generalization ability.
Applsci 15 02931 g011
Figure 12. Experimental pipeline and integrated detection systems.
Figure 12. Experimental pipeline and integrated detection systems.
Applsci 15 02931 g012
Table 1. Configuration and learning environment.
Table 1. Configuration and learning environment.
Basic Parameter and HyperparametersValue
Operating systemUbuntu 22.04
Deep learning frameworkPyTorch
Programming languagePython3.11
CPUIntel(R) Xeon(R)
GPURTX 3090ti
RAM100 GB
Data size1024 × 1024
Batch size64
Epoch500
Momentum0.845
Learning rate0.002
Table 2. Dataset of welding spots.
Table 2. Dataset of welding spots.
ClassNormalEdge WeldOverlapCold WeldBurrDistortion
training dataset2250117024005107801650
validation dataset2251172405178165
testing dataset2341532564559152
Table 3. Comparison of the performance with different modules.
Table 3. Comparison of the performance with different modules.
Models[email protected]Parameters (M)GFLOPsFPS
YOLOv5 [29]79.6513.3324.6135
YOLOv5 + SPPF81.2113.3524.6130
YOLOv7 [41]84.8712.5628.2105
YOLOv8 + SPP [42]83.4212.7727.3162
YOLOv8 + SPPF83.5012.8727.3104
YOLOv8 + SimSPPF [43]83.1812.7427.2121
YOLOv8 + BiFPN88.909.6528.8118
YOLOv8 + SE89.327.4030.197
YOLOv8 + CBAM90.557.3925.690
YOLOv8 + EMA89.308.2825.268
YOLOv8 + Biform [44]91.38.1325.759
YOLOv8 + LSK attention91.110.1526.073
WSDDM90.1311.3925.898
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Liu, W.; Jia, M.; Zhang, S.; Zhu, S.; Qi, J.; Hu, J. A Lightweight TA-YOLOv8 Method for the Spot Weld Surface Anomaly Detection of Body in White. Appl. Sci. 2025, 15, 2931. https://doi.org/10.3390/app15062931

AMA Style

Liu W, Jia M, Zhang S, Zhu S, Qi J, Hu J. A Lightweight TA-YOLOv8 Method for the Spot Weld Surface Anomaly Detection of Body in White. Applied Sciences. 2025; 15(6):2931. https://doi.org/10.3390/app15062931

Chicago/Turabian Style

Liu, Weijie, Miao Jia, Shuo Zhang, Siyu Zhu, Jin Qi, and Jie Hu. 2025. "A Lightweight TA-YOLOv8 Method for the Spot Weld Surface Anomaly Detection of Body in White" Applied Sciences 15, no. 6: 2931. https://doi.org/10.3390/app15062931

APA Style

Liu, W., Jia, M., Zhang, S., Zhu, S., Qi, J., & Hu, J. (2025). A Lightweight TA-YOLOv8 Method for the Spot Weld Surface Anomaly Detection of Body in White. Applied Sciences, 15(6), 2931. https://doi.org/10.3390/app15062931

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop