CN118674985B

CN118674985B - Noise-resistant weld feature recognition method and system based on lightweight neural network

Info

Publication number: CN118674985B
Application number: CN202410804228.XA
Authority: CN
Inventors: 杜付鑫; 苏富康; 董宗峰; 张开旭; 田连发; 何为凯; 陈超
Original assignee: Shandong University; Shandong Jiaotong University; MH Robot and Automation Co Ltd
Current assignee: Shandong University; Shandong Jiaotong University; MH Robot and Automation Co Ltd
Priority date: 2024-06-21
Filing date: 2024-06-21
Publication date: 2025-01-28
Anticipated expiration: 2044-06-21
Also published as: CN118674985A

Abstract

The present invention relates to the field of weld recognition technology, and specifically discloses a noise-resistant weld feature recognition method and system based on a lightweight neural network, including: acquiring welding image data during the welding process; inputting the acquired welding image data into a trained weld feature recognition model, and outputting the type and position information of the weld; wherein the weld feature recognition model includes a backbone network, a neck layer, and a head layer; the backbone network outputs feature information of different scales, and transmits the feature information to the neck layer; the neck layer fuses feature information of different scales, and the fused feature map is input into the head layer; the head layer includes three branches, namely, a heat map branch, an offset branch, and a classification branch, and finally predicts the position and type of the weld. The present invention is used to accurately classify and locate weld feature points using single-line structured light vision, while reducing the parameters and calculation amount of the model, and meeting the real-time requirements of welding tracking systems in the industrial field.

Description

Noise-resistant weld joint feature recognition method and system based on lightweight neural network

Technical Field

The invention relates to the technical field of weld feature recognition processing, in particular to an anti-noise weld feature recognition method and system based on a lightweight neural network.

Background

The statements in this section merely provide background information related to the present disclosure and may not necessarily constitute prior art.

With the wide application of robots in the field of industrial manufacturing, a welding method based on teaching and reproduction has been developed. Traditional robot welding relies on a worker to manually set the position track of a welding gun, is inefficient, and is susceptible to time-varying dimensional errors (e.g., thermal deformations) of the workpiece. In order to track the weld in real time, and realize automatic adjustment of the welding robot, researchers have designed some advanced sensors to acquire spatial position information and category information of the weld. With the rapid development of machine vision, structured light vision has been widely used in intelligent robots with the advantages of high accuracy and providing rich welding process feature information.

The characteristics of the single line structured light projection fringes of different welds vary widely. This discrepancy is valuable for obtaining positional information about the weld feature points and also facilitates classification of the fringes. This classification is critical to effectively adjusting the welding process parameters. However, noise common to welding processes, including spatter, smoke, and arc reflected light, can be confused with streaks collected by industrial cameras. Morphological-based methods have been proposed to achieve extraction of weld features. Morphology-based studies have two common problems. First, feature point (or region) extraction and weld classification typically require two stages. On the other hand, although these models have a certain noise immunity, in the case where noise is continuously high, tracking accuracy thereof is lowered.

With the development of deep learning technology in the field of computer vision, object detection and semantic segmentation processes are improved and enhanced under the background of improving the hardware performance of computing equipment and providing a Convolutional Neural Network (CNN) model. These tasks are often transferred to seam feature extraction tasks using line structured light to help create an efficient feature extraction anti-noise model. For example, a YOLO-WELD model is built on the basis of YOLOv for detecting weld feature points. And methods of extracting multi-layer multi-pass weld feature points using a conditional generation countermeasure network (CGAN) and an improved CNN model.

However, these neural network-based methods require a large amount of computation resources to be allocated, which in turn requires a higher computational performance of the central control device integrated into the welding robot system, and there is typically no independent GPU computing unit on the industrial computer, thus lacking GPU acceleration support, and thus failing to meet the requirements of welding instantaneity.

Disclosure of Invention

In order to solve the problems, the invention provides an anti-noise weld feature recognition method and system based on a lightweight neural network, which are based on the lightweight neural network design, have stronger anti-interference capability and simultaneously maintain higher reasoning frequency so as to meet the real-time requirement of a welding tracking system in the industrial field.

In some embodiments, the following technical scheme is adopted:

an anti-noise weld feature recognition method based on a lightweight neural network, comprising:

acquiring welding image data in a welding process;

inputting the obtained welding image data into a trained welding line characteristic recognition model, and outputting welding line type and position information;

The weld joint feature recognition model comprises a main network, a neck layer and a head layer, wherein the main network outputs feature information of different scales and transmits the feature information to the neck layer, the neck layer fuses the feature information of different scales, the fused feature images are input into the head layer, the head layer comprises three branches, namely a heat image branch, an offset branch and a classification branch, the heat image branch is used for outputting heat image tensors to obtain the probability that each region contains feature points, the offset branch is used for outputting offset tensors to obtain the position deviation of feature points generated by each dimension, the classification branch is used for outputting classification tensors to obtain the corresponding score of each weld joint type, and finally the position of a weld joint and the weld joint type are predicted.

In other embodiments, the following technical solutions are adopted:

an anti-noise weld feature recognition system based on a lightweight neural network, comprising:

the data acquisition module is used for acquiring welding image data in the welding process;

The weld joint prediction module is used for inputting the acquired welding image data into a trained weld joint feature recognition model and outputting the type and position information of the weld joint;

In other embodiments, the following technical solutions are adopted:

The terminal equipment comprises a processor and a memory, wherein the processor is used for realizing instructions, and the memory is used for storing a plurality of instructions, and the instructions are suitable for being loaded by the processor and executing the anti-noise weld joint characteristic identification method based on the lightweight neural network.

In other embodiments, the following technical solutions are adopted:

A computer readable storage medium having stored therein a plurality of instructions adapted to be loaded by a processor of a terminal device and to perform the above-described anti-noise weld feature recognition method based on a lightweight neural network.

Compared with the prior art, the invention has the beneficial effects that:

(1) The invention constructs a MobileNetv light neural network structure, which is used for precisely classifying and positioning weld characteristic points by utilizing single-line structured light vision, and simultaneously reduces parameters and calculation amount of a model, so that the model can be deployed on a computing platform which cannot obtain GPU acceleration, such as an Embedded Industrial Computer (EIC), and simultaneously maintains higher reasoning frequency so as to meet the real-time requirement of a welding tracking system in the industrial field.

(2) The neural network structure uses DP (depth separable convolution (DEPTHWISE SEPARABLE CONVOLUTION)) blocks to replace a standard convolution layer in a neck layer and a head layer, and the DP blocks decompose standard convolution operation into depth convolution and point-by-point convolution, so that the parameter number and the calculation amount of a model are obviously reduced, the memory occupied by the model in operation is correspondingly reduced due to the reduction of the parameter and the calculation amount, meanwhile, the training and reasoning speed of the model is generally faster, the model is more easily adapted to different hardware platforms, better light-weight performance can be obtained, the real-time requirement is met, and the deployment cost is reduced.

(3) The neural network structure of the invention adds a Cascade Channel Attention Module (CCAM) at the neck layer, adopts the CCAM to filter noise, and simultaneously adopts a cascade group structure, thereby reducing the calculated amount under the condition of almost not losing the precision.

(4) The head layer outputs a heat map tensor, an offset tensor and a classification tensor respectively, first two partitions with highest predictive scores in the model identification heat map tensor are obtained, then the data are combined with corresponding segments on the offset tensor to determine a final weld positioning prediction result, the heat map tensor is usually used for representing the position of a weld in an image, the offset tensor can provide finer position adjustment, and the weld can be positioned more accurately by combining the two information.

Additional features and advantages of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.

Drawings

FIG. 1 is a schematic diagram of a weld feature recognition model in an embodiment of the invention;

FIG. 2 is a schematic diagram of a DP block structure in an embodiment of the invention;

FIG. 3 (a) is an original schematic diagram of a feature pyramid network structure according to an embodiment of the present invention;

FIG. 3 (b) is a schematic diagram of an improved structure of a feature pyramid network according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of a stage communication channel attention module according to an embodiment of the present invention;

FIG. 5 is a diagram showing the recognition effect of feature points after optimizing the pooling filter according to the embodiment of the present invention;

FIG. 6 (a) is a confusion matrix obtained by the model of the embodiment of the invention on the adjusted test set;

FIG. 6 (b) is a confusion matrix obtained by YOLOv n model on the adjusted test set;

FIG. 6 (c) is a confusion matrix obtained by the DETR model on the adjusted test set;

FIG. 7 is an absolute value error curve of the weld feature recognition model in the X and Y directions evaluated on the low noise and high noise test sets, respectively, wherein (a) is the regression absolute value error of the feature point in the X direction of the present embodiment model evaluated on the low noise test set, (b) is the regression absolute value error of the feature point in the Y direction of the present embodiment model evaluated on the low noise test set, (c) is the regression absolute value error of the feature point in the X direction of the present embodiment model tested on the high noise test set, (d) is the regression absolute value error of the feature point in the Y direction of the present embodiment model tested on the high noise test set;

FIG. 8 is an absolute value error curve of the YOLOv model in the X and Y directions evaluated on the low-noise and high-noise test sets, respectively, wherein (a) is the absolute value error of the regression of the feature points of the YOLOv model in the X direction evaluated on the low-noise test set, (b) is the absolute value error of the regression of the feature points of the YOLOv model in the Y direction evaluated on the low-noise test set, (c) is the absolute value error of the regression of the feature points of the YOLOv model in the X direction tested on the high-noise test set, and (d) is the absolute value error of the regression of the feature points of the YOLOv model in the Y direction tested on the high-noise test set;

FIG. 9 is an absolute value error curve of the X and Y directions of the DETR model evaluated on the low noise and high noise test sets, respectively, wherein (a) is the regression absolute value error of the feature point of the DETR model evaluated on the low noise test set in the X direction, (b) is the regression absolute value error of the feature point of the DETR model evaluated on the low noise test set in the Y direction, (c) is the regression absolute value error of the feature point of the DETR model tested on the high noise test set in the X direction, (d) is the regression absolute value error of the feature point of the DETR model tested on the high noise test set in the Y direction;

FIG. 10 is a partial prediction obtained on a test set for three different models.

Detailed Description

It should be noted that the following detailed description is illustrative and is intended to provide further explanation of the application. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs.

It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of exemplary embodiments according to the present application. As used herein, the singular is also intended to include the plural unless the context clearly indicates otherwise, and furthermore, it is to be understood that the terms "comprises" and/or "comprising" when used in this specification are taken to specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof.

Example 1

In one or more embodiments, an anti-noise weld feature recognition method based on a lightweight neural network is disclosed, comprising the following steps:

(1) Acquiring welding image data in a welding process;

(2) Inputting the obtained welding image data into a trained welding line characteristic recognition model, and outputting welding line type and position information;

The weld joint feature recognition model comprises a main network, a neck layer and a head layer, wherein the main network outputs feature information of different scales and transmits the feature information to the neck layer, the neck layer fuses the feature information of different scales, the fused feature images are input into the head layer, the head layer comprises three branches, namely a heat image branch, an offset branch and a classification branch, the heat image branch is used for outputting a heat image tensor to obtain the probability that each feature value represents the corresponding region of the point to contain a feature point, the offset branch is used for outputting the offset tensor to obtain the position deviation of each dimension generated feature point, the classification branch is used for outputting a classification tensor to obtain the corresponding score of each weld joint type, and finally the position of the weld joint (such as a Y-shaped weld joint, a Lap Lap joint weld joint, a Butt Butt weld joint and the like) is predicted.

Specifically, the backbone network extracts characteristic information in the image through multi-layer neural network stacking, and transmits the characteristic information with different scales and different extraction degrees into the neck layer. In order to improve the processing speed of the network and enable the network to run on non-GPU equipment better, the embodiment uses a lightweight backbone architecture MobileNetv as a backbone network part, so that network parameters are reduced effectively. Wherein MobileNetV may output different levels of activation features, which are input to the neck layer for further image processing tasks for subsequent image processing.

In the neck layer, a Feature Pyramid Network (FPN) structure is applied, and feature information of different scales is fused. Specifically, feature maps with different resolutions and different semantic levels can be generated in each stage of the main network, for example, the shallow network is more sensitive to the position information by taking the morphological features of the pictures as the main part, and the deep network is more sensitive to the information such as the category of welding lines and the recognition of noise by taking the semantic features of the pictures as the main part. Thus, this embodiment draws three branches in the backbone network. As shown in fig. 1, the FPN will merge the backbone network layer with the downsampling step length {8, 16, 32} relative to the input image, and integrate the deep abstract semantic information with the coarse-granularity feature information of the surface layer, so as to obtain better small-scale feature detection capability.

With reference to fig. 1, the specific structure of the neck layer of this embodiment includes:

The characteristic information of the deepest scale output by the backbone network sequentially passes through the cascade channel attention module, the third convolution layer and the self-adaptive pooling layer and then is output to the classification branch of the head layer;

after the characteristic information of the second scale output by the backbone network passes through the second convolution layer, the characteristic information is combined with the output of the up-sampled third convolution layer and then output to the first DP module;

and after passing through the first convolution layer, the characteristic information of the first scale output by the backbone network is combined with the output of the up-sampled first DP module, and is respectively transmitted to a heat map branch and an offset branch of the head layer through the second DP module.

In this embodiment, the DP block is designed to replace part of the standard 3 x 3 convolution block in the neck layer to achieve better lightweight performance. The structure of the DP block comprises a depth convolution layer (DWC), a first BN layer (normalization layer), an activation function layer, a point-by-point convolution layer (PWC) and a second BN layer which are sequentially connected, wherein spatial feature information is acquired through the depth convolution layer, and above the spatial feature information extracted by the depth convolution layer, the spatial features are combined through the point-by-point convolution layer, so that the fusion of the spatial feature information is realized. Following each convolution process, the BN layer (normalization layer) normalizes each characteristic channel of each small batch of data to a mean value close to 0 and a variance close to 1, thereby reducing internal covariate offset. And after DWC+BN layer processing, adding a ReLU6 activation function friendly to low-dimensional characteristic information, wherein the ReLU6 activation function can introduce nonlinearity into a neural network, so that the network can learn and execute more complex tasks, allow the network to simulate more complex function mapping, increase the expression capability and flexibility of a model, and simultaneously can alleviate the gradient disappearance problem, enable the training process to be more stable and accelerate the convergence speed.

In this embodiment, the spatial feature information is obtained through the deep convolution layer, where the spatial feature information mainly refers to the geometric shape and structure of the weld in the image, including the width, length, shape (straight line, curve, branch) and the like of the weld, and some feature information such as the boundary between the weld and surrounding materials. Channel information, i.e. the association between different feature maps, can be obtained by a point-wise convolution layer.

By introducing the DP block, standard convolution operation is decomposed into deep convolution and point-by-point convolution, which obviously reduces the parameter quantity and calculation amount of the model, and further reduces the memory occupied by the model in running, thereby reducing the performance requirements on mobile and embedded deployment equipment.

In addition, the architecture of the neck layer is also improved to some extent in this embodiment, compared with the original method shown in fig. 3 (a), the improved top-down path and the lateral connection in this embodiment adopt a method of series connection instead of addition, as shown in fig. 3 (b), and features of each input branch are directly combined through Concat functions. Tuning the channel helps to combine the feature map of the PWC output with the upsampling result of the coarser resolution feature map containing the semantically stronger information. Eventually, the neck layer will generate a feature map of size 128 x 64, which is then input into the head layer.

In addition, in the neck layer, the network connects an autonomously designed Cascade Channel Attention Module (CCAM) with 32×n (downsampled output) in order to enhance the noise immunity of the model.

The deepest output shape of the backbone network obtained by the neck layer is hin×win×cin, specifically 16×16×512. It should be noted that a significant portion of the features are associated with noise. These features will be incorporated in the neck layer and ultimately affect the output of the head layer. To mitigate the effects of noise, CCAM is employed to filter the noise. As shown in fig. 4, the CCAM divides the characteristic information of the deepest scale output by the backbone network into three blocks, each of which is respectively channel weighted by the channel attention module, and splices the output of the channel weighted by the three blocks to obtain the output of the cascade channel attention module, wherein the output of the first block after passing through the channel attention module is combined with the second block, and the output of the second block after passing through the channel attention module is combined with the third block.

The CCAM uses a Channel Attention (CA) module as a core feature selector, the global average pooling layer is used for extruding an input feature map into a vector, then the vector is sent to a first full-connection (FC) layer for feature dimension reduction, then a subsequent FC layer for feature number reduction, the obtained vector is mapped into a range from 0 to 1 through a softmax function, and the weighting of the input feature map channel is realized through a channel multiplication operator. The higher the weight of any channel, the more important the features in that channel. This embodiment uses a softmax function for the FC generated vector instead of a sigmoid function for the entire feature map, and this replacement can reduce the amount of computation with little loss of accuracy. The calculation formula of the refining characteristic diagram FCA of the CA module is as follows:

(1)

Wherein z represents the input feature, AndParameters representing two FC phases, r representing the reduction rate of 16 in the neural network, C representing the number of input tensors of the CCAM, g representing the number of packets of the input tensors.

In CCAM, the output of the first CA block and the output of the second block are combined in an element-by-element addition mode to enhance information fusion among channels, and for CCAM, the calculated amount generated by the FC layer is. The FC layer does not depend on the cascade grouping structure, but directly calculates by using the attention of the channel, and the calculated amount is that. With the concatenated packet structure, the CA module computes approximately the unused capacity。

In the head layer, the position of the characteristic point and the weld type are predicted, the weld type is single for each weld image, the key point types are the same, and the embodiment designs a heat map branch, an offset branch and an independent classification branch as shown in fig. 1. The heat map branch is used for outputting a heat map tensor, wherein the heat map tensor is a feature vector diagram of 128×128×1, and each feature value represents the probability that the corresponding region of the point contains feature points. The offset branch is used for outputting an offset tensor, the offset tensor is a displacement graph of 128×128×2, and each dimension generates a deviation of the detailed position of the feature point relative to the upper left corner of the region, so as to eliminate quantization errors of single heat map prediction. The classification branches are used for outputting classification tensors, the classification branches are redesigned and led out from the deepest network layer of the main network, and classification feature vectors of 1 x 3 are finally generated through convolution, pooling and other operations, so that the corresponding scores of three weld categories are obtained, and the weld types are judged according to the scores.

The heat map tensor of the present embodiment is generally used to represent the position of the weld in the image, and the offset tensor may provide finer positional adjustment, and in combination with both information, may more accurately position the weld. The classification tensor may provide information about the type of weld, in combination with the heat map and offset tensor, which may identify the type of weld while the weld is being localized, the use of the heat map tensor and offset tensor improves the interpretability of the model predictions, helping to understand how the model makes predictions.

By considering multiple partitions (i.e., different regions in the heat map tensor) and selecting the first two with the highest score, the model can better handle discontinuities in the weld or noise in the image, reducing the likelihood of erroneously identifying non-weld regions as welds. If the highest scoring partition in the heat map is inaccurate due to some interference, the next highest scoring partition is considered as an alternative, thereby improving the overall system's fault tolerance to mispredictions. Because the offset tensor can fine tune the predictions in the heat map, the model is allowed to adapt to welds of different shapes and sizes.

The specific loss function of the weld feature recognition model of this embodiment is as follows:

the header layer generates three parts, respectively heat map tensors ∈Offset tensorAnd a classification tensor (i.e. one-heat encoded one-hot ncoding)Where W and H are the input image sizes of the model, c is the total number of weld types, and R is the corresponding feature map scaling factor for each head layer.

Throughout the training process, the loss is determined by comparing the predicted value generated by the head layer with the ground truth tag value, emphasizing that the shape of the tag value should match the predicted shape. This comparison aims at optimizing the neural network parameters to achieve optimal score or regression prediction for the model output. Therefore, in this case, the tag generation problem must be solved.

Label heat map (Label Heatmap) the label heat map is a method of representing the position of an object in an image, and label (category) information is mapped onto each pixel of the image in a color coded form to form a heat map of the same size as the original image. In this heat map, different colors represent different labels or categories, so that the category to which each pixel belongs can be intuitively displayed.

The heat map tensor output in this embodiment is the generated label heat map, where each value represents the probability that the corresponding pixel belongs to a certain class. The coordinates of the feature point locations after mapping to the corresponding partitions in the heat map tensor labels are,The label value of the kth feature point coordinate is related to the size of the model input image (i.e., the input size of the algorithm is 512×512), and r=4.

On the label heat map, positive sample points are rounded with gaussian: In a form that accelerates convergence of the model, wherein, Is a factor that varies with the size of the object,Representing the center of a gaussian circle, is the ideal positive position of the feature point mapping on the heat map.

In view of the rounding effect inherent to the division of layers, relying solely on heat maps to recover the positional information of feature points may lead to a reduction in accuracy. To solve this problem, an offset is introduced. The offset tag value for each feature point is expressed as:

(2)

Wherein, Is a two-dimensional coordinate representing the relative position proportion of the characteristic points in the corresponding subareas of the label heat map. Thus, the offset is represented by a two-channel tensor. Furthermore, the offset Zhang Liangxiang remains uniquely valid in the corresponding bin in the positive position in the heat map tensor.

Is provided withPosition in a heat map for modelingThe prediction score at (i, j) refers to the probability or confidence that the model predicts that the pixel at (i, j) belongs to each category, and the variation of focus loss is used to optimize the parameters of the generated heat map, expressed as:

(3)

wherein M _ij is the position The true label at which the value of M _ij is used to train the model, representing the probability that the center point of the object is present at that location, can help the model learn to predict the location of the object. N ₀ is the number of feature points in the structured-light fringe image, H and W represent the spatial size of the heat map, α is used to fine tune the weights of the challenging and easily located points, and β is the weight that controls the non-central values within the gaussian circle. Notably, as the distance between the tag and the center of Gao Siyuan increases, the tag is assigned toThe penalty weight of the corresponding partition prediction score will also increase.

The nature of the bias prediction stems from the regression task. To achieve accurate offset prediction, the following smoothed L1 loss function is utilized:

(4)

Wherein N ₀ is the number of feature points in the structured-light stripe image, An offset tag value representing each feature point,Representing the offset prediction value. Smoothing the L1 loss takes advantage of the L1 and L2 loss functions. This method can prevent gradient explosion at the initial stage of training and promote obtaining a milder gradient in the back propagation process after the training is finished. This greatly contributes to improving the convergence of the model.

Cross entropy loss can be used for weld classification tasks. Here, letRepresenting the score, boolean variable, of the ith structured-light stripe image predicted by the model of the present exampleRepresenting tag values of class i in single-point codingThe loss due to classification is expressed by the following formula:

(5)

wherein c represents the total number of weld types. Finally, total loss Is the overall training goal, and the deduction process is as follows:

(6)

Wherein, 、AndThe heat map loss, the offset loss and the classification loss,AndThe constant weights assigned to offset loss and classification loss are represented, with a configuration value of 1.

There is a correlation between the point prediction and the laser stripe classification. Specifically, if the classification result is predicted as "lap seam", the model identifies the first two partitions in the heat map tensor that have the highest predictive score. These data are then combined with the corresponding segments on the offset tensor to determine the final positioning prediction result. In this case, the suppression of the predictive score threshold need not be considered. However, the heat map tensor is resized to match the uniform shape of the original image, completely covering the original image. The colors in the heat map represent the prediction scores assigned to the potential feature points. Fig. 5 shows a feature point recognition effect diagram, which shows a broken laser stripe, and the implementation method can accurately recognize two ends as potential feature points despite the fact that the two ends are very close.

The following verifies the method of this embodiment:

the data sets for training and testing the models are from the weld positioning and tracking sensors mentioned earlier. The model was trained on a computer equipped with an Nvidia 2080Ti GPU and an Intel Xeon (R) E5-2683 CPU.

The backbone layer is initialized with pre-trained MobileNetv's 3 weights to speed up training and achieve fast convergence. The FC layer was implemented using a standard 1 x 1 convolution, and the convolution layer parameters were initialized using He initialization method. As for BN layer parameters, the weight is initialized to 1 and the bias is initialized to 0. Using Adam as the optimizer, the initial learning rate was set to 5 x 10 ^-4.

For a welding task, the weld feature recognition model of the present embodiment model (Ours) can only predict one class of one image associated with all points. The DETR can learn the correlation between the welding image category and the feature point category in the dataset, thereby preventing erroneous classification of points in a single image into different categories. And YOLOv n would infer that each point would match a category.

In order to compare the performance of several models, the category of the point in each image where the prediction classification confidence is the greatest is the category of the whole image. To simplify the comparison and account for differences in sample numbers, the three models were evaluated in weld classification using the following three criteria:

(7)

(8)

(9)

Wherein, Representing weighted average precision, recall, and F1-score, respectively.Class i precision, recall, and F1-score, respectively.Represents the weighting factor of the i-th class (1/N in this embodiment).

True Positives (TP) indicate the number of samples predicted correctly, false Negatives (FN) indicate the number of labeled frames without corresponding correct prediction frames, false Positives (FP) indicate the number of prediction frames not meeting the correct prediction criteria, and can be calculated from the confusion matrix. Subscript i denotes class i, such as: indicating the number of samples for which class i prediction is correct.

The index values of the three models on the adjusted test set are shown in table 1. Fig. 6 (a) -6 (c) show confusion matrices obtained for different models on the adjusted test set. Fig. 6 (a) -6 (c) correspond to the prediction results of the present embodiment model, YOLOv n, and DETR, respectively.

It should be noted that, among the three key indexes for evaluating the classification performance, the numerical value of the model of this embodiment is highest, the weighted average accuracy is 0.9674, the weighted average recall is 0.9666, and the weighted average F1 score is 0.9668.

TABLE 1 index values of three models

To evaluate the feature point positioning performance of the model of this embodiment at different noise levels, the noise immunity is evaluated. The performance index includes Mean Absolute Error (MAE), root Mean Square Error (RMSE), standard deviationAnd predicting the projection of the coordinates and the tag coordinates to find an average Euclidean distance relative to the camera coordinate system。

The positioning error of the feature points is calculated as follows according to the X-axis direction and the Y-axis direction of the image:

(10)

Wherein, AndTwo-dimensional label coordinates and two-dimensional prediction coordinates representing the feature points, respectively.

MAE is used to evaluate the static accuracy of a position fix and its derivation is as follows:

(11)

where N represents the total number of feature points in the specified test set, consistent with the definition in the following equation. The root mean square error will highlight the effect of the predicted outliers, which is derived as follows:

(12)

For assessing the stability/robustness of the model, the derivation procedure is as follows:

(13)

Wherein, Is at all characteristic pointsAverage value of (2).

After the two-dimensional positioning result is projected into the three-dimensional space, the measurement index is adoptedTo evaluate the error level of the three models. The calculation formula is as follows:

(14)

Wherein, AndThree-dimensional label coordinates and three-dimensional prediction coordinates respectively representing characteristic points according to the formulaAndCoordinates of the feature points with respect to the camera coordinate system can be calculated. This embodiment assumes a priori that the calibration results of the line structured light plane equation and the camera eigenvalue matrix are ideal, and any error can only be attributed to the position of the feature point predicted by the feature point positioning model in the image coordinate system.

Fig. 7, 8 and 9 show absolute value error curves of the present embodiment models YOLOv and DETR (Detection Transformer, target detection model based on transducer architecture), respectively, which represent test results using the three models. The green curve corresponds to the absolute error measured by the low noise test set, the blue curve corresponds to the absolute error measured by the high noise test set, and the red horizontal line represents the Mean Absolute Error (MAE).

Here, (a) and (b) in fig. 7 illustrate the feature point regression absolute value errors in the X and Y directions of the present embodiment model evaluated on the low noise test set, respectively. Fig. 7 (c) and (d) show the feature point regression absolute value errors in the X and Y directions, respectively, of the model of the present embodiment tested on the high noise test set. The peak value, the peak position of the absolute value error, and the mean absolute value error (MAE) are all noted. Fig. 8 (a) and (b) illustrate the feature point regression absolute value errors in the X and Y directions of the YOLOv model evaluated on the low-noise test set, respectively. Fig. 8 (c) and (d) show the eigenvalue regression absolute errors in the X and Y directions for the YOLOv model tested on the high-noise test set, respectively. The peak value, the peak position of the absolute value error, and the mean absolute value error (MAE) are all noted. Fig. 9 (a) and (b) illustrate the feature point regression absolute value errors in the X and Y directions of the DETR model evaluated on the low noise test set, respectively. Fig. 9 (c) and (d) show the feature point regression absolute value errors in the X and Y directions of the DETR model tested on the high noise test set, respectively. The peak value, the peak position of the absolute value error, and the mean absolute value error (MAE) are all noted.

All indices related to feature point positioning errors have been calculated and are listed in tables 2 and 3, mean Absolute Error (MAE), root Mean Square Error (RMSE) and standard deviationEvaluating along X and Y directions of the image, and averaging the two directions with average MAE, average RMSE and averageAnd (3) representing. Meanwhile, tables 2 and 3 also list the results of the present example model without using CCAM.

Table 2 statistical features of three models for low noise welded structure light stripe feature point positioning performance

In table 2, ours represents the present embodiment model, ours † represents that the present embodiment model does not use CCAM. MAE, RMSE and σf are all in pixels,Then in millimeters.

It can be derived that the MAE and RMSE of the three models are substantially at the same level in low noise situations. Notably, the present embodiment model with and without CCAM, in terms of average σf, appears prominent compared to the other two models, reaching 1.943 pixels and 1.922 pixels, respectively. At the position ofIn this respect, the model of this example performed best, reaching 0.197 mm. Under low noise test conditions, there was no significant difference with CCAM.

Table 3 experimental results of locating performance evaluation of weld feature points in a high noise test set

In table 3, ours represents the present embodiment model, ours † represents that the present embodiment model does not use CCAM. MAE, RMSE and σf are all in pixels,Then in millimeters.

It can be seen that in a noisy environment, the average MAE, RMSE, and ρ _mean of the model of the present example are 1.736 pixels, 2.407 pixels, and 0.205 millimeters, respectively. These indices are all better than YOLOv and DETR. Furthermore, compared to the present example model without CCAM, the CCAM-integrated model exhibits higher performance over most of the positioning indicators. This phenomenon is in sharp contrast to the measurement results observed in the low noise test set. The average σ _f value of the model of this example was 2.217 pixels, which indicates that the model of this example has excellent stability in accurately locating the weld feature points in the high noise weld image.

The lightweight effect of the model of this embodiment is verified as follows.

The lightweight nature of the model of this embodiment is evaluated based on a number of metrics, including the total number of model parameters (Params), floating point number of operations (FLOPs), average delay (MEAN LATENCY), and Frame Per Second (FPS). Params can indirectly measure the computational complexity and the memory utilization, and FLOPs represents the computational cost of the model. Lower average delay (or higher frames per second) means a reduction in inference time, which is a key factor in achieving optimal real-time performance for a seam tracking system employing the model. This is particularly important for complex weld tracking applications, welding speeds, and welding process control. By performing 100 tests using a single image, the average delay time of the three models was determined and the Frame Per Second (FPS) value was calculated by dividing the average delay time by 1000 milliseconds. Table 4 provides an overview of the lightweight index for the three models of this example, YOLOv n and DETR, and fig. 10 shows the predictions obtained for the different models over the test set.

Table 4 lightweight index comparison of the three models on Intel Briui 78650U CPU

It can be seen that the model of this example is superior to the other models in all indexes, params being 1.3M, corresponding to 72% of YOLOv n, 3.5% of DETR, GFLOPs being 0.87GFLOPs, corresponding to 21% of YOLOv5n, 1.5% of DETR. MEAN LATENCY at 29.32 milliseconds and an FPS of 34.11, the present embodiment model meets the stringent requirements for real-time weld tracking.

Example two

In one or more embodiments, an anti-noise weld feature recognition system based on a lightweight neural network is disclosed, comprising:

the weld joint feature recognition model comprises a main network, a neck layer and a head layer, wherein the main network outputs feature information of different scales and transmits the feature information to the neck layer, the neck layer fuses the feature information of different scales, the fused feature images are input into the head layer, the head layer comprises three branches, namely a heat image branch, an offset branch and a classification branch, the heat image branch is used for outputting a heat image tensor to obtain the probability that each feature value represents the corresponding region of the point and contains a feature point, the offset branch is used for outputting an offset tensor to obtain the position deviation of each dimension generated feature point, the classification branch is used for outputting a classification tensor to obtain the corresponding score of each weld joint type, and finally the position of the weld joint and the weld joint type are predicted.

The specific implementation manner of each module is the same as that in the first embodiment, and will not be described in detail.

Example III

In one or more embodiments, a terminal device is disclosed that includes a server including a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing the lightweight neural network-based anti-noise weld feature recognition method of embodiment one when executing the program.

Example IV

In one or more embodiments, a computer-readable storage medium is disclosed, in which are stored a plurality of instructions adapted to be loaded by a processor of a terminal device and to perform the lightweight neural network-based anti-noise weld feature identification method described in embodiment one.

While the foregoing description of the embodiments of the present invention has been presented in conjunction with the drawings, it should be understood that it is not intended to limit the scope of the invention, but rather, it is intended to cover all modifications or variations within the scope of the invention as defined by the claims of the present invention.

Claims

1. A noise-resistant weld feature recognition method based on a lightweight neural network, characterized by comprising:

Acquire welding image data during welding process;

Input the acquired welding image data into the trained weld feature recognition model, and output the type and location information of the weld;

Among them, the weld feature recognition model includes a backbone network, a neck layer and a head layer; the backbone network outputs feature information of different scales and transmits the feature information to the neck layer; the neck layer fuses the feature information of different scales, and the fused feature map is input into the head layer; the head layer includes three branches: a heat map branch, an offset branch and a classification branch. The heat map branch is used to output a heat map tensor to obtain the probability that each area contains a feature point; the offset branch is used to output an offset tensor to obtain the position deviation of the feature point generated in each dimension; the classification branch is used to output a classification tensor to obtain the corresponding score of each weld type; and finally the weld position and weld type are predicted;

The neck layer adopts a feature pyramid network structure, including: the deepest scale feature information output by the backbone network passes through the cascade channel attention module, the third convolution layer and the adaptive pooling layer in sequence, and is output to the classification branch of the head layer; the second scale feature information output by the backbone network passes through the second convolution layer, is merged with the output of the upsampled third convolution layer, and then is output to the first DP module; the first scale feature information output by the backbone network passes through the first convolution layer, is merged with the output of the upsampled first DP module, and is transmitted to the heat map branch and the offset branch of the head layer respectively through the second DP module;

The cascade channel attention module divides the deepest scale feature information output by the backbone network into three blocks, and each block is channel-weighted by the channel attention module; the outputs of the three blocks after channel weighting are spliced to obtain the output of the cascade channel attention module; wherein the output of the first block after passing through the channel attention module is merged with the second block; the output of the second block after passing through the channel attention module is merged with the third block;

The first DP module and the second DP module have the same structure, including: a deep convolution layer, a first BN layer, an activation function layer, a point-by-point convolution layer and a second BN layer connected in sequence; spatial feature information is obtained through the deep convolution layer, and on the spatial feature information extracted by the deep convolution layer, these spatial features are combined through the point-by-point convolution layer to achieve fusion of spatial feature information.

2. The noise-resistant weld feature recognition method based on a lightweight neural network as described in claim 1 is characterized in that the backbone network adopts a lightweight MobileNetv3 structure.

3. The noise-resistant weld feature recognition method based on a lightweight neural network according to claim 1, wherein the loss function of the weld feature recognition model is:

;

in, , and They are heat map loss, offset loss and classification loss respectively. and are the weights assigned to bias loss and classification loss, respectively.

4. The noise-resistant weld feature recognition method based on a lightweight neural network according to claim 3, wherein the heat map loss is specifically:

;

Among them, assuming Locations in the heat map output for the weld feature recognition model The prediction score at the position; M _ij is the is the true label at , indicating the probability that the center point of the object appears at this position; _N0 is the number of feature points in the image, H and W represent the spatial size of the heat map, and α and β are weights.

5. A noise-resistant weld feature recognition system based on a lightweight neural network, characterized by comprising:

A data acquisition module, used for acquiring welding image data during the welding process;

The weld prediction module is used to input the acquired welding image data into the trained weld feature recognition model and output the type and location information of the weld;

6. A terminal device, comprising a processor and a memory, the processor being used to implement instructions; the memory being used to store multiple instructions, characterized in that the instructions are suitable for being loaded by the processor and executing the anti-noise weld feature recognition method based on a lightweight neural network as described in any one of claims 1-4.

7. A computer-readable storage medium storing a plurality of instructions, characterized in that the instructions are suitable for being loaded by a processor of a terminal device and executing the noise-resistant weld feature recognition method based on a lightweight neural network as described in any one of claims 1 to 4.