[go: up one dir, main page]

CN118674985B - Noise-resistant weld feature recognition method and system based on lightweight neural network - Google Patents

Noise-resistant weld feature recognition method and system based on lightweight neural network Download PDF

Info

Publication number
CN118674985B
CN118674985B CN202410804228.XA CN202410804228A CN118674985B CN 118674985 B CN118674985 B CN 118674985B CN 202410804228 A CN202410804228 A CN 202410804228A CN 118674985 B CN118674985 B CN 118674985B
Authority
CN
China
Prior art keywords
layer
weld
feature
output
branch
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202410804228.XA
Other languages
Chinese (zh)
Other versions
CN118674985A (en
Inventor
杜付鑫
苏富康
董宗峰
张开旭
田连发
何为凯
陈超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong University
Shandong Jiaotong University
MH Robot and Automation Co Ltd
Original Assignee
Shandong University
Shandong Jiaotong University
MH Robot and Automation Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong University, Shandong Jiaotong University, MH Robot and Automation Co Ltd filed Critical Shandong University
Priority to CN202410804228.XA priority Critical patent/CN118674985B/en
Publication of CN118674985A publication Critical patent/CN118674985A/en
Application granted granted Critical
Publication of CN118674985B publication Critical patent/CN118674985B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • G06T7/0004Industrial image inspection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30108Industrial image inspection
    • G06T2207/30152Solder
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Multimedia (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Molecular Biology (AREA)
  • Mathematical Physics (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Data Mining & Analysis (AREA)
  • Quality & Reliability (AREA)
  • Image Analysis (AREA)

Abstract

本发明涉及焊缝识别技术领域,具体公开了一种基于轻量化神经网络的抗噪声焊缝特征识别方法及系统,包括:获取焊接过程中的焊接图像数据;将获取的焊接图像数据输入至训练好的焊缝特征识别模型,输出焊缝的类型和位置信息;其中,焊缝特征识别模型包括主干网络、颈部层和头部层;主干网络输出不同尺度的特征信息,并将特征信息传输至颈部层;颈部层对不同尺度的特征信息进行融合,融合后的特征图输入头部层;头部层包括热图分支、偏移分支和分类分支三个分支,最终预测得到焊缝的位置及焊缝类型。本发明用于利用单线结构光视觉对焊缝特征点进行精确分类和定位,同时减少模型的参数和计算量,满足工业领域焊接跟踪系统的实时性要求。

The present invention relates to the field of weld recognition technology, and specifically discloses a noise-resistant weld feature recognition method and system based on a lightweight neural network, including: acquiring welding image data during the welding process; inputting the acquired welding image data into a trained weld feature recognition model, and outputting the type and position information of the weld; wherein the weld feature recognition model includes a backbone network, a neck layer, and a head layer; the backbone network outputs feature information of different scales, and transmits the feature information to the neck layer; the neck layer fuses feature information of different scales, and the fused feature map is input into the head layer; the head layer includes three branches, namely, a heat map branch, an offset branch, and a classification branch, and finally predicts the position and type of the weld. The present invention is used to accurately classify and locate weld feature points using single-line structured light vision, while reducing the parameters and calculation amount of the model, and meeting the real-time requirements of welding tracking systems in the industrial field.

Description

Noise-resistant weld joint feature recognition method and system based on lightweight neural network
Technical Field
The invention relates to the technical field of weld feature recognition processing, in particular to an anti-noise weld feature recognition method and system based on a lightweight neural network.
Background
The statements in this section merely provide background information related to the present disclosure and may not necessarily constitute prior art.
With the wide application of robots in the field of industrial manufacturing, a welding method based on teaching and reproduction has been developed. Traditional robot welding relies on a worker to manually set the position track of a welding gun, is inefficient, and is susceptible to time-varying dimensional errors (e.g., thermal deformations) of the workpiece. In order to track the weld in real time, and realize automatic adjustment of the welding robot, researchers have designed some advanced sensors to acquire spatial position information and category information of the weld. With the rapid development of machine vision, structured light vision has been widely used in intelligent robots with the advantages of high accuracy and providing rich welding process feature information.
The characteristics of the single line structured light projection fringes of different welds vary widely. This discrepancy is valuable for obtaining positional information about the weld feature points and also facilitates classification of the fringes. This classification is critical to effectively adjusting the welding process parameters. However, noise common to welding processes, including spatter, smoke, and arc reflected light, can be confused with streaks collected by industrial cameras. Morphological-based methods have been proposed to achieve extraction of weld features. Morphology-based studies have two common problems. First, feature point (or region) extraction and weld classification typically require two stages. On the other hand, although these models have a certain noise immunity, in the case where noise is continuously high, tracking accuracy thereof is lowered.
With the development of deep learning technology in the field of computer vision, object detection and semantic segmentation processes are improved and enhanced under the background of improving the hardware performance of computing equipment and providing a Convolutional Neural Network (CNN) model. These tasks are often transferred to seam feature extraction tasks using line structured light to help create an efficient feature extraction anti-noise model. For example, a YOLO-WELD model is built on the basis of YOLOv for detecting weld feature points. And methods of extracting multi-layer multi-pass weld feature points using a conditional generation countermeasure network (CGAN) and an improved CNN model.
However, these neural network-based methods require a large amount of computation resources to be allocated, which in turn requires a higher computational performance of the central control device integrated into the welding robot system, and there is typically no independent GPU computing unit on the industrial computer, thus lacking GPU acceleration support, and thus failing to meet the requirements of welding instantaneity.
Disclosure of Invention
In order to solve the problems, the invention provides an anti-noise weld feature recognition method and system based on a lightweight neural network, which are based on the lightweight neural network design, have stronger anti-interference capability and simultaneously maintain higher reasoning frequency so as to meet the real-time requirement of a welding tracking system in the industrial field.
In some embodiments, the following technical scheme is adopted:
an anti-noise weld feature recognition method based on a lightweight neural network, comprising:
acquiring welding image data in a welding process;
inputting the obtained welding image data into a trained welding line characteristic recognition model, and outputting welding line type and position information;
The weld joint feature recognition model comprises a main network, a neck layer and a head layer, wherein the main network outputs feature information of different scales and transmits the feature information to the neck layer, the neck layer fuses the feature information of different scales, the fused feature images are input into the head layer, the head layer comprises three branches, namely a heat image branch, an offset branch and a classification branch, the heat image branch is used for outputting heat image tensors to obtain the probability that each region contains feature points, the offset branch is used for outputting offset tensors to obtain the position deviation of feature points generated by each dimension, the classification branch is used for outputting classification tensors to obtain the corresponding score of each weld joint type, and finally the position of a weld joint and the weld joint type are predicted.
In other embodiments, the following technical solutions are adopted:
an anti-noise weld feature recognition system based on a lightweight neural network, comprising:
the data acquisition module is used for acquiring welding image data in the welding process;
The weld joint prediction module is used for inputting the acquired welding image data into a trained weld joint feature recognition model and outputting the type and position information of the weld joint;
The weld joint feature recognition model comprises a main network, a neck layer and a head layer, wherein the main network outputs feature information of different scales and transmits the feature information to the neck layer, the neck layer fuses the feature information of different scales, the fused feature images are input into the head layer, the head layer comprises three branches, namely a heat image branch, an offset branch and a classification branch, the heat image branch is used for outputting heat image tensors to obtain the probability that each region contains feature points, the offset branch is used for outputting offset tensors to obtain the position deviation of feature points generated by each dimension, the classification branch is used for outputting classification tensors to obtain the corresponding score of each weld joint type, and finally the position of a weld joint and the weld joint type are predicted.
In other embodiments, the following technical solutions are adopted:
The terminal equipment comprises a processor and a memory, wherein the processor is used for realizing instructions, and the memory is used for storing a plurality of instructions, and the instructions are suitable for being loaded by the processor and executing the anti-noise weld joint characteristic identification method based on the lightweight neural network.
In other embodiments, the following technical solutions are adopted:
A computer readable storage medium having stored therein a plurality of instructions adapted to be loaded by a processor of a terminal device and to perform the above-described anti-noise weld feature recognition method based on a lightweight neural network.
Compared with the prior art, the invention has the beneficial effects that:
(1) The invention constructs a MobileNetv light neural network structure, which is used for precisely classifying and positioning weld characteristic points by utilizing single-line structured light vision, and simultaneously reduces parameters and calculation amount of a model, so that the model can be deployed on a computing platform which cannot obtain GPU acceleration, such as an Embedded Industrial Computer (EIC), and simultaneously maintains higher reasoning frequency so as to meet the real-time requirement of a welding tracking system in the industrial field.
(2) The neural network structure uses DP (depth separable convolution (DEPTHWISE SEPARABLE CONVOLUTION)) blocks to replace a standard convolution layer in a neck layer and a head layer, and the DP blocks decompose standard convolution operation into depth convolution and point-by-point convolution, so that the parameter number and the calculation amount of a model are obviously reduced, the memory occupied by the model in operation is correspondingly reduced due to the reduction of the parameter and the calculation amount, meanwhile, the training and reasoning speed of the model is generally faster, the model is more easily adapted to different hardware platforms, better light-weight performance can be obtained, the real-time requirement is met, and the deployment cost is reduced.
(3) The neural network structure of the invention adds a Cascade Channel Attention Module (CCAM) at the neck layer, adopts the CCAM to filter noise, and simultaneously adopts a cascade group structure, thereby reducing the calculated amount under the condition of almost not losing the precision.
(4) The head layer outputs a heat map tensor, an offset tensor and a classification tensor respectively, first two partitions with highest predictive scores in the model identification heat map tensor are obtained, then the data are combined with corresponding segments on the offset tensor to determine a final weld positioning prediction result, the heat map tensor is usually used for representing the position of a weld in an image, the offset tensor can provide finer position adjustment, and the weld can be positioned more accurately by combining the two information.
Additional features and advantages of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.
Drawings
FIG. 1 is a schematic diagram of a weld feature recognition model in an embodiment of the invention;
FIG. 2 is a schematic diagram of a DP block structure in an embodiment of the invention;
FIG. 3 (a) is an original schematic diagram of a feature pyramid network structure according to an embodiment of the present invention;
FIG. 3 (b) is a schematic diagram of an improved structure of a feature pyramid network according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of a stage communication channel attention module according to an embodiment of the present invention;
FIG. 5 is a diagram showing the recognition effect of feature points after optimizing the pooling filter according to the embodiment of the present invention;
FIG. 6 (a) is a confusion matrix obtained by the model of the embodiment of the invention on the adjusted test set;
FIG. 6 (b) is a confusion matrix obtained by YOLOv n model on the adjusted test set;
FIG. 6 (c) is a confusion matrix obtained by the DETR model on the adjusted test set;
FIG. 7 is an absolute value error curve of the weld feature recognition model in the X and Y directions evaluated on the low noise and high noise test sets, respectively, wherein (a) is the regression absolute value error of the feature point in the X direction of the present embodiment model evaluated on the low noise test set, (b) is the regression absolute value error of the feature point in the Y direction of the present embodiment model evaluated on the low noise test set, (c) is the regression absolute value error of the feature point in the X direction of the present embodiment model tested on the high noise test set, (d) is the regression absolute value error of the feature point in the Y direction of the present embodiment model tested on the high noise test set;
FIG. 8 is an absolute value error curve of the YOLOv model in the X and Y directions evaluated on the low-noise and high-noise test sets, respectively, wherein (a) is the absolute value error of the regression of the feature points of the YOLOv model in the X direction evaluated on the low-noise test set, (b) is the absolute value error of the regression of the feature points of the YOLOv model in the Y direction evaluated on the low-noise test set, (c) is the absolute value error of the regression of the feature points of the YOLOv model in the X direction tested on the high-noise test set, and (d) is the absolute value error of the regression of the feature points of the YOLOv model in the Y direction tested on the high-noise test set;
FIG. 9 is an absolute value error curve of the X and Y directions of the DETR model evaluated on the low noise and high noise test sets, respectively, wherein (a) is the regression absolute value error of the feature point of the DETR model evaluated on the low noise test set in the X direction, (b) is the regression absolute value error of the feature point of the DETR model evaluated on the low noise test set in the Y direction, (c) is the regression absolute value error of the feature point of the DETR model tested on the high noise test set in the X direction, (d) is the regression absolute value error of the feature point of the DETR model tested on the high noise test set in the Y direction;
FIG. 10 is a partial prediction obtained on a test set for three different models.
Detailed Description
It should be noted that the following detailed description is illustrative and is intended to provide further explanation of the application. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of exemplary embodiments according to the present application. As used herein, the singular is also intended to include the plural unless the context clearly indicates otherwise, and furthermore, it is to be understood that the terms "comprises" and/or "comprising" when used in this specification are taken to specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof.
Example 1
In one or more embodiments, an anti-noise weld feature recognition method based on a lightweight neural network is disclosed, comprising the following steps:
(1) Acquiring welding image data in a welding process;
(2) Inputting the obtained welding image data into a trained welding line characteristic recognition model, and outputting welding line type and position information;
The weld joint feature recognition model comprises a main network, a neck layer and a head layer, wherein the main network outputs feature information of different scales and transmits the feature information to the neck layer, the neck layer fuses the feature information of different scales, the fused feature images are input into the head layer, the head layer comprises three branches, namely a heat image branch, an offset branch and a classification branch, the heat image branch is used for outputting a heat image tensor to obtain the probability that each feature value represents the corresponding region of the point to contain a feature point, the offset branch is used for outputting the offset tensor to obtain the position deviation of each dimension generated feature point, the classification branch is used for outputting a classification tensor to obtain the corresponding score of each weld joint type, and finally the position of the weld joint (such as a Y-shaped weld joint, a Lap Lap joint weld joint, a Butt Butt weld joint and the like) is predicted.
Specifically, the backbone network extracts characteristic information in the image through multi-layer neural network stacking, and transmits the characteristic information with different scales and different extraction degrees into the neck layer. In order to improve the processing speed of the network and enable the network to run on non-GPU equipment better, the embodiment uses a lightweight backbone architecture MobileNetv as a backbone network part, so that network parameters are reduced effectively. Wherein MobileNetV may output different levels of activation features, which are input to the neck layer for further image processing tasks for subsequent image processing.
In the neck layer, a Feature Pyramid Network (FPN) structure is applied, and feature information of different scales is fused. Specifically, feature maps with different resolutions and different semantic levels can be generated in each stage of the main network, for example, the shallow network is more sensitive to the position information by taking the morphological features of the pictures as the main part, and the deep network is more sensitive to the information such as the category of welding lines and the recognition of noise by taking the semantic features of the pictures as the main part. Thus, this embodiment draws three branches in the backbone network. As shown in fig. 1, the FPN will merge the backbone network layer with the downsampling step length {8, 16, 32} relative to the input image, and integrate the deep abstract semantic information with the coarse-granularity feature information of the surface layer, so as to obtain better small-scale feature detection capability.
With reference to fig. 1, the specific structure of the neck layer of this embodiment includes:
The characteristic information of the deepest scale output by the backbone network sequentially passes through the cascade channel attention module, the third convolution layer and the self-adaptive pooling layer and then is output to the classification branch of the head layer;
after the characteristic information of the second scale output by the backbone network passes through the second convolution layer, the characteristic information is combined with the output of the up-sampled third convolution layer and then output to the first DP module;
and after passing through the first convolution layer, the characteristic information of the first scale output by the backbone network is combined with the output of the up-sampled first DP module, and is respectively transmitted to a heat map branch and an offset branch of the head layer through the second DP module.
In this embodiment, the DP block is designed to replace part of the standard 3 x 3 convolution block in the neck layer to achieve better lightweight performance. The structure of the DP block comprises a depth convolution layer (DWC), a first BN layer (normalization layer), an activation function layer, a point-by-point convolution layer (PWC) and a second BN layer which are sequentially connected, wherein spatial feature information is acquired through the depth convolution layer, and above the spatial feature information extracted by the depth convolution layer, the spatial features are combined through the point-by-point convolution layer, so that the fusion of the spatial feature information is realized. Following each convolution process, the BN layer (normalization layer) normalizes each characteristic channel of each small batch of data to a mean value close to 0 and a variance close to 1, thereby reducing internal covariate offset. And after DWC+BN layer processing, adding a ReLU6 activation function friendly to low-dimensional characteristic information, wherein the ReLU6 activation function can introduce nonlinearity into a neural network, so that the network can learn and execute more complex tasks, allow the network to simulate more complex function mapping, increase the expression capability and flexibility of a model, and simultaneously can alleviate the gradient disappearance problem, enable the training process to be more stable and accelerate the convergence speed.
In this embodiment, the spatial feature information is obtained through the deep convolution layer, where the spatial feature information mainly refers to the geometric shape and structure of the weld in the image, including the width, length, shape (straight line, curve, branch) and the like of the weld, and some feature information such as the boundary between the weld and surrounding materials. Channel information, i.e. the association between different feature maps, can be obtained by a point-wise convolution layer.
By introducing the DP block, standard convolution operation is decomposed into deep convolution and point-by-point convolution, which obviously reduces the parameter quantity and calculation amount of the model, and further reduces the memory occupied by the model in running, thereby reducing the performance requirements on mobile and embedded deployment equipment.
In addition, the architecture of the neck layer is also improved to some extent in this embodiment, compared with the original method shown in fig. 3 (a), the improved top-down path and the lateral connection in this embodiment adopt a method of series connection instead of addition, as shown in fig. 3 (b), and features of each input branch are directly combined through Concat functions. Tuning the channel helps to combine the feature map of the PWC output with the upsampling result of the coarser resolution feature map containing the semantically stronger information. Eventually, the neck layer will generate a feature map of size 128 x 64, which is then input into the head layer.
In addition, in the neck layer, the network connects an autonomously designed Cascade Channel Attention Module (CCAM) with 32×n (downsampled output) in order to enhance the noise immunity of the model.
The deepest output shape of the backbone network obtained by the neck layer is hin×win×cin, specifically 16×16×512. It should be noted that a significant portion of the features are associated with noise. These features will be incorporated in the neck layer and ultimately affect the output of the head layer. To mitigate the effects of noise, CCAM is employed to filter the noise. As shown in fig. 4, the CCAM divides the characteristic information of the deepest scale output by the backbone network into three blocks, each of which is respectively channel weighted by the channel attention module, and splices the output of the channel weighted by the three blocks to obtain the output of the cascade channel attention module, wherein the output of the first block after passing through the channel attention module is combined with the second block, and the output of the second block after passing through the channel attention module is combined with the third block.
The CCAM uses a Channel Attention (CA) module as a core feature selector, the global average pooling layer is used for extruding an input feature map into a vector, then the vector is sent to a first full-connection (FC) layer for feature dimension reduction, then a subsequent FC layer for feature number reduction, the obtained vector is mapped into a range from 0 to 1 through a softmax function, and the weighting of the input feature map channel is realized through a channel multiplication operator. The higher the weight of any channel, the more important the features in that channel. This embodiment uses a softmax function for the FC generated vector instead of a sigmoid function for the entire feature map, and this replacement can reduce the amount of computation with little loss of accuracy. The calculation formula of the refining characteristic diagram FCA of the CA module is as follows:
(1)
Wherein z represents the input feature, AndParameters representing two FC phases, r representing the reduction rate of 16 in the neural network, C representing the number of input tensors of the CCAM, g representing the number of packets of the input tensors.
In CCAM, the output of the first CA block and the output of the second block are combined in an element-by-element addition mode to enhance information fusion among channels, and for CCAM, the calculated amount generated by the FC layer is. The FC layer does not depend on the cascade grouping structure, but directly calculates by using the attention of the channel, and the calculated amount is that. With the concatenated packet structure, the CA module computes approximately the unused capacity
In the head layer, the position of the characteristic point and the weld type are predicted, the weld type is single for each weld image, the key point types are the same, and the embodiment designs a heat map branch, an offset branch and an independent classification branch as shown in fig. 1. The heat map branch is used for outputting a heat map tensor, wherein the heat map tensor is a feature vector diagram of 128×128×1, and each feature value represents the probability that the corresponding region of the point contains feature points. The offset branch is used for outputting an offset tensor, the offset tensor is a displacement graph of 128×128×2, and each dimension generates a deviation of the detailed position of the feature point relative to the upper left corner of the region, so as to eliminate quantization errors of single heat map prediction. The classification branches are used for outputting classification tensors, the classification branches are redesigned and led out from the deepest network layer of the main network, and classification feature vectors of 1 x 3 are finally generated through convolution, pooling and other operations, so that the corresponding scores of three weld categories are obtained, and the weld types are judged according to the scores.
The heat map tensor of the present embodiment is generally used to represent the position of the weld in the image, and the offset tensor may provide finer positional adjustment, and in combination with both information, may more accurately position the weld. The classification tensor may provide information about the type of weld, in combination with the heat map and offset tensor, which may identify the type of weld while the weld is being localized, the use of the heat map tensor and offset tensor improves the interpretability of the model predictions, helping to understand how the model makes predictions.
By considering multiple partitions (i.e., different regions in the heat map tensor) and selecting the first two with the highest score, the model can better handle discontinuities in the weld or noise in the image, reducing the likelihood of erroneously identifying non-weld regions as welds. If the highest scoring partition in the heat map is inaccurate due to some interference, the next highest scoring partition is considered as an alternative, thereby improving the overall system's fault tolerance to mispredictions. Because the offset tensor can fine tune the predictions in the heat map, the model is allowed to adapt to welds of different shapes and sizes.
The specific loss function of the weld feature recognition model of this embodiment is as follows:
the header layer generates three parts, respectively heat map tensors Offset tensorAnd a classification tensor (i.e. one-heat encoded one-hot ncoding)Where W and H are the input image sizes of the model, c is the total number of weld types, and R is the corresponding feature map scaling factor for each head layer.
Throughout the training process, the loss is determined by comparing the predicted value generated by the head layer with the ground truth tag value, emphasizing that the shape of the tag value should match the predicted shape. This comparison aims at optimizing the neural network parameters to achieve optimal score or regression prediction for the model output. Therefore, in this case, the tag generation problem must be solved.
Label heat map (Label Heatmap) the label heat map is a method of representing the position of an object in an image, and label (category) information is mapped onto each pixel of the image in a color coded form to form a heat map of the same size as the original image. In this heat map, different colors represent different labels or categories, so that the category to which each pixel belongs can be intuitively displayed.
The heat map tensor output in this embodiment is the generated label heat map, where each value represents the probability that the corresponding pixel belongs to a certain class. The coordinates of the feature point locations after mapping to the corresponding partitions in the heat map tensor labels are,The label value of the kth feature point coordinate is related to the size of the model input image (i.e., the input size of the algorithm is 512×512), and r=4.
On the label heat map, positive sample points are rounded with gaussian: In a form that accelerates convergence of the model, wherein, Is a factor that varies with the size of the object,Representing the center of a gaussian circle, is the ideal positive position of the feature point mapping on the heat map.
In view of the rounding effect inherent to the division of layers, relying solely on heat maps to recover the positional information of feature points may lead to a reduction in accuracy. To solve this problem, an offset is introduced. The offset tag value for each feature point is expressed as:
(2)
Wherein, Is a two-dimensional coordinate representing the relative position proportion of the characteristic points in the corresponding subareas of the label heat map. Thus, the offset is represented by a two-channel tensor. Furthermore, the offset Zhang Liangxiang remains uniquely valid in the corresponding bin in the positive position in the heat map tensor.
Is provided withPosition in a heat map for modelingThe prediction score at (i, j) refers to the probability or confidence that the model predicts that the pixel at (i, j) belongs to each category, and the variation of focus loss is used to optimize the parameters of the generated heat map, expressed as:
(3)
wherein M ij is the position The true label at which the value of M ij is used to train the model, representing the probability that the center point of the object is present at that location, can help the model learn to predict the location of the object. N 0 is the number of feature points in the structured-light fringe image, H and W represent the spatial size of the heat map, α is used to fine tune the weights of the challenging and easily located points, and β is the weight that controls the non-central values within the gaussian circle. Notably, as the distance between the tag and the center of Gao Siyuan increases, the tag is assigned toThe penalty weight of the corresponding partition prediction score will also increase.
The nature of the bias prediction stems from the regression task. To achieve accurate offset prediction, the following smoothed L1 loss function is utilized:
(4)
Wherein N 0 is the number of feature points in the structured-light stripe image, An offset tag value representing each feature point,Representing the offset prediction value. Smoothing the L1 loss takes advantage of the L1 and L2 loss functions. This method can prevent gradient explosion at the initial stage of training and promote obtaining a milder gradient in the back propagation process after the training is finished. This greatly contributes to improving the convergence of the model.
Cross entropy loss can be used for weld classification tasks. Here, letRepresenting the score, boolean variable, of the ith structured-light stripe image predicted by the model of the present exampleRepresenting tag values of class i in single-point codingThe loss due to classification is expressed by the following formula:
(5)
wherein c represents the total number of weld types. Finally, total loss Is the overall training goal, and the deduction process is as follows:
(6)
Wherein, AndThe heat map loss, the offset loss and the classification loss,AndThe constant weights assigned to offset loss and classification loss are represented, with a configuration value of 1.
There is a correlation between the point prediction and the laser stripe classification. Specifically, if the classification result is predicted as "lap seam", the model identifies the first two partitions in the heat map tensor that have the highest predictive score. These data are then combined with the corresponding segments on the offset tensor to determine the final positioning prediction result. In this case, the suppression of the predictive score threshold need not be considered. However, the heat map tensor is resized to match the uniform shape of the original image, completely covering the original image. The colors in the heat map represent the prediction scores assigned to the potential feature points. Fig. 5 shows a feature point recognition effect diagram, which shows a broken laser stripe, and the implementation method can accurately recognize two ends as potential feature points despite the fact that the two ends are very close.
The following verifies the method of this embodiment:
the data sets for training and testing the models are from the weld positioning and tracking sensors mentioned earlier. The model was trained on a computer equipped with an Nvidia 2080Ti GPU and an Intel Xeon (R) E5-2683 CPU.
The backbone layer is initialized with pre-trained MobileNetv's 3 weights to speed up training and achieve fast convergence. The FC layer was implemented using a standard 1 x 1 convolution, and the convolution layer parameters were initialized using He initialization method. As for BN layer parameters, the weight is initialized to 1 and the bias is initialized to 0. Using Adam as the optimizer, the initial learning rate was set to 5 x 10 -4.
For a welding task, the weld feature recognition model of the present embodiment model (Ours) can only predict one class of one image associated with all points. The DETR can learn the correlation between the welding image category and the feature point category in the dataset, thereby preventing erroneous classification of points in a single image into different categories. And YOLOv n would infer that each point would match a category.
In order to compare the performance of several models, the category of the point in each image where the prediction classification confidence is the greatest is the category of the whole image. To simplify the comparison and account for differences in sample numbers, the three models were evaluated in weld classification using the following three criteria:
(7)
(8)
(9)
Wherein, Representing weighted average precision, recall, and F1-score, respectively.Class i precision, recall, and F1-score, respectively.Represents the weighting factor of the i-th class (1/N in this embodiment).
True Positives (TP) indicate the number of samples predicted correctly, false Negatives (FN) indicate the number of labeled frames without corresponding correct prediction frames, false Positives (FP) indicate the number of prediction frames not meeting the correct prediction criteria, and can be calculated from the confusion matrix. Subscript i denotes class i, such as: indicating the number of samples for which class i prediction is correct.
The index values of the three models on the adjusted test set are shown in table 1. Fig. 6 (a) -6 (c) show confusion matrices obtained for different models on the adjusted test set. Fig. 6 (a) -6 (c) correspond to the prediction results of the present embodiment model, YOLOv n, and DETR, respectively.
It should be noted that, among the three key indexes for evaluating the classification performance, the numerical value of the model of this embodiment is highest, the weighted average accuracy is 0.9674, the weighted average recall is 0.9666, and the weighted average F1 score is 0.9668.
TABLE 1 index values of three models
To evaluate the feature point positioning performance of the model of this embodiment at different noise levels, the noise immunity is evaluated. The performance index includes Mean Absolute Error (MAE), root Mean Square Error (RMSE), standard deviationAnd predicting the projection of the coordinates and the tag coordinates to find an average Euclidean distance relative to the camera coordinate system
The positioning error of the feature points is calculated as follows according to the X-axis direction and the Y-axis direction of the image:
(10)
Wherein, AndTwo-dimensional label coordinates and two-dimensional prediction coordinates representing the feature points, respectively.
MAE is used to evaluate the static accuracy of a position fix and its derivation is as follows:
(11)
where N represents the total number of feature points in the specified test set, consistent with the definition in the following equation. The root mean square error will highlight the effect of the predicted outliers, which is derived as follows:
(12)
For assessing the stability/robustness of the model, the derivation procedure is as follows:
(13)
Wherein, Is at all characteristic pointsAverage value of (2).
After the two-dimensional positioning result is projected into the three-dimensional space, the measurement index is adoptedTo evaluate the error level of the three models. The calculation formula is as follows:
(14)
Wherein, AndThree-dimensional label coordinates and three-dimensional prediction coordinates respectively representing characteristic points according to the formulaAndCoordinates of the feature points with respect to the camera coordinate system can be calculated. This embodiment assumes a priori that the calibration results of the line structured light plane equation and the camera eigenvalue matrix are ideal, and any error can only be attributed to the position of the feature point predicted by the feature point positioning model in the image coordinate system.
Fig. 7, 8 and 9 show absolute value error curves of the present embodiment models YOLOv and DETR (Detection Transformer, target detection model based on transducer architecture), respectively, which represent test results using the three models. The green curve corresponds to the absolute error measured by the low noise test set, the blue curve corresponds to the absolute error measured by the high noise test set, and the red horizontal line represents the Mean Absolute Error (MAE).
Here, (a) and (b) in fig. 7 illustrate the feature point regression absolute value errors in the X and Y directions of the present embodiment model evaluated on the low noise test set, respectively. Fig. 7 (c) and (d) show the feature point regression absolute value errors in the X and Y directions, respectively, of the model of the present embodiment tested on the high noise test set. The peak value, the peak position of the absolute value error, and the mean absolute value error (MAE) are all noted. Fig. 8 (a) and (b) illustrate the feature point regression absolute value errors in the X and Y directions of the YOLOv model evaluated on the low-noise test set, respectively. Fig. 8 (c) and (d) show the eigenvalue regression absolute errors in the X and Y directions for the YOLOv model tested on the high-noise test set, respectively. The peak value, the peak position of the absolute value error, and the mean absolute value error (MAE) are all noted. Fig. 9 (a) and (b) illustrate the feature point regression absolute value errors in the X and Y directions of the DETR model evaluated on the low noise test set, respectively. Fig. 9 (c) and (d) show the feature point regression absolute value errors in the X and Y directions of the DETR model tested on the high noise test set, respectively. The peak value, the peak position of the absolute value error, and the mean absolute value error (MAE) are all noted.
All indices related to feature point positioning errors have been calculated and are listed in tables 2 and 3, mean Absolute Error (MAE), root Mean Square Error (RMSE) and standard deviationEvaluating along X and Y directions of the image, and averaging the two directions with average MAE, average RMSE and averageAnd (3) representing. Meanwhile, tables 2 and 3 also list the results of the present example model without using CCAM.
Table 2 statistical features of three models for low noise welded structure light stripe feature point positioning performance
In table 2, ours represents the present embodiment model, ours † represents that the present embodiment model does not use CCAM. MAE, RMSE and σf are all in pixels,Then in millimeters.
It can be derived that the MAE and RMSE of the three models are substantially at the same level in low noise situations. Notably, the present embodiment model with and without CCAM, in terms of average σf, appears prominent compared to the other two models, reaching 1.943 pixels and 1.922 pixels, respectively. At the position ofIn this respect, the model of this example performed best, reaching 0.197 mm. Under low noise test conditions, there was no significant difference with CCAM.
Table 3 experimental results of locating performance evaluation of weld feature points in a high noise test set
In table 3, ours represents the present embodiment model, ours † represents that the present embodiment model does not use CCAM. MAE, RMSE and σf are all in pixels,Then in millimeters.
It can be seen that in a noisy environment, the average MAE, RMSE, and ρ mean of the model of the present example are 1.736 pixels, 2.407 pixels, and 0.205 millimeters, respectively. These indices are all better than YOLOv and DETR. Furthermore, compared to the present example model without CCAM, the CCAM-integrated model exhibits higher performance over most of the positioning indicators. This phenomenon is in sharp contrast to the measurement results observed in the low noise test set. The average σ f value of the model of this example was 2.217 pixels, which indicates that the model of this example has excellent stability in accurately locating the weld feature points in the high noise weld image.
The lightweight effect of the model of this embodiment is verified as follows.
The lightweight nature of the model of this embodiment is evaluated based on a number of metrics, including the total number of model parameters (Params), floating point number of operations (FLOPs), average delay (MEAN LATENCY), and Frame Per Second (FPS). Params can indirectly measure the computational complexity and the memory utilization, and FLOPs represents the computational cost of the model. Lower average delay (or higher frames per second) means a reduction in inference time, which is a key factor in achieving optimal real-time performance for a seam tracking system employing the model. This is particularly important for complex weld tracking applications, welding speeds, and welding process control. By performing 100 tests using a single image, the average delay time of the three models was determined and the Frame Per Second (FPS) value was calculated by dividing the average delay time by 1000 milliseconds. Table 4 provides an overview of the lightweight index for the three models of this example, YOLOv n and DETR, and fig. 10 shows the predictions obtained for the different models over the test set.
Table 4 lightweight index comparison of the three models on Intel Briui 78650U CPU
It can be seen that the model of this example is superior to the other models in all indexes, params being 1.3M, corresponding to 72% of YOLOv n, 3.5% of DETR, GFLOPs being 0.87GFLOPs, corresponding to 21% of YOLOv5n, 1.5% of DETR. MEAN LATENCY at 29.32 milliseconds and an FPS of 34.11, the present embodiment model meets the stringent requirements for real-time weld tracking.
Example two
In one or more embodiments, an anti-noise weld feature recognition system based on a lightweight neural network is disclosed, comprising:
the data acquisition module is used for acquiring welding image data in the welding process;
The weld joint prediction module is used for inputting the acquired welding image data into a trained weld joint feature recognition model and outputting the type and position information of the weld joint;
the weld joint feature recognition model comprises a main network, a neck layer and a head layer, wherein the main network outputs feature information of different scales and transmits the feature information to the neck layer, the neck layer fuses the feature information of different scales, the fused feature images are input into the head layer, the head layer comprises three branches, namely a heat image branch, an offset branch and a classification branch, the heat image branch is used for outputting a heat image tensor to obtain the probability that each feature value represents the corresponding region of the point and contains a feature point, the offset branch is used for outputting an offset tensor to obtain the position deviation of each dimension generated feature point, the classification branch is used for outputting a classification tensor to obtain the corresponding score of each weld joint type, and finally the position of the weld joint and the weld joint type are predicted.
The specific implementation manner of each module is the same as that in the first embodiment, and will not be described in detail.
Example III
In one or more embodiments, a terminal device is disclosed that includes a server including a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing the lightweight neural network-based anti-noise weld feature recognition method of embodiment one when executing the program.
Example IV
In one or more embodiments, a computer-readable storage medium is disclosed, in which are stored a plurality of instructions adapted to be loaded by a processor of a terminal device and to perform the lightweight neural network-based anti-noise weld feature identification method described in embodiment one.
While the foregoing description of the embodiments of the present invention has been presented in conjunction with the drawings, it should be understood that it is not intended to limit the scope of the invention, but rather, it is intended to cover all modifications or variations within the scope of the invention as defined by the claims of the present invention.

Claims (7)

1.一种基于轻量化神经网络的抗噪声焊缝特征识别方法,其特征在于,包括:1. A noise-resistant weld feature recognition method based on a lightweight neural network, characterized by comprising: 获取焊接过程中的焊接图像数据;Acquire welding image data during welding process; 将获取的焊接图像数据输入至训练好的焊缝特征识别模型,输出焊缝的类型和位置信息;Input the acquired welding image data into the trained weld feature recognition model, and output the type and location information of the weld; 其中,所述焊缝特征识别模型包括主干网络、颈部层和头部层;所述主干网络输出不同尺度的特征信息,并将特征信息传输至颈部层;所述颈部层对不同尺度的特征信息进行融合,融合后的特征图输入头部层;所述头部层包括热图分支、偏移分支和分类分支三个分支,热图分支用于输出热图张量,得到每个区域包含特征点的概率;偏移分支用于输出偏移张量,得到每个维度生成特征点的位置偏差;分类分支用于输出分类张量,得到每种焊缝类型的对应得分;最终预测得到焊缝的位置及焊缝类型;Among them, the weld feature recognition model includes a backbone network, a neck layer and a head layer; the backbone network outputs feature information of different scales and transmits the feature information to the neck layer; the neck layer fuses the feature information of different scales, and the fused feature map is input into the head layer; the head layer includes three branches: a heat map branch, an offset branch and a classification branch. The heat map branch is used to output a heat map tensor to obtain the probability that each area contains a feature point; the offset branch is used to output an offset tensor to obtain the position deviation of the feature point generated in each dimension; the classification branch is used to output a classification tensor to obtain the corresponding score of each weld type; and finally the weld position and weld type are predicted; 所述颈部层采用特征金字塔网络结构,包括:主干网络输出的最深尺度的特征信息依次经过级联通道注意模块、第三卷积层和自适应池化层后,输出至头部层的分类分支;主干网络输出的第二尺度的特征信息经过第二卷积层后,与上采样的第三卷积层的输出进行合并,然后输出至第一DP模块;主干网络输出的第一尺度的特征信息经过第一卷积层后,与上采样的第一DP模块的输出进行合并,经过第二DP模块分别传输至头部层的热图分支和偏移分支;The neck layer adopts a feature pyramid network structure, including: the deepest scale feature information output by the backbone network passes through the cascade channel attention module, the third convolution layer and the adaptive pooling layer in sequence, and is output to the classification branch of the head layer; the second scale feature information output by the backbone network passes through the second convolution layer, is merged with the output of the upsampled third convolution layer, and then is output to the first DP module; the first scale feature information output by the backbone network passes through the first convolution layer, is merged with the output of the upsampled first DP module, and is transmitted to the heat map branch and the offset branch of the head layer respectively through the second DP module; 所述级联通道注意模块将主干网络输出的最深尺度的特征信息划分为三个分块,每一个分块分别经过通道注意模块进行通道加权;将三个分块进行通道加权后的输出进行拼接,得到级联通道注意模块的输出;其中,第一个分块经过通道注意模块后的输出与第二个分块进行合并;第二个分块经过通道注意模块后的输出与第三个分块进行合并;The cascade channel attention module divides the deepest scale feature information output by the backbone network into three blocks, and each block is channel-weighted by the channel attention module; the outputs of the three blocks after channel weighting are spliced to obtain the output of the cascade channel attention module; wherein the output of the first block after passing through the channel attention module is merged with the second block; the output of the second block after passing through the channel attention module is merged with the third block; 所述第一DP模块和第二DP模块的结构相同,包括:依次连接的深度卷积层、第一BN层、激活函数层、逐点卷积层和第二BN层;通过深度卷积层获取空间特征信息,在深度卷积层提取的空间特征信息之上,通过逐点卷积层组合这些空间特征,实现空间特征信息的融合。The first DP module and the second DP module have the same structure, including: a deep convolution layer, a first BN layer, an activation function layer, a point-by-point convolution layer and a second BN layer connected in sequence; spatial feature information is obtained through the deep convolution layer, and on the spatial feature information extracted by the deep convolution layer, these spatial features are combined through the point-by-point convolution layer to achieve fusion of spatial feature information. 2.如权利要求1所述的一种基于轻量化神经网络的抗噪声焊缝特征识别方法,其特征在于,所述主干网络采用轻量化的MobileNetv3结构。2. The noise-resistant weld feature recognition method based on a lightweight neural network as described in claim 1 is characterized in that the backbone network adopts a lightweight MobileNetv3 structure. 3.如权利要求1所述的一种基于轻量化神经网络的抗噪声焊缝特征识别方法,其特征在于,所述焊缝特征识别模型的损失函数为:3. The noise-resistant weld feature recognition method based on a lightweight neural network according to claim 1, wherein the loss function of the weld feature recognition model is: ; 其中,分别为热图损失、偏移损失和分类损失,分别为分配给偏移损失和分类损失的权重。in, , and They are heat map loss, offset loss and classification loss respectively. and are the weights assigned to bias loss and classification loss, respectively. 4.如权利要求3所述的一种基于轻量化神经网络的抗噪声焊缝特征识别方法,其特征在于,所述热图损失具体为:4. The noise-resistant weld feature recognition method based on a lightweight neural network according to claim 3, wherein the heat map loss is specifically: ; 其中,假设为焊缝特征识别模型输出的热图中位置处的预测得分;Mij为位置处的真实标签,表示物体中心点出现在该位置的概率;N0是图像中特征点的数量,H和W代表热图的空间大小,α和β均为权重。Among them, assuming Locations in the heat map output for the weld feature recognition model The prediction score at the position; M ij is the is the true label at , indicating the probability that the center point of the object appears at this position; N0 is the number of feature points in the image, H and W represent the spatial size of the heat map, and α and β are weights. 5.一种基于轻量化神经网络的抗噪声焊缝特征识别系统,其特征在于,包括:5. A noise-resistant weld feature recognition system based on a lightweight neural network, characterized by comprising: 数据获取模块,用于获取焊接过程中的焊接图像数据;A data acquisition module, used for acquiring welding image data during the welding process; 焊缝预测模块,用于将获取的焊接图像数据输入至训练好的焊缝特征识别模型,输出焊缝的类型和位置信息;The weld prediction module is used to input the acquired welding image data into the trained weld feature recognition model and output the type and location information of the weld; 其中,所述焊缝特征识别模型包括主干网络、颈部层和头部层;所述主干网络输出不同尺度的特征信息,并将特征信息传输至颈部层;所述颈部层对不同尺度的特征信息进行融合,融合后的特征图输入头部层;所述头部层包括热图分支、偏移分支和分类分支三个分支,热图分支用于输出热图张量,得到每个区域包含特征点的概率;偏移分支用于输出偏移张量,得到每个维度生成特征点的位置偏差;分类分支用于输出分类张量,得到每种焊缝类型的对应得分;最终预测得到焊缝的位置及焊缝类型;Among them, the weld feature recognition model includes a backbone network, a neck layer and a head layer; the backbone network outputs feature information of different scales and transmits the feature information to the neck layer; the neck layer fuses the feature information of different scales, and the fused feature map is input into the head layer; the head layer includes three branches: a heat map branch, an offset branch and a classification branch. The heat map branch is used to output a heat map tensor to obtain the probability that each area contains a feature point; the offset branch is used to output an offset tensor to obtain the position deviation of the feature point generated in each dimension; the classification branch is used to output a classification tensor to obtain the corresponding score of each weld type; and finally the weld position and weld type are predicted; 所述颈部层采用特征金字塔网络结构,包括:主干网络输出的最深尺度的特征信息依次经过级联通道注意模块、第三卷积层和自适应池化层后,输出至头部层的分类分支;主干网络输出的第二尺度的特征信息经过第二卷积层后,与上采样的第三卷积层的输出进行合并,然后输出至第一DP模块;主干网络输出的第一尺度的特征信息经过第一卷积层后,与上采样的第一DP模块的输出进行合并,经过第二DP模块分别传输至头部层的热图分支和偏移分支;The neck layer adopts a feature pyramid network structure, including: the deepest scale feature information output by the backbone network passes through the cascade channel attention module, the third convolution layer and the adaptive pooling layer in sequence, and is output to the classification branch of the head layer; the second scale feature information output by the backbone network passes through the second convolution layer, is merged with the output of the upsampled third convolution layer, and then is output to the first DP module; the first scale feature information output by the backbone network passes through the first convolution layer, is merged with the output of the upsampled first DP module, and is transmitted to the heat map branch and the offset branch of the head layer respectively through the second DP module; 所述级联通道注意模块将主干网络输出的最深尺度的特征信息划分为三个分块,每一个分块分别经过通道注意模块进行通道加权;将三个分块进行通道加权后的输出进行拼接,得到级联通道注意模块的输出;其中,第一个分块经过通道注意模块后的输出与第二个分块进行合并;第二个分块经过通道注意模块后的输出与第三个分块进行合并;The cascade channel attention module divides the deepest scale feature information output by the backbone network into three blocks, and each block is channel-weighted by the channel attention module; the outputs of the three blocks after channel weighting are spliced to obtain the output of the cascade channel attention module; wherein the output of the first block after passing through the channel attention module is merged with the second block; the output of the second block after passing through the channel attention module is merged with the third block; 所述第一DP模块和第二DP模块的结构相同,包括:依次连接的深度卷积层、第一BN层、激活函数层、逐点卷积层和第二BN层;通过深度卷积层获取空间特征信息,在深度卷积层提取的空间特征信息之上,通过逐点卷积层组合这些空间特征,实现空间特征信息的融合。The first DP module and the second DP module have the same structure, including: a deep convolution layer, a first BN layer, an activation function layer, a point-by-point convolution layer and a second BN layer connected in sequence; spatial feature information is obtained through the deep convolution layer, and on the spatial feature information extracted by the deep convolution layer, these spatial features are combined through the point-by-point convolution layer to achieve fusion of spatial feature information. 6.一种终端设备,其包括处理器和存储器,处理器用于实现指令;存储器用于存储多条指令,其特征在于,所述指令适于由处理器加载并执行权利要求1-4任一项所述的基于轻量化神经网络的抗噪声焊缝特征识别方法。6. A terminal device, comprising a processor and a memory, the processor being used to implement instructions; the memory being used to store multiple instructions, characterized in that the instructions are suitable for being loaded by the processor and executing the anti-noise weld feature recognition method based on a lightweight neural network as described in any one of claims 1-4. 7.一种计算机可读存储介质,其中存储有多条指令,其特征在于,所述指令适于由终端设备的处理器加载并执行权利要求1-4任一项所述的基于轻量化神经网络的抗噪声焊缝特征识别方法。7. A computer-readable storage medium storing a plurality of instructions, characterized in that the instructions are suitable for being loaded by a processor of a terminal device and executing the noise-resistant weld feature recognition method based on a lightweight neural network as described in any one of claims 1 to 4.
CN202410804228.XA 2024-06-21 2024-06-21 Noise-resistant weld feature recognition method and system based on lightweight neural network Active CN118674985B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410804228.XA CN118674985B (en) 2024-06-21 2024-06-21 Noise-resistant weld feature recognition method and system based on lightweight neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410804228.XA CN118674985B (en) 2024-06-21 2024-06-21 Noise-resistant weld feature recognition method and system based on lightweight neural network

Publications (2)

Publication Number Publication Date
CN118674985A CN118674985A (en) 2024-09-20
CN118674985B true CN118674985B (en) 2025-01-28

Family

ID=92720854

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410804228.XA Active CN118674985B (en) 2024-06-21 2024-06-21 Noise-resistant weld feature recognition method and system based on lightweight neural network

Country Status (1)

Country Link
CN (1) CN118674985B (en)

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112651973B (en) * 2020-12-14 2022-10-28 南京理工大学 Semantic segmentation method based on cascade of feature pyramid attention and mixed attention
CN113674247B (en) * 2021-08-23 2023-09-01 河北工业大学 A Convolutional Neural Network Based X-ray Weld Defect Detection Method
CN116342531B (en) * 2023-03-27 2024-01-19 中国十七冶集团有限公司 Device and method for detecting quality of welding seam of high-altitude steel structure of lightweight large-scale building
CN116416576A (en) * 2023-04-04 2023-07-11 天津职业技术师范大学(中国职业培训指导教师进修中心) Smoke/flame double-light visual detection method based on V3-YOLOX
CN116894941A (en) * 2023-06-19 2023-10-17 华南理工大学 A lightweight image segmentation neural network construction method, real-time robust weld seam tracking detection method and system based on lightweight image segmentation neural network
CN117934942A (en) * 2024-01-23 2024-04-26 湖南视比特机器人有限公司 Weld joint recognition model, training method and weld joint tracking method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
A weld seam feature real-time extraction method of three typical welds based on target detection;Liangyuan Deng et al.;《Measurement》;20230104;第1-13页 *
YOLO-Weld: A Modified YOLOv5-Based Weld Feature Detection Network for Extreme Weld Noise;Ang Gao et al.;《Sensors》;20230616;第1-24页 *

Also Published As

Publication number Publication date
CN118674985A (en) 2024-09-20

Similar Documents

Publication Publication Date Title
US12105887B1 (en) Gesture recognition systems
CN110084299B (en) Target detection method and device based on multi-head fusion attention
CN113516664A (en) A Visual SLAM Method Based on Semantic Segmentation of Dynamic Points
CN110059558A (en) A kind of orchard barrier real-time detection method based on improvement SSD network
CN107851192B (en) Apparatus and method for detecting face part and face
CN111062263A (en) Method, device, computer device and storage medium for hand pose estimation
CN113780145A (en) Sperm morphology detection method, sperm morphology detection device, computer equipment and storage medium
CN118050734A (en) Robot positioning method and system based on binocular vision and laser scanning fusion
CN112967388A (en) Training method and device for three-dimensional time sequence image neural network model
CN109697727A (en) Target tracking method, system and storage medium based on correlation filtering and metric learning
KR101268596B1 (en) Foreground extraction apparatus and method using CCB and MT LBP
CN115205806A (en) Method and device for generating target detection model and automatic driving vehicle
CN118674985B (en) Noise-resistant weld feature recognition method and system based on lightweight neural network
CN117523428B (en) Ground target detection method and device based on aircraft platform
CN117576665A (en) A single-camera three-dimensional target detection method and system for autonomous driving
CN118115896A (en) Unmanned aerial vehicle detection method and system based on improvement YOLOv3
CN115116128B (en) Self-constrained optimization human body posture estimation method and system
CN113781500B (en) Method, device, electronic equipment and storage medium for segmenting cabin image instance
CN114842506A (en) Human body posture estimation method and system
KR20230090007A (en) Deep learning based joint 3d object detection and tracking using multi-sensor
CN110895680A (en) Unmanned ship water surface target detection method based on regional suggestion network
Wang et al. Top-Down Meets Bottom-Up for Multi-Person Pose Estimation
CN119169489B (en) A method for identifying insulator defects
Wang et al. Cross-modal fusion-based prior correction for road detection in off-road environments
CN119086574B (en) A method for inspecting tunnels using a mother-and-child robot

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant