CN118967690B

CN118967690B - A defect detection method based on deep contrastive learning

Info

Publication number: CN118967690B
Application number: CN202411449342.1A
Authority: CN
Inventors: 孙家栋; 王梦楠; 王泉; 苏秀娥; 施珮; 翟世臣
Original assignee: Wuxi University
Current assignee: Wuxi University
Priority date: 2024-10-17
Filing date: 2024-10-17
Publication date: 2025-01-24
Anticipated expiration: 2044-10-17
Also published as: CN118967690A

Abstract

The invention discloses a defect detection method based on depth contrast learning, which relates to the technical field of computer vision and comprises the steps of obtaining original image data of an industrial workpiece, carrying out normalization, denoising, cutting and size adjustment to form standardized image data, inputting the standardized image data into a depth contrast learning model based on a Transformer, outputting global feature vectors of the workpiece image, inputting an optimized positive and negative sample pair into a meta learning unit, carrying out rapid adaptation training on the depth contrast learning model through a meta learning mechanism, applying the trained contrast learning model to real-time defect detection, carrying out defect judgment on a newly input workpiece image, and outputting a defect detection result of the workpiece. The invention improves the accuracy of defect detection, greatly reduces the training time and data requirements, and remarkably improves the limitation of the prior art in a small sample environment.

Description

Defect detection method based on deep contrast learning

Technical Field

The invention relates to the technical field of computer vision, in particular to a defect detection method based on deep contrast learning.

Background

In the development of industrial automation and intelligent manufacturing, defect detection of industrial workpieces is an important component for ensuring product quality. At present, the traditional defect detection methods mainly depend on a rule-based image processing technology or manual detection, and the methods can be performed in a simple scene, but the limitations of the methods are gradually revealed with the increase of the complexity of workpieces and the improvement of the requirements on detection precision and efficiency. Especially in a small sample environment, the traditional method has low detection precision because of limited capability of extracting workpiece features and difficulty in accurately identifying fine defects on the surface of a workpiece. In recent years, the application of deep learning technology in the field of image processing has been significantly advanced, and in particular, convolutional Neural Networks (CNNs) and other deep neural networks have shown a strong feature extraction capability in a target detection task. However, these techniques generally rely on large scale annotation data, which is poorly generalized for small sample scenes, and is prone to over-fitting problems, thereby affecting the accuracy and robustness of the detection.

The existing deep learning method is faced with the obvious problem of precision reduction when detecting defects of industrial workpieces in a small sample environment. Because it is often difficult to obtain large amounts of industrial image data with defect labels, deep learning models are difficult to effectively train with small amounts of data, resulting in unsatisfactory performance of the models in practical applications. In addition, existing models have poor adaptability in the face of new workpiece types or new defect types, and cannot be quickly adjusted to cope with new inspection tasks. Therefore, in a small sample environment, how to improve the accuracy of defect detection is a critical problem to be solved in the current industrial field.

Disclosure of Invention

The present invention has been made in view of the above-described problems occurring in the prior art.

Therefore, the invention provides a defect detection method based on deep contrast learning, which solves the problem of low detection precision of industrial workpiece defects in a small sample environment.

In order to solve the technical problems, the invention provides the following technical scheme:

in a first aspect, an embodiment of the present invention provides a defect detection method based on depth contrast learning, which includes obtaining original image data of an industrial workpiece, and performing normalization, denoising, clipping and size adjustment to form standardized image data;

Inputting the standardized image data into a depth contrast learning model based on a transducer, and outputting a global feature vector of the workpiece image;

inputting the global feature vector into a positive and negative sample pair generating unit for contrast learning to generate a positive sample pair and a negative sample pair;

optimizing the feature distribution of the positive and negative sample pairs through a loss function of depth comparison learning to obtain the optimized positive and negative sample pairs;

Inputting the optimized positive and negative sample pairs into a meta learning unit, and carrying out rapid adaptation training on a depth comparison learning model through a meta learning mechanism;

and applying the trained contrast learning model to real-time defect detection, judging the defects of the newly input workpiece image, and outputting the defect detection result of the workpiece.

As a preferable scheme of the defect detection method based on depth contrast learning, the method comprises the following steps of,

Acquiring a high-definition workpiece image through an industrial camera, reading an acquired image file by using an OpenCV, and loading the acquired image file into a memory for processing;

checking whether the read image is in an RGB format, and if not, converting the read image into the RGB format;

After ensuring that the image is in RGB format, the size of the image is further checked, and finally the original image data of the checked and normalized industrial workpiece is output.

As a preferable mode of the defect detection method based on depth contrast learning of the present invention, the forming of the standardized image data comprises the specific steps of,

Normalizing pixel values of the original image data of the inspected and normalized industrial workpiece;

Denoising processing is carried out through bilateral filtering based on the normalized image;

cutting the denoised image into a key part containing the workpiece, and simultaneously keeping the sizes of the cut images consistent to form standardized image data.

As an optimal scheme of the defect detection method based on depth contrast learning, the method for detecting the defects comprises the following specific steps of,

Converting the standardized image data into a patch form input by a depth contrast learning model based on a transducer;

Dividing the image into non-overlapping patches, each patch having a size P x P;

Flattening each patch, converting the patch into a one-dimensional vector, and mapping each patch to a fixed dimension D through linear projection;

let N be the number of patches, then the patch expression is:

Wherein H represents the height of the image and W represents the width;

After linear projection, the combination of all the patches is expressed as an input feature matrix X epsilon R ^N×D;

Three groups of matrixes of a query matrix Q, a key matrix K and a value matrix V are generated through linear transformation of an input feature matrix X;

The attention weight matrix A is calculated through the mutual dependency relationship among each patch, and the expression is as follows:

Wherein D _k represents the dimensions of the query vector and the key vector, K ^T represents the transpose of the key matrix K;

calculating an output characteristic z=av based on the attention weight a and the value matrix V;

the input patch is spliced and linearly transformed through the outputs of a plurality of independent attention heads to generate a global feature vector, and the expression is:

Wherein F _gl is the global feature vector of the output, H represents the number of attention heads in the multi-head self-attention mechanism, Representing normalization factor, h representing index of current attention header, i representing index of patch into which input image is divided, A _h(X_i) representing attention weight with i-th patch extracted from attention matrix of h-th header, W _h representing linear transformation matrix of h-th attention header, and i-th patch feature vector input by X _i.

As a preferable scheme of the defect detection method based on depth contrast learning, the method comprises the steps of generating positive sample pairs and negative sample pairs,

Normalizing the global feature vector F _gl generated from the multi-head self-attention mechanism to obtain a global feature vector

Generating a plurality of images with different visual angles through data enhancement for an original image I of each workpiece;

Based on each enhanced image, extracting global feature vectors of the enhanced image through a transducer model

Pairing global feature vectors of different angles of the same workpiece in pairs to generate positive sample pairs;

randomly selecting images I of different workpieces, calculating global feature vectors for each image

Global feature vectors from different workpieces are paired to generate negative pairs of samples.

As an optimal scheme of the defect detection method based on depth contrast learning, the method comprises the following specific steps of,

For each pair of positive and negative sample pairs, calculating cosine similarity, and applying InfoNCE loss function to optimize the neural network model;

Using an Adam optimizer to update parameters of the neural network model based on InfoNCE losses;

Calculating a loss L, and back-propagating the loss Updating the neural network model parameters until the neural network model reaches a preset training round;

And mapping the input samples into the trained neural network model feature space, and further optimizing positive and negative sample pairs.

As an optimal scheme of the defect detection method based on depth contrast learning, the invention comprises the following steps of performing rapid adaptation training on a depth contrast learning model through a meta learning mechanism,

Dividing the optimized positive and negative sample pairs into a plurality of meta-tasks T _i, wherein each task comprises positive and negative sample pairs of a workpiece;

on the support set of each task, adjusting the temperature parameters using an adaptive temperature adjustment mechanism;

after the support set adjusts the parameters, dynamically distributing the weight w _i for each task through the dynamic task weight as follows:

w_i＝(1+exp(-γ·(L_i(Q_i,θ')-L_avg)))^-1;

Wherein, -gamma represents the sensitivity hyper-parameter of the difficulty of the adjustment task to the weight, L _avg represents the average loss on the query set of all tasks, theta 'represents the parameter of the model after one gradient update on the support set of the meta-task T _i, and L _i(Q_i, theta') represents the loss of the meta-task T _i on the query set Q _i;

combining the preliminary updating of the support set and the loss of the query set, and executing global parameter updating through the dynamic weight w _i of each task;

And gradually optimizing the depth contrast learning model through multiple rounds of task iteration.

As an optimal scheme of the defect detection method based on the depth contrast learning, the method outputs the defect detection result of the workpiece and comprises the following specific steps of,

Converting an input workpiece image into a gray image, and performing standardization processing;

Extracting the workpiece contour from the standardized gray level image by using a Canny edge detection algorithm to generate an interested region;

Inputting the extracted region of interest of the workpiece into a transducer encoder, and extracting a feature embedding vector of the input workpiece through the encoder;

Invoking a positive sample characteristic embedded vector Z _pos and a negative sample characteristic embedded vector Z _neg which are optimized in advance from a sample library;

Calculating cosine similarity sim _pos of the input workpiece feature Z _input and the positive sample feature embedding vector Z _pos;

Calculating cosine similarity sim _neg of the input workpiece feature Z _input and the negative-sample feature embedding vector Z _neg;

Judging whether the input workpiece has defects or not by using a preset threshold tau according to the calculated similarity;

If sim _pos < τ and sim _neg > τ, then the workpiece is judged to be defective, otherwise, it is judged to be non-defective.

In a second aspect, an embodiment of the present invention provides a computer device, including a memory and a processor, where the memory stores a computer program, where the computer program, when executed by the processor, implements any step of the defect detection method based on depth contrast learning according to the first aspect of the present invention.

In a third aspect, an embodiment of the present invention provides a computer readable storage medium having a computer program stored thereon, wherein the computer program, when executed by a processor, implements any step of the depth contrast learning based defect detection method according to the first aspect of the present invention.

The method has the beneficial effects that the accuracy of industrial workpiece defect detection in a small sample environment is remarkably improved by introducing a deep contrast learning mechanism. And secondly, positive and negative sample pairs are generated through contrast learning, so that the distinguishing capability of the model on different workpiece features is enhanced, and the dependence on large-scale labeling data is reduced. Meanwhile, the method combines the strong global feature extraction capability and element learning mechanism of the transducer model, and can quickly optimize model parameters under the condition of a small number of samples, so that the method has better adaptability when facing new types of workpieces or defects. The design not only improves the accuracy of defect detection, but also greatly reduces the training time and data requirements, and remarkably improves the limitation of the prior art in a small sample environment.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a flowchart of a defect detection method based on depth contrast learning in embodiment 1.

Fig. 2 is a flowchart of acquiring raw image data of an industrial workpiece in example 1.

Detailed Description

In order that the above-recited objects, features and advantages of the present invention will become more readily apparent, a more particular description of the invention will be rendered by reference to specific embodiments thereof which are illustrated in the appended drawings.

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, but the present invention may be practiced in other ways other than those described herein, and persons skilled in the art will readily appreciate that the present invention is not limited to the specific embodiments disclosed below.

Further, reference herein to "one embodiment" or "an embodiment" means that a particular feature, structure, or characteristic can be included in at least one implementation of the invention. The appearances of the phrase "in one embodiment" in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments.

Embodiment 1, referring to fig. 1 and 2, provides a defect detection method based on deep contrast learning, which comprises the following steps:

S1, obtaining original image data of an industrial workpiece, and carrying out normalization, denoising, cutting and size adjustment to form standardized image data.

Furthermore, the high-definition workpiece image is acquired through an industrial camera, the acquired image file is read by using OpenCV, and the image file is loaded into a memory for processing;

It should be noted that proper exposure, white balance, and resolution (e.g., 1920x1080 pixels) should be ensured while the workpiece image is stored in JPEG or PNG format when the high definition workpiece image is acquired.

Preferably, openCV is a library widely used for computer vision, capable of conveniently reading, processing and displaying images.

Checking whether the read image is in RGB format (i.e. has 3 color channels), if the image is not in RGB format (such as gray scale image), converting into RGB format to ensure uniform color space for convenient subsequent processing, which comprises the following steps:

After the image is read using OpenCV, the image is represented as a multi-dimensional array in the shape of (height, width, channels). Where channels represents the number of channels of the image. The number of image channels in RGB format is 3, and the number of channels in gray scale is 1.

By checking the shape of the image, if the number of channels is 3, it is explained that the image is in RGB format, and if the number of channels is 1, it is explained that the image is in gray scale, and it is necessary to convert into RGB format.

If the image is a gray scale (i.e., channel number 1), we can convert it to RGB format using the cv2.cvttcolor () function of OpenCV.

After ensuring that the image is in RGB format, the dimensions of the image, such as the width, height and channel number of the image, are further inspected to ensure compliance with expectations, and finally the inspected and normalized raw image data of the industrial workpiece is output.

It should be noted that, the specific process of checking the size of the image refers to reading the size information of the image, checking the width, the height and the number of channels of the image according to the requirement, and processing the image size which does not meet the requirement;

The size information of the read image is generally represented as a three-dimensional array (heightx width x channels) in the memory, wherein height is the height of the image (the number of pixels), width is the width of the image (the number of pixels), channels is the number of channels of the image (RGB image is 3, and gray scale image is 1).

For checking the width, height and channel number of the image as needed, it is to check whether the resolution of the image meets the expectations, for example, whether it is the size required by the model (e.g., 224x 224), and whether the channel number is 3 (RGB).

By processing an unsatisfactory image size, it is meant that if the size of the image does not meet a predetermined requirement, resizing (e.g., using cv2. Size ()) may be considered in a subsequent step.

Further, the pixel values of the original image data of the industrial workpiece subjected to inspection and normalization are subjected to normalization;

cutting the denoised image into a key part containing a workpiece, so as to reduce noise and interference and enhance the attention of a model to workpiece characteristics;

It should be noted that the "critical portion containing the workpiece" refers to the region of the image that actually contains the workpiece, and not the extraneous background or noise in the image, and specifically refers to the subject of the workpiece, that portion of the image that is relevant to the workpiece, typically the main region that needs to be analyzed, detected, or classified. These areas contain structural features of the workpiece or portions where defects may exist, and extraneous background is removed-the workpiece image may contain a significant amount of unnecessary background (e.g., the environment at the time of industrial camera shooting, equipment background, etc.). By cutting, irrelevant background parts can be removed, only the workpiece is ensured to be reserved, so that the interference on a model is reduced, and key focusing features are that the cut image is focused on key feature parts of the workpiece, and the features are the contents of the model which need to be focused. Such as edges of the workpiece, surface details, areas where defects may be present, etc.

For example, the workpiece is typically centered in the image, a central region of the image can be cropped based on a priori knowledge (e.g., 75% of the central portion of the image is preserved), and if the position of the workpiece is not determined, a key portion of the workpiece can be automatically located and cropped in conjunction with a target detection algorithm (e.g., YOLO or fast R-CNN).

Adjusting the size of the cut image to be consistent, for example, to be a fixed size (224 x224, for example), which ensures that the size of the image input into the model is consistent, and avoids influencing the performance of the model due to inconsistent size;

the normalized, denoised, cropped and resized image will form normalized image data.

S2, inputting the standardized image data into a depth contrast learning model based on a transducer, and outputting a global feature vector of the workpiece image.

Further, the standardized image data are converted into a patch form input by a depth contrast learning model based on a transducer;

it should be noted that, patch refers to a small region or block extracted from an image, and that patches is a complex form of patch and represents all image small blocks obtained by dividing the entire image.

Flattening each patch, converting the flattening operation into a one-dimensional vector, and mapping each patch to a fixed dimension D through linear projection so as to adapt to the input requirement of a transducer;

let N be the number of patches, then the patch expression is:

Wherein H represents the height of the image and W represents the width;

The core goal of the self-attention mechanism is to capture global dependencies by computing the interrelationships between elements in the input sequence (i.e. "attention"). In an image processing task, an input image is divided into a plurality of patches, each patch is flattened into a one-dimensional vector, and then a self-attention mechanism is introduced by:

the query matrix Q is expressed as q=xw _Q;

the key matrix K is expressed as K=XW _K;

The value matrix V is expressed as V=XW _V;

Wherein W _Q represents mapping of input data to a Query vector, W _K represents mapping of input data to a Key vector, and W _V represents mapping of input data to a Value vector;

It should be noted that in the self-attention mechanism, the Query vector is used to calculate similarity with Key vectors of other elements to determine the relationship between the current input element (such as a word or an image patch) and the other elements, and the Key vector is one of the representations of each input element, and performs dot product operation with the Query vector to measure similarity. Key is similar to an "index" or "Key" in a database, which helps the model find the input element most relevant to Query, and Value vector is a representation of the content of each input element, which is ultimately weighted and summed according to Quer and Key similarity, generating the final output. The Value vector contains the actual information entered, and is used with Query, key to calculate the final attention weighting result.

Where D is the fixed dimension of the input patch and D _k is the mapped dimension (typically the lower dimension).

It should also be noted that W _Q,W_K,W_V refers specifically to the learnable parameters in the model, and that the learnable weight matrices refer to the parameters of these matrices W _Q,W_K,W_V that are not fixed constants, but are continuously adjusted during model training. During training, the weight matrixes are updated by forward propagation and backward propagation of the input data through optimization algorithms such as gradient descent and the like, so that the model can be better adapted to the characteristics of the input data.

it should be noted that by dividing by The dot product result can be prevented from becoming too large, especially in the case of a large dimension D _k, and if scaling is not performed, the dot product value can become too large, resulting in a very small softmax gradient, which affects the training effect of the model.

Wherein F _gl is the global feature vector of the output, H represents the number of attention heads in the multi-head self-attention mechanism, Representing normalization factors for averaging output results of a plurality of heads, h representing indexes of current attention heads, the output W _h XdX of each head h participating in generation of global features, i representing indexes of patches into which an input image is divided, a _h(X_i) representing attention weights with the i-th patch extracted from an attention matrix of the h head, W _h representing a linear transformation matrix of the h attention head, and an i-th patch feature vector input by X _i;

In the transducer architecture, the multi-head self-attention mechanism calculates multiple attention heads in parallel, each head independently extracts different features, and by dividing H, the outputs of the multiple heads can be normalized, ensuring that the final global feature is the average of the outputs of the individual heads.

S3, inputting the global feature vector into a positive and negative sample pair generating unit for contrast learning, and generating a positive sample pair and a negative sample pair.

Further, the global feature vector F _gl generated from the multi-head self-attention mechanism is normalized to obtain a global feature vectorThe expression is:

Wherein, ||F _gl | is the modulo length of the global feature vector;

For the original image I of each workpiece, generating a plurality of images with different visual angles through data enhancement (such as rotation, cutting, overturning, color dithering and the like);

Based on each enhanced image, extracting global feature vectors of the enhanced image through a transducer model Pairing global feature vectors of the same workpiece at different angles in pairs to generate positive sample pairs;

Example if workpiece B is data enhanced to generate 3 images at different perspectives The positive sample pair generated is:

Pairing global feature vectors from different workpieces to generate a negative sample pair;

example if the global feature vectors for workpiece B and workpiece C are AndThe negative sample pair generated is:

It should be noted that in contrast learning, positive samples correspond to images from different workpieces, with the same global feature (because they are from the same workpiece), but with different specific image appearances. To ensure the validity of the negative pair of samples, the images from the different workpieces should have significant feature differences so that the model can learn to distinguish between the different workpieces.

And S4, optimizing the characteristic distribution of the positive and negative sample pairs through a loss function of deep comparison learning, and obtaining the positive and negative sample pairs after optimization.

Further, for each pair of positive and negative sample pairs, calculating cosine similarity and optimizing the neural network model by applying InfoNCE loss functions;

it should be noted that cosine similarity is used to measure the angle between two feature vectors in a high-dimensional space, and that closer to 1 means more similar and closer to 0 or negative means less similar.

The formula for calculating cosine similarity is:

wherein F _i and F _j represent feature vectors of positive or negative sample pairs, respectively, F _i×F_j represents the product of two eigenvectors, the values of F _i and F _j are respectively norm of feature vector (L2 norm);

It should also be noted that in contrast learning, infoNCE loss function is one of the most common loss functions that optimizes the feature representation of the model by maximizing the similarity of positive pairs of samples while minimizing the similarity of negative pairs of samples.

The formula of the loss is:

Wherein F _i is a feature vector of a query sample, F _p is a positive sample pair feature vector of the query sample, F _j is a negative sample pair feature vector (including all negative examples of the query sample), τ is a temperature parameter for controlling scaling of similarity, and sensitivity of the model to different similarities is adjusted (smaller τ makes the model more sensitive to differences between samples);

It should be noted that Adam optimizer is an adaptive learning rate optimization algorithm capable of automatically adjusting the learning rate of each parameter, and SGD (random gradient descent) is a commonly used optimization algorithm for updating model parameters directly according to the gradient of the loss function.

The formula of the usage optimizer is (SGD for example):

where θ represents a parameter of the model, η is a learning rate (learning rate), Is a loss functionGradient of model parameter θ;

Calculating a loss L, and back-propagating the loss And updating the neural network model parameters until the neural network model reaches a preset training round.

Counter-propagationThe expression is:

Wherein the gradient is calculated by back propagation The updating direction of the model parameters is guided, so that the model can gradually optimize the characteristic distribution of positive and negative sample pairs.

Mapping the input samples into the trained neural network model feature space, and further optimizing the positive and negative sample pairs so that the feature representations of the positive sample pairs are closer and the feature representations of the negative sample pairs are farther.

S5, inputting the positive and negative sample pairs after optimization into a meta learning unit, and carrying out rapid adaptation training on the deep comparison learning model through a meta learning mechanism.

Further, the optimized positive and negative sample pairs are divided into a plurality of meta-tasks T _i, and each task comprises the positive and negative sample pairs of the workpiece;

w_i＝(1+exp(-γ·(L_i(Q_i,θ')-L_avg)))^-1;

through multiple rounds of task iteration, the deep comparison learning model is gradually optimized, so that the deep comparison learning model has higher adaptability among different tasks.

It should be noted that, the expression of dynamically assigned weights w _i is specifically derived as follows:

Firstly, in the meta-learning process, the difficulty of each meta-task T _i needs to be evaluated, and the model performance of the task T _i is measured by querying the set loss L _i(Q_i, θ '), specifically, the loss calculated by the model on the query set Q _i by using the parameter θ' after the support set update.

If L _i(Q_i, θ') is large, meaning that the model performs poorly on the task, the task may be difficult.

If L _i(Q_i, θ') is small, this means that the model performs better on the task, which is relatively easy.

Meanwhile, in order to make the difficulty of different tasks have comparability, the average query set loss L _avg of all the tasks is introduced and used as a reference standard.

The difference between the single task loss L _i(Q_i, θ') and the average loss of all tasks L _avg is calculated to quantify the difficulty of each task, task loss Δl _i is expressed as:

ΔL_i＝L_i(Q_i,θ')-L_avg;

Task T _i is more difficult than the average task when L _i(Q_i,θ')>L_avg, Δl _i > 0.

Task T _i is easier than the average task when L _i(Q_i,θ')<L_avg, Δl _i < 0.

Preferably, the sigmoid function is an ideal choice in view of this requirement. It is a common sigmoid function with an output value between (0, 1) that can smoothly map the input to values between 0 and 1.

Converting the loss difference DeltaL _i into a weight w _i through the form of a sigmoid function;

The general form of the sigmoid function is:

Substituting the loss difference Δl _i into the sigmoid function:

Wherein, -gamma represents the sensitivity hyper-parameter of the difficulty of the task to the weight, which is used for controlling the influence sensitivity of the input (namely the loss difference) to the weight, and a larger gamma can increase the sensitivity of the weight to the loss difference, whereas a smaller gamma can make the weight change more gentle;

Substituting the loss difference DeltaL _i into the sigmoid expression, and finally obtaining the dynamic allocation weight w _i expression of each task:

w_i＝(1+exp(-γ·(L_i(Q_i,θ')-L_avg)))^-1;

Wherein, when L _i(Q_i, θ') is far greater than L _avg, i.e. the task is more difficult, the value of the exponential term exp (- γ. ΔL _i) approaches 0,w _i to 1, indicating that the importance of the task is higher;

When L _i(Q_i, θ') is much smaller than L _avg, i.e., the task is easier, the value of the exponential term exp (- γ. DELTA.L _i) is larger, and w _i is close to 0, indicating that the importance of the task is lower.

It should be further noted that, through the sigmoid function, the difference between the query set loss and the global average loss of the tasks is mapped to a weight w _i between 0 and 1, and the weight w _i dynamically reflects the difficulty of each task, so that more attention is given to the harder task when the model updates the global parameters.

S6, applying the trained contrast learning model to real-time defect detection, judging defects of the newly input workpiece image, and outputting a defect detection result of the workpiece.

Further, the input workpiece image is converted into a gray image, and standard treatment is carried out, so that the gray image is consistent with the distribution of training data;

Extracting the contour of the workpiece from the standardized gray image by using a Canny edge detection algorithm to generate an interested region, ensuring that the input of the model is an effective region of the workpiece, and removing irrelevant background information;

Inputting the extracted region of interest of the workpiece into a transducer encoder, and extracting a feature embedding vector Z _input of the input workpiece through the encoder, wherein the feature embedding vector Z is a high-dimensional feature representation of the workpiece image and is used for comparing with a reference sample;

Invoking a pre-optimized positive sample (non-defective workpiece) feature embedding vector Z _pos and a negative sample (defective workpiece) feature embedding vector Z _neg from a sample library;

The cosine similarity sim _pos expression of the input workpiece feature Z _input and the positive sample feature embedding vector Z _pos is calculated as follows:

The cosine similarity sim _neg expression of the input workpiece feature Z _input and the negative-sample feature embedding vector Z _neg is calculated as follows:

If sim _pos < τ and sim _neg > τ, then judging that the workpiece has defects;

otherwise, judging that the defect is not contained;

It should be noted that, the setting of the threshold τ is generally determined according to the specific application scenario and data distribution by the following method:

In some fixed scenarios, a suitable threshold τ may be manually selected by observing the similarity distribution of the training data and the test data, typically by visualizing the distribution map (e.g., histogram) of similarity, observing the difference between the similarity of the non-defective sample and the defective sample, and selecting a threshold that maximizes the distinction between positive and negative samples;

Secondly, parameter tuning is carried out by using the verification set, namely, whether samples with defects are known in the verification set or not are input into a model, and detection results under different thresholds tau are observed.

Then, indexes such as accuracy, recall, F1 value and the like under different thresholds can be calculated, and the threshold tau for optimizing the indexes can be selected.

Accuracy (Accuracy) the ratio of the number of correctly classified samples to the total number of samples.

Recall (Recall) the proportion of defective workpieces that are correctly detected as defective.

And F1, regulating the average of the accuracy and the recall rate, and comprehensively considering the accuracy and the recall capability of the model.

It should also be noted that when selecting the threshold τ, several factors are typically considered:

detection accuracy, ensuring that the threshold can maximally distinguish Positive samples from negative samples, and reducing False Positive (False Positive) and False negative (FALSE NEGATIVE).

Environmental change if the workpiece types are diverse or the detected environmental change is large, it is recommended to use a dynamic threshold adjustment mechanism.

And (3) data distribution, namely selecting a threshold value for separating the positive sample and the negative sample from each other according to the similarity distribution characteristics of the workpiece image.

Threshold selection in an example scenario:

Assuming that in one specific defect detection task, the following similarity distribution is observed through tuning of the verification set:

The similarity sim _pos of positive samples is mostly between 0.8 and 1.0.

The negative samples have a similarity sim _neg of mostly between 0.0 and 0.4.

From this distribution, a threshold τ=0.7 can be selected. At this time:

If sim _pos is less than 0.7, judging that the similarity between the input workpiece and the non-defective sample is low, and possibly having defects;

if sim _neg >0.7, it is determined that the input workpiece has a high similarity to the defective sample, and a defect may exist.

The embodiment also provides computer equipment, which is suitable for the situation of the defect detection method based on the depth contrast learning and comprises a memory and a processor, wherein the memory is used for storing computer executable instructions, and the processor is used for executing the computer executable instructions to realize the defect detection method based on the depth contrast learning.

The computer device may be a terminal comprising a processor, a memory, a communication interface, a display screen and input means connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The communication interface of the computer device is used for carrying out wired or wireless communication with an external terminal, and the wireless mode can be realized through WIFI, an operator network, NFC (near field communication) or other technologies. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, can also be keys, a track ball or a touch pad arranged on the shell of the computer equipment, and can also be an external keyboard, a touch pad or a mouse and the like.

The present embodiment also provides a storage medium having a computer program stored thereon, which when executed by a processor implements the depth contrast learning-based defect detection method as proposed in the above embodiment, and the storage medium may be implemented by any type of volatile or non-volatile storage device or a combination thereof, such as a static random access Memory (Static Random Access Memory, SRAM), an electrically erasable Programmable Read-Only Memory (ELECTRICALLY ERASABLE PROGRAMMABLE READ-Only Memory, EEPROM), an erasable Programmable Read-Only Memory (Erasable Programmable Read Only Memory, EPROM), a Programmable Read-Only Memory (ROM), a magnetic Memory, a flash Memory, a magnetic disk, or an optical disk.

In conclusion, the invention obviously improves the precision of industrial workpiece defect detection in a small sample environment by introducing a deep contrast learning mechanism. And secondly, positive and negative sample pairs are generated through contrast learning, so that the distinguishing capability of the model on different workpiece features is enhanced, and the dependence on large-scale labeling data is reduced. Meanwhile, the method combines the strong global feature extraction capability and element learning mechanism of the transducer model, and can quickly optimize model parameters under the condition of a small number of samples, so that the method has better adaptability when facing new types of workpieces or defects. The design not only improves the accuracy of defect detection, but also greatly reduces the training time and data requirements, and remarkably improves the limitation of the prior art in a small sample environment.

Example 2, referring to table 1, for the second example of the present invention, experimental simulation data of a defect detection method based on deep contrast learning is provided for further verification of the technical scheme of the present invention.

The experimental object is a workpiece on an industrial production line, and the selected workpiece type comprises a common mechanical component. In the experiment, 1000 workpiece images are randomly acquired from a production line, high-definition image acquisition is carried out by using an industrial camera, and the image resolution is 1024 multiplied by 1024 pixels. The prior art adopts a traditional Convolutional Neural Network (CNN) based defect detection model, and the method adopts a transform-based deep contrast learning model.

Furthermore, in the preprocessing stage of the experiment, firstly, normalization processing is carried out on the acquired workpiece image, so that each pixel value is ensured to be in the range of [0,1 ]. And then denoising the image by using a bilateral filtering algorithm, filtering high-frequency noise in the image, and simultaneously retaining the boundary details of the workpiece. And cutting unnecessary parts of the image after denoising, intensively reserving key areas of the workpiece, and uniformly adjusting the size of the image to 256×256 pixels to form standardized image data.

After the preprocessing is completed, the standardized image is input into a transducer model. The model first partitions the image into patches of 16 x 16 size and maps the dimensions of each patch into a fixed 128-dimensional vector by linear projection. Global feature vectors of the workpiece image are generated through a multi-head self-attention mechanism. The global feature vector is used to generate a positive sample pair and a negative sample pair, the negative sample pair is generated based on images of different workpieces, and the positive sample pair is generated based on different perspectives of the same workpiece.

Next, the model is trained by deep contrast learning, feature distribution of positive and negative sample pairs is optimized using InfoNCE penalty functions, and model parameters are updated with Adam optimizers. After 50 rounds of training, the loss value of the model tends to be stable, which indicates that the feature extraction capability of the model reaches the optimal state.

And finally, applying the trained model to a real-time defect detection task, randomly selecting 100 new workpiece images for detection, and evaluating the accuracy, recall rate and detection speed of defect detection.

The details are shown in table 1 below:

Table 1 comparison table of experimental data

Parameter name	Performance of the inventive method	Performance of CNN model	Rate of elevation (%)
				Detection accuracy (%)	98.5	92.3	6.7
Recall (%)	97.8	89.5	9.3
				Detection speed (ms/sheet)	15	35	-57.1
False alarm rate (%)	1.2	5.7	-79.0
				Report missing rate (%)	2.5	7.8	-68.0

From the above experimental data, it is obvious that the defect detection method based on depth contrast learning of the present invention is superior to the conventional CNN defect detection method in terms of a plurality of key indexes. Firstly, in terms of detection accuracy, the method reaches 98.5%, and the traditional CNN model is only 92.3%, so that the method is improved by 6.7%. This shows that the method can extract global features more accurately when processing the workpiece image, and reduces the sensitivity of the model to noise and non-key features through contrast learning, thereby improving the detection accuracy.

The recall rate is further analyzed, and the recall rate of the method is 97.8%, which is improved by 9.3% compared with 89.5% of the CNN model. The improvement of the recall rate shows that the method can more efficiently capture the defect characteristics of the workpiece in the defect detection task and reduce the omission phenomenon. The multi-head self-attention mechanism of the transducer can better capture global information in the image, and particularly, the identification capability of large-area or complex morphological defects is remarkably improved.

In terms of detection speed, the detection time of each image is 15ms on average, and the detection time of the traditional CNN model is 35ms, so that 57.1% is reduced. The result shows that although the computational complexity of the transducer model is high, the detection efficiency is remarkably improved through the application of reasonable model optimization and depth contrast learning methods, and particularly in an industrial real-time detection scene, the method can better meet the requirement of rapid detection.

Meanwhile, the two indexes of false alarm rate and false alarm rate also show the advantages of the method. Experimental data show that the false alarm rate of the method is only 1.2%, and the false alarm rate of the CNN model is as high as 5.7%, and the false alarm rate is reduced by 79.0%, which means that the method can effectively reduce false judgment on normal workpieces, and avoid excessive overhaul or unnecessary production stoppage caused by false alarm. In the aspect of the missing report rate, the missing report rate of the method is 2.5%, and is reduced by 68.0% compared with 7.8% of the CNN model, which shows that the method can more stably detect the defects of the workpieces, and avoid the follow-up links of the defective workpieces flowing into a production line, thereby improving the reliability of quality control.

In conclusion, through the comparative analysis of experimental data, the defect detection method based on deep comparative learning fully proves that the defect detection method based on deep comparative learning has remarkable advantages in the aspects of detection accuracy, recall rate, detection speed, false alarm rate and the like, and meanwhile, the global features extracted by adopting a transducer model are combined with comparative learning, so that the workpiece defect can be more effectively identified, false alarm and false alarm are reduced, and the defect detection method based on deep comparative learning has higher detection efficiency in real-time detection.

It should be noted that the above embodiments are only for illustrating the technical solution of the present invention and not for limiting the same, and although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that the technical solution of the present invention may be modified or substituted without departing from the spirit and scope of the technical solution of the present invention, which is intended to be covered in the scope of the claims of the present invention.

Claims

1. A defect detection method based on deep contrast learning is characterized by comprising the following steps of,

Obtaining original image data of an industrial workpiece, and carrying out normalization, denoising, cutting and size adjustment to form standardized image data;

applying the trained contrast learning model to real-time defect detection, judging defects of the newly input workpiece image, and outputting a defect detection result of the workpiece;

the global feature vector of the output workpiece image comprises the following specific steps,

let N be the number of patches, then the patch expression is:

Wherein H represents the height of the image and W represents the width;

Wherein F _gl is the global feature vector of the output, H represents the number of attention heads in the multi-head self-attention mechanism, Representing normalization factor, h representing index of current attention header, i representing index of patch into which input image is divided, A _h(X_i) representing attention weight with i-th patch extracted from attention matrix of h-th header, W _h representing linear transformation matrix of h-th attention header, and i-th patch feature vector input by X _i;

the method comprises the steps of generating a positive sample pair and a negative sample pair,

The specific steps of the obtained optimized positive and negative sample pairs are,

mapping the input samples into the trained neural network model feature space, and further optimizing positive and negative sample pairs;

The method carries out rapid adaptation training on the depth contrast learning model through a meta learning mechanism, and comprises the following specific steps,

w_i＝(1+exp(-γ·(L_i(Q_i,θ')-L_avg)))^-1;

gradually optimizing the depth contrast learning model through multiple rounds of task iteration;

The defect detection result of the workpiece is output, specifically comprises the steps of,

2. The method for detecting defects based on depth contrast learning according to claim 1, wherein the step of acquiring the raw image data of the industrial workpiece comprises the steps of,

3. The method for detecting defects based on depth contrast learning according to claim 2, wherein the forming of the normalized image data comprises the steps of,

4. A computer device comprises a memory and a processor, wherein the memory stores a computer program, and the computer program is characterized in that the processor realizes the steps of the defect detection method based on depth contrast learning according to any one of claims 1-3 when executing the computer program.

5. A computer-readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the depth contrast learning based defect detection method according to any one of claims 1 to 3.