CN115984662B

CN115984662B - Multi-mode data pre-training and identifying method, device, equipment and medium

Info

Publication number: CN115984662B
Application number: CN202310272537.2A
Authority: CN
Inventors: 罗亮; 林珠; 李海威; 马志平; 冯轶华
Original assignee: Guangdong Science & Technology Infrastructure Center
Current assignee: Guangdong Science & Technology Infrastructure Center
Priority date: 2023-03-21
Filing date: 2023-03-21
Publication date: 2023-08-04
Anticipated expiration: 2043-03-21
Also published as: CN115984662A

Abstract

The invention discloses a multi-mode data pre-training and identifying method, device, equipment and medium, which are used for constructing a defect scene rule database by carrying out multi-source heterogeneous data fusion on defect basic data acquired by acquisition; extracting defect type information, feature information and scene information from the defect scene rule database, performing data association, and extracting scene factors of the defect scene rule database; constructing a self-coding network structure model carrying defect scene information, merging the scene factors into the self-coding network structure model, inputting feature vectors obtained by coding sample data of various defects, carrying out data and rule matching training, and generating a modal identification model; and carrying out defect recognition on the sample to be detected according to the modal recognition model. The product defect detection accuracy and the robustness of the model can be improved.

Description

Multi-mode data pre-training and identifying method, device, equipment and medium

Technical Field

The present invention relates to the field of image recognition, and in particular, to a method, apparatus, device, and medium for training and recognizing multimodal data.

Background

With the rapid development of the precision manufacturing industry, the loss caused by the surface defects of high-precision instruments is up to the level of one hundred billion yuan every year, and the high-precision defect detection requirements of industrial products are increasingly strong. Particularly, the industrial production environment has highly complex conditions such as noise, shielding, vibration, dim light and the like, so that the defect detection has to have the requirements of intelligence, high precision, long time and high efficiency.

Although the defect accuracy is improved to a certain extent by applying the deep learning algorithm at the present stage, the defect sample is small and unbalanced in the existing high-precision defect detection process, and meanwhile, the defect sample is easily influenced by environments such as shielding, oxidization and vibration, and the problems of low product defect detection accuracy and weak robustness of a model exist.

Disclosure of Invention

In order to solve the technical problems, the invention provides a multi-mode data pre-training and identifying method, device, equipment and medium, which improve the accuracy of product defect detection and the robustness of a model.

The embodiment of the invention provides a multi-mode data pre-training and identifying method, which comprises the following steps:

carrying out multi-source heterogeneous data fusion on the acquired defect basic data to construct a defect scene rule database;

Extracting defect type information, feature information and scene information from the defect scene rule database, performing data association, and extracting scene factors of the defect scene rule database;

constructing a self-coding network structure model carrying defect scene information, merging the scene factors into the self-coding network structure model, inputting feature vectors obtained by coding sample data of various defects, carrying out data and rule matching training, and generating a modal identification model;

and carrying out defect recognition on the sample to be detected according to the modal recognition model.

Further, the multi-source heterogeneous data fusion is performed on the acquired defect basic data to construct a defect scene rule database, which specifically comprises the following steps:

performing multi-source heterogeneous data fusion on defect basic data consisting of historical experience data, common rule data and defect standard data to form a defect scene rule database of association of defect scenes, defect types, positions and scales;

the defect scene rule database comprises: a surface defect dataset, a defect rule dataset, a inspection system dataset, and a process scene dataset.

As an improvement of the above-described scheme, the surface defect data set d1= [ surface defect ID, defect geometric feature, spatial distribution data, defect statistics data, defect spectrum data ];

the defect rule data set d2= [ defect rule ID, detected object type, defect classification statistics, damage-causing mechanism data, defect cause rule, defect grade ];

the detection system data set d3= [ detection system ID, equipment type, production line design data, technical choice ];

the process scene data set d4= [ process scene data ID, detected object type, scene factor, production procedure ];

the defect geometry includes: dotted line-plane defects, boundaries, bones, shape, position, size, stretching, and translation;

the spatial distribution data includes: entropy, contrast, consistency, and correlation;

the defect statistical data comprise a gray level co-occurrence matrix, an autocorrelation coefficient, mathematical morphology, histogram statistical characteristics, fractal values and a defect spectrum subset;

the histogram statistical features include range, mean, geometric mean, harmonic mean, standard deviation, variance, and median.

The fractal values include a stretched, translated fractal dimension and a porosity;

The defect spectrum subset includes texture spectrum, stain spectrum, and saw tooth spectrum;

the defect classification statistical data specifically refers to a fault mode of automatic defect division;

the defect level includes the detection object type;

the detection object types comprise semiconductors, circuit boards, wafers, fabrics, metal surfaces and wood;

the scene factors comprise job scale and equipment type selection;

the production process comprises blank making, grinding, rolling, shearing, bundling and finished product production.

Preferably, the extracting defect type information, feature information and scene information from the defect scene rule database, performing data association, and extracting scene factors of the defect scene rule database specifically includes:

extracting defect type information from the surface defect dataset, extracting feature information from the surface defect dataset and the defect rule dataset, extracting scene information from the detection system dataset and the process scene dataset;

for the defect Z, a layered matrix Z multiplied by T multiplied by R is constructed according to the extracted defect type information, characteristic information and scene information;

for defect-feature association information, a first extraction factor a is adopted _ij Mapping and extracting from the matrix Z multiplied by T to obtain the previous defect scene factorAccording to all the extracted forefront defect scene factors +.>Form the forepart scene factor->；

For the feature-scene association information, a second extraction factor b is adopted _ij Mapping and extracting from the matrix T X R to obtain the late defect scene factorAccording to all the extracted postamble defect scene factors +.>Form postamble scene factor->；

Determining the scene factors according to the extracted front scene factors and back scene factors;

wherein,,，T，n is the number of defect categories, j is the feature vector dimension, Z _i ^j For the values of elements in the defect matrix, T _i ^j For the element values in the characteristic information matrix, R _i ^j I=1, 2, … n for the element values in the scene information matrix;，，If you are->=0，If you are->；；，，If you are->=0，Time then；。

Preferably, the constructing a self-coding network structure model carrying defect scene information, integrating the scene factors into the self-coding network structure model, inputting feature vectors obtained by coding sample data of various defects, and performing matching training of data and rules to generate a modal identification model, which specifically comprises:

applying the previous scene factors in the scene factors to the encoder of the self-coding network structure model to extract effective features;

Applying the latter scene factors in the scene factors to a decoder of the self-coding network structure model to generate rules;

inputting a feature vector W coded by sample data of various defects, introducing scene factors in the structure of a basic operation block by referring to the idea of a residual error network, so that the scene factors are hidden in a hierarchical structure in the stacking of the self-coding network structure model, and decoding and outputting to obtain scene rule output [ type, feature and scene ];

and outputting the scene rules through semi-supervised stacked self-encoders, adding a classifier in a decoding stage to realize a classification function, optimizing the self-encoding network structure model classifier through matching training of data and rules, and generating the modal identification model.

As a preferred scheme, the objective function of the self-coding network structure model is specifically:

；

the loss function of the self-coding network structure model is specifically as follows:

；

wherein V (G, D) is the whole defined objective function, N is the number of labels to which the source belongs,representing probability P of original tag in output data (x) of defect sample x after self-encoding network,/I>The probability P of the original tag in the output data z (x) after the self-coding network carrying the defect knowledge sample x is shown; d (X) is a conditional probability calculation function, G (z) is the probability of outputting information y under the condition of the category model G (z) in the applied classification category data; / >Representing whether or not there is +.>Class defects; a. b, w, h, c is the composition variable of each grid in defect detection, a, b are the points at the lower left corner of the grid, w, h are the width and height of the grid, c is the grid confidence, and->Representing the coordinate loss of the defect boundary box represented by the mean square error calculated through the position information;Representing the size loss of the defect boundary box by calculating absolute mean square error of the size information;By determining whether or not to belong toThe defect type calculates a confidence loss.

Preferably, the scene rule output further continuously generates and updates a defect scene rule by hidden layer training stacked from the encoder, and supplements the defect scene rule database.

The embodiment of the invention also provides a multi-mode data pre-training and identifying device, which comprises:

the database construction module is used for carrying out multi-source heterogeneous data fusion on the acquired defect basic data to construct a defect scene rule database;

the scene factor extraction module is used for extracting defect type information, feature information and scene information from the defect scene rule database, carrying out data association and extracting scene factors of the defect scene rule database;

The model generation module is used for constructing a self-coding network structure model carrying defect scene information, integrating the scene factors into the self-coding network structure model, inputting feature vectors obtained by coding sample data of various defects, and carrying out data and rule matching training to generate a modal identification model;

and the defect recognition module is used for recognizing defects of the sample to be detected according to the modal recognition model.

Preferably, the database construction module is specifically configured to:

Further, the surface defect data set d1= [ surface defect ID, defect geometrical feature, spatial distribution data, defect statistics, defect spectrum data ];

the histogram statistical characteristics comprise a range, a mean value, a geometric mean value, a harmonic mean value, a standard deviation; median value

the defect level includes the detection object type;

The scene factors comprise job scale and equipment type selection;

Preferably, the scene factor extraction module is specifically configured to:

Preferably, the model generation module is specifically configured to:

Preferably, the objective function of the self-coding network structure model is specifically:

；

wherein V (G, D) is the whole defined objective function, N is the number of labels to which the source belongs,representing probability P of original tag in output data (x) of defect sample x after self-encoding network,/I>The probability P of the original tag in the output data z (x) after the self-coding network carrying the defect knowledge sample x is shown; d (X) is a conditional probability calculation function, G (z) is the probability of outputting information y under the condition of the category model G (z) in the applied classification category data;Representing whether or not there is +.>Class defects; a. b, w, h, c is the composition variable of each grid in defect detection, a, b are the points at the lower left corner of the grid, w, h are the width and height of the grid, c is the grid confidence, and->Representing the coordinate loss of the defect boundary box represented by the mean square error calculated through the position information;Representing the size loss of the defect boundary box by calculating absolute mean square error of the size information;By determining whether or not to belong toThe defect type calculates a confidence loss.

Further, the scene rule output also continuously generates and updates the defect scene rule through hidden layer training of the stacked self-encoder, and supplements the defect scene rule database.

The present invention also provides a computer readable storage medium, where the computer readable storage medium includes a stored computer program, where when the computer program runs, the device where the computer readable storage medium is located is controlled to execute the multi-modal data pre-training and identifying method according to any one of the foregoing embodiments.

The invention also provides a terminal device comprising a processor, a memory and a computer program stored in the memory and configured to be executed by the processor, the processor implementing the multi-modal data pre-training and recognition method according to any of the above embodiments when executing the computer program.

The invention provides a multi-mode data pre-training and identifying method, device, equipment and medium, which are used for constructing a defect scene rule database by carrying out multi-source heterogeneous data fusion on acquired defect basic data; extracting defect type information, feature information and scene information from the defect scene rule database, performing data association, and extracting scene factors of the defect scene rule database; constructing a self-coding network structure model carrying defect scene information, merging the scene factors into the self-coding network structure model, inputting feature vectors obtained by coding sample data of various defects, carrying out data and rule matching training, and generating a modal identification model; and carrying out defect recognition on the sample to be detected according to the modal recognition model. The product defect detection accuracy and the robustness of the model can be improved.

Drawings

FIG. 1 is a schematic flow chart of a multi-modal data pre-training and recognition method according to an embodiment of the present invention;

FIG. 2 is a flowchart of a multi-modal data pre-training and recognition method according to another embodiment of the present invention;

FIG. 3 is a schematic structural diagram of a multi-modal data pre-training and recognition device according to an embodiment of the present invention;

fig. 4 is a schematic structural diagram of a terminal device according to an embodiment of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

The embodiment of the invention provides a multi-mode data pre-training and identifying method, referring to fig. 1, which is a schematic flow chart of the multi-mode data pre-training and identifying method provided by the embodiment of the invention, wherein the steps S1-S4 of the method are as follows:

s1, carrying out multi-source heterogeneous data fusion on acquired defect basic data, and constructing a defect scene rule database;

S2, extracting defect type information, feature information and scene information from the defect scene rule database, performing data association, and extracting scene factors of the defect scene rule database;

s3, constructing a self-coding network structure model carrying defect scene information, integrating the scene factors into the self-coding network structure model, inputting feature vectors obtained by coding sample data of various defects, and performing data and rule matching training to generate a modal identification model;

s4, carrying out defect recognition on the sample to be detected according to the modal recognition model.

In the implementation of the embodiment, defect basic data are collected, wherein the defect basic data are specifically historical defect data of a sample to be detected, multi-source heterogeneous data facing to defect detection are fused, and a basic defect scene rule database containing static defect characterization, dynamic defect evolution, defect classification, defect-scene rules and other information is constructed through multi-source heterogeneous data fusion;

extracting scene factors according to a defect scene rule database, constructing a three-dimensional vector matrix containing defect type information, feature information and scene information, forcing a self-encoder to consider which parts of input data need to be optimally copied and which parts need to be discarded by using the matrix constraint, so that the self-encoder can learn effective features of the data, discard irrelevant features, generate more defect scene rules, perform data association and extract the scene factors of the defect scene rule database;

Researching a scene rule knowledge base construction based on a semi-supervised self-coding network, designing a stacked self-coding network structure carrying defect scene information, introducing scene factors to enable the scene factors to be hidden in a hierarchical structure in the stacking of the self-coding network, inputting feature vectors obtained by coding sample data of various defects, carrying out data and rule matching training, and generating a modal identification model;

and identifying the defects of the sample according to the generated modal identification model.

According to the invention, under the conditions of low defect sampling rate and unbalanced samples, the material characteristics, manufacturing process data and high-resolution defect image sub-pixel characteristics are fused by combining a production process scene, a scene rule knowledge base is constructed by a sample generation method based on the material process data, a high-resolution defect image sub-pixel characteristic coding method and a deep learning classification method, and a self-coding network can well process various mapping relations in small sample defect data, perform characteristic coding and knowledge modeling, and can solve core problems of difficult defect identification and classification, weak robustness, large image capacity to be detected, low calculation efficiency, difficult defect source tracing and the like caused by the use of the deep learning method in the defect detection process under the complex background of shielding, oxidation, vibration and the like.

In yet another embodiment of the present invention, the step S1 specifically includes:

In the implementation of this embodiment, the sources of the defect basic data include historical experience data, common rule data and defect standard data, where the historical experience data is specifically the historical data of the expert on defect judgment;

common industrial product defects are mainly: defects such as lines, scratches, greasy dirt, points, shadows, textures and saw teeth are reflected in an image when the defects are detected, the scene analysis is combined with the characteristics of business activities by combining the common defect image data representation condition, and the detected industrial products belong to links in the business, so that the scene judgment formed by the defect detection is influenced. Finally, through the association of each data set, a defect scene rule database which is associated with the defect type, the position and the scale of the defect scene is formed;

The defect scene rule database comprises a surface defect data set, a defect rule data set, a detection system data set and a process scene data set.

By classifying and correlating complex backgrounds such as shielding, oxidation, vibration and the like in the micron-sized visual image defect detection process, accurate defect identification is realized.

In yet another embodiment provided by the present invention, the surface defect dataset d1= [ surface defect ID, defect geometry, spatial distribution data, defect statistics, defect spectrum data ];

The histogram statistical features include range, mean, geometric mean, harmonic mean, standard deviation, variance, and median

the defect level includes the detection object type;

the scene factors comprise job scale and equipment type selection;

In the implementation of the present embodiment, the surface defect dataset specifically includes defect geometric features (dotted line-plane defects, boundaries, bones, shapes, positions, sizes, stretching, translations), spatial distribution data (entropy, contrast, consistency and correlation), defect statistics (gray level co-occurrence matrix, autocorrelation coefficients, mathematical morphology, histogram statistics (range, mean, geometric mean, harmonic mean, standard deviation, variance and median), and fractal values (stretching, fractal dimension of translations and porosities)), defect spectrum data (texture spectrum, stain spectrum and saw tooth spectrum).

Wherein, entropy is used for reflecting the randomness of the image reflecting pixels, and the larger and the coarser are; contrast refers to the average difference in brightness of the defective scene image; consistency refers to the degree of consistency of the measurement angles in the batch of images; correlation refers to the degree of correlation of an acquired image with a detected scene. In general, these specific data sets are actually detection data sets of image data, and different subsets are formed by classifying from different angles, so as to facilitate image processing and identification.

The defect rule data set includes defect classification statistics (defects automatically classified into corresponding failure modes), damage causing mechanism data, defect cause rules, and defect levels (inspection object types (semiconductor, circuit board, wafer, fabric, metal surface, wood, etc.)). The detection system data set comprises equipment type, production line design data and technical model selection;

the process scene data includes the type of object detected (semiconductor, circuit board, wafer, fabric, metal surface, wood, etc.), scene factors (job scale, equipment type selection), production process (blank making, grinding, rolling, shearing, bundling, finished product, etc.).

The surface defect data set, the defect rule data set, the detection system data set and the process scene data set are respectively expressed as the following data sets in the form of data sets:

surface defect dataset d1= [ surface defect ID, defect geometry, spatial distribution data, defect statistics, defect spectrum data ];

defect geometric feature subset= [ surface defect ID, defect geometric feature ID, dotted line surface defect, boundary, bone, shape, position, size, stretching, translation ];

spatial distribution subset= [ surface defect ID, spatial distribution ID, entropy, contrast, consistency, correlation ];

defect statistical subset= [ surface defect ID, defect statistical ID, gray level co-occurrence matrix, autocorrelation coefficient, mathematical morphology, histogram statistical feature, fractal value ];

the defect statistical subset refers to a data value obtained by calculating defect data from a statistical perspective. Although the defect characteristics are not directly described, the statistical data of the characteristic distribution is mastered, so that the relationship between the defect type and the common characteristics is favorably analyzed. This is intersected in the D2 dataset, i.e. these statistics will eventually be correlated with defect rules, making it easier to form defect scene rules.

Histogram statistical feature subset= [ surface defect ID, defect statistics ID, histogram statistics ID, range, mean, geometric mean, harmonic mean, standard deviation, variance, and median ];

a subset of fractal values= [ surface defect ID, defect statistics ID, fractal value ID, fractal dimension of stretching, translation and porosity sign ];

the split value can show the stretching and deformation degree of the defect, and the whole stretching of the accessory is often caused by improper application of the process level in the manufacturing process of the product, so that the industrial gap defect and the like are caused.

Defect spectrum subset= [ surface defect ID, defect spectrum ID, texture spectrum, stain spectrum, saw tooth spectrum ];

the defect spectrum is really the spectrum characteristic of the defect image, but the spectrum characteristic formed by the texture, the stain and the saw tooth is different, and the data set is the spectrum characteristic of the defect image which is collected by the texture, the stain and the saw tooth in the image defect process.

the device type refers to a missing test device, and the detection object type refers to a detected object, such as PCB board detection, steel detection, chip detection, mobile phone accessory detection, and the like. Different detection objects have different detection scenarios.

the process scene data set d4= [ process scene data ID, detected object type, scene factor, production procedure ].

In yet another embodiment of the present invention, the step S2 specifically includes:

For the feature-scene association information, a second extraction factor b is adopted _ij Mapping and extracting from the matrix T X R to obtain the late defect scene factorAccording to all the extracted postamble defect scene factors +.>Form postamble scene factor- >；

In the implementation of this embodiment, the scene factors are extracted according to the basic knowledge base, and these scene factors together constitute a three-dimensional vector matrix including types, features and scenes, and the matrix constraint is used to force the self-encoder to consider which parts of the input data need to be optimally copied and which parts need to be discarded, so the self-encoder can learn the effective features of the data and discard the irrelevant features, thereby generating more defect scene rules.

And after data cleaning, data association and conversion are carried out on the defect scene rule database, a three-dimensional vector matrix containing type information, characteristic information and scene information is finally formed.

Extracting defect type information from the surface defect dataset D1; extracting feature information from the surface defect data set D1 and the defect rule data set D2; extracting scene information from the detection system data set D3 and the process scene data set D4;

For defect Z, it can be expressed asFor characteristic information, it can be expressed as T +.>For scene information, it can be expressed as +.>Finally, a layered matrix of Z×T×R is formed.

Wherein n is the number of defect categories, j is the feature vector dimension, j is the vector dimension of the pointing quantity dimension, sample or feature; for example, for defect Z, the surface defect data set D1 and the defect rule data set D2 represent feature information, and j represents 1 to 11 assuming that the sum of the fields of the surface defect data set D1 and the defect rule data set D2 is 11;

Z _i ^j for the values of elements in the defect matrix, T _i ^j For the element values in the characteristic information matrix, R _i ^j I=1, 2, … n for the element values in the scene information matrix;

for the defect-feature association information, the mapping information is extracted from Z×T, and the first extraction factor is adopted from the defect to the featureExtracting the front defect scene factor +.>；/>

Wherein,,is a stepwise expression used in the calculation process, < ->If you are->=0，Time then；

Based on the extracted forefront defect scene factorFormed antecedent scene factor->；

For the feature-scene association information, mapping information is extracted from the T X R, and a second extraction factor is adopted from the defect to the feature Extracting the postterm defect scene factor +.>；

Based on the extracted forefront defect scene factorFormed intoFront scene factor->；

Scene factor= [ front scene factor, back scene factor ].

The previous scene factor represents: the information during the association of the defect features can be used for guiding effective feature extraction before an encoder, so that sample noise is reduced;

the postamble scene factor represents: the information when the feature is associated with the scene can be used for guiding the rule generation after the decoder and before the rule generation, and filtering invalid rules.

In yet another embodiment of the present invention, the step S3 specifically includes applying a previous scene factor of the scene factors to an encoder of the self-coding network structure model, and performing efficient feature extraction;

In the implementation of this embodiment, referring to fig. 2, a flowchart of a multi-mode data pre-training and identifying method according to another embodiment of the present invention is shown;

in fig. 2, a scene rule knowledge base construction based on a semi-supervised self-coding network is studied, and a stacked self-coding network structure carrying defect scene information is designed;

applying a leading-term scene factor containing defects and features in the scene factors to an encoder of the self-coding network structure model to extract effective features; applying the background scene factors containing the features and scenes in the scene factors to a decoder of the self-coding network structure model, and carrying out rule generation to enable the scene factors to be hidden in a hierarchical structure in the stacking of the self-coding network, and adding coding structures and classification feature information after the stacking of the self-coding network, so that the constructed model has the functions of modal identification and scene prejudgment;

Firstly, in a stacked self-coding network, an encoder and a decoder are in a symmetrical structure model, and a basic operation block structure of the network is designed in the coding network. Introducing scene factors in the structure of basic operation blocks by referring to the thought of a residual error network during superposition, so that the scene factors are hidden in a hierarchical structure in the stacking of the self-coding network;

inputting a characteristic vector W formed by sample data W1-Wi obtained after data preprocessing of input sample data X1-Xi into a self-coding network structure model, introducing scene factors in the structure of a basic operation block by referring to the thought of a residual network, enabling the scene factors to be hidden in a hierarchical structure in the stacking of the self-coding network structure model, and decoding and outputting to obtain scene rule output [ type, characteristic and scene ];

The mode identification and scene prejudging method based on the semi-supervised self-coding network is characterized in that a basic defect scene knowledge base containing static defect characterization, dynamic defect evolution, defect classification, defect-scene rules and other information is constructed through multi-source heterogeneous data fusion. Then, based on a self-coding network, introducing scene factors into the stacked self-coding network, coding the data samples to obtain feature vectors through learning of the data samples, learning the mapping from the image space of a certain type to the potential space, generating feature models of various types, positions and degrees, and carrying out data and rule matching training; through the construction and application of the defect scene knowledge base, the defect detection model has the scene pre-judging function, can promote the generation cause according to the defect information, and is helpful for the optimization of the production line design and the process of the industrial defect products.

In yet another embodiment of the present invention, the objective function of the self-coding network structure model is specifically:

；

When the embodiment is specifically implemented, the self-coding network structure model carrying the defect scene information designed by the patent is applied to classification and identification, and the designed objective function is as follows:

；

wherein V (G, D) is the whole objective function defined, the objective function is calculated with the maximum contribution angle, the objective function is a conditional probability calculation function for generating an improvement D (X) of the antagonism network formula, the function is divided into three parts, the first part: embodying the objective function calculation of the encoding stage, pursuing that the calculation of the stage and the integral function calculation are as large as possible at the moment so as to obtain the most representative characteristic information; the second part is a decoding stage, the output calculated value of the stage is required to be as small as possible, but the whole formula calculation is as large as possible, so that the decoding difference is smaller; when the third part is object classification and identification, G (z) is the probability of outputting information y under the condition of the class model G (z) in the applied classification class data, and the probability can represent the accuracy degree of classification;the probability P that the defect sample x is the original tag in the output data (x) after the self-coding network is shown,the probability P of the original tag in the output data z (x) after the self-coding network carrying the defect knowledge sample x is shown; / >Measuring for the central point;

the loss function is:

；

wherein a, b, w, h, c is the composition variable of each grid in defect detection, N is the number of labels to which the source belongs, a and b are points at the lower left corner of the grid, w and h are the width and height of the grid, c is the grid confidence,representing the coordinate loss of the defect boundary box represented by the mean square error calculated through the position information;Representing the size loss of the defect boundary box by calculating absolute mean square error of the size information;Indicating whether or not by judging whether or not it is->The defect type calculates a confidence loss.

In yet another embodiment of the present invention, the scene rule output further continuously generates and updates the defect scene rules by stacking hidden layer training from the encoder, and supplements the defect scene rules database.

When the embodiment is implemented, the post-decoder output result can not only realize the classification function by semi-supervised stacking of the self-encoders, but also continuously generate and update defect-scene rule knowledge by hidden layer training of the stacked self-encoders and supplement the defect-scene rule knowledge to the defect scene rule database. Further perfecting the knowledge base of the defect and scene mapping rules.

When the embodiment is implemented, referring to fig. 2, according to the rule generated by the output after the decoder, the scene rule knowledge base is supplemented, namely, the scene factors are extracted through the last postterm factors [ Yi-1], the scene layering matrix is updated for the self-coding network structure model, and the scene layering matrix is also supplemented into the input feature vector according to the vector matrix [ Yi ] of the extracted scene factors;

the stacks are all stacked in the form of a scene factor structure. In the stacked sub-structure, the front scene factors are merged into the first layer of training, and the rear scene factors are merged into the second layer of training; the usage is the same, one is threshold value usage, and the other is weight amplification; the threshold value use is to influence the activation function, on the basis of original full connection, through the matrix check of the front-term/back-term scene factors, the defect characteristics with too small threshold value can be directly abandoned, so that excessive characteristics/scene information can be prevented, and finally, the generation of overfitting can be prevented in application; on the other hand, the effective characteristics are further amplified, so that gradient disappearance phenomenon which is easy to generate in deep learning can be prevented, and the loss of the effective characteristics is prevented. By both aspects, rules formed by training stacked from the coding network are made more suitable for defect scenarios.

In still another embodiment of the present invention, referring to fig. 3, a schematic structural diagram of a multi-mode data pre-training and identifying device according to an embodiment of the present invention is provided, where the device includes:

It should be noted that, the multi-mode data pre-training and identifying device provided by the embodiment of the present invention can execute the multi-mode data pre-training and identifying method described in any embodiment of the foregoing embodiments, and specific functions of the multi-mode data pre-training and identifying device are not described herein.

Referring to fig. 4, a schematic structural diagram of a terminal device according to an embodiment of the present invention is provided. The terminal device of this embodiment includes: a processor, a memory, and a computer program stored in the memory and executable on the processor, such as a multimodal data pretraining and recognition program. The steps in the embodiments of the multi-modal data pre-training and identifying method described above, such as steps S1-S5 shown in fig. 1, are implemented when the processor executes the computer program. Alternatively, the processor may implement the functions of the modules in the above-described device embodiments when executing the computer program.

The computer program may be divided into one or more modules/units, which are stored in the memory and executed by the processor to accomplish the present invention, for example. The one or more modules/units may be a series of computer program instruction segments capable of performing the specified functions, which instruction segments are used for describing the execution of the computer program in the terminal device. For example, the computer program may be divided into modules, and specific functions of each module are not described herein.

The terminal equipment can be computing equipment such as a desktop computer, a notebook computer, a palm computer, a cloud server and the like. The terminal device may include, but is not limited to, a processor, a memory. It will be appreciated by those skilled in the art that the schematic diagram is merely an example of a terminal device and does not constitute a limitation of the terminal device, and may include more or less components than illustrated, or may combine certain components, or different components, e.g., the terminal device may further include an input-output device, a network access device, a bus, etc.

The processor may be a central processing unit (Central Processing Unit, CPU), other general purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), field programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. The general purpose processor may be a microprocessor or the processor may be any conventional processor or the like, which is a control center of the terminal device, and which connects various parts of the entire terminal device using various interfaces and lines.

The memory may be used to store the computer program and/or module, and the processor may implement various functions of the terminal device by running or executing the computer program and/or module stored in the memory and invoking data stored in the memory. The memory may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program (such as a sound playing function, an image playing function, etc.) required for at least one function, and the like; the storage data area may store data (such as audio data, phonebook, etc.) created according to the use of the handset, etc. In addition, the memory may include high-speed random access memory, and may also include non-volatile memory, such as a hard disk, memory, plug-in hard disk, smart Media Card (SMC), secure Digital (SD) Card, flash Card (Flash Card), at least one disk storage device, flash memory device, or other volatile solid-state storage device.

Wherein the terminal device integrated modules/units may be stored in a computer readable storage medium if implemented in the form of software functional units and sold or used as stand alone products. Based on such understanding, the present invention may implement all or part of the flow of the method of the above embodiment, or may be implemented by a computer program to instruct related hardware, where the computer program may be stored in a computer readable storage medium, and when the computer program is executed by a processor, the computer program may implement the steps of each of the method embodiments described above. Wherein the computer program comprises computer program code, which may be in the form of code, object code, executable files, or in some intermediate form, etc. The computer readable medium may include: any entity or device capable of carrying the computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), an electrical carrier signal, a telecommunications signal, a software distribution medium, and so forth.

It should be noted that the above-described apparatus embodiments are merely illustrative, and the units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. In addition, in the drawings of the embodiment of the device provided by the invention, the connection relation between the modules represents that the modules have communication connection, and can be specifically implemented as one or more communication buses or signal lines. Those of ordinary skill in the art will understand and implement the present invention without undue burden.

While the foregoing is directed to the preferred embodiments of the present invention, it will be appreciated by those skilled in the art that changes and modifications may be made without departing from the principles of the invention, such changes and modifications are also intended to be within the scope of the invention.

Claims

1. A method for multi-modal data pre-training and recognition, the method comprising:

performing defect recognition on the sample to be detected according to the modal recognition model;

the defect scene rule database comprises: a surface defect dataset, a defect rule dataset, a detection system dataset, and a process scene dataset;

the method for extracting the scene factors of the defect scene rule database specifically comprises the following steps of:

For a certain defect, constructing a layered matrix Z multiplied by T multiplied by R according to the extracted defect type information, the characteristic information and the scene information;

wherein the defect matrixCharacteristic information matrix->Scene information matrixN is the number of defect categories, j is the feature vector dimension, Z _ij For the values of elements in the defect matrix, T _ij For the element values in the characteristic information matrix, R _ij I=1, 2, … n for the element values in the scene information matrix;，，If you are->=0，If you are->；；，，If you are->=0，If you are->；。

2. The multi-modal data pre-training and identifying method as defined in claim 1, wherein the multi-source heterogeneous data fusion is performed on the acquired defect basic data to construct a defect scene rule database, and the method specifically comprises the following steps:

And performing multi-source heterogeneous data fusion on defect basic data consisting of historical experience data, common rule data and defect standard data to form a defect scene rule database of association of defect scenes and defect types, positions and scales.

3. The multi-modal data pre-training and recognition method of claim 2 wherein the surface defect dataset d1= [ surface defect ID, defect geometry, spatial distribution data, defect statistics, defect spectrum data ];

The histogram statistical features include range, mean, geometric mean, harmonic mean, standard deviation, variance and median;

the defect level includes the detection object type;

the scene factors comprise job scale and equipment type selection;

4. The method for pre-training and identifying multi-modal data according to claim 1, wherein the constructing a self-coding network structure model carrying defect scene information, integrating the scene factors into the self-coding network structure model, inputting feature vectors obtained by coding sample data of various defects, performing data and rule matching training, and generating a modal identification model, specifically comprises:

inputting a feature vector W coded by sample data of various defects, introducing a scene factor in the structure of a basic operation block during superposition, so that the scene factor is hidden in a hierarchical structure in the stacking of the self-coding network structure model, and decoding and outputting to obtain scene rule output [ type, feature and scene ];

5. The multi-modal data pre-training and recognition method of claim 1, wherein the objective function of the self-encoding network structure model is specifically:

；

wherein V (G, D) is the whole defined objective function, N is the number of labels to which the source belongs,representing probability of original tag in output data (x) of defect sample x after self-encoding network,/I >Representing the probability of the original tag in the output data z (x) after the self-coding network carries the defect knowledge sample x; d (·) is a conditional probability calculation function, G (z) is the probability of outputting information y under the condition of the category model in the applied category data;representing whether or not there is +.>Class defects; a. b, w, h, c is the composition variable of each grid in defect detection, a, b are the points at the lower left corner of the grid, w, h are the width and height of the grid, c is the grid confidence, and->Representing the coordinate loss of the defect boundary box represented by the mean square error calculated through the position information;Representing the size loss of the defect boundary box by calculating absolute mean square error of the size information;Indicating whether or not by judging whether or not it is->The defect type calculates a confidence loss.

6. The multi-modal data pre-training and recognition method as set forth in claim 4 wherein the scene rule output is further continuously generating and updating defect scene rules by hidden layer training stacked from the encoder and supplementing into the defect scene rule database.

7. A multi-modal data pre-training and recognition apparatus, the apparatus comprising:

the defect recognition module is used for carrying out defect recognition on the sample to be detected according to the modal recognition model;

the scene factor extraction module is specifically configured to:

wherein the defect matrixCharacteristic information matrix->Scene information matrixN is the number of defect categories, j is the feature vector dimension, Z _ij Is a defect matrixElement value, T of _ij For the element values in the characteristic information matrix, R _ij I=1, 2, … n for the element values in the scene information matrix;，，If you are->=0，If you are->；；，，If you are->=0，If you are->；。

8. A computer readable storage medium, characterized in that the computer readable storage medium comprises a stored computer program, wherein the computer program, when run, controls a device in which the computer readable storage medium is located to perform the multi-modal data pre-training and recognition method according to any one of claims 1 to 6.

9. A terminal device comprising a processor, a memory and a computer program stored in the memory and configured to be executed by the processor, the processor implementing the multimodal data pre-training and recognition method according to any of claims 1 to 6 when the computer program is executed.