[go: up one dir, main page]

CN116664633B - Printed image registration method based on convolutional cross-attention mechanism - Google Patents

Printed image registration method based on convolutional cross-attention mechanism

Info

Publication number
CN116664633B
CN116664633B CN202310624605.7A CN202310624605A CN116664633B CN 116664633 B CN116664633 B CN 116664633B CN 202310624605 A CN202310624605 A CN 202310624605A CN 116664633 B CN116664633 B CN 116664633B
Authority
CN
China
Prior art keywords
feature map
size
attention
cross
convolutional
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310624605.7A
Other languages
Chinese (zh)
Other versions
CN116664633A (en
Inventor
陈亚军
杨茜
余璐
蔺广逢
张二虎
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xian University of Technology
Original Assignee
Xian University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xian University of Technology filed Critical Xian University of Technology
Priority to CN202310624605.7A priority Critical patent/CN116664633B/en
Publication of CN116664633A publication Critical patent/CN116664633A/en
Application granted granted Critical
Publication of CN116664633B publication Critical patent/CN116664633B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/30Determination of transform parameters for the alignment of images, i.e. image registration
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/15Correlation function computation including computation of convolution operations
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • G06T7/0004Industrial image inspection
    • G06T7/0006Industrial image inspection using a design-rule based approach
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10004Still image; Photographic image

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • Pure & Applied Mathematics (AREA)
  • Mathematical Optimization (AREA)
  • Mathematical Analysis (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Medical Informatics (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Algebra (AREA)
  • Artificial Intelligence (AREA)
  • Quality & Reliability (AREA)
  • General Engineering & Computer Science (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

本发明公开了一种基于卷积交叉注意力机制的印刷品图像配准方法,首先构建深度学习配准网络,包括卷积交叉注意力机制和基于上采样的深度单应性估计配准网络;然后将参考印刷品图像的和待配准印刷品图像的输入到深度学习配准网络中,得到配准后的印刷品图像;最后通过计算配准后的印刷品图像与参考印刷品图像之间的损失函数,对网络参数进行优化,输出精度更高的印刷品图像。本发明完成印刷品图像配准任务,为后期印刷品的缺陷检测提供保证,提高缺陷检测效率。

This invention discloses a method for registering printed images based on a convolutional cross-attention mechanism. First, a deep learning registration network is constructed, including a convolutional cross-attention mechanism and a registration network based on upsampling-based depth homography estimation. Then, the reference printed image and the image to be registered are input into the deep learning registration network to obtain the registered printed image. Finally, the network parameters are optimized by calculating the loss function between the registered printed image and the reference printed image, resulting in a more accurate output printed image. This invention completes the printed image registration task, providing a guarantee for subsequent defect detection of printed materials and improving defect detection efficiency.

Description

Printed matter image registration method based on convolution cross attention mechanism
Technical Field
The invention belongs to the technical field of deep neural networks and image analysis, and particularly relates to a printed matter image registration method based on a convolution cross attention mechanism.
Background
The printing industry is an important industrial support for the national economy of China. In daily life, books, periodicals, magazines, newspapers, gift boxes, business cards and the like contacted by people all belong to the category of printed matters. The printed matter is closely related to the life of people. However, in the production process of printed matter, some defects such as ink flying, offset, color difference, cutting deviation and the like are unavoidable due to the influence of external factors or the cause of internal equipment. In the defect detection technology of printed matters, the foremost task is registration, two printed matter images are aligned geometrically through registration, the reference printed matter image and the printed matter image to be registered are aligned in space positions, then defect detection is carried out, the accuracy of the defect detection is determined by the quality of the registration, and the method has important research significance and practical value.
The continuous development of the deep learning technology provides a new idea for a printed matter image registration method. The image registration method based on deep learning is not limited to feature extraction, but can also estimate geometric transformation between images through a neural network for alignment. The unsupervised depth homography estimation model does not depend on a real label, and the training network can be optimized by calculating similarity measurement of the registration image and the reference image. The method not only can learn the characteristics, but also can estimate homography, and has good image registration effect on large displacement and illumination variation. The image registration of the printed matter can effectively register the images of the two printed matters at the space position, so that the guarantee is provided for the defect detection of the later-stage printed matter, and the defect detection efficiency is improved.
Disclosure of Invention
The invention aims to provide a printed matter image registration method based on a convolution cross attention mechanism, which completes the printed matter image registration task, provides guarantee for defect detection of later printed matters and improves defect detection efficiency.
The technical scheme adopted by the invention is that the printed matter image registration method based on a convolution cross attention mechanism is implemented according to the following steps:
Step1, constructing a deep learning registration network, wherein the deep learning registration network comprises a convolution cross attention mechanism and an up-sampling-based depth homography estimation registration network;
Step 2, inputting a patch (p B) of a reference printed matter image and a patch (p A) of a printed matter image to be registered into a deep learning registration network, predicting four corner offsets H '4pt of p A relative to four corner offsets on the reference printed matter image p B, and performing direct linear transformation through DLT to obtain a transformation matrix H';
Step 3, performing space transformation on the transformation matrix H' and the printed matter image A to be registered to obtain a registered printed matter image;
and 4, optimizing network parameters by calculating a loss function between the registered printed matter image and the reference printed matter image, and outputting the printed matter image with higher precision.
The present invention is also characterized in that,
The convolution cross-attention mechanism in the step 1 is implemented specifically according to the following steps:
Step 1.1, inputting tensors X 1,X2∈RH×W×C of two given shapes, wherein H represents the height of an input feature diagram, W represents the width of the input feature diagram, C represents the channel number of the input feature diagram, and the sizes of X 1 and X 2 are 64 multiplied by 32;
Step 1.2, in order to ensure that the image processing contains translational isomorphism attributes, the existing relative position codes are expanded to two dimensions, width information and height information are embedded in the relative positions of the cross attention, so that the two-dimensional relative cross attention is realized, and the attention degree of a pixel i= (i x,iy) to a pixel j= (j x,jy) is calculated as a formula (1):
Where l i,j denotes the degree of attention of pixel i= (i x,iy) to pixel j= (j x,jy), Representing a transpose of the pixel i query vector,Representing the depth of key k, k j is the key vector for pixel j,AndRepresenting a relative width j x-ix and a relative height j y-iy;
Step 1.3, the output of the two-dimensional single-head cross attention is expressed as a formula (2):
Where O h represents the output of two-dimensional single-headed cross-attention, softmax (·) represents normalization, W Q represents query weight, W k represents weight of keys, W v represents weight of values, A logical matrix representing the relative positions of width and height, X 1 representing the tensor form of the feature map 1, X 2 representing the tensor form of the feature map 2,Indicating the depth of key k.
Step 1.4, the multi-head attention is formed by splicing single-head attention, as shown in formula (3):
MHA(X)=Concat[O1,...,ONh]WO (3)
Wherein MHA (X) represents a multi-head attention tensor of the shape (H, W, d v), concat [. Cndot ] represents stitching, O 1,...,ONh represents single-head attention, Representing a weight vector;
Step 1.5, mapping and connecting the convolution and the multi-head cross attention feature map to obtain convolution cross attention, which can be written as a formula (4):
AAConv(X)=Concat[Conv(X),MHA(X)] (4)
Wherein AAConv (X) represents convolution cross-attention, concat [ · ] represents concatenation, conv (X) represents convolution, MHA (X) represents multi-head attention tensor of shape (H, W, d v);
Step 1.6, the convolution cross attention is normalized in batches, so that a characteristic diagram X '1,X′1 fused with X 2 characteristics is 64×64×32, and X 1 is replaced by X' 1.
In the step 1, the registration network is estimated based on the up-sampling depth homography, and the method is implemented according to the following steps:
Step a, the feature map tensor X 1 and the feature map tensor X 2 obtained in the step 1 are connected in series to obtain a single feature map with the size of 64 multiplied by 64;
Step b, carrying out transformation operation on the feature map with the size of 64 multiplied by 64;
Step c, performing transformation operation on the feature map with the size of 32 multiplied by 128;
step d, inputting the feature map obtained in the step c into a full-connection layer Linear1, inputting feature vectors with the size of 16 multiplied by 256, and outputting output features with the size of 1024;
and e, inputting the output characteristic with 1024 obtained in the step d into the full-connection layer Linear2, inputting the characteristic vector with 1024, and outputting the characteristic vector with 8.
Step b is specifically performed according to the following steps:
Step b1 a feature map of size 64X 64 is convolved by 3X 3, the output channel is 96, the padding is 1, and a characteristic diagram X a with the size of 64 multiplied by 96 is obtained;
Step b2 up-sampling of 3X 3 is performed on a feature map of size 64X 64, the output channel is 32, the padding is 1, and a characteristic diagram X b with the size of 64 multiplied by 32 is obtained;
Step b3, connecting the feature maps X a and X b, and activating with LeakyReLU with a negative slope of 0.2 to obtain a feature map with the size of 64×64×128;
Step b4, carrying out 3×3 convolution on the feature map obtained in the step b3, wherein the output channel is 128, the padding is 1, and the activation is carried out by LeakyReLU with a negative slope of 0.2, so as to obtain a feature map with the size of 64×64×128;
step b5, carrying out maximum pooling with the kernel of 2 on the feature map obtained in the step b4, and finally obtaining the feature map with the size of 32 multiplied by 128.
Step c is specifically performed according to the following steps:
Step c1, performing 3×3 convolution on a feature map with a size of 32×32×128, and obtaining a feature map X a′ with a size of 32×32×192, wherein the output channel is 192 and the padding is 1;
step c2, up-sampling the feature map with the size of 32×32×128 by 3×3, wherein the output channel is 64, and the padding is 1, so as to obtain a feature map X b′ with the size of 32×32×64;
Step c3, connecting the feature maps X a′ and X b′, and activating with LeakyReLU with a negative slope of 0.2 to obtain a feature map with the size of 32 multiplied by 256;
step c4, carrying out 3×3 convolution on the feature map obtained in the step c3, wherein the output channel is 256, the padding is 1, and the activation is carried out by LeakyReLU with a negative slope of 0.2, so as to obtain a feature map with a size of 32×32×256;
Step c5, carrying out maximum pooling with a kernel of 2 on the feature map obtained in the step c4, and finally obtaining the feature map with the size of 16 multiplied by 256.
The method has the beneficial effects that the two printed matter images can be effectively registered in the space position by the method for registering the printed matter images based on the convolution cross attention mechanism, so that the guarantee is provided for the defect detection of the later printed matter, and the defect detection efficiency is improved. The invention provides a convolution cross attention mechanism, wherein a key-value inquiry mechanism of the mechanism can effectively fuse characteristic information of two images, corresponds hot spot area characteristics of the two images and fuses the hot spot area characteristics into a characteristic diagram, and accords with a generalized flow of image registration. The invention provides a print image registration method based on a convolution cross attention mechanism, which comprises the steps of firstly designing a parallel network to add the cross attention mechanism to process an input reference print image and a print image to be registered, fusing the characteristics of the two images, and then rolling and upsampling the characteristic images of the two images in a serial connection mode, wherein the designed upsampling mode can reduce the loss of the characteristics.
Drawings
FIG. 1 is a flow chart of a method of print image registration based on a convolution cross-attention mechanism of the present invention;
FIG. 2 is a schematic diagram of the overall architecture of a printed image registration network based on a convolutionally cross-attentive mechanism of the present invention;
fig. 3 is a diagram of a print image to be registered, a reference print image, and a registration result in an embodiment of the present invention.
Detailed Description
The invention will be described in detail below with reference to the drawings and the detailed description.
The invention discloses a printed matter image registration method based on a convolution cross attention mechanism, wherein a flow chart is shown in fig. 1, and the method is implemented specifically according to the following steps:
Step1, constructing a deep learning registration network, wherein the deep learning registration network comprises a convolution cross attention mechanism and an up-sampling-based depth homography estimation registration network;
the convolution cross-attention mechanism in the step 1 is implemented specifically according to the following steps:
Step 1.1, inputting tensors X 1,X2∈RH×W×C of two given shapes, wherein H represents the height of an input feature diagram, W represents the width of the input feature diagram, C represents the channel number of the input feature diagram, and the sizes of X 1 and X 2 are 64 multiplied by 32;
Step 1.2, in order to ensure that the image processing contains translational isomorphism attributes, the existing relative position codes are expanded to two dimensions, width information and height information are embedded in the relative positions of the cross attention, so that the two-dimensional relative cross attention is realized, and the attention degree of a pixel i= (i x,iy) to a pixel j= (j x,jy) is calculated as a formula (1):
Where l i,j denotes the degree of attention of pixel i= (i x,iy) to pixel j= (j x,jy), Representing a transpose of the pixel i query vector,Representing the depth of key k, k j is the key vector for pixel j,AndRepresenting a relative width j x-ix and a relative height j y-iy;
Step 1.3, the output of the two-dimensional single-head cross attention is expressed as a formula (2):
Where O h represents the output of two-dimensional single-headed cross-attention, softmax (·) represents normalization, W Q represents query weight, W k represents weight of keys, W v represents weight of values, A logical matrix representing the relative positions of width and height, X 1 representing the tensor form of the feature map 1, X 2 representing the tensor form of the feature map 2,Indicating the depth of key k.
Step 1.4, the multi-head attention is formed by splicing single-head attention, as shown in formula (3):
MHA(X)=Concat[O1,...,ONh]WO (3)
Wherein MHA (X) represents a multi-head attention tensor of the shape (H, W, d v), concat [. Cndot ] represents stitching, O 1,...,ONh represents single-head attention, Representing a weight vector;
Step 1.5, mapping and connecting the convolution and the multi-head cross attention feature map to obtain convolution cross attention, which can be written as a formula (4):
AAConv(X)=Concat[Conv(X),MHA(X)] (4)
Wherein AAConv (X) represents convolution cross-attention, concat [ · ] represents concatenation, conv (X) represents convolution, MHA (X) represents multi-head attention tensor of shape (H, W, d v);
Step 1.6, the convolution cross attention is normalized in batches, so that a characteristic diagram X '1,X′1 fused with X 2 characteristics is 64×64×32, and X 1 is replaced by X' 1.
In the step 1, the registration network is estimated based on the up-sampling depth homography, and the method is implemented according to the following steps:
Step a, the feature map tensor X 1 and the feature map tensor X 2 obtained in the step 1 are connected in series to obtain a single feature map with the size of 64 multiplied by 64;
Step b, carrying out transformation operation on the feature map with the size of 64 multiplied by 64;
Step c, performing transformation operation on the feature map with the size of 32 multiplied by 128;
step d, inputting the feature map obtained in the step c into a full-connection layer Linear1, inputting feature vectors with the size of 16 multiplied by 256, and outputting output features with the size of 1024;
and e, inputting the output characteristic with 1024 obtained in the step d into the full-connection layer Linear2, inputting the characteristic vector with 1024, and outputting the characteristic vector with 8.
Step b is specifically performed according to the following steps:
Step b1 a feature map of size 64X 64 is convolved by 3X 3, the output channel is 96, the padding is 1, and a characteristic diagram X a with the size of 64 multiplied by 96 is obtained;
Step b2 up-sampling of 3X 3 is performed on a feature map of size 64X 64, the output channel is 32, the padding is 1, and a characteristic diagram X b with the size of 64 multiplied by 32 is obtained;
Step b3, connecting the feature maps X a and X b, and activating with LeakyReLU with a negative slope of 0.2 to obtain a feature map with the size of 64×64×128;
Step b4, carrying out 3×3 convolution on the feature map obtained in the step b3, wherein the output channel is 128, the padding is 1, and the activation is carried out by LeakyReLU with a negative slope of 0.2, so as to obtain a feature map with the size of 64×64×128;
step b5, carrying out maximum pooling with the kernel of 2 on the feature map obtained in the step b4, and finally obtaining the feature map with the size of 32 multiplied by 128.
Step c is specifically performed according to the following steps:
Step c1, performing 3×3 convolution on a feature map with a size of 32×32×128, and obtaining a feature map X a′ with a size of 32×32×192, wherein the output channel is 192 and the padding is 1;
step c2, up-sampling the feature map with the size of 32×32×128 by 3×3, wherein the output channel is 64, and the padding is 1, so as to obtain a feature map X b′ with the size of 32×32×64;
Step c3, connecting the feature maps X a′ and X b′, and activating with LeakyReLU with a negative slope of 0.2 to obtain a feature map with the size of 32 multiplied by 256;
step c4, carrying out 3×3 convolution on the feature map obtained in the step c3, wherein the output channel is 256, the padding is 1, and the activation is carried out by LeakyReLU with a negative slope of 0.2, so as to obtain a feature map with a size of 32×32×256;
Step c5, carrying out maximum pooling with a kernel of 2 on the feature map obtained in the step c4, and finally obtaining the feature map with the size of 16 multiplied by 256.
Step 2, inputting a patch (p B) of a reference print image and a patch (p A) of a print image to be registered into a deep learning registration network, predicting four corner offsets H '4pt of p A relative to four corner offsets on the reference print image p B, and performing direct linear transformation through DLT to obtain a transformation matrix H';
step 3, performing space transformation on the transformation matrix H' and the printed matter image A to be registered to obtain a registered printed matter image;
and 4, optimizing network parameters by calculating a loss function between the registered printed matter image and the reference printed matter image, and outputting the printed matter image with higher precision.
Example 1
Referring to fig. 1, the method for registering printed matter images based on convolution cross attention mechanism specifically comprises the following steps:
step 1, constructing a deep learning registration network, which comprises a convolution cross attention mechanism and an up-sampling-based depth homography estimation registration network;
Step 1.1, inputting tensors X 1,X2∈RH×W×C of the print image to be registered and the reference print image into a parallel network, wherein H represents the height of an input feature map, W represents the width of the input feature map, C represents the channel number of the input feature map, and the size of X 1,X2 is 128 multiplied by 1.
Referring to fig. 2 and 3, step 1.2, in the parallel network, convolution of 3×3 is performed on X 1,X2 twice, the output channel is 32, the step size is 1, the padding is 1, the activation function is LeakyReLU with a negative slope of 0.2, so as to obtain a feature map with a size of 128×128×32, and the feature map is subjected to maximum pooling with a kernel of 2, so that a feature map with a size of 64×64×32 is finally obtained.
Step 1.3, inputting a feature map with the size of 64 multiplied by 32 into a convolution cross attention module, wherein the depth dk of a key is=10, the depth dv of a value is=1, the head number Nh of multi-head attention is=1, the feature map comprises a convolution module and an attention module, wherein an output channel dv of the convolution module, a kernel size of 3, a step size of 1 and a running of 1 are included, the input and output channels of the attention module are dv, the kernel size of 1 and the step size of 1, the convolution module and the attention module are spliced to obtain a feature map X '1,X′1 with the size of 64 multiplied by 32, which is fused with X 2 features, and X 1 is replaced by X' 1;
Step 1.4, the characteristic graph tensor X 1 obtained in step 1.3 and the characteristic graph tensor X 2 input in step 1.1 are connected in series, a single feature map of size 64X 64 is obtained.
In the step 1.5 of the method, for a size of 64 x 64 the feature map performs the following operations:
(1) A feature map of size 64X 64 is convolved by 3X 3, the output channel is 96, the padding is 1, and a characteristic diagram X a with the size of 64 multiplied by 96 is obtained;
(2) Up-sampling of 3X 3 is performed on a feature map of size 64X 64, the output channel is 32, the padding is 1, and a characteristic diagram X b with the size of 64 multiplied by 32 is obtained;
(3) Connecting the feature maps X a and X b, and activating with LeakyReLU with a negative slope of 0.2 to obtain a feature map with the size of 64×64×128;
(4) Performing 3×3 convolution on the feature map obtained in (3), wherein the output channel is 128, the padding is 1, and the feature map is activated by LeakyReLU with a negative slope of 0.2, so as to obtain a feature map with a size of 64×64×128;
(5) And (3) carrying out maximum pooling of the kernel 2 on the feature map obtained in the step (4) to finally obtain the feature map with the size of 32 multiplied by 128.
Step 1.6, the following operation is performed on the feature map with the size of 32×32×128:
(1) The feature map with the size of 32×32×128 is subjected to 3×3 convolution, the output channel is 192, the padding is 1, and a feature map X a′ with the size of 32×32×192 is obtained;
(2) Up-sampling 3×3 is performed on a feature map with a size of 32×32×128, the output channel is 64, and padding is 1, so as to obtain a feature map X b′ with a size of 32×32×64;
(3) Connecting the feature maps X a′ and X b′, and activating with LeakyReLU with a negative slope of 0.2 to obtain a feature map with the size of 32×32×256;
(4) Performing 3×3 convolution on the feature map obtained in (3), wherein the output channel is 256, the padding is 1, and the feature map is activated by LeakyReLU with a negative slope of 0.2, so as to obtain a feature map with a size of 32×32×256;
(5) And (3) carrying out maximum pooling of the kernel 2 on the feature map obtained in the step (4) to finally obtain the feature map with the size of 16 multiplied by 256.
And step 1.7, inputting the feature map obtained in the step 1.6 into the full connection layer Linear1, inputting feature vectors with the size of 16 multiplied by 256, and outputting output features with the size of 1024.
And step 1.8, inputting the output characteristic with 1024 obtained in the step 1.7 into the full-connection layer Linear2, inputting the characteristic vector with 1024, and outputting the characteristic vector with 8.
Step 2, obtaining a feature vector with the size of 8 in the step 1.8, namely a deep learning registration network, predicting four corner offsets H '4pt of the four corner points of the p A relative to the four corner points on the reference printed matter image p B, and performing direct linear transformation through DLT to obtain a transformation matrix H';
step 3, performing spatial transformation on the transformation matrix H 'and the printed matter image A to be registered to obtain a registered printed matter image p' B;
And 4, optimizing network parameters by calculating a loss function between the registered printed matter image and the reference printed matter image, and outputting the printed matter image with higher precision as shown in fig. 3. Wherein red represents the true perspective transformation, yellow represents the perspective transformation estimated by the model, and the more coincident the two, the higher the accuracy of the registration.
The invention provides a printed matter image registration method based on a convolution cross attention mechanism, which can better complete registration tasks and acquire registered images by improving an unsupervised depth homography estimation model to perform image registration on printed matters, has important significance on defect detection of the printed matters and improves defect detection efficiency.
Example 2
The invention discloses a printed matter image registration method based on a convolution cross attention mechanism, wherein a flow chart is shown in fig. 1, and the method is implemented specifically according to the following steps:
Step1, constructing a deep learning registration network, wherein the deep learning registration network comprises a convolution cross attention mechanism and an up-sampling-based depth homography estimation registration network;
the convolution cross-attention mechanism in the step 1 is implemented specifically according to the following steps:
Step 1.1, inputting tensors X 1,X2∈RH×W×C of two given shapes, wherein H represents the height of an input feature diagram, W represents the width of the input feature diagram, C represents the channel number of the input feature diagram, and the sizes of X 1 and X 2 are 64 multiplied by 32;
Step 1.2, in order to ensure that the image processing contains translational isomorphism attributes, the existing relative position codes are expanded to two dimensions, width information and height information are embedded in the relative positions of the cross attention, so that the two-dimensional relative cross attention is realized, and the attention degree of a pixel i= (i x,iy) to a pixel j= (j x,jy) is calculated as a formula (1):
Where l i,j denotes the degree of attention of pixel i= (i x,iy) to pixel j= (j x,jy), Representing a transpose of the pixel i query vector,Representing the depth of key k, k j is the key vector for pixel j,AndRepresenting a relative width j x-ix and a relative height j y-iy;
Step 1.3, the output of the two-dimensional single-head cross attention is expressed as a formula (2):
Where O h represents the output of two-dimensional single-headed cross-attention, softmax (·) represents normalization, W Q represents query weight, W k represents weight of keys, W v represents weight of values, A logical matrix representing the relative positions of width and height, X 1 representing the tensor form of the feature map 1, X 2 representing the tensor form of the feature map 2,Indicating the depth of key k.
Step 1.4, the multi-head attention is formed by splicing single-head attention, as shown in formula (3):
MHA(X)=Concat[O1,...,ONh]WO(3)
Wherein MHA (X) represents a multi-head attention tensor of the shape (H, W, d v), concat [. Cndot ] represents stitching, O 1,...,ONh represents single-head attention, Representing a weight vector;
Step 1.5, mapping and connecting the convolution and the multi-head cross attention feature map to obtain convolution cross attention, which can be written as a formula (4):
AAConv(X)=Concat[Conv(X),MHA(X)] (4)
Wherein AAConv (X) represents convolution cross-attention, concat [ · ] represents concatenation, conv (X) represents convolution, MHA (X) represents multi-head attention tensor of shape (H, W, d v);
Step 1.6, the convolution cross attention is normalized in batches, so that a characteristic diagram X '1,X′1 fused with X 2 characteristics is 64×64×32, and X 1 is replaced by X' 1.
In the step 1, the registration network is estimated based on the up-sampling depth homography, and the method is implemented according to the following steps:
Step a, the feature map tensor X 1 and the feature map tensor X 2 obtained in the step 1 are connected in series to obtain a single feature map with the size of 64 multiplied by 64;
Step b, carrying out transformation operation on the feature map with the size of 64 multiplied by 64;
Step c, performing transformation operation on the feature map with the size of 32 multiplied by 128;
step d, inputting the feature map obtained in the step c into a full-connection layer Linear1, inputting feature vectors with the size of 16 multiplied by 256, and outputting output features with the size of 1024;
and e, inputting the output characteristic with 1024 obtained in the step d into the full-connection layer Linear2, inputting the characteristic vector with 1024, and outputting the characteristic vector with 8.
Step 2, inputting a patch (p B) of a reference printed matter image and a patch (p A) of a printed matter image to be registered into a deep learning registration network, predicting four corner offsets H '4pt of p A relative to four corner offsets on the reference printed matter image p B, and performing direct linear transformation through DLT to obtain a transformation matrix H';
Step 3, performing space transformation on the transformation matrix H' and the printed matter image A to be registered to obtain a registered printed matter image;
and 4, optimizing network parameters by calculating a loss function between the registered printed matter image and the reference printed matter image, and outputting the printed matter image with higher precision.
Example 3
The invention discloses a printed matter image registration method based on a convolution cross attention mechanism, wherein a flow chart is shown in fig. 1, and the method is implemented specifically according to the following steps:
Step1, constructing a deep learning registration network, wherein the deep learning registration network comprises a convolution cross attention mechanism and an up-sampling-based depth homography estimation registration network;
in the step 1, the registration network is estimated based on the up-sampling depth homography, and the method is implemented according to the following steps:
Step a, the feature map tensor X 1 and the feature map tensor X 2 obtained in the step 1 are connected in series to obtain a single feature map with the size of 64 multiplied by 64;
Step b, carrying out transformation operation on the feature map with the size of 64 multiplied by 64;
Step c, performing transformation operation on the feature map with the size of 32 multiplied by 128;
step d, inputting the feature map obtained in the step c into a full-connection layer Linear1, inputting feature vectors with the size of 16 multiplied by 256, and outputting output features with the size of 1024;
and e, inputting the output characteristic with 1024 obtained in the step d into the full-connection layer Linear2, inputting the characteristic vector with 1024, and outputting the characteristic vector with 8.
Step b is specifically performed according to the following steps:
Step b1 a feature map of size 64X 64 is convolved by 3X 3, the output channel is 96, the padding is 1, and a characteristic diagram X a with the size of 64 multiplied by 96 is obtained;
Step b2 up-sampling of 3X 3 is performed on a feature map of size 64X 64, the output channel is 32, the padding is 1, and a characteristic diagram X b with the size of 64 multiplied by 32 is obtained;
Step b3, connecting the feature maps X a and X b, and activating with LeakyReLU with a negative slope of 0.2 to obtain a feature map with the size of 64×64×128;
Step b4, carrying out 3×3 convolution on the feature map obtained in the step b3, wherein the output channel is 128, the padding is 1, and the activation is carried out by LeakyReLU with a negative slope of 0.2, so as to obtain a feature map with the size of 64×64×128;
step b5, carrying out maximum pooling with the kernel of 2 on the feature map obtained in the step b4, and finally obtaining the feature map with the size of 32 multiplied by 128.
Step c is specifically performed according to the following steps:
Step c1, performing 3×3 convolution on a feature map with a size of 32×32×128, and obtaining a feature map X a′ with a size of 32×32×192, wherein the output channel is 192 and the padding is 1;
step c2, up-sampling the feature map with the size of 32×32×128 by 3×3, wherein the output channel is 64, and the padding is 1, so as to obtain a feature map X b′ with the size of 32×32×64;
Step c3, connecting the feature maps X a′ and X b′, and activating with LeakyReLU with a negative slope of 0.2 to obtain a feature map with the size of 32 multiplied by 256;
step c4, carrying out 3×3 convolution on the feature map obtained in the step c3, wherein the output channel is 256, the padding is 1, and the activation is carried out by LeakyReLU with a negative slope of 0.2, so as to obtain a feature map with a size of 32×32×256;
Step c5, carrying out maximum pooling with a kernel of 2 on the feature map obtained in the step c4, and finally obtaining the feature map with the size of 16 multiplied by 256.
Step 2, inputting a patch (p B) of a reference print image and a patch (p A) of a print image to be registered into a deep learning registration network, predicting four corner offsets H '4pt of p A relative to four corner offsets on the reference print image p B, and performing direct linear transformation through DLT to obtain a transformation matrix H';
step 3, performing space transformation on the transformation matrix H' and the printed matter image A to be registered to obtain a registered printed matter image;
and 4, optimizing network parameters by calculating a loss function between the registered printed matter image and the reference printed matter image, and outputting the printed matter image with higher precision.

Claims (5)

1.基于卷积交叉注意力机制的印刷品图像配准方法,其特征在于,具体按照以下步骤实施:1. A method for image registration of printed materials based on convolutional cross-attention mechanism, characterized by the following steps: 步骤1、构建深度学习配准网络;Step 1: Construct a deep learning registration network; 步骤1中卷积交叉注意力机制,具体按照以下步骤实施:The convolutional cross-attention mechanism in step 1 is implemented according to the following steps: 步骤1.1、输入两个给定形状的张量,H表示输入特征图的高度,W表示输入特征图的宽度,C表示输入特征图的通道数,的大小均为64×64×32;Step 1.1: Input two tensors of given shapes. H represents the height of the input feature map, W represents the width of the input feature map, and C represents the number of channels in the input feature map. and The size of each is 64×64×32; 步骤1.2、将现有的相对位置编码扩展至二维,在交叉注意力的相对位置嵌入宽度信息和高度信息,实现二维相对交叉注意力,像素 对像素 的关注度计算为公式(1):Step 1.2: Extend the existing relative position encoding to two dimensions by embedding width and height information into the relative positions of the cross-attention, thus achieving two-dimensional relative cross-attention, pixel-wise. For pixels The attention level is calculated using formula (1): (1) (1) 其中,表示像素 对像素 的关注度,表示像素i查询向量的转置,表示键k的深度,是像素 j的键向量, 表示相对宽度和相对高度in, Represents pixels For pixels attention, This represents the transpose of the query vector for pixel i. Indicates the depth of key k. It is the key vector of pixel j. and Indicates relative width and relative height ; 步骤1.3、二维单头交叉注意力的输出为公式(2):Step 1.3, the output of the two-dimensional single-head cross attention is formula (2): (2) (2) 其中,表示二维单头交叉注意力的输出,softmax(·)表示归一化,表示查询权重,表示键的权重,表示值的权重,表示宽度和高度相对位置的逻辑矩阵,表示特征图1的张量形式,表示特征图2的张量形式,表示键k的深度;in, This represents the output of a two-dimensional single-head cross-attention function, where softmax (·) represents normalization. Indicates the query weight. Indicates the weight of the key. The weight of the value. , A logical matrix representing the relative positions of width and height. The tensor form representing feature map 1, The tensor form representing feature map 2, Indicates the depth of bond k; 步骤1.4、多头注意力是由单头注意力拼接而成,如公式(3):Step 1.4: Multi-head attention is composed of single-head attention, as shown in formula (3): (3) (3) 其中,表示形状为 (H, W,) 的多头注意力张量,表示拼接,表示单头注意力,表示权重向量;in, Indicates a shape of (H, W, The multi-head attention tensor of ) Indicates splicing, This indicates single-head attention. Represents the weight vector; 步骤1.5、将卷积和多头交叉注意力特征图进行映射连接得到卷积交叉注意力,可以写成公式(4):Step 1.5: Map and connect the convolutional and multi-head cross-attention feature maps to obtain convolutional cross-attention, which can be written as formula (4): (4) (4) 其中,表示卷积交叉注意力,表示拼接,表示卷积,表示形状为 (H, W,) 的多头注意力张量;in, Indicates convolutional cross attention. Indicates splicing, Represents convolution. Indicates a shape of (H, W, The multi-head attention tensor; 步骤1.6、将卷积交叉注意力进行批量归一化,得到融合了特征的特征图的大小为64×64×32,将替换为Step 1.6: Batch normalize the convolutional cross attention to obtain the fused... Feature map of features , The size is 64×64×32. Replace with ; 步骤2、将参考印刷品图像的和待配准印刷品图像的输入到深度学习配准网络中,通过DLT进行直接线性变换得到变换矩阵Step 2: Input the reference printed image and the printed image to be registered into the deep learning registration network, and obtain the transformation matrix through direct linear transformation (DLT). ; 步骤3、将变换矩阵与待配准印刷品图像A进行空间变换,得到配准后的印刷品图像;Step 3: Transform the matrix A spatial transformation is performed between the image A to be registered and the image A of the printed matter to be registered to obtain the registered printed image; 步骤4、通过计算配准后的印刷品图像与参考印刷品图像之间的损失函数,对网络参数进行优化,输出精度更高的印刷品图像。Step 4: Optimize the network parameters by calculating the loss function between the registered printed image and the reference printed image to output a printed image with higher accuracy. 2.根据权利要求1所述的基于卷积交叉注意力机制的印刷品图像配准方法,其特征在于,所述步骤1中深度学习配准网络包括卷积交叉注意力机制和基于上采样的深度单应性估计配准网络。2. The printed image registration method based on convolutional cross-attention mechanism according to claim 1, wherein the deep learning registration network in step 1 includes a convolutional cross-attention mechanism and a depth homography estimation registration network based on upsampling. 3.根据权利要求2所述的基于卷积交叉注意力机制的印刷品图像配准方法,其特征在于,所述步骤1中基于上采样的深度单应性估计配准网络,具体按照以下步骤实施:3. The method for registering printed images based on convolutional cross-attention mechanism according to claim 2, characterized in that the upsampled depth homography estimation registration network in step 1 is implemented according to the following steps: 步骤a、将所述步骤1得到的特征图张量和特征图张量串联起来,得到大小为64×64×64的单个特征图;Step a: Convert the feature map tensor obtained in step 1 into a multi-dimensional array. and feature map tensor By concatenating them, a single feature map of size 64×64×64 is obtained; 步骤b、对大小为64×64×64的特征图进行变换操作;Step b: Perform a transformation operation on the feature map of size 64×64×64; 步骤c,对大小为32×32×128的特征图进行变换操作;Step c: Perform a transformation operation on the feature map of size 32×32×128; 步骤d、将步骤c得到的特征图输入到全连接层Linear1,输入大小为16×16×256的特征向量,输出大小为1024的输出特征;Step d: Input the feature map obtained in step c into the fully connected layer Linear1. The input feature vector is 16×16×256 and the output feature is 1024. 步骤e、将步骤d得到的大小为1024的输出特征输入到全连接层Linear2,输入大小为1024的特征向量,输出大小为8的特征向量。Step e: Input the output feature vector of size 1024 obtained in step d into the fully connected layer Linear2. The input feature vector is of size 1024, and the output feature vector is of size 8. 4.根据权利要求3所述的基于卷积交叉注意力机制的印刷品图像配准方法,其特征在于,所述步骤b具体按照以下步骤实施:4. The printed image registration method based on convolutional cross-attention mechanism according to claim 3, characterized in that step b is specifically implemented according to the following steps: 步骤b1、对大小为64×64×64的特征图进行3×3的卷积,输出通道为96, padding为1,得到大小为64×64×96的特征图Step b1: Perform a 3×3 convolution on the 64×64×64 feature map, with 96 output channels and padding of 1, to obtain a feature map of size 64×64×96. ; 步骤b2、对大小为64×64×64的特征图进行3×3的上采样,输出通道为32, padding为1,得到大小为64×64×32的特征图Step b2: Upsample the 64×64×64 feature map by 3×3, output 32 channels, and padding to 1 to obtain a feature map of size 64×64×32. ; 步骤b3、连接特征图,用负斜率为0.2的LeakyReLU进行激活,得到大小为64×64×128的特征图;Step b3: Connect feature maps and Activation with LeakyReLU with a negative slope of 0.2 yields a feature map of size 64×64×128; 步骤b4、对步骤b3中得到的特征图进行3×3的卷积,输出通道为128, padding为1,用负斜率为0.2的LeakyReLU进行激活,得到大小为64×64×128的特征图;Step b4: Perform a 3×3 convolution on the feature map obtained in step b3, with 128 output channels and padding of 1. Activate with LeakyReLU with a negative slope of 0.2 to obtain a feature map of size 64×64×128. 步骤b5、对步骤b4得到的特征图进行核为2的最大池化,最终得到大小为32×32×128的特征图。Step b5: Perform max pooling with kernel 2 on the feature map obtained in step b4, and finally obtain a feature map of size 32×32×128. 5.根据权利要求4所述的基于卷积交叉注意力机制的印刷品图像配准方法,其特征在于,所述步骤c具体按照以下步骤实施:5. The printed image registration method based on convolutional cross-attention mechanism according to claim 4, characterized in that step c is specifically implemented according to the following steps: 步骤c1、对大小为32×32×128的特征图进行3×3的卷积,输出通道为192, padding为1,得到大小为32×32×192的特征图Step c1: Perform a 3×3 convolution on the feature map of size 32×32×128, with 192 output channels and padding of 1, to obtain a feature map of size 32×32×192. ; 步骤c2、对大小为32×32×128的特征图进行3×3的上采样,输出通道为64, padding为1,得到大小为32×32×64的特征图Step c2: Upsample the 32×32×128 feature map by 3×3, output 64 channels, and set padding to 1 to obtain a feature map of size 32×32×64. ; 步骤c3、连接特征图,用负斜率为0.2的LeakyReLU进行激活,得到大小为32×32×256的特征图;Step c3: Connect feature maps and Activation with LeakyReLU with a negative slope of 0.2 yields a feature map of size 32×32×256; 步骤c4、对步骤c3中得到的特征图进行3×3的卷积,输出通道为256, padding为1,用负斜率为0.2的LeakyReLU进行激活,得到大小为32×32×256的特征图;Step c4: Perform a 3×3 convolution on the feature map obtained in step c3, with 256 output channels and padding of 1. Activate with LeakyReLU with a negative slope of 0.2 to obtain a feature map of size 32×32×256. 步骤c5、对步骤c4得到的特征图进行核为2的最大池化,最终得到大小为16×16×256的特征图。Step c5: Perform max pooling with kernel 2 on the feature map obtained in step c4, and finally obtain a feature map of size 16×16×256.
CN202310624605.7A 2023-05-30 2023-05-30 Printed image registration method based on convolutional cross-attention mechanism Active CN116664633B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310624605.7A CN116664633B (en) 2023-05-30 2023-05-30 Printed image registration method based on convolutional cross-attention mechanism

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310624605.7A CN116664633B (en) 2023-05-30 2023-05-30 Printed image registration method based on convolutional cross-attention mechanism

Publications (2)

Publication Number Publication Date
CN116664633A CN116664633A (en) 2023-08-29
CN116664633B true CN116664633B (en) 2025-11-18

Family

ID=87725501

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310624605.7A Active CN116664633B (en) 2023-05-30 2023-05-30 Printed image registration method based on convolutional cross-attention mechanism

Country Status (1)

Country Link
CN (1) CN116664633B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113160289A (en) * 2021-03-31 2021-07-23 哈尔滨工业大学(深圳) Industrial printed matter image registration method and device based on deep learning
CN116071410A (en) * 2023-03-14 2023-05-05 南京大学 A method, system, device and medium for point cloud registration based on deep learning

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11321028B2 (en) * 2019-12-11 2022-05-03 Landa Corporation Ltd. Correcting registration errors in digital printing
CN111709909B (en) * 2020-05-12 2024-02-20 苏州科亿信息科技有限公司 Universal printing defect detection method and model based on deep learning

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113160289A (en) * 2021-03-31 2021-07-23 哈尔滨工业大学(深圳) Industrial printed matter image registration method and device based on deep learning
CN116071410A (en) * 2023-03-14 2023-05-05 南京大学 A method, system, device and medium for point cloud registration based on deep learning

Also Published As

Publication number Publication date
CN116664633A (en) 2023-08-29

Similar Documents

Publication Publication Date Title
Berg et al. Shape matching and object recognition using low distortion correspondences
CN113989340A (en) Point cloud registration method based on distribution
CN113159043A (en) Feature point matching method and system based on semantic information
CN113239954A (en) Attention mechanism-based image semantic segmentation feature fusion method
CN110321830A (en) A kind of Chinese character string picture OCR recognition methods neural network based
CN107506765B (en) License plate inclination correction method based on neural network
CN101706965A (en) Method for colorizing regional image on basis of Gaussian mixture model
CN114612660A (en) Three-dimensional modeling method based on multi-feature fusion point cloud segmentation
CN111652273B (en) Deep learning-based RGB-D image classification method
CN110443261A (en) A kind of more figure matching process restored based on low-rank tensor
CN111178451A (en) A license plate detection method based on YOLOv3 network
CN113837263A (en) Gesture image classification method based on feature fusion attention module and feature selection
CN112633088B (en) A Power Station Capacity Estimation Method Based on Photovoltaic Module Recognition in Aerial Images
CN113160291A (en) Change detection method based on image registration
CN110648276A (en) Dimensionality reduction method for high-dimensional image data based on manifold map and dictionary learning
CN118279907B (en) Chinese herbal medicine image recognition system based on Transformer and CNN
CN114581903A (en) License plate character recognition method based on convolutional neural network
CN118229638A (en) A method for detecting surface defects of continuous strip bamboo strips based on machine vision
CN118298399A (en) A nighttime vehicle target detection method based on YOLOv8 model optimization
CN109766748B (en) Pedestrian re-recognition method based on projection transformation and dictionary learning
CN109325407B (en) Optical remote sensing video target detection method based on F-SSD network filtering
CN116664633B (en) Printed image registration method based on convolutional cross-attention mechanism
CN113869396A (en) PC screen semantic segmentation method based on efficient attention mechanism
CN117422998A (en) Improved river float identification algorithm based on YOLOv5s
CN116385335A (en) A Target Detection Method for Insulator Defects Based on Improved YOLOv4

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant