Disclosure of Invention
The invention aims to provide a printed matter image registration method based on a convolution cross attention mechanism, which completes the printed matter image registration task, provides guarantee for defect detection of later printed matters and improves defect detection efficiency.
The technical scheme adopted by the invention is that the printed matter image registration method based on a convolution cross attention mechanism is implemented according to the following steps:
Step1, constructing a deep learning registration network, wherein the deep learning registration network comprises a convolution cross attention mechanism and an up-sampling-based depth homography estimation registration network;
Step 2, inputting a patch (p B) of a reference printed matter image and a patch (p A) of a printed matter image to be registered into a deep learning registration network, predicting four corner offsets H '4pt of p A relative to four corner offsets on the reference printed matter image p B, and performing direct linear transformation through DLT to obtain a transformation matrix H';
Step 3, performing space transformation on the transformation matrix H' and the printed matter image A to be registered to obtain a registered printed matter image;
and 4, optimizing network parameters by calculating a loss function between the registered printed matter image and the reference printed matter image, and outputting the printed matter image with higher precision.
The present invention is also characterized in that,
The convolution cross-attention mechanism in the step 1 is implemented specifically according to the following steps:
Step 1.1, inputting tensors X 1,X2∈RH×W×C of two given shapes, wherein H represents the height of an input feature diagram, W represents the width of the input feature diagram, C represents the channel number of the input feature diagram, and the sizes of X 1 and X 2 are 64 multiplied by 32;
Step 1.2, in order to ensure that the image processing contains translational isomorphism attributes, the existing relative position codes are expanded to two dimensions, width information and height information are embedded in the relative positions of the cross attention, so that the two-dimensional relative cross attention is realized, and the attention degree of a pixel i= (i x,iy) to a pixel j= (j x,jy) is calculated as a formula (1):
Where l i,j denotes the degree of attention of pixel i= (i x,iy) to pixel j= (j x,jy), Representing a transpose of the pixel i query vector,Representing the depth of key k, k j is the key vector for pixel j,AndRepresenting a relative width j x-ix and a relative height j y-iy;
Step 1.3, the output of the two-dimensional single-head cross attention is expressed as a formula (2):
Where O h represents the output of two-dimensional single-headed cross-attention, softmax (·) represents normalization, W Q represents query weight, W k represents weight of keys, W v represents weight of values, A logical matrix representing the relative positions of width and height, X 1 representing the tensor form of the feature map 1, X 2 representing the tensor form of the feature map 2,Indicating the depth of key k.
Step 1.4, the multi-head attention is formed by splicing single-head attention, as shown in formula (3):
MHA(X)=Concat[O1,...,ONh]WO (3)
Wherein MHA (X) represents a multi-head attention tensor of the shape (H, W, d v), concat [. Cndot ] represents stitching, O 1,...,ONh represents single-head attention, Representing a weight vector;
Step 1.5, mapping and connecting the convolution and the multi-head cross attention feature map to obtain convolution cross attention, which can be written as a formula (4):
AAConv(X)=Concat[Conv(X),MHA(X)] (4)
Wherein AAConv (X) represents convolution cross-attention, concat [ · ] represents concatenation, conv (X) represents convolution, MHA (X) represents multi-head attention tensor of shape (H, W, d v);
Step 1.6, the convolution cross attention is normalized in batches, so that a characteristic diagram X '1,X′1 fused with X 2 characteristics is 64×64×32, and X 1 is replaced by X' 1.
In the step 1, the registration network is estimated based on the up-sampling depth homography, and the method is implemented according to the following steps:
Step a, the feature map tensor X 1 and the feature map tensor X 2 obtained in the step 1 are connected in series to obtain a single feature map with the size of 64 multiplied by 64;
Step b, carrying out transformation operation on the feature map with the size of 64 multiplied by 64;
Step c, performing transformation operation on the feature map with the size of 32 multiplied by 128;
step d, inputting the feature map obtained in the step c into a full-connection layer Linear1, inputting feature vectors with the size of 16 multiplied by 256, and outputting output features with the size of 1024;
and e, inputting the output characteristic with 1024 obtained in the step d into the full-connection layer Linear2, inputting the characteristic vector with 1024, and outputting the characteristic vector with 8.
Step b is specifically performed according to the following steps:
Step b1 a feature map of size 64X 64 is convolved by 3X 3, the output channel is 96, the padding is 1, and a characteristic diagram X a with the size of 64 multiplied by 96 is obtained;
Step b2 up-sampling of 3X 3 is performed on a feature map of size 64X 64, the output channel is 32, the padding is 1, and a characteristic diagram X b with the size of 64 multiplied by 32 is obtained;
Step b3, connecting the feature maps X a and X b, and activating with LeakyReLU with a negative slope of 0.2 to obtain a feature map with the size of 64×64×128;
Step b4, carrying out 3×3 convolution on the feature map obtained in the step b3, wherein the output channel is 128, the padding is 1, and the activation is carried out by LeakyReLU with a negative slope of 0.2, so as to obtain a feature map with the size of 64×64×128;
step b5, carrying out maximum pooling with the kernel of 2 on the feature map obtained in the step b4, and finally obtaining the feature map with the size of 32 multiplied by 128.
Step c is specifically performed according to the following steps:
Step c1, performing 3×3 convolution on a feature map with a size of 32×32×128, and obtaining a feature map X a′ with a size of 32×32×192, wherein the output channel is 192 and the padding is 1;
step c2, up-sampling the feature map with the size of 32×32×128 by 3×3, wherein the output channel is 64, and the padding is 1, so as to obtain a feature map X b′ with the size of 32×32×64;
Step c3, connecting the feature maps X a′ and X b′, and activating with LeakyReLU with a negative slope of 0.2 to obtain a feature map with the size of 32 multiplied by 256;
step c4, carrying out 3×3 convolution on the feature map obtained in the step c3, wherein the output channel is 256, the padding is 1, and the activation is carried out by LeakyReLU with a negative slope of 0.2, so as to obtain a feature map with a size of 32×32×256;
Step c5, carrying out maximum pooling with a kernel of 2 on the feature map obtained in the step c4, and finally obtaining the feature map with the size of 16 multiplied by 256.
The method has the beneficial effects that the two printed matter images can be effectively registered in the space position by the method for registering the printed matter images based on the convolution cross attention mechanism, so that the guarantee is provided for the defect detection of the later printed matter, and the defect detection efficiency is improved. The invention provides a convolution cross attention mechanism, wherein a key-value inquiry mechanism of the mechanism can effectively fuse characteristic information of two images, corresponds hot spot area characteristics of the two images and fuses the hot spot area characteristics into a characteristic diagram, and accords with a generalized flow of image registration. The invention provides a print image registration method based on a convolution cross attention mechanism, which comprises the steps of firstly designing a parallel network to add the cross attention mechanism to process an input reference print image and a print image to be registered, fusing the characteristics of the two images, and then rolling and upsampling the characteristic images of the two images in a serial connection mode, wherein the designed upsampling mode can reduce the loss of the characteristics.
Detailed Description
The invention will be described in detail below with reference to the drawings and the detailed description.
The invention discloses a printed matter image registration method based on a convolution cross attention mechanism, wherein a flow chart is shown in fig. 1, and the method is implemented specifically according to the following steps:
Step1, constructing a deep learning registration network, wherein the deep learning registration network comprises a convolution cross attention mechanism and an up-sampling-based depth homography estimation registration network;
the convolution cross-attention mechanism in the step 1 is implemented specifically according to the following steps:
Step 1.1, inputting tensors X 1,X2∈RH×W×C of two given shapes, wherein H represents the height of an input feature diagram, W represents the width of the input feature diagram, C represents the channel number of the input feature diagram, and the sizes of X 1 and X 2 are 64 multiplied by 32;
Step 1.2, in order to ensure that the image processing contains translational isomorphism attributes, the existing relative position codes are expanded to two dimensions, width information and height information are embedded in the relative positions of the cross attention, so that the two-dimensional relative cross attention is realized, and the attention degree of a pixel i= (i x,iy) to a pixel j= (j x,jy) is calculated as a formula (1):
Where l i,j denotes the degree of attention of pixel i= (i x,iy) to pixel j= (j x,jy), Representing a transpose of the pixel i query vector,Representing the depth of key k, k j is the key vector for pixel j,AndRepresenting a relative width j x-ix and a relative height j y-iy;
Step 1.3, the output of the two-dimensional single-head cross attention is expressed as a formula (2):
Where O h represents the output of two-dimensional single-headed cross-attention, softmax (·) represents normalization, W Q represents query weight, W k represents weight of keys, W v represents weight of values, A logical matrix representing the relative positions of width and height, X 1 representing the tensor form of the feature map 1, X 2 representing the tensor form of the feature map 2,Indicating the depth of key k.
Step 1.4, the multi-head attention is formed by splicing single-head attention, as shown in formula (3):
MHA(X)=Concat[O1,...,ONh]WO (3)
Wherein MHA (X) represents a multi-head attention tensor of the shape (H, W, d v), concat [. Cndot ] represents stitching, O 1,...,ONh represents single-head attention, Representing a weight vector;
Step 1.5, mapping and connecting the convolution and the multi-head cross attention feature map to obtain convolution cross attention, which can be written as a formula (4):
AAConv(X)=Concat[Conv(X),MHA(X)] (4)
Wherein AAConv (X) represents convolution cross-attention, concat [ · ] represents concatenation, conv (X) represents convolution, MHA (X) represents multi-head attention tensor of shape (H, W, d v);
Step 1.6, the convolution cross attention is normalized in batches, so that a characteristic diagram X '1,X′1 fused with X 2 characteristics is 64×64×32, and X 1 is replaced by X' 1.
In the step 1, the registration network is estimated based on the up-sampling depth homography, and the method is implemented according to the following steps:
Step a, the feature map tensor X 1 and the feature map tensor X 2 obtained in the step 1 are connected in series to obtain a single feature map with the size of 64 multiplied by 64;
Step b, carrying out transformation operation on the feature map with the size of 64 multiplied by 64;
Step c, performing transformation operation on the feature map with the size of 32 multiplied by 128;
step d, inputting the feature map obtained in the step c into a full-connection layer Linear1, inputting feature vectors with the size of 16 multiplied by 256, and outputting output features with the size of 1024;
and e, inputting the output characteristic with 1024 obtained in the step d into the full-connection layer Linear2, inputting the characteristic vector with 1024, and outputting the characteristic vector with 8.
Step b is specifically performed according to the following steps:
Step b1 a feature map of size 64X 64 is convolved by 3X 3, the output channel is 96, the padding is 1, and a characteristic diagram X a with the size of 64 multiplied by 96 is obtained;
Step b2 up-sampling of 3X 3 is performed on a feature map of size 64X 64, the output channel is 32, the padding is 1, and a characteristic diagram X b with the size of 64 multiplied by 32 is obtained;
Step b3, connecting the feature maps X a and X b, and activating with LeakyReLU with a negative slope of 0.2 to obtain a feature map with the size of 64×64×128;
Step b4, carrying out 3×3 convolution on the feature map obtained in the step b3, wherein the output channel is 128, the padding is 1, and the activation is carried out by LeakyReLU with a negative slope of 0.2, so as to obtain a feature map with the size of 64×64×128;
step b5, carrying out maximum pooling with the kernel of 2 on the feature map obtained in the step b4, and finally obtaining the feature map with the size of 32 multiplied by 128.
Step c is specifically performed according to the following steps:
Step c1, performing 3×3 convolution on a feature map with a size of 32×32×128, and obtaining a feature map X a′ with a size of 32×32×192, wherein the output channel is 192 and the padding is 1;
step c2, up-sampling the feature map with the size of 32×32×128 by 3×3, wherein the output channel is 64, and the padding is 1, so as to obtain a feature map X b′ with the size of 32×32×64;
Step c3, connecting the feature maps X a′ and X b′, and activating with LeakyReLU with a negative slope of 0.2 to obtain a feature map with the size of 32 multiplied by 256;
step c4, carrying out 3×3 convolution on the feature map obtained in the step c3, wherein the output channel is 256, the padding is 1, and the activation is carried out by LeakyReLU with a negative slope of 0.2, so as to obtain a feature map with a size of 32×32×256;
Step c5, carrying out maximum pooling with a kernel of 2 on the feature map obtained in the step c4, and finally obtaining the feature map with the size of 16 multiplied by 256.
Step 2, inputting a patch (p B) of a reference print image and a patch (p A) of a print image to be registered into a deep learning registration network, predicting four corner offsets H '4pt of p A relative to four corner offsets on the reference print image p B, and performing direct linear transformation through DLT to obtain a transformation matrix H';
step 3, performing space transformation on the transformation matrix H' and the printed matter image A to be registered to obtain a registered printed matter image;
and 4, optimizing network parameters by calculating a loss function between the registered printed matter image and the reference printed matter image, and outputting the printed matter image with higher precision.
Example 1
Referring to fig. 1, the method for registering printed matter images based on convolution cross attention mechanism specifically comprises the following steps:
step 1, constructing a deep learning registration network, which comprises a convolution cross attention mechanism and an up-sampling-based depth homography estimation registration network;
Step 1.1, inputting tensors X 1,X2∈RH×W×C of the print image to be registered and the reference print image into a parallel network, wherein H represents the height of an input feature map, W represents the width of the input feature map, C represents the channel number of the input feature map, and the size of X 1,X2 is 128 multiplied by 1.
Referring to fig. 2 and 3, step 1.2, in the parallel network, convolution of 3×3 is performed on X 1,X2 twice, the output channel is 32, the step size is 1, the padding is 1, the activation function is LeakyReLU with a negative slope of 0.2, so as to obtain a feature map with a size of 128×128×32, and the feature map is subjected to maximum pooling with a kernel of 2, so that a feature map with a size of 64×64×32 is finally obtained.
Step 1.3, inputting a feature map with the size of 64 multiplied by 32 into a convolution cross attention module, wherein the depth dk of a key is=10, the depth dv of a value is=1, the head number Nh of multi-head attention is=1, the feature map comprises a convolution module and an attention module, wherein an output channel dv of the convolution module, a kernel size of 3, a step size of 1 and a running of 1 are included, the input and output channels of the attention module are dv, the kernel size of 1 and the step size of 1, the convolution module and the attention module are spliced to obtain a feature map X '1,X′1 with the size of 64 multiplied by 32, which is fused with X 2 features, and X 1 is replaced by X' 1;
Step 1.4, the characteristic graph tensor X 1 obtained in step 1.3 and the characteristic graph tensor X 2 input in step 1.1 are connected in series, a single feature map of size 64X 64 is obtained.
In the step 1.5 of the method, for a size of 64 x 64 the feature map performs the following operations:
(1) A feature map of size 64X 64 is convolved by 3X 3, the output channel is 96, the padding is 1, and a characteristic diagram X a with the size of 64 multiplied by 96 is obtained;
(2) Up-sampling of 3X 3 is performed on a feature map of size 64X 64, the output channel is 32, the padding is 1, and a characteristic diagram X b with the size of 64 multiplied by 32 is obtained;
(3) Connecting the feature maps X a and X b, and activating with LeakyReLU with a negative slope of 0.2 to obtain a feature map with the size of 64×64×128;
(4) Performing 3×3 convolution on the feature map obtained in (3), wherein the output channel is 128, the padding is 1, and the feature map is activated by LeakyReLU with a negative slope of 0.2, so as to obtain a feature map with a size of 64×64×128;
(5) And (3) carrying out maximum pooling of the kernel 2 on the feature map obtained in the step (4) to finally obtain the feature map with the size of 32 multiplied by 128.
Step 1.6, the following operation is performed on the feature map with the size of 32×32×128:
(1) The feature map with the size of 32×32×128 is subjected to 3×3 convolution, the output channel is 192, the padding is 1, and a feature map X a′ with the size of 32×32×192 is obtained;
(2) Up-sampling 3×3 is performed on a feature map with a size of 32×32×128, the output channel is 64, and padding is 1, so as to obtain a feature map X b′ with a size of 32×32×64;
(3) Connecting the feature maps X a′ and X b′, and activating with LeakyReLU with a negative slope of 0.2 to obtain a feature map with the size of 32×32×256;
(4) Performing 3×3 convolution on the feature map obtained in (3), wherein the output channel is 256, the padding is 1, and the feature map is activated by LeakyReLU with a negative slope of 0.2, so as to obtain a feature map with a size of 32×32×256;
(5) And (3) carrying out maximum pooling of the kernel 2 on the feature map obtained in the step (4) to finally obtain the feature map with the size of 16 multiplied by 256.
And step 1.7, inputting the feature map obtained in the step 1.6 into the full connection layer Linear1, inputting feature vectors with the size of 16 multiplied by 256, and outputting output features with the size of 1024.
And step 1.8, inputting the output characteristic with 1024 obtained in the step 1.7 into the full-connection layer Linear2, inputting the characteristic vector with 1024, and outputting the characteristic vector with 8.
Step 2, obtaining a feature vector with the size of 8 in the step 1.8, namely a deep learning registration network, predicting four corner offsets H '4pt of the four corner points of the p A relative to the four corner points on the reference printed matter image p B, and performing direct linear transformation through DLT to obtain a transformation matrix H';
step 3, performing spatial transformation on the transformation matrix H 'and the printed matter image A to be registered to obtain a registered printed matter image p' B;
And 4, optimizing network parameters by calculating a loss function between the registered printed matter image and the reference printed matter image, and outputting the printed matter image with higher precision as shown in fig. 3. Wherein red represents the true perspective transformation, yellow represents the perspective transformation estimated by the model, and the more coincident the two, the higher the accuracy of the registration.
The invention provides a printed matter image registration method based on a convolution cross attention mechanism, which can better complete registration tasks and acquire registered images by improving an unsupervised depth homography estimation model to perform image registration on printed matters, has important significance on defect detection of the printed matters and improves defect detection efficiency.
Example 2
The invention discloses a printed matter image registration method based on a convolution cross attention mechanism, wherein a flow chart is shown in fig. 1, and the method is implemented specifically according to the following steps:
Step1, constructing a deep learning registration network, wherein the deep learning registration network comprises a convolution cross attention mechanism and an up-sampling-based depth homography estimation registration network;
the convolution cross-attention mechanism in the step 1 is implemented specifically according to the following steps:
Step 1.1, inputting tensors X 1,X2∈RH×W×C of two given shapes, wherein H represents the height of an input feature diagram, W represents the width of the input feature diagram, C represents the channel number of the input feature diagram, and the sizes of X 1 and X 2 are 64 multiplied by 32;
Step 1.2, in order to ensure that the image processing contains translational isomorphism attributes, the existing relative position codes are expanded to two dimensions, width information and height information are embedded in the relative positions of the cross attention, so that the two-dimensional relative cross attention is realized, and the attention degree of a pixel i= (i x,iy) to a pixel j= (j x,jy) is calculated as a formula (1):
Where l i,j denotes the degree of attention of pixel i= (i x,iy) to pixel j= (j x,jy), Representing a transpose of the pixel i query vector,Representing the depth of key k, k j is the key vector for pixel j,AndRepresenting a relative width j x-ix and a relative height j y-iy;
Step 1.3, the output of the two-dimensional single-head cross attention is expressed as a formula (2):
Where O h represents the output of two-dimensional single-headed cross-attention, softmax (·) represents normalization, W Q represents query weight, W k represents weight of keys, W v represents weight of values, A logical matrix representing the relative positions of width and height, X 1 representing the tensor form of the feature map 1, X 2 representing the tensor form of the feature map 2,Indicating the depth of key k.
Step 1.4, the multi-head attention is formed by splicing single-head attention, as shown in formula (3):
MHA(X)=Concat[O1,...,ONh]WO(3)
Wherein MHA (X) represents a multi-head attention tensor of the shape (H, W, d v), concat [. Cndot ] represents stitching, O 1,...,ONh represents single-head attention, Representing a weight vector;
Step 1.5, mapping and connecting the convolution and the multi-head cross attention feature map to obtain convolution cross attention, which can be written as a formula (4):
AAConv(X)=Concat[Conv(X),MHA(X)] (4)
Wherein AAConv (X) represents convolution cross-attention, concat [ · ] represents concatenation, conv (X) represents convolution, MHA (X) represents multi-head attention tensor of shape (H, W, d v);
Step 1.6, the convolution cross attention is normalized in batches, so that a characteristic diagram X '1,X′1 fused with X 2 characteristics is 64×64×32, and X 1 is replaced by X' 1.
In the step 1, the registration network is estimated based on the up-sampling depth homography, and the method is implemented according to the following steps:
Step a, the feature map tensor X 1 and the feature map tensor X 2 obtained in the step 1 are connected in series to obtain a single feature map with the size of 64 multiplied by 64;
Step b, carrying out transformation operation on the feature map with the size of 64 multiplied by 64;
Step c, performing transformation operation on the feature map with the size of 32 multiplied by 128;
step d, inputting the feature map obtained in the step c into a full-connection layer Linear1, inputting feature vectors with the size of 16 multiplied by 256, and outputting output features with the size of 1024;
and e, inputting the output characteristic with 1024 obtained in the step d into the full-connection layer Linear2, inputting the characteristic vector with 1024, and outputting the characteristic vector with 8.
Step 2, inputting a patch (p B) of a reference printed matter image and a patch (p A) of a printed matter image to be registered into a deep learning registration network, predicting four corner offsets H '4pt of p A relative to four corner offsets on the reference printed matter image p B, and performing direct linear transformation through DLT to obtain a transformation matrix H';
Step 3, performing space transformation on the transformation matrix H' and the printed matter image A to be registered to obtain a registered printed matter image;
and 4, optimizing network parameters by calculating a loss function between the registered printed matter image and the reference printed matter image, and outputting the printed matter image with higher precision.
Example 3
The invention discloses a printed matter image registration method based on a convolution cross attention mechanism, wherein a flow chart is shown in fig. 1, and the method is implemented specifically according to the following steps:
Step1, constructing a deep learning registration network, wherein the deep learning registration network comprises a convolution cross attention mechanism and an up-sampling-based depth homography estimation registration network;
in the step 1, the registration network is estimated based on the up-sampling depth homography, and the method is implemented according to the following steps:
Step a, the feature map tensor X 1 and the feature map tensor X 2 obtained in the step 1 are connected in series to obtain a single feature map with the size of 64 multiplied by 64;
Step b, carrying out transformation operation on the feature map with the size of 64 multiplied by 64;
Step c, performing transformation operation on the feature map with the size of 32 multiplied by 128;
step d, inputting the feature map obtained in the step c into a full-connection layer Linear1, inputting feature vectors with the size of 16 multiplied by 256, and outputting output features with the size of 1024;
and e, inputting the output characteristic with 1024 obtained in the step d into the full-connection layer Linear2, inputting the characteristic vector with 1024, and outputting the characteristic vector with 8.
Step b is specifically performed according to the following steps:
Step b1 a feature map of size 64X 64 is convolved by 3X 3, the output channel is 96, the padding is 1, and a characteristic diagram X a with the size of 64 multiplied by 96 is obtained;
Step b2 up-sampling of 3X 3 is performed on a feature map of size 64X 64, the output channel is 32, the padding is 1, and a characteristic diagram X b with the size of 64 multiplied by 32 is obtained;
Step b3, connecting the feature maps X a and X b, and activating with LeakyReLU with a negative slope of 0.2 to obtain a feature map with the size of 64×64×128;
Step b4, carrying out 3×3 convolution on the feature map obtained in the step b3, wherein the output channel is 128, the padding is 1, and the activation is carried out by LeakyReLU with a negative slope of 0.2, so as to obtain a feature map with the size of 64×64×128;
step b5, carrying out maximum pooling with the kernel of 2 on the feature map obtained in the step b4, and finally obtaining the feature map with the size of 32 multiplied by 128.
Step c is specifically performed according to the following steps:
Step c1, performing 3×3 convolution on a feature map with a size of 32×32×128, and obtaining a feature map X a′ with a size of 32×32×192, wherein the output channel is 192 and the padding is 1;
step c2, up-sampling the feature map with the size of 32×32×128 by 3×3, wherein the output channel is 64, and the padding is 1, so as to obtain a feature map X b′ with the size of 32×32×64;
Step c3, connecting the feature maps X a′ and X b′, and activating with LeakyReLU with a negative slope of 0.2 to obtain a feature map with the size of 32 multiplied by 256;
step c4, carrying out 3×3 convolution on the feature map obtained in the step c3, wherein the output channel is 256, the padding is 1, and the activation is carried out by LeakyReLU with a negative slope of 0.2, so as to obtain a feature map with a size of 32×32×256;
Step c5, carrying out maximum pooling with a kernel of 2 on the feature map obtained in the step c4, and finally obtaining the feature map with the size of 16 multiplied by 256.
Step 2, inputting a patch (p B) of a reference print image and a patch (p A) of a print image to be registered into a deep learning registration network, predicting four corner offsets H '4pt of p A relative to four corner offsets on the reference print image p B, and performing direct linear transformation through DLT to obtain a transformation matrix H';
step 3, performing space transformation on the transformation matrix H' and the printed matter image A to be registered to obtain a registered printed matter image;
and 4, optimizing network parameters by calculating a loss function between the registered printed matter image and the reference printed matter image, and outputting the printed matter image with higher precision.