CN118537227A

CN118537227A - Iterative interactive reference type stereo image super-resolution reconstruction method and system

Info

Publication number: CN118537227A
Application number: CN202410442888.8A
Authority: CN
Inventors: 丛润民; 盛荣晖; 张敬林; 张铭津; 张伟
Original assignee: Shandong University
Current assignee: Shandong University
Priority date: 2024-04-12
Filing date: 2024-04-12
Publication date: 2024-08-23
Anticipated expiration: 2044-04-12
Also published as: CN118537227B

Abstract

The invention discloses an iterative interactive reference type stereo image super-resolution reconstruction method and system, wherein the method comprises the following steps: acquiring two images to be reconstructed; inputting two images to be reconstructed into a trained image reconstruction model to obtain a high-resolution left view stereoscopic image and a high-resolution right view stereoscopic image; the method comprises the steps that a trained image reconstruction model adopts a plurality of information perception modules to extract features in views of two images to be reconstructed; the model also simulates the dependency relationship between views based on a pixel matching module and a tile matching module; the pixel matching module forms internal intersection and internal iteration; the block matching module generates a matching dictionary; projecting the matched dictionary into a high-resolution space, and performing high-resolution view information interaction by using feature graphs of two views as cross references; and the model also re-weights the characteristic diagram of the view through a supervision side output modulator to realize block level matching.

Description

Iterative interactive reference stereo image super-resolution reconstruction method and system

技术领域Technical Field

本发明涉及超分辨率图像重建技术领域，特别是涉及迭代交互参考式立体图像超分辨率重建方法及系统。The present invention relates to the technical field of super-resolution image reconstruction, and in particular to an iterative interactive reference stereo image super-resolution reconstruction method and system.

背景技术Background Art

对于深度估计任务来说，实际应用过程中双目立体图像仍是重要的信息载体，如何获得高分辨率双目立体图像成为学术界和产业界关注的重要问题，立体图像超分辨率(SSR)也引起了越来越多的关注。现有大多数SSR方法主要关注低分辨率(Low-Resolution,LR)空间中的视图间对应关系，而忽略了高质量超分图像之间的参考性指导作用。For depth estimation tasks, binocular stereo images are still important information carriers in practical applications. How to obtain high-resolution binocular stereo images has become an important issue of concern to academia and industry, and stereo image super-resolution (SSR) has also attracted more and more attention. Most existing SSR methods focus on the correspondence between views in the low-resolution (LR) space, while ignoring the reference guidance between high-quality super-resolution images.

总的来说，现有SSR工作可分为两条主线：1)通过视差注意机制探索LR空间的像素级对应匹配，其缺点是很少关注高分辨率(High-Resolution,HR)对应。这可能会导致LR图像中的大量细节丢失，从而限制了跨视角信息的利用。此外，LR图像本身的退化(例如噪声和模糊)也会严重影响匹配精度。2)引入额外的视差图作为先验，这主要依赖于额外的深度估计，导致网络复杂而臃肿。同时不准确的先验深度估计会对对应匹配产生不利影响。In general, existing SSR work can be divided into two main lines: 1) Exploring pixel-level correspondence matching in LR space through the disparity attention mechanism, the disadvantage of which is that it pays little attention to high-resolution (HR) correspondence. This may lead to the loss of a large amount of details in the LR image, thus limiting the utilization of cross-view information. In addition, the degradation of the LR image itself (such as noise and blur) will also seriously affect the matching accuracy. 2) Introducing additional disparity maps as priors, which mainly relies on additional depth estimation, making the network complex and bloated. At the same time, inaccurate prior depth estimation will have an adverse effect on correspondence matching.

发明内容Summary of the invention

为了解决现有技术的不足，本发明提供了迭代交互参考式立体图像超分辨率重建方法及系统；In order to solve the deficiencies of the prior art, the present invention provides an iterative interactive reference stereo image super-resolution reconstruction method and system;

一方面，提供了迭代交互参考式立体图像超分辨率重建方法，包括：On the one hand, an iterative interactive reference stereo image super-resolution reconstruction method is provided, comprising:

获取待重建的两幅图像；所述待重建的两幅图像，包括：低分辨率左视图立体图像和低分辨率右视图立体图像；Acquire two images to be reconstructed; the two images to be reconstructed include: a low-resolution left-view stereo image and a low-resolution right-view stereo image;

将待重建的两幅图像，输入训练后的图像重建模型中，得到高分辨率左视图立体图像和高分辨率右视图立体图像；Input the two images to be reconstructed into the trained image reconstruction model to obtain a high-resolution left-view stereo image and a high-resolution right-view stereo image;

其中，训练后的图像重建模型，采用若干个信息感知模块对待重建的两幅图像提取出视图内的特征；所述模型，还基于像素匹配模块和图块匹配模块来模拟视图间的依赖关系；所述像素匹配模块，形成内部交叉和内部迭代；所述图块匹配模块，生成匹配字典；将匹配字典投射到高分辨率空间，通过利用两个视图的特征图作为相互参照，进行高分辨率视图信息交互；所述模型，还通过监督侧输出调制器，对视图的特征图进行重新加权，实现图块级匹配。Among them, the trained image reconstruction model uses several information perception modules to extract features within the view of the two images to be reconstructed; the model also simulates the dependency between views based on the pixel matching module and the block matching module; the pixel matching module forms internal intersection and internal iteration; the block matching module generates a matching dictionary; the matching dictionary is projected into the high-resolution space, and high-resolution view information interaction is performed by using the feature maps of the two views as mutual references; the model also re-weights the feature maps of the views through the supervisory side output modulator to achieve block-level matching.

另一方面，提供了迭代交互参考式立体图像超分辨率重建系统，包括：On the other hand, an iterative interactive reference stereo image super-resolution reconstruction system is provided, comprising:

获取模块，其被配置为：获取待重建的两幅图像；所述待重建的两幅图像，包括：低分辨率左视图立体图像和低分辨率右视图立体图像；An acquisition module is configured to: acquire two images to be reconstructed; the two images to be reconstructed include: a low-resolution left-view stereo image and a low-resolution right-view stereo image;

重建模块，其被配置为：将待重建的两幅图像，输入训练后的图像重建模型中，得到高分辨率左视图立体图像和高分辨率右视图立体图像；A reconstruction module is configured to: input the two images to be reconstructed into the trained image reconstruction model to obtain a high-resolution left-view stereo image and a high-resolution right-view stereo image;

再一方面，还提供了一种电子设备，包括：On the other hand, an electronic device is provided, comprising:

存储器，用于非暂时性存储计算机可读指令；以及a memory for non-transitory storage of computer-readable instructions; and

处理器，用于运行所述计算机可读指令，a processor for executing the computer readable instructions,

其中，所述计算机可读指令被所述处理器运行时，执行上述第一方面所述的方法。When the computer-readable instructions are executed by the processor, the method described in the first aspect is executed.

再一方面，还提供了一种存储介质，非暂时性存储计算机可读指令，其中，当非暂时性计算机可读指令由计算机执行时，执行第一方面所述方法的指令。On the other hand, a storage medium is provided, which non-temporarily stores computer-readable instructions, wherein when the non-temporary computer-readable instructions are executed by a computer, the instructions of the method described in the first aspect are executed.

再一方面，还提供了一种计算机程序产品，包括计算机程序，所述计算机程序当在一个或多个处理器上运行的时候用于实现上述第一方面所述的方法。On the other hand, a computer program product is provided, comprising a computer program, wherein the computer program is used to implement the method described in the first aspect when running on one or more processors.

上述技术方案具有如下优点或有益效果：The above technical solution has the following advantages or beneficial effects:

(1)本发明提出了基于迭代交互参考的立体图像超分辨率模型(Reference-basedIterative Interaction for Stereo Image Super-Resolution,RIISSR)，它利用基于参照的像素与图块迭代匹配(称为P²-Matching)来建立SSR的跨视图和跨分辨率对应关系。具体来说，本发明首先设计了并行级联的信息感知块(IPB)，以提取不同视图的分层上下文特征。在两个并行IPB之间嵌入像素匹配，以利用低分辨率空间中的跨视图交互。然后，利用超分辨率立体图像对作为相互参照，执行迭代图块匹配，利用跨尺度图块递归特性学习高分辨率(HR)对应关系，以实现SSR性能。此外，本发明还引入了监督侧输出调制器(SSOM)来重新加权局部视图内特征并生成中间超分辨率图像，从而无缝衔接两种匹配机制。四个数据集上的实验结果证明了所提RIISSR网络的优越性能。(1) This paper proposes a Reference-based Iterative Interaction for Stereo Image Super-Resolution (RIISSR), which uses reference-based iterative matching of pixels and patches (called ^P2 -Matching) to establish cross-view and cross-resolution correspondences for SSR. Specifically, this paper first designs parallel cascaded information-aware blocks (IPBs) to extract hierarchical contextual features from different views. Pixel matching is embedded between two parallel IPBs to exploit cross-view interactions in the low-resolution space. Then, iterative patch matching is performed using super-resolution stereo image pairs as mutual references, and high-resolution (HR) correspondences are learned using cross-scale patch recursive properties to achieve SSR performance. In addition, this paper introduces a supervised side output modulator (SSOM) to reweight local intra-view features and generate intermediate super-resolution images, thereby seamlessly connecting the two matching mechanisms. Experimental results on four datasets demonstrate the superior performance of the proposed RIISSR network.

(2)为实现SSR任务，提出了基于迭代交互参考的立体图像超分辨率网络(RIISSR)，为立体图像超分辨率重建在相互迭代的参考模式中实现强大的跨视图交互提供了一种新的范式。(2) To achieve the SSR task, an iterative interactive reference based stereo image super-resolution network (RIISSR) is proposed, which provides a new paradigm for stereo image super-resolution reconstruction to achieve powerful cross-view interaction in a mutually iterative reference mode.

(3)提出了P²-Matching机制，迭代实现像素级匹配和图块级匹配，以学习跨视图和跨分辨率的对应关系。同时，创新性地构建了跨视角匹配字典，利用跨尺度图块递归特性进行图块匹配。(3) A ^P2 -Matching mechanism is proposed to iteratively implement pixel-level matching and tile-level matching to learn cross-view and cross-resolution correspondences. At the same time, a cross-view matching dictionary is innovatively constructed to perform tile matching using the cross-scale tile recursive feature.

(4)提出了信息感知模块IPB将通道注意力、大核注意力和特征重分配统一起来，以提取信息丰富的视图内的层次化特征。此外，还设计了具有高分辨率立体图像真值监督功能的SSOM，在像素级匹配和图块级匹配之间架起了桥梁，从而提高了超分辨率性能。(4) An information-aware module (IPB) is proposed to unify channel attention, large kernel attention, and feature redistribution to extract information-rich hierarchical features within the view. In addition, a SSOM with high-resolution stereo image ground-truth supervision is designed to bridge the gap between pixel-level matching and patch-level matching, thereby improving super-resolution performance.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

构成本发明的一部分的说明书附图用来提供对本发明的进一步理解，本发明的示意性实施例及其说明用于解释本发明，并不构成对本发明的不当限定。The accompanying drawings in the specification, which constitute a part of the present invention, are used to provide a further understanding of the present invention. The exemplary embodiments of the present invention and their descriptions are used to explain the present invention and do not constitute improper limitations on the present invention.

图1为实施例一的基于迭代交互参考的立体图像超分辨率网络；FIG1 is a stereo image super-resolution network based on iterative cross-reference according to Embodiment 1;

图2为实施例一的第一迭代信息感知单元内部结构示意图；FIG2 is a schematic diagram of the internal structure of a first iteration information sensing unit according to Embodiment 1;

图3为实施例一的像素匹配模块内部结构示意图；FIG3 is a schematic diagram of the internal structure of a pixel matching module according to the first embodiment;

图4为实施例一的第一图块匹配模块内部结构示意图；FIG4 is a schematic diagram of the internal structure of the first block matching module in accordance with the first embodiment;

图5为实施例一的监督侧输出调制器内部结构示意图；FIG5 is a schematic diagram of the internal structure of the supervisory side output modulator of Example 1;

图6为实施例一的融合模块内部结构示意图；FIG6 is a schematic diagram of the internal structure of the fusion module of Example 1;

图7为实施例一的不同图像超分辨率方法的可视化结果。FIG. 7 is a visualization result of different image super-resolution methods in Example 1.

具体实施方式DETAILED DESCRIPTION

应该指出，以下详细说明都是示例性的，旨在对本发明提供进一步的说明。除非另有指明，本文使用的所有技术和科学术语具有与本发明所属技术领域的普通技术人员通常理解的相同含义。It should be noted that the following detailed descriptions are exemplary and are intended to provide further explanation of the present invention. Unless otherwise specified, all technical and scientific terms used herein have the same meanings as those commonly understood by those skilled in the art to which the present invention belongs.

实施例一Embodiment 1

本实施例提供了迭代交互参考式立体图像超分辨率重建方法；This embodiment provides an iterative interactive reference stereo image super-resolution reconstruction method;

迭代交互参考式立体图像超分辨率重建方法，包括：Iterative interactive reference stereo image super-resolution reconstruction method, comprising:

S101：获取待重建的两幅图像；所述待重建的两幅图像，包括：低分辨率左视图立体图像和低分辨率右视图立体图像；S101: Acquire two images to be reconstructed; the two images to be reconstructed include: a low-resolution left-view stereo image and a low-resolution right-view stereo image;

S102：将待重建的两幅图像，输入训练后的图像重建模型中，得到高分辨率左视图立体图像和高分辨率右视图立体图像；S102: inputting the two images to be reconstructed into the trained image reconstruction model to obtain a high-resolution left-view stereo image and a high-resolution right-view stereo image;

进一步地，所述低分辨率，具体是指分辨率小于或等于400像素×450像素的图像；Furthermore, the low resolution specifically refers to an image with a resolution less than or equal to 400 pixels × 450 pixels;

所述高分辨率，具体是指分辨率小于或等于1600像素×1800像素的图像。The high resolution specifically refers to an image with a resolution less than or equal to 1600 pixels×1800 pixels.

所述低分辨率与高分辨率是相对而言的。The low resolution and high resolution are relative.

所述立体图像，是指由两台相机在与场景平行且只有水平偏移的两处拍摄得到的不同视角的图像对。The stereo image refers to a pair of images from different perspectives captured by two cameras at two locations parallel to the scene and only horizontally offset.

所述左视图立体图像，是指所得立体图像中左侧视角相机拍摄的图像。The left-view stereoscopic image refers to the image captured by the left-viewing camera in the obtained stereoscopic image.

所述右视图立体图像，是指所得立体图像中右侧视角相机拍摄的图像进一步地，所述训练后的图像重建模型，包括：The right-view stereo image refers to an image captured by a right-view camera in the obtained stereo image. Furthermore, the trained image reconstruction model includes:

两个并列的分支：第一分支和第二分支；Two parallel branches: the first branch and the second branch;

所述第一分支，包括：依次连接的第一卷积层、M个信息感知模块、第一上采样层、第一监督侧输出调制器、第一图块匹配模块和第一加法器；The first branch comprises: a first convolutional layer, M information perception modules, a first upsampling layer, a first supervisory side output modulator, a first block matching module and a first adder connected in sequence;

所述第二分支，包括：依次连接的第二卷积层、M个信息感知模块、第二上采样层、第二图块匹配模块、第二监督侧输出调制器和第二加法器；The second branch includes: a second convolutional layer, M information perception modules, a second upsampling layer, a second block matching module, a second supervisory side output modulator and a second adder connected in sequence;

所述第一卷积层，用于输入低分辨率左视图立体图像；The first convolutional layer is used to input a low-resolution left-view stereo image;

所述第二卷积层，用于输入低分辨率右视图立体图像；The second convolutional layer is used to input a low-resolution right-view stereo image;

所述第一分支的第i个信息感知模块的输出端与对应的第i个像素匹配模块的输入端连接；第i个像素匹配模块的输出端与第一分支的第i+1个信息感知模块的输入端连接；The output end of the i-th information perception module of the first branch is connected to the input end of the corresponding i-th pixel matching module; the output end of the i-th pixel matching module is connected to the input end of the i+1-th information perception module of the first branch;

所述第二分支的第i个信息感知模块的输出端与对应的第i个像素匹配模块的输入端连接；第i个像素匹配模块的输出端与第二分支的第i+1个信息感知模块的输入端连接；The output end of the i-th information perception module of the second branch is connected to the input end of the corresponding i-th pixel matching module; the output end of the i-th pixel matching module is connected to the input end of the i+1-th information perception module of the second branch;

第一监督侧输出调制器的输出端与第二图块匹配模块的输入端连接；An output terminal of the first supervisory-side output modulator is connected to an input terminal of the second block matching module;

第二监督侧输出调制器的输出端与第一图块匹配模块的输入端连接；An output terminal of the second supervisory-side output modulator is connected to an input terminal of the first block matching module;

第一图块匹配模块的输出端和第二监督侧输出调制器的输出端，均与融合模块的输入端连接；An output end of the first block matching module and an output end of the second supervisory side output modulator are both connected to an input end of the fusion module;

融合模块的输出端，分别与第一加法器的输入端和第二加法器的输入端连接；The output end of the fusion module is connected to the input end of the first adder and the input end of the second adder respectively;

第一卷积层的输入端通过第一上采样层与第一加法器的输入端连接；An input end of the first convolutional layer is connected to an input end of the first adder through a first upsampling layer;

第二卷积层的输入端通过第二上采样层与第二加法器的输入端连接；An input end of the second convolutional layer is connected to an input end of the second adder through a second upsampling layer;

第一加法器输出高分辨率左视图立体图像；The first adder outputs a high-resolution left-view stereo image;

第二加法器输出高分辨率右视图立体图像。The second adder outputs a high-resolution right-view stereo image.

进一步地，如图2所示，所述信息感知模块，包括：Further, as shown in FIG2 , the information perception module includes:

依次连接的感知信息提取器和细化前馈网络；The perceptual information extractor and refinement feed-forward network are connected sequentially;

所述感知信息提取器，包括依次连接的第一层归一化层、第一逐点卷积层、第一深度可分离卷积层、第一通道划分门控单元、通道注意力层、第二逐点卷积层、第三加法器；所述通道注意力层与大核注意力层并联；所述第三加法器的输入端还与第一层归一化层的输入端连接；The perceptual information extractor comprises a first normalization layer, a first point-by-point convolution layer, a first depth-separable convolution layer, a first channel partitioning gating unit, a channel attention layer, a second point-by-point convolution layer, and a third adder connected in sequence; the channel attention layer is connected in parallel with the large core attention layer; the input end of the third adder is also connected to the input end of the first normalization layer;

所述细化前馈网络，包括依次连接的第二层归一化层、第三逐点卷积层、第二深度可分离卷积层、第二通道划分门控单元、第四逐点卷积层和第四加法器；所述第四加法器的输入端还与第二层归一化层的输入端连接；The refined feedforward network comprises a second normalization layer, a third point-by-point convolution layer, a second depth-separable convolution layer, a second channel partitioning gating unit, a fourth point-by-point convolution layer and a fourth adder connected in sequence; the input end of the fourth adder is also connected to the input end of the second normalization layer;

第一层归一化层的输入端是信息感知模块的输入端，所述第四加法器的输出端是信息感知模块的输出端；第二层归一化层的输入端与第三加法器的输出端连接。The input end of the first normalization layer is the input end of the information perception module, and the output end of the fourth adder is the output end of the information perception module; the input end of the second normalization layer is connected to the output end of the third adder.

进一步地，所述通道注意力层，包括：依次连接的平均池化层、第五逐点卷积层和第一乘法器，所述平均池化层的输入端与第一乘法器的输入端连接。Furthermore, the channel attention layer includes: an average pooling layer, a fifth point-by-point convolutional layer and a first multiplier connected in sequence, and the input end of the average pooling layer is connected to the input end of the first multiplier.

进一步地，所述大核注意力模块，包括：依次连接的5*5卷积层、7*7卷积层、第六逐点卷积层和第二乘法器，所述5*5卷积层的输入端与第二乘法器的输入端连接。Furthermore, the large core attention module includes: a 5*5 convolutional layer, a 7*7 convolutional layer, a sixth point-by-point convolutional layer and a second multiplier connected in sequence, and the input end of the 5*5 convolutional layer is connected to the input end of the second multiplier.

进一步地，所述第一通道划分门控单元，和第二通道划分门控单元的内部结构是一致的，所述第一通道划分门控单元，包括：Furthermore, the internal structure of the first channel division gating unit is consistent with that of the second channel division gating unit, and the first channel division gating unit includes:

通道划分层，所述通道划分层将每一个输入的特征图划分为若干个切片，并将相邻的切片进行逐元素乘积，并将得到的乘积进行串联拼接，得到输出特征图。The channel partitioning layer divides each input feature map into several slices, performs element-by-element product on adjacent slices, and concatenates the obtained products in series to obtain an output feature map.

应理解地，信息感知模块(IPB)是一个变压器式卷积块，由感知信息提取器和细化前馈网络组成。两个模块中分别包含了两个通道划分门控，用于重新分配通道间的特征。It should be understood that the information perception module (IPB) is a transformer-style convolutional block consisting of a perception information extractor and a refinement feedforward network. The two modules each contain two channel partitioning gates for redistributing features between channels.

进一步地，所述信息感知模块，用于：Furthermore, the information perception module is used to:

假设感知信息提取器的输入特征为通道数为C；感知信息提取器，首先利用1×1逐点卷积和3×3的深度可分离卷积，从层归一化处理后的特征X中提取局部特征公式为：Assume that the input features of the perceptual information extractor are The number of channels is C; the perceptual information extractor first uses 1×1 point-by-point convolution and 3×3 depth-separable convolution to extract local features from the layer-normalized feature X The formula is:

其中，Conv_PW(·)表示逐点卷积，即卷积核为1×1的卷积，Conv_DW(·)为深度可分离卷积，每个通道独立进行卷积运算的卷积操作，LN(·)为层归一化。Among them, Conv _PW (·) represents point-by-point convolution, that is, convolution with a convolution kernel of 1×1, Conv _DW (·) is a depth-wise separable convolution, in which each channel performs convolution operations independently, and LN (·) is layer normalization.

然后，使用第一通道划分门控单元(CS)将沿通道维度分成n个切片。Then, use the first channel partition gating unit (CS) to Divide into n slices along the channel dimension.

将通道间的特征成对关联，形成n/2组，并采用非线性门控机制整合信息：The features between channels are associated in pairs to form n/2 groups, and a nonlinear gating mechanism is used to integrate the information:

其中，表示通过操作第(n-1)和第n通道和得到的第(n/2)个输出，是GELU激活函数，⊙是逐元素乘积；将所有组的通道特征合并并重排序，得到重新分布的特征 in, Indicates that by operating the (n-1)th and nth channels and The (n/2)th output obtained is, is the GELU activation function, ⊙ is the element-wise product; all groups The channel features of are merged and reordered to obtain redistributed features

然后，利用大核注意力层(LKA)和通道注意力层(CA)来分别捕捉长期依赖性和全局空间分布。细化前馈网络通过通道划分门控来控制信息流，允许每个通道专注于恢复与其他通道间互补的精细细节。Then, large kernel attention (LKA) and channel attention (CA) layers are used to capture long-term dependencies and global spatial distribution, respectively. The refined feed-forward network controls the information flow through channel-wise gating, allowing each channel to focus on recovering fine details that are complementary to those of other channels.

应理解地，逐点卷积即为1×1卷积核的普通卷积，深度可分离卷积则是对特征的每个通道独立分配卷积核进行卷积运算，参数量相对于普通卷积大幅下降。It should be understood that point-by-point convolution is ordinary convolution with a 1×1 convolution kernel, and depth-wise separable convolution is to independently assign a convolution kernel to each channel of the feature for convolution operation, and the number of parameters is greatly reduced compared to ordinary convolution.

应理解地，通道注意力层用于计算特征图的全局统计数据，以加强对重要特征的关注。同时，大核注意力层利用更大的卷积核来捕捉视图内图像的长程依赖关系，从而加强对局部信息的关注。这两种注意力机制相结合，使感知信息提提取器能够有效地模拟输入图像中包含的层次信息，并准确地恢复纹理细节。It should be understood that the channel attention layer is used to calculate the global statistics of the feature map to strengthen the focus on important features. At the same time, the large kernel attention layer uses larger convolution kernels to capture the long-range dependencies of the image within the view, thereby strengthening the focus on local information. The combination of these two attention mechanisms enables the perceptual information extractor to effectively model the hierarchical information contained in the input image and accurately restore texture details.

应理解地，通过信息感知模块以及整体的整合堆叠，本发明提升了模型的性能同时大幅提升模型的处理速度。信息感知模块是并行部署的，用于提取左右视图的视图内部特征。因此，通过信息感知模块，得到相应的视图内特征和 It should be understood that, through the information perception module and the overall integration stack, the present invention improves the performance of the model and greatly improves the processing speed of the model. The information perception module is deployed in parallel and is used to extract the internal features of the left and right views. Therefore, through the information perception module, the corresponding internal features of the view are obtained. and

应理解地，以往的立体图像超分辨率重建方法大多是从低分辨率图像中计算像素级特征相似度来进行立体匹配，但低分辨率空间域本身的分辨率限制以及图像降质影响使得单一的像素级匹配很难充分传递互补信息，这阻碍了高分辨率细粒度细节的恢复。虽然有些方法建议通过估计高分辨率视差图来提高立体对应性，但额外的视差估计网络会增加计算开销，大幅增加网络参数量。此外，不准确的先验视差估计还会反过来对左右视图对应关系产生不利影响。It should be understood that most of the previous stereo image super-resolution reconstruction methods calculate pixel-level feature similarity from low-resolution images for stereo matching. However, the resolution limitation of the low-resolution spatial domain itself and the image degradation make it difficult for a single pixel-level match to fully convey complementary information, which hinders the recovery of high-resolution fine-grained details. Although some methods suggest improving stereo correspondence by estimating high-resolution disparity maps, the additional disparity estimation network will increase the computational overhead and significantly increase the number of network parameters. In addition, inaccurate prior disparity estimates will in turn have an adverse effect on the correspondence between the left and right views.

本发明从有参考重建的角度重新思考了立体图像对应，并提出了P²-Matching方法，将左右视图之间的立体信息补充视为一种相互参照模式，通过基于参照的像素匹配和图块级匹配学习跨视图和跨分辨率的立体对应关系。The present invention rethinks stereo image correspondence from the perspective of reference reconstruction and proposes a P ² -Matching method, which regards the stereo information complement between the left and right views as a mutual reference mode and learns cross-view and cross-resolution stereo correspondence through reference-based pixel matching and tile-level matching.

进一步地，如图3所示，所述像素匹配模块，用于：Furthermore, as shown in FIG3 , the pixel matching module is used to:

假设和是第i个信息感知模块提取的低分辨率左视角和右视角立体配对特征，分别视作另一个视图的参考特征。Assumptions and are the low-resolution left-view and right-view stereo pairing features extracted by the i-th information perception module, which are regarded as reference features of the other view.

首先对立体配对特征进行层归一化处理，然后分别送入逐点卷积，当前视图的输出作为查询(query)和另一视图的特征作为键值(key)。因此，对于左视图的重建，有以下表示：First, the stereo pair features are layer-normalized and then fed into point-by-point convolution, with the output of the current view as the query and the features of the other view as the key. Therefore, for the reconstruction of the left view, the following representation is obtained:

其中，Conv_PW(·)表示逐点卷积，LN(·)表示层归一化，Q_L表示生成的左视图查询，K_R表示生成的右视图键值，V_R表示逐点卷积产生的值。Among them, Conv _PW (·) represents point-by-point convolution, LN (·) represents layer normalization, Q _L represents the generated left view query, K _R represents the generated right view key, and V _R represents the generated point-by-point convolution. value.

对于右视图中的每个像素，计算它与左视图中所有可能像素的相似度分数，生成一个交叉注意力图A_R→L：For each pixel in the right view, we calculate its similarity score with all possible pixels in the left view and generate a cross attention map A _R→L :

其中，Softmax为归一化激活函数，为矩阵乘法，(K_R)^T为右视图键值的转置。在与右视图特征值V_R矩阵相乘计算分数后，利用左视图的补充信息更新右侧特征 Among them, Softmax is the normalized activation function, is the matrix multiplication, (K _R ) ^T is the transpose of the right view key value. After multiplying the right view feature value _VR matrix to calculate the score, the right feature is updated using the supplementary information of the left view

对称地，对于左侧特征A_R→L只需转置为交叉注意图A_L→R，从而生成左视图对右视图的补充特征 Symmetrically, for the left feature _AR→L only needs to be transposed into the cross attention map AR _→R to generate the complementary features of the left view to the right view.

其中，(A_R→L)^T为右侧交叉注意力的转置，即A_L→R，V_L表示逐点卷积产生的值。Among them, ( _AR→L ) ^T is the transposition of the right cross attention, that is, A _L→R , and V _L represents the point-by-point convolution. value.

应理解地，为了克服低分辨率特征空间域中细粒度细节的损失，除了像素级匹配外，本发明还设计了图块级匹配，利用高分辨率空间的特征表示在高频区域提供更高的精度，以指导高分辨率重建。整个过程包括匹配字典构建和超分辨率图块传输。It should be understood that in order to overcome the loss of fine-grained details in the low-resolution feature space domain, in addition to pixel-level matching, the present invention also designs tile-level matching, which uses the feature representation of the high-resolution space to provide higher accuracy in high-frequency areas to guide high-resolution reconstruction. The whole process includes matching dictionary construction and super-resolution tile transmission.

进一步地，如图4所示，所述第一图块匹配模块与第二图块匹配模块的内部工作过程是一致的，所述第一图块匹配模块，用于：Furthermore, as shown in FIG4 , the internal working process of the first block matching module is consistent with that of the second block matching module. The first block matching module is used to:

将左、右视图特征均展开为特征图块；Expand the left and right view features into feature blocks;

基于匹配字典，查询当前视图的相似度得分和图块位置，将超分辨率参考特征转移到当前视图；Based on the matching dictionary, the similarity score and tile position of the current view are queried to transfer the super-resolution reference features to the current view;

因此，从左视图到右视图的第一次迭代表述为：Therefore, the first iteration from the left view to the right view is expressed as:

其中，表示通过顺序的1×1点卷积和3×3深度卷积学习到的投影矩阵，是投影后的左视图高分辨率特征，γ_R是一个可训练的通道缩放参数，是根据对进行重新排列的查询操作，为查询得到的相似度得分，为右视图上采样至高分辨后的特征，为优化后的右视图特征。in, represents the projection matrix learned by sequential 1×1 point convolution and 3×3 depth convolution, is the high-resolution feature of the left view after projection, γ _R is a trainable channel scaling parameter, is based on right Perform a rearranged query operation. is the similarity score obtained from the query, is the feature after upsampling the right view to high resolution, It is the optimized right view feature.

在从右视图到左视图的第二次迭代中，反转了引导关系，利用改进后的右视图特征作为参考。根据公式(9)、(10)，得到比原来左视图特征信息量更大的优化特征 In the second iteration from the right view to the left view, the guiding relationship is reversed, and the improved right view features are used. As a reference. According to formulas (9) and (10), we can get the feature of the left view More informative optimization features

应理解地，结合像素级匹配和图块级匹配机制，本发明提出的模型可以实现充分的跨视图跨分辨率的交互，提高互补信息交换的准确性。根据跨尺度图块重现特性对视图间相关性进行建模，因为两个视图之间的视差及其相似图块往往会在不同尺度上重现多次。例如，给定成对的低分辨率图像，它们在低分辨率或高分辨率空间中的差异是一致的，而且也能找到相似的图块。因此，可以构建一个LR-LLR(降采样的低分辨率图像)映射来学习跨尺度视图间的对应关系，然后在高分辨率空间中聚合相似的图块，由于大多数计算都是在低分辨率图像和LLR图像之间进行的，这种方法既高效又有效。It should be understood that, combined with pixel-level matching and tile-level matching mechanisms, the model proposed in the present invention can achieve sufficient cross-view and cross-resolution interaction and improve the accuracy of complementary information exchange. The correlation between views is modeled based on the cross-scale tile reproduction characteristics, because the disparity between two views and their similar tiles tend to reproduce multiple times at different scales. For example, given a pair of low-resolution images, their differences in low-resolution or high-resolution space are consistent, and similar tiles can be found. Therefore, a LR-LLR (downsampled low-resolution image) mapping can be constructed to learn the correspondence between cross-scale views, and then similar tiles are aggregated in the high-resolution space. Since most of the calculations are performed between low-resolution images and LLR images, this method is both efficient and effective.

进一步地，所述字典的构建过程包括：Furthermore, the dictionary construction process includes:

对于低分辨率输入图像首先将其降采样至更低分辨率然后对于中的每个图块计算相似性分数图 For low-resolution input images First downsample it to a lower resolution Then for Each tile in Compute similarity score graph

其中，i、j表示图块的中心位于第i行(i对应h维度)和第j列，l、r分别对应左右视图，Norm表示归一化操作，和分别表示左视图和右视图的图块集合。Among them, i and j indicate that the center of the block is located in the i-th row (i corresponds to the h dimension) and the j-th column, l and r correspond to the left and right views respectively, and Norm represents the normalization operation. and A collection of tiles representing the left view and the right view respectively.

然后，根据构建的相似性分数图沿极线搜索其在左侧低分辨率图像中的相似图块，那么左侧字典中元素的值和位置为：Then, based on the constructed similarity score graph Search along the epipolar line in the low-resolution image on the left If we want to find similar tiles in , then the values and positions of the elements in the dictionary on the left are:

其中，和表示图块的最大相似度得分和位置，表示求最大值，表示求最大值的位置。in, and Representation tile The maximum similarity score and position of It means to find the maximum value, Indicates the location of the maximum value.

将左视图的匹配字典转置来构建右视图的另一个字典，用和表示。Transpose the matching dictionary of the left view to construct another dictionary for the right view, using and express.

为了给块匹配提供高质量的超分辨率特征，在亚像素卷积之后，上采样后的特征被送入监督侧出调制模块(SSOM)以提升特征表示。In order to provide high-quality super-resolution features for block matching, after sub-pixel convolution, the upsampled features It is fed into the supervised side-out modulation module (SSOM) to improve the feature representation.

进一步地，如图5所示，所述第一监督侧输出调制器与第二监督侧输出调制器的内部结构是一致的，所述第一监督侧输出调制器，用于：Further, as shown in FIG5 , the internal structures of the first supervisory side output modulator and the second supervisory side output modulator are consistent, and the first supervisory side output modulator is used to:

上采样后的左侧特征首先通过通道点卷积映射到图像层。Left feature after upsampling First, it is mapped to the image layer through channel point convolution.

然后，将左侧特征与双三次插值后的图像相加，还原出一个初始的超分辨率粗糙图像 Then, the left feature Image after bicubic interpolation Add together to restore an initial super-resolution rough image

其中，表示双三次插值上采样，为左侧低分辨率图像，Conv_1×1为1×1卷积。通过3×3卷积将其重新映射到特征空间，对得到的特征进行残差的非线性门运算，并生成特征在图块匹配中被用作参考特征，以促进右视图特征的生成。in, represents bicubic interpolation upsampling, The left low-resolution image, Conv _1×1 is a 1×1 convolution. It is remapped to the feature space through 3×3 convolution, and the residual nonlinear gate operation is performed on the obtained features to generate feature Used as reference features in patch matching to facilitate the generation of right view features.

其中，Sigmoid为激活函数，⊙为像素级点积，Conv_3×3为3×3卷积。Among them, Sigmoid is the activation function, ⊙ is the pixel-level dot product, and Conv _3×3 is a 3×3 convolution.

对于预测的根据与高分辨率真值图像的特征结构相似性和像素距离提供明确的优化监督信息。For the predicted Based on the high-resolution ground-truth image The feature structure similarity and pixel distance provide clear optimization supervision information.

将和投影到同一个特征空间，并以结构相似性得分(StructuralSimilarity Index Measurement，SSIM)被用作筛选标准：Will and Projected into the same feature space, the structural similarity score (StructuralSimilarity Index Measurement, SSIM) is used as the screening criterion:

随后保留相似度得分最高的前K个通道：Then retain the top K channels with the highest similarity scores:

其中，表示基于真值的高分辨率特征与之间的结构相似性得分的TOP-K选择。为了优化第一监督侧输出调制器，利用逐点卷积和残差连接来恢复出新的图像应理解地，基于并行第一监督侧输出调制器和第二监督侧输出调制器，所提出的RIISSR网络生成优化的超分辨率立体特征和作为彼此的超分辨率参考来执行图块匹配。in, Represents high-resolution features based on ground-truth values and To optimize the output modulator of the first supervision side, point-wise convolution and residual connection are used to recover the new image. It should be understood that based on the parallel first supervisory side output modulator and the second supervisory side output modulator, the proposed RIISSR network generates optimized super-resolution stereo features and Patch matching is performed as super-resolution references to each other.

进一步地，如图6所示，所述融合模块，包括：Furthermore, as shown in FIG6 , the fusion module includes:

首先将超分辨率特征与其右侧的高分辨率参考特征沿通道维度级联起来，然后送入卷积层，生成与大小相同的可学习参数的β和γ；First, the super-resolution feature The high-resolution reference feature to its right Cascaded along the channel dimension and then sent to the convolution layer to generate The learnable parameters β and γ are of the same size;

随后对右视图参考特征使用层归一化处理，其中和分别是的平均值和标准偏差：Then reference the feature in the right view Use layer normalization, where and They are The mean and standard deviation of:

更新后的γ_R和β_R作为权重参数合并到归一化特征中，由得到，最后，融合模块输出 The updated γ _R and β _R are merged into the normalized features as weight parameters In, by Finally, the fusion module output

进一步地，所述第一上采样模块和第二上采样模块的内部功能是一致的，所述第一上采样模块，用于以双线性插值的方式提高特征分辨率。Furthermore, the internal functions of the first upsampling module and the second upsampling module are consistent, and the first upsampling module is used to improve feature resolution by bilinear interpolation.

进一步地，所述训练后的图像重建模型，训练过程包括：Furthermore, the training process of the trained image reconstruction model includes:

构建训练集，所述训练集为已知重建后图像的左视图和右视图立体图像；Constructing a training set, wherein the training set is a left view and a right view stereo image of a known reconstructed image;

将训练集，输入到图像重建模型中，对模型进行训练，当模型的总损失函数值不再下降时，或者迭代次数超过设定次数时，停止训练，得到训练后的图像重建模型。The training set is input into the image reconstruction model to train the model. When the total loss function value of the model no longer decreases or the number of iterations exceeds the set number, the training is stopped to obtain the trained image reconstruction model.

进一步地，所述模型的总损失函数L_total，表达式为：Furthermore, the total loss function L _total of the model is expressed as:

L_total＝L_SO+L_MSE+λ₃·L_FC (20)L _total ＝L _SO +L _MSE +λ ₃ ·L _FC (20)

其中，λ₃设为0.01；Among them, λ ₃ is set to 0.01;

将对监督侧输出调制器的监督表示为L_SO，其公式为：The supervision of the output modulator on the supervisory side is denoted as L _SO , and its formula is:

其中，和表示高分辨率立体图像真值，和表示整个网络超分辨率后的立体图像对，‖·‖²为求二范数，λ₁和λ₂分别设为0.01和0.02，N为像素值个数。in, and represents the high-resolution stereo image truth, and represents the stereo image pair after super-resolution of the entire network, ‖·‖ ² is the binary norm, λ ₁ and λ ₂ are set to 0.01 and 0.02 respectively, and N is the number of pixel values.

L_MSE为MSE损失：L _MSE is the MSE loss:

其中，表示左视角和右视角的输出，表示高分辨率立体图像真值对；in, Represents the output of the left and right views, represents a high-resolution stereo image ground-truth pair;

频率损失L_FC定义为：The frequency loss L _FC is defined as:

其中，FFT(·)为快速傅里叶变换，所有实验中的常数ε都设为1×10^-6。Here, FFT(·) is fast Fourier transform, and the constant ε in all experiments is set to 1×10 ^-6 .

事实上，左右视图之间的立体信息补充可以看作是一种相互参照模式。将重建好的SR图像作为另一个视图的参考图像有助于更好地重建另一视角图像，因为高分辨率特征能提供更为丰富的纹理细节，有利于跨视图信息的利用。有鉴于此，本发明从一种新的视角来实现立体图像超分辨率SSR任务，即将其建模为基于参考的图像超分辨率重建(Ref-SR)任务，提出了基于迭代交互参考式立体图像超分辨率重建方法(Reference-basedIterative Interaction for Stereo Image Super-Resolution,RIISSR)。In fact, the stereo information complementation between the left and right views can be regarded as a mutual reference mode. Using the reconstructed SR image as the reference image of another view helps to better reconstruct the image of another perspective, because high-resolution features can provide richer texture details, which is conducive to the use of cross-view information. In view of this, the present invention realizes the stereo image super-resolution SSR task from a new perspective, that is, it is modeled as a reference-based image super-resolution reconstruction (Ref-SR) task, and proposes a reference-based interactive reference stereo image super-resolution reconstruction method (Reference-basedIterative Interaction for Stereo Image Super-Resolution, RIISSR).

RIISSR将左视图和右视图的立体图像视为彼此的参照图像，并学习左视图和右视图空间中的视图间对应关系，以实现充分的交互。为此，RIISSR提出了P²-Matching方法，即通过基于参照的像素匹配和图块匹配来模拟视图间的依赖关系。具体来说，在两个不同视图之间进行像素级匹配时，本发明首先设计了并行权重共享的信息感知块(InformationPerception Block，IPB)提取两个视图的视内信息特征。IPB将通道注意力、大核注意力和特征重分配统一起来，从而实现了强大的特征表示。RIISSR regards the stereo images of the left view and the right view as each other's reference images, and learns the correspondence between views in the left view and the right view space to achieve full interaction. To this end, RIISSR proposes a ^P2 -Matching method, which simulates the dependency between views through reference-based pixel matching and block matching. Specifically, when performing pixel-level matching between two different views, the present invention first designs a parallel weight-sharing Information Perception Block (IPB) to extract the intra-view information features of the two views. IPB unifies channel attention, large core attention, and feature redistribution to achieve powerful feature representation.

在两个并行的IPB之间，本发明通过参考另一个视图中所有可能的差异来计算当前视图中每个像素的特征相似性，从而学习低分辨率立体对应关系。通过堆叠并行IPB与像素匹配，模型可以形成一种内部交叉、内部迭代机制。Between two parallel IPBs, the present invention calculates the feature similarity of each pixel in the current view by referring to all possible differences in the other view, thereby learning low-resolution stereo correspondence. By stacking parallel IPBs and pixel matching, the model can form an internal crossover, internal iteration mechanism.

至于图块级匹配，本发明利用立体图像的跨尺度图块递归特性(即相似的图像结构信息在不同分辨率的图像中可以较好保持)，测量它们的图块相似性，从而生成用于查询的匹配字典。As for the tile-level matching, the present invention utilizes the cross-scale tile recursive property of stereo images (ie, similar image structure information can be well preserved in images of different resolutions) to measure their tile similarities, thereby generating a matching dictionary for query.

后续匹配字典将被进一步投射到高分辨率空间，网络通过利用两个视图的超分辨率结果作为相互的参照，进行高分辨率视图信息交互。超分辨率-低分辨率的信息传递形式打破了传统SSR方法只能在低分辨率空间交互的分辨率限制，极大地提升了有效信息量。The subsequent matching dictionary will be further projected into the high-resolution space, and the network will interact with the high-resolution view information by using the super-resolution results of the two views as mutual references. The super-resolution-low-resolution information transfer form breaks the resolution limitation of the traditional SSR method that can only interact in the low-resolution space, greatly increasing the amount of effective information.

此外，本发明引入了监督侧输出调制器(SSOM)，它能对视图的超分辨率局部特征进行重新加权，并为超分辨率图像提供高分辨率立体图像真值监督，以实现图块级匹配。在SSOM的支持下，像素级匹配和图块级匹配得以实现，从而提高了SSR的准确性和立体一致性。In addition, the present invention introduces a supervised side output modulator (SSOM), which can re-weight the super-resolution local features of the view and provide high-resolution stereo image ground truth supervision for the super-resolution image to achieve patch-level matching. With the support of SSOM, pixel-level matching and patch-level matching are achieved, thereby improving the accuracy and stereo consistency of SSR.

基于迭代交互参考的立体图像超分辨率网络整体框架，如图1所示。它是一个双流架构，由三个关键部分组成：信息感知模块(IPB)，用于提取左右视图的特征；P²-Matcing匹配模块，由像素级匹配和图块级匹配组成，用于学习立体图像间的对应关系；监督侧输出调制器(SSOM)，用于视图内超分辨率特征表征的改进和高分辨率参考视图的恢复。给定低分辨率立体图像对RIISSR的目标是将它们超分辨率为高质量的高分辨率版本在SSR任务中，视图内信息和视图间信息在超分辨率重建中都起着至关重要的作用。为了提取左右视图的空间特征，在进行3×3卷积将图像特征映射为64通道后，M个IPB将并行级联进行堆叠，以较深的网络提取获得相应视图的低分辨率特征。由于立体图像的特殊性，两个视图的IPB采用简单的权重共享策略，严格对称。如前文所分析，低分辨率空间域的左右视图特征不够准确，因此其特征提取以及跨视图交互都需要多次迭代更新，不断纠正分辨率以及原始降质造成的误差，并为后续的跨分辨率图块级匹配提供更准确的参考特征。The overall framework of the stereo image super-resolution network based on iterative cross-reference is shown in Figure 1. It is a two-stream architecture consisting of three key parts: the information perception module (IPB), which is used to extract the features of the left and right views; the ^P2 -Matcing matching module, which consists of pixel-level matching and patch-level matching, is used to learn the correspondence between stereo images; and the supervised side output modulator (SSOM), which is used to improve the representation of intra-view super-resolution features and restore the high-resolution reference view. Given a low-resolution stereo image pair The goal of RIISSR is to super-resolve them into high-quality, high-resolution versions In the SSR task, both intra-view information and inter-view information play a vital role in super-resolution reconstruction. In order to extract the spatial features of the left and right views, after performing 3×3 convolution to map the image features to 64 channels, M IPBs are stacked in parallel cascade to obtain the low-resolution features of the corresponding views with a deeper network extraction. Due to the particularity of stereo images, the IPBs of the two views adopt a simple weight sharing strategy and are strictly symmetrical. As analyzed above, the left and right view features of the low-resolution spatial domain are not accurate enough, so their feature extraction and cross-view interaction require multiple iterative updates to continuously correct the errors caused by resolution and original degradation, and provide more accurate reference features for subsequent cross-resolution tile-level matching.

为此，在任意两个并行的IPB之间本发明嵌入了像素级匹配，执行跨视图交互并汇总视图内的互补信息。通过迭代部署具有像素匹配功能的IPB，网络可以形成一种内部交叉、内部迭代机制，从而提高网络的特征交换能力，生成增强的低分辨率立体特征和 To this end, the present invention embeds pixel-level matching between any two parallel IPBs, performs cross-view interaction and aggregates complementary information within the view. By iteratively deploying IPBs with pixel matching function, the network can form an internal crossover and internal iteration mechanism, thereby improving the network's feature exchange capability and generating enhanced low-resolution stereo features. and

之后，本发明采用亚像素卷积(Pixel-Shuffle)操作对和进行超分辨率处理，得到成对的超分辨率特征和 Afterwards, the present invention uses a sub-pixel convolution (Pixel-Shuffle) operation to and Perform super-resolution processing to obtain paired super-resolution features and

监督侧输出调制器(SSOM)可对左右视图的超分辨率特征重新加权，并提供了基于特征结构相似性和像素距离的高分辨率立体图像真值监督，以强调更具区分性的特征，并生成中间层SR图像后续的图块级匹配利用超分辨率结果和作为相互参照来学习高分辨率立体对应关系，借助相似图块可跨尺度重复出现的特性来促进匹配。The supervised side output modulator (SSOM) re-weights the super-resolution features of the left and right views and provides ground-truth supervision of high-resolution stereo images based on feature structure similarity and pixel distance to emphasize more discriminative features and generate intermediate-level SR images. Subsequent tile-level matching utilizes super-resolution results and We use them as cross-references to learn high-resolution stereo correspondences, leveraging the property that similar patches recur across scales to facilitate matching.

为了生成最终的超分辨率立体图像，本发明采用了MASA中类似的空间自适应模块作为网络的融合模块(见图1)，它可以将一个视图中的高分辨率参考特征的分布重新映射到另一个视图中。如图2所示，本发明的RIISSR模型在Flickr1024和Middlebury验证数据集上对不同的4×超分辨率方法进行了直观比较。在第一组复杂室外场景中，本发明提出的RIISSR准确地还原了窗台的纹理，显示出更清晰的边缘和细节。第一行中的单图超分辨率算法SwinIR和双目超分辨率算法基本无法恢复出纹理细节，呈现完全模糊的状态。单图的SwinIR算法虽然有比双目算法PASSRnet更好的定量性能，但其视觉重建细节却有所不如。而第二行的双目超分辨率方法则恢复出了一定的粗糙细节。在重建室内的"sword"场景中，所展示的篮框凹痕处RIISSR也显著地生成了精细的细节，右下角出的锐利程度甚至比HR有更好的视觉表现。与室外场景类似，第一行的方法均不能有效重建出场景细节而第二行则有更好的视觉效果。结果表明，与现有的最先进的SSR方法相比，RIISSR具有更高的重建性能和出色的室内外场景恢复效果。图7为实施例一的不同图像超分辨率方法的可视化结果。In order to generate the final super-resolution stereo image, the present invention uses a similar spatial adaptive module in MASA as the fusion module of the network (see Figure 1), which can remap the distribution of high-resolution reference features in one view to another view. As shown in Figure 2, the RIISSR model of the present invention makes an intuitive comparison of different 4× super-resolution methods on the Flickr1024 and Middlebury validation datasets. In the first group of complex outdoor scenes, the RIISSR proposed by the present invention accurately restores the texture of the windowsill, showing clearer edges and details. The single-image super-resolution algorithm SwinIR and the binocular super-resolution algorithm in the first row are basically unable to restore the texture details, showing a completely blurred state. Although the single-image SwinIR algorithm has better quantitative performance than the binocular algorithm PASSRnet, its visual reconstruction details are not as good. The binocular super-resolution method in the second row restores certain rough details. In the reconstruction of the "sword" scene in the room, RIISSR also significantly generates fine details at the dents of the basket, and the sharpness in the lower right corner is even better than HR. Visual performance. Similar to outdoor scenes, the methods in the first row cannot effectively reconstruct the scene details, while the methods in the second row have better visual effects. The results show that compared with the existing state-of-the-art SSR methods, RIISSR has higher reconstruction performance and excellent indoor and outdoor scene restoration effects. Figure 7 shows the visualization results of different image super-resolution methods in Example 1.

实施例二Embodiment 2

本实施例提供了迭代交互参考式立体图像超分辨率重建系统；This embodiment provides an iterative interactive reference stereo image super-resolution reconstruction system;

迭代交互参考式立体图像超分辨率重建系统，包括：Iterative cross-reference stereo image super-resolution reconstruction system, including:

此处需要说明的是，上述获取模块和重建模块对应于实施例一中的步骤S101至S102，上述模块与对应的步骤所实现的示例和应用场景相同，但不限于上述实施例一所公开的内容。需要说明的是，上述模块作为系统的一部分可以在诸如一组计算机可执行指令的计算机系统中执行。It should be noted that the acquisition module and the reconstruction module correspond to steps S101 to S102 in Embodiment 1, and the examples and application scenarios implemented by the modules and the corresponding steps are the same, but are not limited to the contents disclosed in Embodiment 1. It should be noted that the modules as part of the system can be executed in a computer system such as a set of computer executable instructions.

上述实施例中对各个实施例的描述各有侧重，某个实施例中没有详述的部分可以参见其他实施例的相关描述。The description of each embodiment in the above embodiments has different emphases. For parts not described in detail in a certain embodiment, reference can be made to the relevant descriptions of other embodiments.

所提出的系统，可以通过其他的方式实现。例如以上所描述的系统实施例仅仅是示意性的，例如上述模块的划分，仅仅为一种逻辑功能划分，实际实现时，可以有另外的划分方式，例如多个模块可以结合或者可以集成到另外一个系统，或一些特征可以忽略，或不执行。The proposed system can be implemented in other ways. For example, the system embodiment described above is only illustrative, and the division of the modules is only a logical function division. In actual implementation, there may be other division methods, such as multiple modules can be combined or integrated into another system, or some features can be ignored or not executed.

实施例三Embodiment 3

本实施例还提供了一种电子设备，包括：一个或多个处理器、一个或多个存储器、以及一个或多个计算机程序；其中，处理器与存储器连接，上述一个或多个计算机程序被存储在存储器中，当电子设备运行时，该处理器执行该存储器存储的一个或多个计算机程序，以使电子设备执行上述实施例一所述的方法。This embodiment also provides an electronic device, including: one or more processors, one or more memories, and one or more computer programs; wherein the processor is connected to the memory, and the one or more computer programs are stored in the memory. When the electronic device is running, the processor executes the one or more computer programs stored in the memory so that the electronic device executes the method described in the above embodiment one.

应理解，本实施例中，处理器可以是中央处理单元CPU，处理器还可以是其他通用处理器、数字信号处理器DSP、专用集成电路ASIC，现成可编程门阵列FPGA或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。It should be understood that in this embodiment, the processor may be a central processing unit CPU, and the processor may also be other general-purpose processors, digital signal processors DSP, application-specific integrated circuits ASIC, off-the-shelf programmable gate arrays FPGA or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. The general-purpose processor may be a microprocessor or the processor may also be any conventional processor, etc.

存储器可以包括只读存储器和随机存取存储器，并向处理器提供指令和数据、存储器的一部分还可以包括非易失性随机存储器。例如，存储器还可以存储设备类型的信息。The memory may include a read-only memory and a random access memory, and provide instructions and data to the processor. A portion of the memory may also include a non-volatile random access memory. For example, the memory may also store information about the device type.

在实现过程中，上述方法的各步骤可以通过处理器中的硬件的集成逻辑电路或者软件形式的指令完成。In the implementation process, each step of the above method can be completed by an integrated logic circuit of hardware in a processor or an instruction in the form of software.

实施例一中的方法可以直接体现为硬件处理器执行完成，或者用处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器、闪存、只读存储器、可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于存储器，处理器读取存储器中的信息，结合其硬件完成上述方法的步骤。为避免重复，这里不再详细描述。The method in the first embodiment can be directly embodied as a hardware processor, or a combination of hardware and software modules in the processor. The software module can be located in a mature storage medium in the field such as a random access memory, a flash memory, a read-only memory, a programmable read-only memory, or an electrically erasable programmable memory, a register, etc. The storage medium is located in the memory, and the processor reads the information in the memory and completes the steps of the above method in combination with its hardware. To avoid repetition, it will not be described in detail here.

本领域普通技术人员可以意识到，结合本实施例描述的各示例的单元及算法步骤，能够以电子硬件或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行，取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能，但是这种实现不应认为超出本发明的范围。Those skilled in the art will appreciate that the units and algorithm steps of each example described in conjunction with this embodiment can be implemented in electronic hardware or a combination of computer software and electronic hardware. Whether these functions are performed in hardware or software depends on the specific application and design constraints of the technical solution. Professional and technical personnel can use different methods to implement the described functions for each specific application, but such implementation should not be considered to be beyond the scope of the present invention.

实施例四Embodiment 4

本实施例还提供了一种计算机可读存储介质，用于存储计算机指令，所述计算机指令被处理器执行时，完成实施例一所述的方法。This embodiment further provides a computer-readable storage medium for storing computer instructions. When the computer instructions are executed by a processor, the method described in the first embodiment is completed.

以上所述仅为本发明的优选实施例而已，并不用于限制本发明，对于本领域的技术人员来说，本发明可以有各种更改和变化。凡在本发明的精神和原则之内，所作的任何修改、等同替换、改进等，均应包含在本发明的保护范围之内。The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention. For those skilled in the art, the present invention may have various modifications and variations. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention shall be included in the protection scope of the present invention.

Claims

1. The iterative interactive reference type stereo image super-resolution reconstruction method is characterized by comprising the following steps of:

acquiring two images to be reconstructed; the two images to be reconstructed comprise: a low resolution left view stereoscopic image and a low resolution right view stereoscopic image;

inputting two images to be reconstructed into a trained image reconstruction model to obtain a high-resolution left view stereoscopic image and a high-resolution right view stereoscopic image;

The method comprises the steps that a trained image reconstruction model adopts a plurality of information perception modules to extract features in views of two images to be reconstructed; the model also simulates the dependency relationship between views based on a pixel matching module and a tile matching module; the pixel matching module forms internal intersection and internal iteration; the block matching module generates a matching dictionary; projecting the matched dictionary into a high-resolution space, and performing high-resolution view information interaction by using feature graphs of two views as cross references; and the model also re-weights the characteristic diagram of the view through a supervision side output modulator to realize block level matching.

2. The iterative cross-reference stereoscopic image super-resolution reconstruction method of claim 1, wherein the information sensing module is configured to:

Assume that the input features of the perceptual information extractor are The number of channels is C; the perception information extractor firstly uses 1×1 point-by-point convolution and 3×3 depth separable convolution to extract local features from the layer normalized features XThe formula is:

Conv _PW (·) represents point-by-point convolution, namely convolution with a convolution kernel of 1×1, conv _DW (·) is convolution with separable depth, each channel independently carries out convolution operation of convolution operation, and LN (·) is layer normalization;

Then, the gating unit is divided by using the first channel Dividing into n slices along a channel dimension;

The characteristics among channels are associated in pairs to form n/2 groups, and a nonlinear gating mechanism is adopted to integrate information:

Wherein, Indicating the pass through operation of the (n-1) th and n-th channelsAndThe (n/2) th output obtained,Is GELU the activation function, as by element product; all groups are combinedChannel feature merging and reordering to obtain redistributed featuresThen, capturing long-term dependence and global spatial distribution respectively by using the large-core attention layer and the channel attention layer; the refined feed-forward network controls the information flow through channel division gating.

3. The iterative cross-reference stereoscopic image super-resolution reconstruction method of claim 1, wherein the pixel matching module is configured to:

Assume that AndThe low-resolution left view angle and the low-resolution right view angle three-dimensional pairing features extracted by the ith information perception module are respectively regarded as reference features of another view;

Firstly, carrying out layer normalization processing on three-dimensional pairing features, and then respectively sending the three-dimensional pairing features into point-by-point convolution, wherein the output of the current view is used as a query and the features of the other view are used as key values; for the reconstruction of the left view, there is the following representation:

Wherein Conv _PW (s)) represents a point-by-point convolution, LN (s)) represents layer normalization, Q _L represents a generated left view query, K _R represents a generated right view key, and V _R represents the generation of the point-by-point convolution A value; for each pixel in the right view, calculate its similarity score to all possible pixels in the left view, generating a cross attention map a _R→L:

wherein Softmax is the normalized activation function, For matrix multiplication, (K _R)^T is the transposition of the key value of the right view; after the score is calculated by matrix multiplication with the characteristic value V _R of the right view, the characteristic of the right side is updated by the supplementary information of the left view

Symmetrically, for the left-hand featureA _R→L need only be transposed to cross-attention to FIG. A _L→R, thereby generating the left view to right view supplemental features

Wherein (A _R→L)^T is the transpose of the right cross-attention A _L→R,V_L represents the point-wise convolution generationValues.

4. The iterative cross-referenced stereoscopic image super-resolution reconstruction method of claim 1, wherein the tile matching module is configured to: unfolding the left view feature and the right view feature into feature blocks;

Inquiring the similarity score and the block position of the current view based on the matching dictionary, and transferring the reference feature SR to the current view; thus, the first iteration from left view to right view is expressed as:

Wherein, Representing a projection matrix learned by sequential 1 x 1 point convolution and 3 x 3 depth convolution,Is the left-view high-resolution feature after projection, gamma _R is a trainable channel scaling parameter,Is based onFor a pair ofA query operation of the rearrangement is performed such that,For the similarity score obtained by the query,For right view upsampling to high resolution features,The right view characteristics after optimization;

in the second iteration from right view to left view, the guiding relationship is reversed, utilizing the improved right view features As a reference; according to formulas (9) and (10), the characteristics of the original left view are obtainedOptimization features with greater information content

5. The iterative cross-referenced stereoscopic image super-resolution reconstruction method of claim 4, wherein the dictionary construction process comprises:

for low resolution input images Downsampling it first to a lower resolutionThen for the followingEach tile in (1)Computing a similarity score graph

Where i, j denote that the center of the tile is located in the ith row and the jth column, l, r correspond to the left and right views, respectively, norm denotes the normalization operation,AndTile sets representing left and right views, respectively; then, according to the constructed similarity score graphSearching for its left low resolution image along epipolar lineThe values and positions of the elements in the left dictionary are then:

Wherein, AndRepresenting a tileIs used to determine the maximum similarity score and location of (1),Indicating that the maximum value is to be found,Representing the position of the maximum value; transpose the matched dictionary of the left view to construct another dictionary of the right view usingAndAnd (3) representing.

6. The iterative cross-reference stereoscopic image super-resolution reconstruction method of claim 1, wherein the supervisory side output modulator is configured to:

Upsampled left-hand feature Firstly, mapping to an image layer through channel point convolution;

Then, the left side feature With bicubic interpolated imagesAdding to restore an initial super-resolution coarse image

Wherein,Indicating that bicubic interpolation upsampling is performed,Conv _1×1 is a1×1 convolution for the left low resolution image; re-mapping it to feature space by 3 x 3 convolution, performing residual nonlinear gate operation on the obtained features, and generatingFeatures (e.g. a character)Is used as a reference feature in tile matching to facilitate the generation of right view features;

wherein Sigmoid is the activation function, by pixel-level dot product, conv _3×3 is the 3 x 3 convolution;

For prediction According to and high resolution truth imageProviding explicit optimization supervision information for feature structure similarity and pixel distance;

Will be AndProjected into the same feature space and used as a screening criterion with a structural similarity score:

The top K channels with highest similarity scores are then kept:

Wherein, Representing high resolution features based on truth valuesAnd (3) withTOP-K selection of structural similarity scores between; restoring new images using point-wise convolution and residual connection

7. The iterative interactive reference type stereo image super-resolution reconstruction system is characterized by comprising the following components:

An acquisition module configured to: acquiring two images to be reconstructed; the two images to be reconstructed comprise: a low resolution left view stereoscopic image and a low resolution right view stereoscopic image;

A reconstruction module configured to: inputting two images to be reconstructed into a trained image reconstruction model to obtain a high-resolution left view stereoscopic image and a high-resolution right view stereoscopic image;

8. An electronic device, comprising:

A memory for non-transitory storage of computer readable instructions; and

A processor for executing the computer-readable instructions,

Wherein the computer readable instructions, when executed by the processor, perform the method of any of the preceding claims 1-6.

9. A storage medium, characterized by non-transitory storage of computer readable instructions, wherein the instructions of the method of any of claims 1-6 are performed when the non-transitory computer readable instructions are executed by a computer.

10. A computer program product comprising a computer program for implementing the method of any of the preceding claims 1-6 when run on one or more processors.