CN110136062A

CN110136062A - A Super-resolution Reconstruction Method for Joint Semantic Segmentation

Info

Publication number: CN110136062A
Application number: CN201910389111.9A
Authority: CN
Inventors: 向炟; 陈军; 杨玉红
Original assignee: Wuhan University WHU
Current assignee: Wuhan University WHU
Priority date: 2019-05-10
Filing date: 2019-05-10
Publication date: 2019-08-16
Anticipated expiration: 2039-05-10
Also published as: CN110136062B

Abstract

The present invention proposes an image super-resolution reconstruction method combined with semantic segmentation, which uses the intermediate results and final results of low-quality images generated during semantic segmentation to perform super-resolution reconstruction, and performs large-scale super-resolution reconstruction time can have a more realistic effect. Since the high-level semantic information of the image is the inherent information of the image, it contains a large number of category priors at the pixel level, so it can be used as the constraint information in the super-resolution reconstruction process to improve the quality of its reconstruction results. The present invention combines image super-resolution reconstruction, a low-level problem of computer vision, with image semantic segmentation as a high-level problem, and uses various information generated after the image is semantically segmented to constrain and coordinate the process of super-resolution reconstruction. Enhancement solves the problem of lack of authenticity in the reconstruction of low-resolution images under the condition of large zoom factors, and has a higher improvement in subjective quality evaluation.

Description

A Super-resolution Reconstruction Method for Joint Semantic Segmentation

技术领域technical field

本发明涉及图像处理技术领域，具体涉及一种利用语义分割进行图像超分辨率重建的方法。The invention relates to the technical field of image processing, in particular to a method for image super-resolution reconstruction using semantic segmentation.

背景技术Background technique

图像超分辨率重建指使用各种技术手段将低分辨率图像转换为高分辨率图像，恢复出更多的高频信息，使得图像拥有更加清晰的纹理和细节。图像的超分辨率重建方法自首次提出以来已经历半个世纪的发展，众多的图像超分辨率重建方法根据其原理不同可大致分为三类：基于插值的方法、基于重建的方法和基于学习的方法。Image super-resolution reconstruction refers to the use of various technical means to convert low-resolution images into high-resolution images, recover more high-frequency information, and make the image have clearer texture and details. The image super-resolution reconstruction method has been developed for half a century since it was first proposed. Many image super-resolution reconstruction methods can be roughly divided into three categories according to their principles: interpolation-based methods, reconstruction-based methods, and learning-based methods. Methods.

基于插值的方法将超分辨率重建问题与图像插值问题联系在一起，是超分辨率重建中最直接的方法。常见的插值方法有最近邻插值方法、双线性插值方法、双三次插值方法等。其核心思路是将目标图像中的点按照缩放关系寻找源图像中与其相关的点，然后通过源图像中相关点的像素值经过插值计算得到目标点的像素值。基于插值的方法，优点是非常简单直观,且运行速度很快，缺点是适应性相对较差，不易加入图像的先验信息，容易引入额外噪声，使重建出来的图像缺失细节，产生模糊、锯齿等现象。Interpolation-based methods link the super-resolution reconstruction problem with the image interpolation problem and are the most straightforward methods in super-resolution reconstruction. Common interpolation methods include nearest neighbor interpolation method, bilinear interpolation method, bicubic interpolation method, etc. The core idea is to search the points in the target image for their related points in the source image according to the scaling relationship, and then calculate the pixel value of the target point by interpolating the pixel values of the relevant points in the source image. Based on the interpolation method, the advantage is that it is very simple and intuitive, and the running speed is fast. The disadvantage is that the adaptability is relatively poor, it is not easy to add the prior information of the image, and it is easy to introduce additional noise, so that the reconstructed image lacks details and produces blur and jagged. And so on.

基于重建的方法得到了最广泛的关注和研究，此种方法假定高分辨率图像是经过了适当的运动变换、模糊及噪声才得到低分辨率图像，将超分辨率重建问题转化为对一个约束条件下的代价函数的优化问题。其核心思路是从图像的退化模型出发，利用正则化等方法提取低分辨率图像中的关键信息，并结合对未知的超分辨率图像的先验知识来约束超分辨率图像的生成。此类方法在重建时只需要一些局部性的先验假设，在一定程度上缓解了插值方法所产生的模糊或者锯齿效果，但当放大倍数过大时，退化模型不能很好地提供重建所需要的先验知识，导致重建结果缺乏高频信息。The reconstruction-based method has received the most extensive attention and research. This method assumes that the high-resolution image is obtained through appropriate motion transformation, blur and noise to obtain the low-resolution image, and transforms the super-resolution reconstruction problem into a constraint The optimization problem of the cost function under the condition. The core idea is to start from the degradation model of the image, use regularization and other methods to extract the key information in the low-resolution image, and combine the prior knowledge of the unknown super-resolution image to constrain the generation of the super-resolution image. This type of method only needs some local a priori assumptions during reconstruction, which alleviates the blurring or jagged effect produced by the interpolation method to a certain extent, but when the magnification is too large, the degradation model cannot well provide the reconstruction required. prior knowledge, resulting in the lack of high-frequency information in the reconstruction results.

基于学习的方法是近年来超分辨率算法研究的热点方向。其基本思路是通过对一组同时包括高分辨率图像和低分辨率图像的训练集合的训练，学习到一个联合的系统模型，利用学习到的模型对相似的低分辨率图像进行超分辨率重建，达到提高图像分辨率的目的。基于学习的方法充分利用了图像的先验知识，能恢复出低分辨率图像中更多的高频信息，获得比其余两种方法更好的重建结果。而所有基于学习的方法之中，基于深度学习的超分辨率重建方法在近几年中取得了优秀的成绩。Learning-based methods are a hot research direction of super-resolution algorithms in recent years. The basic idea is to learn a joint system model by training a set of training sets that include both high-resolution images and low-resolution images, and use the learned model to perform super-resolution reconstruction on similar low-resolution images. , to improve the image resolution. The learning-based method makes full use of the prior knowledge of the image, can recover more high-frequency information in the low-resolution image, and obtain better reconstruction results than the other two methods. Among all learning-based methods, the super-resolution reconstruction method based on deep learning has achieved excellent results in recent years.

尽管如今的单幅图像超分辨率重建技术依靠深度学习在精度和速度上都取得了不小的突破，但在处理较为复杂的低分辨率图像时，其效果会有所下降。例如：当处理的低分辨率图像中含有许多物体且物体之间有较大部分的重叠和遮挡时，现有的方法不能很好地划分重叠和遮挡物体间的界线，导致其重建结果的纹理细节不足，甚至会将多个重叠的物体重建为一个。Although today's single-image super-resolution reconstruction technology relies on deep learning to achieve great breakthroughs in accuracy and speed, its effect will decline when dealing with more complex low-resolution images. For example: when the processed low-resolution image contains many objects and there is a large part of overlap and occlusion between the objects, the existing methods cannot well divide the boundary between the overlapping and occlusion objects, resulting in the texture of the reconstruction result Insufficient detail, even multiple overlapping objects are reconstructed as one.

发明内容Contents of the invention

为了解决上述中出现的问题，本发明提出一种全新的联合了语义分割的超分辨率重建方法。语义分割是计算机视觉中的基本任务之一，其目的将视觉输入分为不同的语义可解释类别，对一幅图像来说就是将图像中的像素划分为不同类别。基于语义分割将像素分类的特性，联合了语义分割的超分辨率重建方法能较好地处理带有多个重叠和遮挡物体的低分辨率图像。In order to solve the above-mentioned problems, the present invention proposes a new super-resolution reconstruction method combined with semantic segmentation. Semantic segmentation is one of the basic tasks in computer vision. Its purpose is to divide the visual input into different semantically interpretable categories. For an image, it is to divide the pixels in the image into different categories. Based on the characteristics of semantic segmentation to classify pixels, the super-resolution reconstruction method combined with semantic segmentation can better deal with low-resolution images with multiple overlapping and occluding objects.

本发明针对现有技术的不足，提供了一种对低分辨率图像进行超分辨率重建的方法，包含以下步骤：Aiming at the deficiencies in the prior art, the present invention provides a method for super-resolution reconstruction of low-resolution images, comprising the following steps:

步骤1，构建低分辨率的语义分割数据集，所述低分辨率的语义分割数据集包括低分辨率图像和对应的语义布局图；Step 1, constructing a low-resolution semantic segmentation data set, the low-resolution semantic segmentation data set includes a low-resolution image and a corresponding semantic layout;

步骤2，使用低分辨率的语义分割数据集训练语义分割网络，；Step 2, use the low-resolution semantic segmentation dataset to train the semantic segmentation network;

步骤3，构建用于训练超分辨率重建网络的数据集，所述用于训练超分辨率重建网络的数据集包括低分辨率图像的语义布局图、语义特征图和对应的高分辨率图像，其中低分辨率图像的语义布局图、语义特征图通过将低分辨率图像输入到步骤2中训练好的语义分割网络中获得；Step 3, constructing a data set for training a super-resolution reconstruction network, the data set for training a super-resolution reconstruction network includes a semantic layout map of a low-resolution image, a semantic feature map and a corresponding high-resolution image, The semantic layout map and semantic feature map of the low-resolution image are obtained by inputting the low-resolution image into the semantic segmentation network trained in step 2;

步骤4，将语义布局图和语义特征图作为输入，语义布局图对应的高分辨率图像作为真实值，训练超分辨率重建网络，使其能根据输入的语义布局图输出对应的高分辨率重建结果；Step 4, take the semantic layout map and semantic feature map as input, and the high-resolution image corresponding to the semantic layout map as the real value, train the super-resolution reconstruction network so that it can output the corresponding high-resolution reconstruction according to the input semantic layout map result;

步骤5，将一张待重建的低分辨率图片输入到步骤2中得到的语义分割结果，得到其语义布局图和语义特征图，然后将其输入至步骤4中训练得到的超分辨率重建网络，最终得到重建后的高分辨率图像。Step 5, input a low-resolution image to be reconstructed into the semantic segmentation result obtained in step 2, obtain its semantic layout map and semantic feature map, and then input it into the super-resolution reconstruction network trained in step 4 , and finally get the reconstructed high-resolution image.

进一步的，步骤1所述低分辨率的语义分割数据集是将通常的语义分割数据集中的高分辨率图像和语义布局图进行相同缩放因子的下采样，得到的低分辨率图像和语义布局图构成了低分辨率的语义分割数据集。Further, the low-resolution semantic segmentation dataset described in step 1 is the low-resolution image and semantic layout diagram obtained by downsampling the high-resolution image and semantic layout diagram in the usual semantic segmentation dataset with the same scaling factor Constitutes a low-resolution semantic segmentation dataset.

进一步的，步骤2中的语义分割网络为全卷积网络，该全卷积网络是将VGG16中的全连接层改为卷积层后所得，具体网络结构为：卷积层×2+池化层+卷积层×2+池化层+卷积层×3+池化层+卷积层×3+池化层+卷积层×3+池化层+卷积层×2+反卷积层，其中卷积层的卷积核大小为3×3，池化层采用的是最大池化。Further, the semantic segmentation network in step 2 is a fully convolutional network, which is obtained by changing the fully connected layer in VGG16 to a convolutional layer. The specific network structure is: convolutional layer×2+pooling layer + convolutional layer × 2 + pooling layer + convolutional layer × 3 + pooling layer + convolutional layer × 3 + pooling layer + convolutional layer × 3 + pooling layer + convolutional layer × 2 + deconvolution The product layer, where the convolution kernel size of the convolutional layer is 3×3, and the pooling layer uses the maximum pooling.

进一步的，全卷积网络的权值初始化为经过预训练的VGG16中的权值；训练中所优化的损失函数为网络最后一层的像素预测值的偏差之和；训练的具体参数为：训练的批大小为20，采用动量为0.9、衰变率为10^-4的Adam算法进行优化，网络的学习率为10^-4。Further, the weights of the fully convolutional network are initialized to the weights in the pre-trained VGG16; the loss function optimized in the training is the sum of the deviations of the pixel prediction values of the last layer of the network; the specific parameters of the training are: training The batch size of is 20, and the Adam algorithm with the momentum of 0.9 and the decay rate of 10 ^-4 is used for optimization, and the learning rate of the network is 10 ^-4 .

进一步的，步骤4所述超分辨率重建网络是由一系列级联的重建模块所组成的级联重建网络，级联的重建模块以递增的分辨率运行，其中每个重建模块由3个网络层组成：第一层为特征融合层，作用为将输入的语义布局图和语义特征图与前一层的输出结果融合；后面两层为带有3×3卷积核、层正则化和修正线性单元的卷积层，作为是对融合后的特征进行重建。Further, the super-resolution reconstruction network described in step 4 is a cascaded reconstruction network composed of a series of cascaded reconstruction modules, and the cascaded reconstruction modules operate at increasing resolutions, wherein each reconstruction module consists of 3 network Layer composition: the first layer is a feature fusion layer, which is used to fuse the input semantic layout map and semantic feature map with the output result of the previous layer; the next two layers are with 3×3 convolution kernel, layer regularization and correction The convolutional layer of the linear unit is used to reconstruct the fused features.

进一步的，超分辨率重建网络中的重建模块之间的具体运行关系如下，Further, the specific operational relationship between the reconstruction modules in the super-resolution reconstruction network is as follows,

第一个重建模块将下采样到当前分辨率的语义布局图和语义特征图作为输入，输出一个当前分辨率的结果，此结果看作是经过合并和卷积后的特征图，后面的重建模块将前一个模块的结果和下采样后的语义布局图以及语义特征图一起作为输入，输出一个新的结果，经过多次这样的过程，最终的重建模块输出的结果即为超分辨率重建的结果，此过程的数学描述如下：The first reconstruction module takes the semantic layout map and semantic feature map downsampled to the current resolution as input, and outputs a result of the current resolution, which is regarded as the feature map after merging and convolution, and the subsequent reconstruction module The result of the previous module and the downsampled semantic layout map and semantic feature map are used as input to output a new result. After several such processes, the final output of the reconstruction module is the result of super-resolution reconstruction. , the mathematical description of this process is as follows:

其中，O_i表示第i个重建模块的输出，f表示重建模块中的卷积等操作，L表示语义布局图，F表示语义特征图，表示特征融合。Among them, O _i represents the output of the i-th reconstruction module, f represents the convolution and other operations in the reconstruction module, L represents the semantic layout map, F represents the semantic feature map, Indicates feature fusion.

进一步的，步骤4中训练超分辨率重建网络时使用的损失函数为，Further, the loss function used when training the super-resolution reconstruction network in step 4 is,

其中，I为代表真实值的高分辨率图像，f为待训练的级联重建网络，θ为f中的参数集合，L为输入的语义布局图，Φ为经过训练的视觉感知网络，该视觉感知网络为VGG网络，Φ_l表示视觉感知网络中的卷积层，λ_l为控制权重的超参数，在训练过程中其值随着训练进程进行调整。Among them, I is the high-resolution image representing the real value, f is the cascade reconstruction network to be trained, θ is the parameter set in f, L is the input semantic layout map, Φ is the trained visual perception network, the visual The perception network is a VGG network, Φ _l represents the convolutional layer in the visual perception network, and λ _l is a hyperparameter that controls the weight, and its value is adjusted during the training process.

进一步的，训练超分辨率重建网络时，具体设置为：整体的迭代次数为200代；模型的学习率为10^-4，并且每经过100代训练，学习率降为一半；采用动量为0.9、衰变率为10^-4的Adam算法进行优化。Further, when training the super-resolution reconstruction network, the specific settings are: the overall number of iterations is 200 generations; the learning rate of the model is 10 ^-4 , and every 100 generations of training, the learning rate is reduced to half; the momentum is 0.9, The Adam algorithm with a decay rate of 10 ^-4 is optimized.

与现有技术相比，本发明具有以下优点和积极效果：Compared with the prior art, the present invention has the following advantages and positive effects:

由于图像的高层级语义信息作为图像的固有信息，在像素级层面上含有大量的类别先验，因此可以作为超分辨率重建过程中的约束信息提升其重建结果的质量。本发明将图像超分辨率重建这一计算机视觉的低层级问题与作为高层级问题的图像语义分割结合起来，利用图像经过语义分割之后产生的各种信息，对超分辨率重建的过程进行约束和增强，解决了在大缩放因子的条件下低分辨率图像的重建缺乏真实性的问题，在主观质量评价上有较高的提升。Since the high-level semantic information of the image is the inherent information of the image, it contains a large number of category priors at the pixel level, so it can be used as the constraint information in the super-resolution reconstruction process to improve the quality of its reconstruction results. The present invention combines image super-resolution reconstruction, a low-level problem of computer vision, with image semantic segmentation as a high-level problem, and uses various information generated after the image is semantically segmented to constrain and coordinate the process of super-resolution reconstruction. Enhancement solves the problem of lack of authenticity in the reconstruction of low-resolution images under the condition of large zoom factors, and has a higher improvement in subjective quality evaluation.

附图说明Description of drawings

图1为本发明实施例中全卷积网络的网络结构图。FIG. 1 is a network structure diagram of a fully convolutional network in an embodiment of the present invention.

图2为本发明实施例中级联重建网络的模块结构图。FIG. 2 is a block diagram of a cascaded reconstruction network in an embodiment of the present invention.

图3为本发明的整体流程图。Fig. 3 is the overall flow chart of the present invention.

图4为本发明与对比方法的视觉效果对比图，其中(a)为双三次插值(Bicubic)，(b)为SRCNN，(c)为SRDenNet，(d)为SRGAN，(e)为本发明。Fig. 4 is a comparison diagram of visual effects between the present invention and the comparison method, wherein (a) is Bicubic, (b) is SRCNN, (c) is SRDenNet, (d) is SRGAN, (e) is the present invention .

具体实施方式Detailed ways

为了便于本领域普通技术人员理解和实施本发明，下面结合附图及实施例对本发明作进一步的详细描述，应当理解，此处所描述的实施示例仅用于说明和解释本发明，并不用于限定本发明。In order to facilitate those of ordinary skill in the art to understand and implement the present invention, the present invention will be described in further detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the implementation examples described here are only used to illustrate and explain the present invention, and are not intended to limit this invention.

本发明将图像语义分割和图像超分辨率重建两种计算机视觉任务的特点结合了起来，将图像经过语义分割产生的特征作为超分辨率重建的先验信息，提出了一种联合语义分割的图像超分辨率重建方法。此方法阐述的总体流程如图3所示，此方法能够用计算机软件技术实现，实施例以网络的训练为主要内容对本发明的流程进行一个具体的阐述，如下：The present invention combines the characteristics of two computer vision tasks, image semantic segmentation and image super-resolution reconstruction, uses the features generated by image semantic segmentation as prior information for super-resolution reconstruction, and proposes a joint semantic segmentation image Super-resolution reconstruction methods. The overall flow process that this method explains is as shown in Figure 3, and this method can be realized with computer software technology, and embodiment is carried out a specific setting forth to the flow process of the present invention with the training of network as main content, as follows:

步骤1，构建低分辨率的语义分割数据集，所述低分辨率的语义分割数据集包括低分辨率图片和对应的语义布局图。通常的语义分割数据集包含高分辨率的图片和其对应的语义布局图，将高分辨率的图片和对应的语义布局图统一降采样得到低分辨率的语义分割数据集。Step 1, constructing a low-resolution semantic segmentation dataset, the low-resolution semantic segmentation dataset includes low-resolution pictures and corresponding semantic layout maps. The usual semantic segmentation datasets contain high-resolution images and their corresponding semantic layouts, and the high-resolution images and corresponding semantic layouts are uniformly down-sampled to obtain low-resolution semantic segmentation datasets.

具体实施中，使用图像处理软件读入全部的高分辨率的图片和其对应的语义布局图，将其尺寸统一，随后使用双三次插值对全部高分辨率图片进行缩放因子为4的降采样。再使用同样的方法将对应的语义布局图降采样至相同分辨率。如此，便得到了由低分辨率的图像与对应的语义布局图构成的低分辨率的语义分割数据集。In the specific implementation, image processing software is used to read in all high-resolution pictures and their corresponding semantic layout diagrams, unify their sizes, and then use bicubic interpolation to down-sample all high-resolution pictures with a scaling factor of 4. Then use the same method to downsample the corresponding semantic layout map to the same resolution. In this way, a low-resolution semantic segmentation dataset consisting of low-resolution images and corresponding semantic layout maps is obtained.

步骤2，使用低分辨率的语义分割数据集训练语义分割网络。通常的语义分割网络所处理的对象为高分辨率图片，使用步骤1中所得低分辨率的数据集训练语义分割网络使其在输入低分辨率图像时能输出对应的准确的语义布局图；Step 2, use the low-resolution semantic segmentation dataset to train the semantic segmentation network. The object processed by the usual semantic segmentation network is a high-resolution image, and the low-resolution data set obtained in step 1 is used to train the semantic segmentation network so that it can output the corresponding accurate semantic layout when inputting a low-resolution image;

在本实施例中，语义分割网络以全卷积网络(FCN，Fully ConvolutionalNetworks)为例来进行说明。全卷积网络是一种不含全连接层的卷积神经网络，能为图片中的每个像素预测分类从而得到语义分割结果，其网络结构如图1所示。特别地，本实施例中的全卷积网络改进自VGG16分类网络，是将VGG16中的全连接层改为卷积层后所得。在全卷积网络中，记x_ij为网络某层(i，j)位置的数据向量，y_ij为下一网络层的(i，j)位置的数据向量，y_ij由x_ij经以下公式可以得出：In this embodiment, the semantic segmentation network is described by taking a fully convolutional network (FCN, Fully Convolutional Networks) as an example. The fully convolutional network is a convolutional neural network without a fully connected layer, which can predict and classify each pixel in the image to obtain semantic segmentation results. Its network structure is shown in Figure 1. In particular, the fully convolutional network in this embodiment is improved from the VGG16 classification network, which is obtained by changing the fully connected layer in VGG16 to a convolutional layer. In the fully convolutional network, record x _ij as the data vector at the position (i, j) of a certain layer of the network, y _ij is the data vector at the position (i, j) of the next network layer, and y _ij is determined by x _ij through the following formula It can be concluded that:

y_ij＝f_ks({xsi+δi,sj+δj}0≤δi,δj≤k)y _ij =f _ks ({xsi+δi,sj+δj}0≤δi,δj≤k)

其中，k表示卷积核的大小，s表示卷积核的步长或者下采样因子，si，sj表示原网络层(i，j)位置的数据向量经过卷积或池化操作后位置坐标发生了与s有关的变化，δi，δj表示在卷积或者池化过程中产生的空间位移，通常由补零操作引起。f_ks决定了网络层的类型，它可能是用于卷积或者池化的矩阵乘法，也有可能是用于最大池化的空间最大化，或者是激活函数的元素非线性映射。对全卷积网络而言，其中的每一个网络层实现的功能都能用以上公式概括。Among them, k represents the size of the convolution kernel, s represents the step size or downsampling factor of the convolution kernel, si, sj represent the position coordinates of the data vector at the position of the original network layer (i, j) after convolution or pooling operations Changes related to s, δi, δj represent the spatial displacement generated during convolution or pooling, usually caused by zero padding operations. f _ks determines the type of network layer, which may be matrix multiplication for convolution or pooling, or space maximization for maximum pooling, or element-wise non-linear mapping of activation functions. For a fully convolutional network, the functions implemented by each network layer can be summarized by the above formula.

训练全卷积网络的具体实施方案如下：The specific implementation of training the full convolutional network is as follows:

1.构建网络。本实施例中全卷积网络主体由VGG16组成，其网络结构为：卷积层×2+池化层+卷积层×2+池化层+卷积层×3+池化层+卷积层×3+池化层+卷积层×3+池化层+卷积层×2+反卷积层。其中卷积层的卷积核大小为3×3，池化层采用的是最大池化，随着卷积层的深入，数据的尺寸变小而通道变多。1. Build your network. In this embodiment, the main body of the fully convolutional network is composed of VGG16, and its network structure is: convolutional layer×2+pooling layer+convolutional layer×2+pooling layer+convolutional layer×3+pooling layer+convolution Layer×3+pooling layer+convolution layer×3+pooling layer+convolution layer×2+deconvolution layer. The convolution kernel size of the convolutional layer is 3×3, and the pooling layer uses the maximum pooling. As the convolutional layer deepens, the size of the data becomes smaller and the number of channels increases.

2.初始化网络中的权值。与通常情况下随机初始化不同，本实施例中的权值初始化为经过预训练的VGG16中的权值。2. Initialize the weights in the network. Different from random initialization in general, the weights in this embodiment are initialized to the weights in the pre-trained VGG16.

3.训练网络。训练中所优化的损失函数为网络最后一层的像素预测值的偏差之和。本实施例中，训练的具体参数为：训练的批大小为20，采用动量为0.9、衰变率为10^-4的Adam算法进行优化，网络的学习率为10^-4。3. Train the network. The loss function optimized during training is the sum of the deviations of the pixel prediction values of the last layer of the network. In this embodiment, the specific parameters of the training are: the training batch size is 20, the Adam algorithm with the momentum of 0.9 and the decay rate of 10 ⁻⁴ is used for optimization, and the learning rate of the network is 10 ⁻⁴ .

步骤3，构建用于训练超分辨率重建网络的数据集，所述用于训练超分辨率重建网络的数据集包括低分辨率图片的语义布局图和对应的高分辨率图片。将低分辨率图片输入步骤2中所得语义分割网络，得到其语义分割结果—语义布局图。此外，还能得到语义分割的过程中产生的中间结果，语义特征图。语义布局图和相应的特征图以及对应的高分辨率照片将组成一个新的数据集去训练超分辨率重建网络。将图像输入语义分割网络后，网络最终的输出结果即为语义布局图，而语义特征图则需要从语义分割网络的不同网络层中去提取。本实施例中，所选择的语义特征图为全卷积网络中池化层之前的卷积层中的特征。Step 3, constructing a data set for training a super-resolution reconstruction network, the data set for training a super-resolution reconstruction network includes a semantic layout map of a low-resolution picture and a corresponding high-resolution picture. Input the low-resolution image into the semantic segmentation network obtained in step 2, and obtain its semantic segmentation result—semantic layout map. In addition, the intermediate result generated in the process of semantic segmentation, the semantic feature map, can also be obtained. The semantic layout map and the corresponding feature map and the corresponding high-resolution photos will form a new dataset to train the super-resolution reconstruction network. After the image is input into the semantic segmentation network, the final output of the network is the semantic layout map, and the semantic feature map needs to be extracted from different network layers of the semantic segmentation network. In this embodiment, the selected semantic feature map is the feature in the convolutional layer before the pooling layer in the full convolutional network.

步骤4，将语义布局图和语义特征图作为输入，语义布局图对应的高分辨率图像作为真实值，训练一个超分辨率重建网络，使其能根据输入的语义布局图输出对应的高分辨率重建结果；Step 4, take the semantic layout map and semantic feature map as input, and the high-resolution image corresponding to the semantic layout map as the real value, train a super-resolution reconstruction network so that it can output the corresponding high-resolution image according to the input semantic layout map reconstruction results;

本实施例中，选择作为超分辨率重建网络的是由一系列级联的重建模块所组成的级联重建网络，其结构如图2所示。每个重建模块以不同的分辨率运行，第一个模块的分辨率被设置为8×16，后面的模块分辨率依次加倍，经过5个重建模块后，最终的输出分辨率为256×512。第一个重建模块将下采样到当前分辨率的语义布局图和特征图作为输入，输出一个当前分辨率的结果，此结果可看作是经过合并和卷积后的特征图。后面的重建模块会将前一个模块的结果和下采样后的语义布局图以及特征图一起作为输入，输出一个新的结果。经过多次这样的过程，最终的重建模块输出的结果即为超分辨率重建的结果。此过程的数学描述如下：In this embodiment, what is selected as the super-resolution reconstruction network is a cascaded reconstruction network composed of a series of cascaded reconstruction modules, the structure of which is shown in FIG. 2 . Each reconstruction module operates at a different resolution. The resolution of the first module is set to 8×16, and the resolution of subsequent modules is doubled in turn. After 5 reconstruction modules, the final output resolution is 256×512. The first reconstruction module takes the semantic layout map and feature map downsampled to the current resolution as input, and outputs a result at the current resolution, which can be regarded as a merged and convoluted feature map. The subsequent reconstruction module will take the result of the previous module together with the downsampled semantic layout map and feature map as input, and output a new result. After several such processes, the final output of the reconstruction module is the result of super-resolution reconstruction. The mathematical description of this process is as follows:

每个重建模块由3个网络层组成：第一层为特征融合层，作用为将输入的语义布局图和语义特征图与前一层的输出结果融合；后面两层为带有3×3卷积核、层正则化和修正线性单元的卷积层，作为是对融合后的特征进行重建。除了最后的重建模块外，每个重建模块的结构都一样，但每个模块重建的侧重点却不一样，因为输入的特征图含有不同层次的信息。Each reconstruction module consists of three network layers: the first layer is a feature fusion layer, which is used to fuse the input semantic layout map and semantic feature map with the output result of the previous layer; Convolutional layers with kernels, layer regularization, and rectified linear units are used to reconstruct the fused features. Except for the last reconstruction module, the structure of each reconstruction module is the same, but the focus of reconstruction of each module is different, because the input feature map contains different levels of information.

级联重建网络将语义布局图作为框架，利用特征图中包含的各种信息来重建图像的细节，因此在训练时使用的损失函数和一般的超分辨率重建方法也有区别，不同于常规的均方误差损失函数直接将重建结果与原始高清图像逐像素进行对比，级联重建网络使用了被称为感知损失的损失函数，目的是对比重建结果与真实值两者在视觉感知网络中的特征差异，其定义为：The cascade reconstruction network uses the semantic layout map as a framework, and uses various information contained in the feature map to reconstruct the details of the image. Therefore, the loss function used in training is also different from the general super-resolution reconstruction method, which is different from the conventional average The square error loss function directly compares the reconstruction result with the original high-definition image pixel by pixel. The cascaded reconstruction network uses a loss function called perceptual loss. The purpose is to compare the feature difference between the reconstruction result and the real value in the visual perception network. , which is defined as:

其中，I为代表真实值的高分辨率图像，f为待训练的级联重建网络，θ为f中的参数集合，L为输入的语义布局图，λ_l为控制权重的超参数，在训练过程中其值随着训练进程会有所调整，Φ为经过训练的视觉感知网络，Φ_l表示视觉感知网络中的卷积层，视觉感知网络是一个经过大量数据训练好的图像分类网络，具有将输入图像中的物体正确分类的能力，通常使用的是已经公开发布的VGG系列网络，在其官网上可以找到。经过感知损失函数的训练，级联重建网络能重建出更真实的重建结果。Among them, I is the high-resolution image representing the real value, f is the cascaded reconstruction network to be trained, θ is the parameter set in f, L is the input semantic layout map, λ _l is the hyperparameter to control the weight, in the training In the process, its value will be adjusted with the training process. Φ is the trained visual perception network, and Φ _l represents the convolutional layer in the visual perception network. The visual perception network is an image classification network that has been trained with a large amount of data. The ability to correctly classify objects in the input image, usually using the publicly released VGG series network, can be found on its official website. After training with a perceptual loss function, the cascaded reconstruction network can reconstruct more realistic reconstruction results.

训练超分辨率重建网络的具体实施方案如下：The specific implementation scheme of training the super-resolution reconstruction network is as follows:

1.构建网络。本实施例中的超分辨率重建网络由一系列重建模块级联而成，每个级联模块的结构一致。所构建的重建模块由三层网络组成，第一层网络将输入的特征进行融合，后两层网络为卷积层，其卷积核大小为3×3，且带有层正则化和LRELU激活函数。1. Build your network. The super-resolution reconstruction network in this embodiment is formed by cascading a series of reconstruction modules, and each cascade module has the same structure. The constructed reconstruction module consists of a three-layer network. The first layer of the network fuses the input features, and the last two layers of the network are convolutional layers with a convolution kernel size of 3×3 and layer regularization and LRELU activation. function.

2.初始化网络中的权值，随机初始化网络中的权值。2. Initialize the weights in the network and randomly initialize the weights in the network.

3.训练网络。训练中需要优化的函数为感知损失，训练的具体设置为，整体的迭代次数为200代；模型的学习率为10^-4，并且每经过100代训练，学习率降为一半；采用动量为0.9、衰变率为10^-4的Adam算法进行优化。3. Train the network. The function that needs to be optimized in training is perceptual loss. The specific setting of training is that the overall number of iterations is 200 generations; the learning rate of the model is 10 ^-4 , and every 100 generations of training, the learning rate is reduced to half; the momentum is 0.9 , the Adam algorithm with a decay rate of 10 ^-4 is optimized.

步骤5，使用训练所得网络进行超分辨率重建。具体实施方案为：将一张待重建的低分辨率图片输入到步骤2中得到的语义分割网络，得到其语义布局图和语义特征图，然后将其输入至步骤4中训练得到的超分辨率重建网络，最终得到重建后的高分辨率图像。Step 5, use the trained network for super-resolution reconstruction. The specific implementation plan is: input a low-resolution image to be reconstructed into the semantic segmentation network obtained in step 2, obtain its semantic layout map and semantic feature map, and then input it into the super-resolution image obtained in step 4 Reconstruct the network and finally get the reconstructed high-resolution image.

为验证本发明技术效果，使用Cityscapes城市景观数据集进行验证。Cityscapes数据集中有2975张高分辨率图像和对应的精细语义布局图。将2975张图像中的1000张用作训练语义分割网络，其余的1975张图像训练超分辨率重建网络。用作对比的方法有双三次插值(Bicubic)，超分辨率卷积神经网络SRCNN(C.Dong,C.C.Loy,K.He,and X.Tang.Imagesuper-resolution using deep convolutional networks.IEEE Transactions onPattern Analysis and Machine Intelligence,38(2):295–307,2016.)、超分辨率密集神经网络SRDenNet(T.Tong,G.Li,X.Liu,and Q.Gao,“Image super-resolution usingdense skip connections,”in 2017 IEEE International Conference on ComputerVision(ICCV).IEEE,2017,pp.4809–4817.)和超分辨率生成对抗网络SRGAN(C.Ledig,L.Theis,F.Huszar,J.Caballero,A.Cunningham,A.Acosta,A.Aitken,A.Tejani,J.Totz,Z.Wang,et al.Photo-realistic single image super-resolution using a generativeadversarial network.arXiv preprint arXiv:1609.04802,2016.2)。In order to verify the technical effect of the present invention, the Cityscapes urban landscape data set is used for verification. There are 2975 high-resolution images and corresponding fine semantic layout maps in the Cityscapes dataset. 1000 of the 2975 images are used to train the semantic segmentation network, and the remaining 1975 images are used to train the super-resolution reconstruction network. The methods used for comparison are bicubic interpolation (Bicubic), super-resolution convolutional neural network SRCNN (C.Dong, C.C.Loy, K.He, and X.Tang. Image super-resolution using deep convolutional networks.IEEE Transactions onPattern Analysis and Machine Intelligence, 38(2):295–307, 2016.), super-resolution dense neural network SRDenNet (T.Tong, G.Li, X.Liu, and Q.Gao, "Image super-resolution using dense skip connections , "in 2017 IEEE International Conference on Computer Vision (ICCV). IEEE, 2017, pp.4809–4817.) and super-resolution generative confrontation network SRGAN (C.Ledig, L.Theis, F.Huszar, J.Caballero, A . Cunningham, A. Acosta, A. Aitken, A. Tejani, J. Totz, Z. Wang, et al. Photo-realistic single image super-resolution using a generative adversarial network. arXiv preprint arXiv:1609.04802, 2016.2).

表1展示了各方法在缩放因子为4的情况下对应的客观和主观评价指标，包括PSNR(峰值信噪比)和SSIM(结构相似性)以及MOS(平均主观意见得分)。从表1中可以看出，本发明方法在恢复图像的主观质量上，有稳定的提升。Table 1 shows the objective and subjective evaluation indicators corresponding to each method when the scaling factor is 4, including PSNR (Peak Signal-to-Noise Ratio) and SSIM (Structural Similarity) and MOS (Mean Subjective Opinion Score). It can be seen from Table 1 that the method of the present invention has a stable improvement in the subjective quality of the restored image.

表1各方法客观以及主观评分Table 1 Objective and subjective scoring of each method

视觉效果对比如图4，从对比结果可知，本发明的方法相对于其它方法在重建的细节上更为生动具体，在整体上具有更高的真实感和视觉说服力，在保持了客观评价指标基本不变的水准下，在主观评价指标上有较大幅度提高。The comparison of visual effects is shown in Figure 4. From the comparison results, it can be known that the method of the present invention is more vivid and specific in reconstruction details than other methods, has a higher sense of reality and visual persuasion as a whole, and maintains the objective evaluation index Under the basically unchanged level, the subjective evaluation indicators have been greatly improved.

应当理解的是，本说明书未详细阐述的部分均属于现有技术。It should be understood that the parts not described in detail in this specification belong to the prior art.

应当理解的是，上述针对实施例的描述较为详细，并不能因此而认为是对本发明专利保护范围的限制。本领域的普通技术人员在本发明的启示下，在不脱离本发明权利要求所保护的范围情况下，还可以做出替换或变形，但仍属于本发明的保护范围之内，本发明的请求保护范围应以所附权利要求为准。It should be understood that the above descriptions for the embodiments are relatively detailed, and should not be considered as limiting the protection scope of the present invention. Under the enlightenment of the present invention, those skilled in the art can also make replacements or modifications without departing from the protection scope of the claims of the present invention, but still belong to the protection scope of the present invention. The scope of protection should be determined by the appended claims.

Claims

1. a kind of super resolution ratio reconstruction method of combination semantic segmentation, which comprises the steps of:

Step 1, the semantic segmentation data set of low resolution is constructed, the semantic segmentation data set of the low resolution includes low resolution Rate image and corresponding semantic layout；

Step 2, semantic segmentation network is trained using the semantic segmentation data set of low resolution,；

Step 3, the data set for training Super-resolution reconstruction establishing network is constructed, it is described for training Super-resolution reconstruction establishing network Data set includes the semantic layout, semantic feature figure and corresponding high-definition picture of low-resolution image, wherein low resolution The semantic layout of rate image, semantic feature figure are by being input to trained semantic segmentation in step 2 for low-resolution image It is obtained in network；

Step 4, using semantic layout and semantic feature figure as input, the corresponding high-definition picture of semantic layout is as true Real value, training Super-resolution reconstruction establishing network can export corresponding super-resolution reconstruction knot according to the semantic layout of input Fruit；

Step 5, a low resolution picture to be reconstructed is input to semantic segmentation obtained in step 2 as a result, obtaining its language Then adopted layout and semantic feature figure are input to the Super-resolution reconstruction establishing network that training obtains in step 4, finally obtain High-definition picture after reconstruction.

2. a kind of super resolution ratio reconstruction method of combination semantic segmentation according to claim 1, it is characterised in that: step 1 The semantic segmentation data set of the low resolution is by the high-definition picture and semantic cloth in common semantic segmentation data set Office's figure carries out the down-sampling of the same zoom factor, and obtained low-resolution image and semantic layout constitute the language of low resolution Adopted partitioned data set.

3. a kind of super resolution ratio reconstruction method of combination semantic segmentation according to claim 1, it is characterised in that: step 2 In semantic segmentation network be full convolutional network, which is that the full articulamentum in VGG16 is changed to institute after convolutional layer , specific network structure are as follows: the convolutional layer × pond the 2+ layer+convolutional layer × pond the 2+ layer+convolutional layer × pond 3+ layer+convolutional layer × The pond the 3+ layer+convolutional layer × pond 3+ layer+convolutional layer × 2+ warp lamination, wherein the convolution kernel size of convolutional layer is 3 × 3, pond Change layer using maximum pond.

4. a kind of super resolution ratio reconstruction method of combination semantic segmentation according to claim 3, it is characterised in that: full convolution The weight initialization of network is by the weight in the VGG16 of pre-training；The loss function optimized in training is that network is last One layer of the sum of the deviation of pixel predictors；Trained design parameter are as follows: batch size of training is 20, uses momentum for 0.9, declines Variability is 10^-4Adam algorithm optimize, the learning rate of network is 10^-4。

5. a kind of super resolution ratio reconstruction method of combination semantic segmentation according to claim 1, it is characterised in that: step 4 The Super-resolution reconstruction establishing network is that network, cascade reconstruction mould are rebuild in a series of cascade as composed by cascade reconstruction modules Block is run with incremental resolution ratio, wherein each reconstruction module is made of 3 network layers: first layer is characterized fused layer, effect For the semantic layout of input and semantic feature figure are merged with the output result of preceding layer；Two layers next is with 3 × 3 convolution Core, layer regularization and the convolutional layer for correcting linear unit, as being rebuild to fused feature.

6. a kind of super resolution ratio reconstruction method of combination semantic segmentation according to claim 5, it is characterised in that: super-resolution The carrying out practically relationship that rate is rebuild between the reconstruction module in network is as follows,

First is rebuild module using the semantic layout and semantic feature figure that are down sampled to current resolution as input, output one A current resolution as a result, this result regard as through merging and convolution after characteristic pattern, subsequent reconstruction module will be previous The result of a module and the semantic layout after down-sampling and semantic feature figure export a new knot together as input Fruit, by multiple such process, the result of final reconstruction module output is super-resolution rebuilding as a result, this process Mathematical description is as follows:

Wherein, O_iIndicate the output of i-th of reconstruction module, f indicates the operation such as convolution rebuild in module, and L indicates semantic layout Figure, F indicate that semantic feature figure, ⊕ indicate Fusion Features.

7. a kind of super resolution ratio reconstruction method of combination semantic segmentation according to claim 5, it is characterised in that: step 4 The loss function used when middle trained Super-resolution reconstruction establishing network is,

Wherein, I is the high-definition picture for representing true value, and f is that network is rebuild in cascade to be trained, and θ is the parameter set in f It closes, L is the semantic layout of input, and Φ is trained visual perception network, which is VGG network, Φ_l Indicate the convolutional layer in visual perception network, λ_lFor the hyper parameter for controlling weight, its value is with training process in the training process It is adjusted.

8. a kind of super resolution ratio reconstruction method of combination semantic segmentation according to claim 5, it is characterised in that: training is super When resolution reconstruction network, be specifically configured to: whole the number of iterations was 200 generations；The learning rate of model is 10^-4, and every warp After the training of 100 generations, learning rate is reduced to half；Use momentum for the 0.9, rate of disintegration 10^-4Adam algorithm optimize.