CN110177282A

CN110177282A - A kind of inter-frame prediction method based on SRCNN

Info

Publication number: CN110177282A
Application number: CN201910388829.6A
Authority: CN
Inventors: 颜成钢; 黄智坤; 李志胜; 孙垚棋; 张继勇; 张勇东
Original assignee: Hangzhou Dianzi University
Current assignee: Hangzhou Dianzi University
Priority date: 2019-05-10
Filing date: 2019-05-10
Publication date: 2019-08-27
Anticipated expiration: 2039-05-10
Also published as: CN110177282B

Abstract

The invention discloses a kind of inter-frame prediction methods based on SRCNN, it is characterised in that carries out inter-prediction to image sequence using super-resolution convolutional neural networks；It takes exercises after estimation and operation of motion compensation to image sequence, trains characteristic model in conjunction with super-resolution convolutional neural networks；Super-resolution rebuilding is carried out to image using the parameter in model, while Motion estimation and compensation is carried out to image, obtains the consistent image of next frame image with present image.Deep learning is applied to the inter-prediction of Video coding by the present invention, using convolutional neural networks, carries out feature extraction to estimation, the operation of motion compensation image sequence and training learns.Meanwhile using super-resolution neural network, in image reconstruction, the image quality of image can be enhanced.

Description

An Inter-Frame Prediction Method Based on SRCNN

技术领域technical field

本发明属于视频编码领域中的帧间预测，主要为了提高视频传输效率，具体涉及一种基于SRCNN的帧间预测方法。The invention belongs to inter-frame prediction in the field of video coding, mainly for improving video transmission efficiency, and specifically relates to an SRCNN-based inter-frame prediction method.

背景技术Background technique

超分辨率(Super-Resolution)意味着将一幅低分辨率(Low Resolution)图像转变成一幅高分辨率(HighResolution)图像，通常可以提高图像质量、清晰度。超分辨率卷积神经网络(Super-ResolutionConvolutionalNeuralNetwork，SRCNN)是一个应用于图像超分辨率重建的卷积神经网络，通过提取图像块的特征，对特征进行非线性映射后，重建出高分辨率的图像。此卷积神经网络自提出后，被广泛使用，准确性和可靠性得到了很好的验证。Super-resolution (Super-Resolution) means converting a low-resolution (Low Resolution) image into a high-resolution (High Resolution) image, usually improving image quality and clarity. Super-Resolution Convolutional Neural Network (SRCNN) is a convolutional neural network applied to image super-resolution reconstruction. By extracting the features of image blocks and nonlinearly mapping the features, a high-resolution image can be reconstructed. image. Since this convolutional neural network was proposed, it has been widely used, and its accuracy and reliability have been well verified.

在当今这个信息时代里，从科学家们的研究和统计的数据表明，人类获取的来自外界的信息中，大概75％的是通过眼睛获得的，眼睛获得的信息通过视觉系统转换成图像并传输到大脑。随着当今生活水平的快速提高，人们对图像视频质量要求越来越高。而图像、视频的分辨率不断提高也为信息传输带来巨大的挑战。越清晰的图像、视频意味着更大的数据量和需要更高的传输速率。为了保证人们的观感舒适度，现今电影等视频的帧率一般要高于24帧每秒，如果将每一帧的图像保存下来，再逐帧播放，不仅对硬盘容量要求特别高，并且对播放设备的传输、显示速率都有着巨大的挑战。如果以此方式播放视频，因为传输速率的限制，那么将不会存在2K、4K等高清视频。视频编码技术极大程度上消除了图像序列之间的冗余，使得视频的数据量大大压缩，配合现有硬件技术，让超高清视频走进人们的生活中，极大程度上满足了人们的观感需求。In today's information age, the research and statistical data of scientists show that about 75% of the information obtained by humans from the outside world is obtained through the eyes, and the information obtained by the eyes is converted into images through the visual system and transmitted to brain. With the rapid improvement of today's living standards, people have higher and higher requirements for image and video quality. The continuous improvement of image and video resolution also brings huge challenges to information transmission. Clearer images and videos mean larger data volumes and higher transmission rates. In order to ensure people's viewing comfort, the frame rate of movies and other videos is generally higher than 24 frames per second. If the image of each frame is saved and played frame by frame, not only the hard disk capacity is particularly high, but also the playback There are huge challenges in the transmission and display rate of the device. If the video is played in this way, due to the limitation of the transmission rate, there will be no high-definition videos such as 2K and 4K. Video coding technology has largely eliminated the redundancy between image sequences, greatly compressing the data volume of video, combined with existing hardware technology, allowing ultra-high-definition video to enter people's lives, to a great extent satisfying people's needs perception needs.

帧间预测是视频编码中最主要的一环，是利用视频图像帧间的相关性，即时间相关性，来达到图像压缩的目的，广泛用于普通电视、会议电视、视频电话、高清晰度电视的压缩编码。在图像传输技术中，活动图像特别是电视图像是关注的主要对象。活动图像是由时间上以帧周期为间隔的连续图像帧组成的时间图像序列，它在时间上比在空间上具有更大的相关性。大多数电视图像相邻帧间细节变化是很小的，即视频图像帧间具有很强的相关性，利用帧所具有的相关性的特点进行帧间编码，可获得比帧内编码高得多的压缩比。Inter-frame prediction is the most important part of video coding. It uses the correlation between video image frames, that is, time correlation, to achieve the purpose of image compression. It is widely used in ordinary TV, conference TV, video phone, high-definition Compression coding for television. In image transmission technology, moving images, especially TV images, are the main objects of concern. A moving image is a temporal image sequence composed of consecutive image frames temporally spaced by a frame period, and it has greater correlation in time than in space. The detail changes between adjacent frames of most TV images are very small, that is, there is a strong correlation between video image frames, and inter-frame coding can be obtained by using the correlation characteristics of frames, which is much higher than intra-frame coding. the compression ratio.

在帧间预测编码中，由于活动图像邻近帧中的景物存在着一定的相关性。因此，可将活动图像分成若干块或宏块，并设法搜索出每个块或宏块在邻近帧图像中的位置，并得出两者之间的空间位置的相对偏移量，得到的相对偏移量就是通常所指的运动矢量，得到运动矢量的过程被称为运动估计。运动矢量和经过运动匹配后得到的预测误差共同发送到解码端，在解码端按照运动矢量指明的位置，从已经解码的邻近参考帧图像中找到相应的块或宏块，和预测误差相加后就得到了块或宏块在当前帧中的位置。通过运动估计可以去除帧间冗余度，使得视频传输的比特数大为减少，因此，运动估计是视频压缩处理系统中的一个重要组成部分。本节先从运动估计的一般方法入手，重点讨论了运动估计的三个关键问题：将运动场参数化、最优化匹配函数定义以及如何寻找到最优化匹配。In inter-frame predictive coding, there is a certain correlation between the scenes in the adjacent frames of the moving image. Therefore, the active image can be divided into several blocks or macroblocks, and the position of each block or macroblock in the adjacent frame image can be searched, and the relative offset of the spatial position between the two can be obtained. The offset is usually referred to as a motion vector, and the process of obtaining the motion vector is called motion estimation. The motion vector and the prediction error obtained after motion matching are sent to the decoding end together. At the decoding end, according to the position indicated by the motion vector, the corresponding block or macroblock is found from the decoded adjacent reference frame image, and the prediction error is added. The position of the block or macroblock in the current frame is obtained. The inter-frame redundancy can be removed through motion estimation, which greatly reduces the number of bits in video transmission. Therefore, motion estimation is an important part of the video compression processing system. This section starts with the general method of motion estimation, focusing on three key issues of motion estimation: parameterizing the motion field, defining the optimal matching function, and how to find the optimal matching.

发明内容Contents of the invention

本发明的目的是区别于主流的HEVC视频编码方式，提出一种基于SRCNN的帧间预测方法。本发明旨在使用超分辨率卷积神经网络对图像序列进行帧间预测。对图像序列做运动估计和运动补偿操作后，结合超分辨率卷积神经网络训练出特征模型。使用模型中的参数，可以在对图像进行超分辨率重建，同时对图像进行运动估计和运动补偿，得到与当前图像的下一帧图像基本一致的图像。The purpose of the present invention is to propose an inter-frame prediction method based on SRCNN, which is different from the mainstream HEVC video encoding method. The present invention aims at inter-prediction of image sequences using super-resolution convolutional neural networks. After performing motion estimation and motion compensation operations on the image sequence, a feature model is trained with a super-resolution convolutional neural network. Using the parameters in the model, the super-resolution reconstruction of the image can be performed, and the motion estimation and motion compensation can be performed on the image at the same time, so as to obtain an image that is basically consistent with the next frame of the current image.

本发明解决其技术问题所采用的技术方案包括如下步骤：The technical solution adopted by the present invention to solve its technical problems comprises the steps:

步骤1：收集大量不同场景的视频文件，按不同的量化参数(QP)对视频进行压缩；Step 1: collect a large amount of video files of different scenes, and compress the video according to different quantization parameters (QP);

步骤2：从视频中提取图像序列，提取图像序列时，前后两帧图像的时间间隔设置为t，t<0.1秒；Step 2: Extract the image sequence from the video. When extracting the image sequence, the time interval between the two frames of images before and after is set to t, and t<0.1 second;

步骤3：将图像序列中的部分划分为验证集。逐帧读取剩余图像，除了读取的图像序列的首帧外，每张图像使用当前帧和前一帧，计算两帧图像之间的残差，将前一帧图像和此残差结合，对其进行运动补偿，得到前一帧图像的预测帧。保存计算所得的预测帧序列，将预测帧图像序列进行划分，得到训练集和测试集，二者的比例为4：1。Step 3: Partition the image sequence into a validation set. Read the remaining images frame by frame. Except for the first frame of the read image sequence, each image uses the current frame and the previous frame, calculates the residual between the two frames, and combines the previous frame with the residual. Perform motion compensation on it to obtain the predicted frame of the previous frame image. Save the calculated predicted frame sequence, divide the predicted frame image sequence to obtain a training set and a test set, and the ratio of the two is 4:1.

步骤4：输入训练集和测试集，设置合适的超参数，使用超分辨率卷积神经网络(SRCNN)训练参数模型；Step 4: Input the training set and test set, set the appropriate hyperparameters, and use the super-resolution convolutional neural network (SRCNN) to train the parameter model;

步骤5：计算验证集内每一图像序列中的第i帧图像与第i+1帧的峰值信噪比(PSNR)，记作PSRN1；读取参数模型中的参数对获取的图像序列内的第i帧图像进行处理，得到重建图像I；计算重建图像I与验证集内图像序列的第i帧图像之间的PSNR，记作PSNR2；Step 5: Calculate the peak signal-to-noise ratio (PSNR) of the i-th frame image and the i+1-th frame in each image sequence in the verification set, denoted as PSRN1; read the parameters in the parameter model to the obtained image sequence Process the i-th frame image to obtain a reconstructed image I; calculate the PSNR between the reconstructed image I and the i-th frame image of the image sequence in the verification set, denoted as PSNR2;

比较计算所得的两个PSNR值，若PSNR2≥PSNR1，则认为该模型有效；Compare the calculated two PSNR values, if PSNR2≥PSNR1, the model is considered valid;

若PSNR2<PSNR1，则认为模型效果不好；记ERR＝PSNR1-PSNR2；若ERR<5,则认为训练超参数设置有问题，返回步骤4，调整学习率的超参数，然后重新训练参数模型；若ERR≥5，则认为数据集的划分策略又问题，返回步骤3，通过扩充数据集使数据集包含更多场景，重新划分训练集和测试集后进行训练和验证；If PSNR2<PSNR1, it is considered that the model effect is not good; record ERR=PSNR1-PSNR2; if ERR<5, it is considered that there is a problem with the training hyperparameter setting, return to step 4, adjust the hyperparameter of the learning rate, and then retrain the parameter model; If ERR ≥ 5, it is considered that the division strategy of the data set is problematic, return to step 3, expand the data set to include more scenes, re-divide the training set and test set, and then perform training and verification;

若两张图像差别较大，PSNR值超出最低预设阈值，则调整训练集、测试集；If the difference between the two images is large and the PSNR value exceeds the minimum preset threshold, adjust the training set and test set;

若两张图像差别较小，PSNR值在最佳预设阈值和最低预设阈值之间，则返回步骤4调整超分辨率卷积神经网络的参数，重新训练参数模型。If the difference between the two images is small and the PSNR value is between the optimal preset threshold and the lowest preset threshold, return to step 4 to adjust the parameters of the super-resolution convolutional neural network and retrain the parameter model.

所述的使用参数模型重建图像具体实现如下：The specific implementation of the image reconstruction using the parameter model is as follows:

1.将输入的低分辨率图像转到YCbCr色彩空间取灰度图，作为图像重建操作的输入i。对图像i进行下采样，下采样的步长设置为k，得到低维度的图像；1. Convert the input low-resolution image to the YCbCr color space to take the grayscale image as the input i of the image reconstruction operation. Downsample the image i, and set the downsampling step size to k to obtain a low-dimensional image;

2.对低维度的图像使用双三次插值，将其放大到目标大小，即输入的低分辨率图像大小；2. Use bicubic interpolation on the low-dimensional image to enlarge it to the target size, which is the size of the input low-resolution image;

3.读取参数模型中的参数，包括各个网络节点的权重和偏置。通过三层卷积网络对插值后的图像做非线性映射，得到的重建后的结果，图像I；3. Read the parameters in the parameter model, including the weight and bias of each network node. Perform nonlinear mapping on the interpolated image through a three-layer convolutional network, and obtain the reconstructed result, image I;

4.将图像I转回RGB彩色图，得到重建的高分辨率图像。4. Convert the image I back to an RGB color map to obtain a reconstructed high-resolution image.

本发明有益效果如下：The beneficial effects of the present invention are as follows:

本发明的创新性在于将深度学习应用到视频编码的帧间预测，使用卷积神经网络，对图像序列间的运动估计、运动补偿操作进行特征提取和训练学习。同时，使用超分辨率神经网络，在图像重建时，图像的画质会得到增强。The innovation of the present invention lies in the application of deep learning to the inter-frame prediction of video coding, and the use of convolutional neural networks to perform feature extraction and training learning on motion estimation and motion compensation operations between image sequences. At the same time, using the super-resolution neural network, the quality of the image will be enhanced when the image is reconstructed.

附图说明Description of drawings

图1是超分辨率卷积神经网络SRCNN的示意图；Figure 1 is a schematic diagram of a super-resolution convolutional neural network SRCNN;

图2是本发明实施的流程图。Figure 2 is a flowchart of the implementation of the present invention.

具体实施方式Detailed ways

本发明主要针对视频编码内的帧间预测方法进行算法创新，对于整个模型的训练流程进行了详细的介绍，以下结合附图，详细阐述本发明的具体实施步骤，本发明的目的和效果将变得更加明显。The present invention mainly focuses on the algorithm innovation of the inter-frame prediction method in video coding, and introduces the training process of the entire model in detail. The specific implementation steps of the present invention will be described in detail below in conjunction with the accompanying drawings. The purpose and effect of the present invention will change more obvious.

图1是超分辨率卷积神经网络SRCNN的示意图，从图中可以清楚地看到该卷积神经网络结构简单，通过非线性映射和图像重建，可以对图像的画质起到增强作用。运用该网络，可以在对图像序列进行帧间预测的同时，提高图像的分辨率。Figure 1 is a schematic diagram of the super-resolution convolutional neural network SRCNN. It can be clearly seen from the figure that the convolutional neural network has a simple structure and can enhance the image quality through nonlinear mapping and image reconstruction. Using this network, the resolution of the image can be improved while performing inter-frame prediction on the image sequence.

图2是本发明的实施流程图，其中具体操作包括：Fig. 2 is the implementation flowchart of the present invention, and wherein concrete operation comprises:

1.收集大量YUV格式的视频文件，包含各种不同的场景。1. Collect a large number of video files in YUV format, including various scenes.

2.使用不同的量化参数对视频文件进行压缩，量化参数越高，则压缩程度越高，主要关注量化参数在28至42之间的压缩比。2. Use different quantization parameters to compress video files. The higher the quantization parameter, the higher the degree of compression. The main focus is on the compression ratio of the quantization parameter between 28 and 42.

3.从视频文件中提取图像序列，根据不同时长的视频，提取不同数量的图像，来保证图像序列的间隔一致。为了保证前后两帧图像间的变化不大，提取图像的时间间隔要设置得很小，具体根据视频的长度来设置。3. Extract image sequences from video files, and extract different numbers of images according to videos of different durations to ensure consistent intervals between image sequences. In order to ensure that there is not much change between the two frames of images before and after, the time interval for extracting images should be set very small, specifically according to the length of the video.

4.对提取出来的每一张图像做运动估计、运动补偿，此操作具体为输入当前帧及下一帧图像，通过对比两帧图像，对当前帧做运动估计、运动补偿。4. Perform motion estimation and motion compensation for each extracted image. This operation is specifically to input the current frame and the next frame image, and perform motion estimation and motion compensation for the current frame by comparing the two frames of images.

5.使用处理好的图像序列，组织训练集和测试集。验证模型所需的验证集则需要用没有做过运动估计、运动补偿的图像序列。5. Using the processed image sequence, organize the training set and test set. The verification set required to verify the model requires an image sequence that has not been motion estimated or compensated.

6.输入训练集和测试集，设置合适的参数，使用超分辨率卷积神经网络SRCNN来训练模型。6. Input the training set and test set, set appropriate parameters, and use the super-resolution convolutional neural network SRCNN to train the model.

7.验证训练好的模型是否有效，通过对比原本提取出的下一帧图像和使用该模型参数重建出的图像，若两张图像几无差别，则可认为该模型有效。若两张图像有着明显的差别，还要根据不同情况作出调整。若两张图像的差别很大，则需要调整数据集，重新训练模型，如果两图像间差别不是很大，则需要在成像效果上做出改进，调整网络参数，重新训练出复合要求的模型。7. Verify whether the trained model is valid. By comparing the original extracted next frame image with the image reconstructed using the model parameters, if there is little difference between the two images, the model can be considered valid. If there are obvious differences between the two images, adjustments should be made according to different situations. If the difference between the two images is very large, you need to adjust the data set and retrain the model. If the difference between the two images is not very large, you need to improve the imaging effect, adjust the network parameters, and retrain the model with composite requirements.

在对比生成图像和原图像的下一帧图像时，需要结合视觉上的主观判断跟客观数值分析。主观上，通过肉眼观察两帧图像，如果两张图片几无差别，可以主观上认为模型有效。但由于原本前后帧图像的区别并不大，还需借助数学工具，来对两张图像进行比较。可以使用，即峰值信噪比即PSNR来对重建效果进行客观评价，PSNR是一种评价图像的客观标准，其公式如下：When comparing the generated image with the next frame of the original image, it is necessary to combine visual subjective judgment with objective numerical analysis. Subjectively, by observing two frames of images with the naked eye, if there is little difference between the two images, the model can be considered to be valid subjectively. However, since there is not much difference between the front and rear frame images, it is necessary to use mathematical tools to compare the two images. It can be used to objectively evaluate the reconstruction effect, that is, the peak signal-to-noise ratio (PSNR). PSNR is an objective standard for evaluating images. The formula is as follows:

其中，MSE为均方误差(Meansquarederror)。分别计算原图像与其下一帧图像、原图像和重建出的图像间的PSNR数值，若两者数值接近，则说明该模型效果很好，基本重建出了与原图像下一帧图像相同的图片。若后者的PSNR数值更高，则可以认为，程序在对图像进行帧间预测的同时，还提高了图像质量。Among them, MSE is mean square error (Meansquarederror). Calculate the PSNR values between the original image and its next frame image, the original image and the reconstructed image respectively. If the two values are close, it means that the model works well, and basically reconstructs the same picture as the next frame of the original image. . If the PSNR value of the latter is higher, it can be considered that the program improves the image quality while performing inter-frame prediction on the image.

借助PSNR，可以从客观上再次验证模型的准确性，以此来减少工作量，并保证该方案有效地实施。With the help of PSNR, the accuracy of the model can be verified again objectively, so as to reduce the workload and ensure the effective implementation of the scheme.

Claims

1. An inter-frame prediction method based on SRCNN, characterized in that it uses a super-resolution convolutional neural network to perform inter-frame prediction on an image sequence; after performing motion estimation and motion compensation operations on the image sequence, it combines the super-resolution convolutional neural network The network trains a feature model; uses the parameters in the model to perform super-resolution reconstruction on the image, and at the same time performs motion estimation and motion compensation on the image to obtain an image that is consistent with the next frame of the current image.

2. a kind of inter-frame prediction method based on SRCNN according to claim 1, is characterized in that concrete realization comprises the steps:

Step 1: Collect a large number of video files of different scenes, and compress the video according to different quantization parameters;

Step 2: Extract the image sequence from the video. When extracting the image sequence, the time interval between the two frames of images before and after is set to t, and t<0.1 second;

Step 3: Divide the part of the image sequence into a verification set; read the remaining image sequence frame by frame, except for the first frame of the read image sequence, each image uses the current frame and the previous frame to calculate the difference between the two frame images The residual error of the previous frame image is combined with this residual error, and motion compensation is performed on it to obtain the predicted frame of the previous frame image; the calculated predicted frame image sequence is saved, and the predicted frame image sequence is divided to obtain the training set and the test set, the ratio of the two is 4:1;

Step 4: Input the training set and test set, set the hyperparameters, and use the super-resolution convolutional neural network to train the parameter model;

Step 5: Calculate the peak signal-to-noise ratio (PSNR) of the i-th frame image and the i+1-th frame in each image sequence in the verification set, denoted as PSRN1; read the parameters in the parameter model to the obtained image sequence Process the i-th frame image to obtain a reconstructed image I; calculate the PSNR between the reconstructed image I and the i-th frame image of the image sequence in the verification set, denoted as PSNR2;

Compare the calculated two PSNR values, if PSNR2≥PSNR1, the model is considered valid;

If PSNR2<PSNR1, it is considered that the model effect is not good; record ERR=PSNR1-PSNR2; if ERR<5, it is considered that there is a problem with the training hyperparameter setting, return to step 4, adjust the hyperparameter of the learning rate, and then retrain the parameter model; If ERR ≥ 5, it is considered that the division strategy of the data set is problematic, return to step 3, expand the data set to make the data set include more scenes, and re-divide the training set and test set for training and verification;

If the difference between the two images is large and the PSNR value exceeds the minimum preset threshold, adjust the training set and test set;

If the difference between the two images is small and the PSNR value is between the optimal preset threshold and the lowest preset threshold, return to step 4 to adjust the parameters of the super-resolution convolutional neural network and retrain the parameter model.

3. a kind of interframe prediction method based on SRCNN according to claim 2, is characterized in that described use parametric model to reconstruct image concrete realization as follows:

1. Convert the input low-resolution image to the YCbCr color space to take the grayscale image as the input image i of the image reconstruction operation; down-sample the input image i, and set the step size of the down-sampling to k to obtain a low-dimensional image ;

2. Use bicubic interpolation on the low-dimensional image to enlarge it to the target size, which is the size of the input low-resolution image;

3. Read the parameters in the parameter model, including the weights and offsets of each network node; use a three-layer convolutional network to perform nonlinear mapping on the interpolated image to obtain the reconstructed image I;

4. Convert the image I back to an RGB color map to obtain a reconstructed high-resolution image.