[go: up one dir, main page]

CN107133919A - Time dimension video super-resolution method based on deep learning - Google Patents

Time dimension video super-resolution method based on deep learning Download PDF

Info

Publication number
CN107133919A
CN107133919A CN201710341864.3A CN201710341864A CN107133919A CN 107133919 A CN107133919 A CN 107133919A CN 201710341864 A CN201710341864 A CN 201710341864A CN 107133919 A CN107133919 A CN 107133919A
Authority
CN
China
Prior art keywords
video image
layer
neural network
image set
original video
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710341864.3A
Other languages
Chinese (zh)
Inventor
董伟生
巨丹
石光明
谢雪梅
吴金建
李甫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xidian University
Original Assignee
Xidian University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xidian University filed Critical Xidian University
Priority to CN201710341864.3A priority Critical patent/CN107133919A/en
Publication of CN107133919A publication Critical patent/CN107133919A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4053Scaling of whole images or parts thereof, e.g. expanding or contracting based on super-resolution, i.e. the output image resolution being higher than the sensor resolution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/587Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal sub-sampling or interpolation, e.g. decimation or subsequent interpolation of pictures in a video sequence

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Signal Processing (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

本发明公开了一种基于深度学习的时间维视频超分辨率方法,主要解决现有技术重构的视频图像插帧稳定性差和精度低的问题。其技术关键是利用神经网络训练拟合原始视频图像和下采样视频图像之间的非线性映射关系,包括:1)获取原始视频图像集和下采样视频图像集作为神经网络的训练样本;2)构建神经网络模型并利用训练样本训练神经网络的参数;3)将任给的一段视频作为测试样本,输入到训练好的神经网络模型中,神经网络的输出结果即为重构的视频图像。本发明降低了视频图像插帧重构的计算复杂度,提高了重构视频图像插帧的稳定性和精度,可用于场景插值、动画制作、实现低帧率视频的时间域插帧。

The invention discloses a time-dimensional video super-resolution method based on deep learning, which mainly solves the problems of poor frame interpolation stability and low precision of reconstructed video images in the prior art. The key technology is to use neural network training to fit the nonlinear mapping relationship between the original video image and the downsampled video image, including: 1) obtaining the original video image set and the downsampled video image set as the training samples of the neural network; 2) Construct the neural network model and use the training samples to train the parameters of the neural network; 3) Input any given video as a test sample into the trained neural network model, and the output result of the neural network is the reconstructed video image. The invention reduces the computational complexity of frame interpolation and reconstruction of video images, improves the stability and accuracy of frame interpolation of reconstructed video images, and can be used for scene interpolation, animation production, and time-domain frame interpolation of low frame rate videos.

Description

基于深度学习的时间维视频超分辨率方法Time-dimension video super-resolution method based on deep learning

技术领域technical field

本发明属于图像处理领域,具体涉及一种时间维视频超分辨方法,可用于场景插值、动画制作、实现低帧率视频的时间域插帧。The invention belongs to the field of image processing, and in particular relates to a time-dimensional video super-resolution method, which can be used for scene interpolation, animation production, and time domain frame interpolation of low frame rate videos.

背景技术Background technique

视频图像不仅包含了被观测目标的空间信息,而且包含了被观测目标在时间上的运动信息,具备“空时合一”的性质。由于视频图像可以把反映物体性质的空间信息和时间信息维系在一起,因此极大的提高了人类认知客观世界的能力,在遥感、军事、农业、医学、生物化学等领域都被证明有着巨大的应用价值。The video image not only contains the spatial information of the observed target, but also contains the motion information of the observed target in time, which has the property of "integration of space and time". Since the video image can maintain the spatial information and time information reflecting the nature of the object together, it greatly improves the ability of human beings to recognize the objective world, and has been proved to have great potential in remote sensing, military affairs, agriculture, medicine, biochemistry and other fields. application value.

利用视频成像设备获取精密的视频图像成本很高,而且受到传感器和光学器件制造工艺的限制,为了提高成像视频的分辨率,通常需要对视频进行压缩,以牺牲视频的时间分辨率为代价,这显然难以满足科学研究和大规模实际应用的需求。所以利用信号处理技术从压缩后的视频图像中重建出高分辨率的视频图像成为获取视频图像的一个重要途径。The use of video imaging equipment to obtain precise video images is very costly, and is limited by the manufacturing process of sensors and optical devices. In order to improve the resolution of imaging video, it is usually necessary to compress the video at the expense of the temporal resolution of the video. Obviously, it is difficult to meet the needs of scientific research and large-scale practical applications. So using signal processing technology to reconstruct high-resolution video images from compressed video images has become an important way to obtain video images.

Kang S J等人在“Dual Motion Estimation for Frame Rate Up-Conversion”中提出了一种采用运动估计和运动补偿的方法实现视频图像插帧重构的算法。该视频图像插帧重构问题是一个病态逆问题,其利用视频图图像的时间信息并结合视频图像的空间信息来实现视频图像插帧重构,但是该算法由于没有充分利用视频图像中存在的较强的相邻帧间的结构相似性,使得重构的视频图像稳定性和精度难以满足科学研究和大规模实际应用的要求。In "Dual Motion Estimation for Frame Rate Up-Conversion", Kang S J and others proposed an algorithm that uses motion estimation and motion compensation to realize frame interpolation and reconstruction of video images. The video image frame interpolation and reconstruction problem is an ill-conditioned inverse problem, which uses the time information of the video image and combines the spatial information of the video image to realize the video image frame interpolation reconstruction, but the algorithm does not make full use of the existing in the video image. The strong structural similarity between adjacent frames makes it difficult for the stability and accuracy of reconstructed video images to meet the requirements of scientific research and large-scale practical applications.

发明内容Contents of the invention

本发明的目的在于针对上述现有技术的不足,提出一种基于深度学习的时间维视频超分辨率方法,以提高重构视频图像的稳定性和精度,满足大规模实际应用的要求。The purpose of the present invention is to address the deficiencies of the above-mentioned prior art and propose a time-dimensional video super-resolution method based on deep learning to improve the stability and accuracy of reconstructed video images and meet the requirements of large-scale practical applications.

本发明的技术方案是这样实现的:Technical scheme of the present invention is realized like this:

将经过下采样的视频图像集和原始视频图像集分别作为神经网络的输入训练样本和输出训练样本,通过神经网络训练拟合下采样视频图像和原始视频图像之间的非线性映射关系,并以这种关系为指导进行测试样本的插帧重构,从而达到利用神经网络进行视频时间域插帧的目的,其具体步骤包括如下:The down-sampled video image set and the original video image set are respectively used as the input training samples and output training samples of the neural network, and the nonlinear mapping relationship between the down-sampled video images and the original video images is fitted through neural network training, and the This relationship is to guide the frame insertion and reconstruction of test samples, so as to achieve the purpose of using neural network for video time domain frame insertion. The specific steps include the following:

(1)将彩色视频图像集S={S1,S2,...,Si,...,SN}转换为灰度视频图像集,即原始视频图像集X={X1,X2,...,Xi,...,XN},并利用下采样矩阵F对原始视频图像集X进行直接下采样,得到下采样视频图像集Y={Y1,Y2,...,Yi,...,YN},其中,表示第i个原始视频图像样本,表示第i个下采样视频图像样本,1≤i≤N,N表示原始视频图像集中图像样本的数量,M表示原始视频图像块的大小,Lh表示原始视频图像集每个样本中图像块的数量,Ll表示下采样视频图像集每个样本中图像块的数量,且Lh=r×Ll,r表示原始视频图像集对下采样视频图像集的放大倍数;(1) Convert the color video image set S={S 1 , S 2 ,...,S i ,...,S N } into a grayscale video image set, that is, the original video image set X={X 1 , X 2 ,...,X i ,...,X N }, and use the downsampling matrix F to directly downsample the original video image set X to obtain the downsampled video image set Y={Y 1 ,Y 2 , ...,Y i ,...,Y N }, where, Indicates the i-th original video image sample, Represents the i-th downsampled video image sample, 1≤i≤N, N represents the number of image samples in the original video image set, M represents the size of the original video image block, L h represents the size of the image block in each sample of the original video image set Quantity, L l represents the number of image blocks in each sample of the downsampled video image set, and L h = r × L l , r represents the magnification factor of the original video image set to the downsampled video image set;

(2)构建神经网络模型,并利用下采样视频图像集Y和原始视频图像集X训练神经网络参数:(2) Construct the neural network model, and utilize the downsampled video image set Y and the original video image set X to train the neural network parameters:

(2a)确定神经网络输入层节点数、输出层节点数、隐藏层数和隐藏层节点数量,随机初始化各层的连接权值W(t)和偏置b(t),给定学习速率η,选定激活函数为:其中,g表示神经网络节点的输入值,t=1,2,···,n,n表示神经网络的总层数;(2a) Determine the number of input layer nodes, the number of output layer nodes, the number of hidden layers and the number of hidden layer nodes of the neural network, randomly initialize the connection weight W (t) and bias b (t) of each layer, and give the learning rate η , the selected activation function is: Among them, g represents the input value of the neural network node, t=1,2,...,n, n represents the total number of layers of the neural network;

(2b)随机输入下采样视频图像集中的一个下采样视频图像Yi作为输入训练样本,同时输入对应的原始视频图像集中的一个原始视频图像Xi作为输出训练样本,使用选定的激活函数计算神经网络每一层的激活值,计算得到:(2b) Randomly input a downsampled video image Y i in the downsampled video image set as the input training sample, and at the same time input an original video image X i in the corresponding original video image set as the output training sample, and use the selected activation function to calculate The activation value of each layer of the neural network is calculated as:

第1层即输入层的激活值为:a(1)=YiThe activation value of the first layer, that is, the input layer: a (1) = Y i ,

第t'=2,3,...,n层的激活值为:a(t′)=f(W(t′-1)*a(t′-1)+b(t-1)),其中,在该网络的第二层,第三层,第四层即t'=2,t'=3,t'=4时,为了充分提取视频帧间的相关性,设计了三个三维滤波器用来代替传统的二维滤波器,f(g)表示tanh(g)激活函数,g=W(t′-1)*a(t′-1)+b(t′-1),W(t'-1)和b(t'-1)分别表示第t'-1层的权重和偏置,a(t'-1)表示第t'-1层的激活值;The activation value of the t'=2,3,...,nth layer is: a (t') =f(W (t'-1) *a (t'-1) +b (t-1) ), wherein, in the second layer of the network, the third layer, the fourth layer that is t'=2, t'=3, t'=4, in order to fully extract the correlation between video frames, three The three-dimensional filter is used to replace the traditional two-dimensional filter, f(g) represents the tanh(g) activation function, g=W (t′-1) *a (t′-1) +b (t′-1) , W (t' - 1) and b (t' - 1) represent the weight and bias of layer t'-1, respectively, and a ( t'- 1) represents the activation value of layer t'-1;

(2c)计算神经网络各层的学习误差:(2c) Calculate the learning error of each layer of the neural network:

输出层即第n层的误差为:δ(n)=Xi-a(n)The error of the output layer, namely the nth layer, is: δ (n) =X i -a (n) ,

第t"=n-1,n-2,...,2层的误差为:δ(t")=((W(t”))Tδ(t”+1)).*f'(W(t”-1)*a(t”-1)+b(t”-1)),其中,W(t”)表示第t"层的权值,δ(t"+1)表示第t"+1层的误差,W(t”-1)和b(t”-1)分别表示第t"-1层的权值和偏置,a(t”-1)表示第t"-1层的激活值,f'(g')表示函数f(g')的导数,(g”)T表示转置变换,g'=W(t”-1)*a(t”-1)+b(t”-1),g”=W(t”)The error of the t"=n-1, n-2,..., layer 2 is: δ (t") =((W (t") ) T δ (t"+1) ).*f'( W (t”-1) *a (t”-1) +b (t”-1) ), where W (t”) represents the weight of the t"th layer, δ (t"+1) represents the The error of the t"+1 layer, W (t"-1) and b (t"-1) respectively represent the weight and bias of the t"-1 layer, a (t"-1) represents the t"- The activation value of layer 1, f'(g') represents the derivative of the function f(g'), (g") T represents the transpose transformation, g'=W (t"-1) *a (t"-1) +b (t"-1) , g"=W (t") ;

(2d)按误差梯度下降方法更新神经网络各层的权值和偏置:(2d) Update the weights and biases of each layer of the neural network according to the error gradient descent method:

将权值更新为W(t)=W(t)-ηδ(t+1)(a(t))T,将偏置更新为b(t)=b(t)-ηδ(t+1),其中,δ(t+1)表示第t+1层的误差,a(t)表示第t层的激活值;Update the weight to W (t) = W (t) -ηδ (t+1) (a (t) ) T , update the bias to b (t) = b (t) -ηδ (t+1) , where δ (t+1) represents the error of layer t+1, and a (t) represents the activation value of layer t;

(2e)反复执行步骤(2b)-(2d),直到神经网络的输出层误差达到预设精度要求或训练次数达到最大迭代次数,结束训练,保存网络结构和参数,得到训练好的神经网络模型;(2e) Repeat steps (2b)-(2d) until the output layer error of the neural network reaches the preset accuracy requirement or the number of training times reaches the maximum number of iterations, end the training, save the network structure and parameters, and obtain the trained neural network model ;

(3)任给一段视频,输入到训练好的神经网络模型中,神经网络的输出即为时间维超分辨后的视频。(3) Given any video, input it into the trained neural network model, and the output of the neural network is the super-resolved video in the time dimension.

本发明与现有的技术相比具有以下优点:Compared with the prior art, the present invention has the following advantages:

1)本发明由于利用卷积神经网络进行时间维视频超分辨率重建,相比现有技术降低了计算复杂度,提高了时间维视频图像超分辨重建的稳定性;1) Since the present invention uses a convolutional neural network to perform time-dimensional video super-resolution reconstruction, compared with the prior art, the computational complexity is reduced, and the stability of time-dimensional video image super-resolution reconstruction is improved;

2)本发明所设计的三维滤波器,由于充分考虑了视频相邻帧间的相关性,提高了时间维视频图像时间超分辨重建的精度。2) The three-dimensional filter designed in the present invention improves the accuracy of temporal super-resolution reconstruction of time-dimensional video images due to fully considering the correlation between adjacent video frames.

附图说明Description of drawings

图1为本发明的实现流程图;Fig. 1 is the realization flowchart of the present invention;

图2为本发明构建的神经网络结构图;Fig. 2 is the neural network structural diagram that the present invention builds;

图3为本发明仿真实验所用的bus视频的原始图像;Fig. 3 is the original image of the used bus video of the simulation experiment of the present invention;

图4为用现有的Kang’s方法和Choi’s方法以及本发明方法对bus视频图像的重建结果图。Fig. 4 is the reconstruction result figure of bus video image with existing Kang's method and Choi's method and the method of the present invention.

具体实施方式detailed description

以下结合附图对本发明的实施例和效果做进一步详细描述。Embodiments and effects of the present invention will be further described in detail below in conjunction with the accompanying drawings.

参照图1,本发明基于深度学习的时间维视频超分辨率方法,其实现步骤如下:With reference to Fig. 1, the time dimension video super-resolution method based on deep learning of the present invention, its realization steps are as follows:

步骤1,获取彩色视频图像集S。Step 1, get the color video image set S.

(1a)从给定数据库中,选取样本数为464814的彩色视频图像集S={S1,S2,...,Si,...,S464814},将S转换为灰度视频图像集,即原始视频图像集X={X1,X2,...,Xi,...,X464814},其中,表示第i个原始视频图像样本,1≤i≤464814,M表示原始视频图像块的大小,M=576,Lh表示原始视频图像集每个样本中图像块的数量,Lh=6;(1a) From a given database, select a color video image set S={S 1 ,S 2 ,...,S i ,...,S 464814 } with a sample number of 464814, and convert S to a grayscale video Image set, that is, the original video image set X={X 1 ,X 2 ,...,X i ,...,X 464814 }, wherein, Represent the i-th original video image sample, 1≤i≤464814, M represents the size of the original video image block, M=576, L h represents the number of image blocks in each sample of the original video image set, L h =6;

(1b)利用下采样矩阵F,对原始视频图像集X进行直接下采样,得到下采样视频图像集Y=FX,相当于对原始视频图像集X={X1,X2,...,Xi,...,X464814}中的每个样本进行下采样得到下采样视频图像集Y={Y1,Y2,...,Yi,...,Y464814},其中,Yi表示第i个下采样视频图像样本,Yi=FXi1≤i≤464814,M表示下采样视频图像块的大小,M=576,Ll表示下采样视频图像集每个样本中图像块的数量,Ll=3, (1b) Use the downsampling matrix F to directly downsample the original video image set X to obtain the downsampled video image set Y=FX, which is equivalent to the original video image set X={X 1 ,X 2 ,..., Each sample in X i ,...,X 464814 } is down-sampled to obtain a down-sampled video image set Y={Y 1 ,Y 2 ,...,Y i ,...,Y 464814 }, where, Y i represents the i-th downsampled video image sample, Y i =FX i , 1≤i≤464814, M represents the size of the downsampled video image block, M=576, L 1 represents the number of image blocks in each sample of the downsampled video image set, L 1 =3,

步骤2,构建神经网络模型,并利用下采样视频图像集Y和原始视频图像集X训练神经网络参数。Step 2, constructing a neural network model, and using the downsampled video image set Y and the original video image set X to train the neural network parameters.

本步骤的具体实现如下:The specific implementation of this step is as follows:

(2a)初始化神经网络参数;(2a) Initialize the neural network parameters;

(2a1)将下采样视频图像集中的下采样视频图像作为输入训练样本,将原始视频图像集中的原始视频图像作为输出训练样本;(2a1) the downsampled video image in the downsampled video image set is used as an input training sample, and the original video image in the original video image set is used as an output training sample;

(2a2)根据输入训练样本的视频帧数来确定神经网络的输入层节点数,本实施例中,根据输入层节点数等于下采样视频图像集每个样本中图像块的数量Ll,设置输入层节点数为3;(2a2) Determine the number of input layer nodes of the neural network according to the number of video frames of the input training samples. In this embodiment, according to the number of input layer nodes equal to the number L l of image blocks in each sample of the downsampled video image set, set the input The number of layer nodes is 3;

(2a3)根据输出训练样本的视频帧数来确定神经网络的输出层节点数,本实施例中,根据输出层节点数等于原始视频图像集每个样本中图像块的数量Lh,设置输出层节点数为6;(2a3) Determine the output layer node number of the neural network according to the video frame number of the output training sample. In this embodiment, the output layer node number is equal to the number L h of image blocks in each sample of the original video image set, and the output layer is set. The number of nodes is 6;

(2a4)确定隐藏层数和隐藏层节点数:(2a4) Determine the number of hidden layers and the number of hidden layer nodes:

由于神经网络的隐藏层数和隐藏层节点数决定了神经网络的规模,因而应在保证能够解决问题的前提下,力求神经网络的规模尽量简单,本实施例中,将神经网络的隐藏层数直接确定为7层,隐藏层节点数通过实验调节每层的节点数,即第一层结点数为64个,第二层结点数为32个,第三层结点数为24个,第四层结点数为12个,第五层结点数为32个,第六层结点数为32个,第七层结点数为6个;Since the number of hidden layers of the neural network and the number of hidden layer nodes determine the scale of the neural network, the scale of the neural network should be as simple as possible under the premise of ensuring that the problem can be solved. In this embodiment, the number of hidden layers of the neural network is It is directly determined as 7 layers, and the number of nodes in the hidden layer is adjusted through experiments, that is, the number of nodes in the first layer is 64, the number of nodes in the second layer is 32, the number of nodes in the third layer is 24, and the number of nodes in the fourth layer The number of nodes is 12, the number of nodes in the fifth layer is 32, the number of nodes in the sixth layer is 32, and the number of nodes in the seventh layer is 6;

(2a5)随机初始化各层连接权值W(t)和偏置b(t),t=1,2,3,4,5,6,7,8;(2a5) Randomly initialize the connection weight W (t) and bias b (t) of each layer, t=1,2,3,4,5,6,7,8;

(2a6)给定学习速率η=0.0005;(2a6) given learning rate η=0.0005;

(2a7)选定激活函数为:其中,g表示神经网络节点包括偏置在内的输入加权和;(2a7) The selected activation function is: Among them, g represents the input weighted sum of neural network nodes including bias;

(2b)随机输入一个输入训练样本Yi,使用选定的激活函数计算神经网络每一层的激活值,计算得到:(2b) Randomly input an input training sample Y i , use the selected activation function to calculate the activation value of each layer of the neural network, and calculate:

第1层即输入层的激活值为:a(1)=YiThe activation value of the first layer, that is, the input layer: a (1) = Y i ,

第t'=2,3,4,5,6,7层的激活值为:a(t')=f(W(t'-1)*a(t'-1)+b(t'-1)),其中,在该网络的第二层,第三层,第四层即t'=2,t'=3,t'=4时,为了充分提取视频帧间的相关性,设计了三个三维滤波器用来代替传统的二维滤波器,f(g)表示tanh(g)激活函数,g=W(t'-1)*a(t'-1)+b(t'-1),W(t'-1)和b(t'-1)分别表示第t'-1层的权值和偏置,a(t'-1)表示第t'-1层的激活值;The activation value of layer t'=2,3,4,5,6,7 is: a (t') =f(W (t'-1) *a (t'-1) +b (t'- 1) ), wherein, in the second layer of the network, the third layer, the fourth layer that is t'=2, t'=3, t'=4, in order to fully extract the correlation between video frames, design Three three-dimensional filters are used to replace the traditional two-dimensional filter, f(g) represents the tanh(g) activation function, g=W (t'-1) *a (t'-1) +b (t'-1 ) , W (t'-1) and b (t'-1) respectively represent the weight and bias of the t'-1th layer, a (t'-1) represents the activation value of the t'-1th layer;

(2c)输入一个对应的输出训练样本Xi,计算神经网络各层的学习误差:(2c) Input a corresponding output training sample X i , and calculate the learning error of each layer of the neural network:

输出层即第4层的误差为:δ(4)=Xi-a(4)The error of the output layer, that is, the fourth layer is: δ (4) =X i -a (4) ,

第t”=3,2层的误差为:δ(t”)=((W(t”))Tδ(t”+1)).*f'(W(t”-1)*a(t”-1)+b(t”-1)),其中,W(t”)表示第t”层的权值,W(t”-1)和b(t”-1)分别表示第t”-1层的权值和偏置,a(t”-1)表示第t”-1层的激活值,f'(g')表示函数f(g')的导数,(g”)T表示转置变换,g'=W(t”-1)*a(t”-1)+b(t”-1),g”=W(t”)t”=3, the error of layer 2 is: δ (t”) =((W (t”) ) T δ (t”+1) ).*f'(W (t”-1) *a ( t”-1) +b (t”-1) ), where W (t”) represents the weight of the t”th layer, W (t”-1) and b (t”-1) respectively represent the weight of the t”th layer The weight and bias of the "-1 layer, a (t"-1) represents the activation value of the t"-1 layer, f'(g') represents the derivative of the function f(g'), (g") T Represent transpose transformation, g'=W (t "-1) *a (t "-1) +b (t "-1) , g "=W (t ") ;

(2d)按误差梯度下降方法更新神经网络各层的权值和偏置:(2d) Update the weights and biases of each layer of the neural network according to the error gradient descent method:

将权值更新为:W(t)=W(t)-ηδ(t+1)(a(t))TUpdate the weight as: W (t) = W (t) -ηδ (t+1) (a (t) ) T ,

将偏置更新为:b(t)=b(t)-ηδ(t+1),其中,δ(t+1)表示第t+1层的误差,a(t)表示第t层的激活值;Update the bias as: b (t) = b (t) -ηδ (t+1) , where δ (t+1) represents the error of the t+1th layer and a (t) represents the activation of the tth layer value;

(2e)反复执行步骤(2b)-(2d),直到网络输出层误差达到预设精度要求或训练次数达到最大迭代次数,结束训练,保存网络结构和参数,得到训练好的神经网络模型,本实施例中,最大迭代次数为500;(2e) Repeat steps (2b)-(2d) until the network output layer error reaches the preset accuracy requirement or the number of training times reaches the maximum number of iterations, then end the training, save the network structure and parameters, and obtain the trained neural network model. In an embodiment, the maximum number of iterations is 500;

本步骤2所构建的神经网络如图2所示,其包括1个输入层、3个三维卷积层、3个二维卷积层、1个输出层,输入层有3个结点,7个隐含层的结点个数分别为64,32,24,12,32,32,6,输出层有6个结点。The neural network constructed in step 2 is shown in Figure 2, which includes 1 input layer, 3 three-dimensional convolution layers, 3 two-dimensional convolution layers, and 1 output layer. The input layer has 3 nodes, 7 The number of nodes in each hidden layer is 64, 32, 24, 12, 32, 32, 6 respectively, and the output layer has 6 nodes.

步骤3,利用训练好的神经网络模型,进行视频图像的时间维超分辨重建。Step 3, using the trained neural network model to perform time-dimensional super-resolution reconstruction of video images.

(3a)将任给的一段视频作为测试样本,将该视频图像中的每一个视频图像样本即Yi拉成一个列向量,每个向量的大小为1728×1;(3a) Take any given section of video as a test sample, and pull each video image sample in the video image into a column vector, and the size of each vector is 1728 ×1;

(3b)将这些列向量作为已经训练好的神经网络模型的输入,对于每一个输入的向量,神经网络的输出结果是一个维数增加了的向量,该向量的大小为3456×1;(3b) Using these column vectors as the input of the trained neural network model, for each input vector, the output result of the neural network is a vector with increased dimension, and the size of the vector is 3456×1;

(3c)将这些向量进行重构组合,即先将这些向量重构成单帧图像,再将这些单帧的图像组合成视频,就可得到时间维超分辨的视频。(3c) Reconstruct and combine these vectors, that is, first reconstruct these vectors into single-frame images, and then combine these single-frame images into a video to obtain a time-dimensional super-resolution video.

本发明的效果可以通过如下仿真实验具体说明:Effect of the present invention can be specified by following simulation experiments:

1.仿真条件:1. Simulation conditions:

1)仿真实验中的直接下采样变换矩阵F通过函数imresize得到;1) The direct downsampling transformation matrix F in the simulation experiment is obtained by the function imresize;

2)仿真实验所用的编程平台为Matlab R2015a和Pycharm v2016;2) The programming platforms used in the simulation experiment are Matlab R2015a and Pycharm v2016;

3)仿真实验中构建的神经网络结构如图2所示;3) The neural network structure constructed in the simulation experiment is shown in Figure 2;

4)仿真实验所用的bus视频序列的第14帧图像如图3所示;4) The 14th frame image of the bus video sequence used in the simulation experiment is as shown in Figure 3;

5)仿真实验所用的视频图像集中的视频图像来源于Xiph数据库,共464814个训练样本;5) The video images in the video image set used in the simulation experiment come from the Xiph database, with a total of 464814 training samples;

6)仿真实验中,采用峰值信噪比PSNR指标来评价实验结果,峰值信噪比PSNR的定义为:6) In the simulation experiment, the peak signal-to-noise ratio PSNR index is used to evaluate the experimental results, and the peak signal-to-noise ratio PSNR is defined as:

其中,M表示重构出的视频图像的帧数,MAXj表示重构出的第j帧图像的最大像素值,MSEj表示重构出的视频图像第j帧与原始视频图像第j帧之间的均方误差。Among them, M represents the number of frames of the reconstructed video image, MAX j represents the maximum pixel value of the reconstructed jth frame image, MSE j represents the difference between the reconstructed video image frame j and the original video image frame j mean square error between.

2.仿真内容:采用本发明方法对图3所示的bus视频图像进行时间维视频超分辨重建,其重建结果如图4所示,其中:2. Simulation content: adopt the inventive method to carry out time-dimension video super-resolution reconstruction to the bus video image shown in Figure 3, and its reconstruction result is as shown in Figure 4, wherein:

图4(a)表示用Kang’s方法重构出的第14帧图像,Figure 4(a) shows the 14th frame image reconstructed by Kang's method,

图4(b)表示用Choi’s方法重构出的第14帧图像,Figure 4(b) shows the 14th frame image reconstructed by Choi’s method,

图4(c)表示用本发明方法重构出的第14帧图像,Fig. 4 (c) represents the 14th frame image reconstructed by the method of the present invention,

从图4所显示的重构结果可以看出,本发明重构出来的图像比Kang’s方法和Choi’s方法重构出来的图像更接近真实的图像。As can be seen from the reconstruction results shown in Figure 4, the image reconstructed by the present invention is closer to the real image than the reconstructed image by Kang's method and Choi's method.

3.峰值信噪比PSNR对比3. Peak Signal-to-Noise Ratio PSNR Comparison

计算现有的Tsai’s方法、Choi’s方法和本发明方法对bus视频图像进行视频时间超分辨重建的峰值信噪比PSNR,结果如表1所示。Calculate the peak signal-to-noise ratio (PSNR) of the existing Tsai's method, Choi's method and the method of the present invention to carry out the video time super-resolution reconstruction of the bus video image, and the results are shown in Table 1.

表1 重构视频图像的峰值信噪比PSNR值(单位:dB)Table 1 PSNR value of reconstructed video image (unit: dB)

从表1可以看出,本发明方法重建的视频图像的峰值信噪比PSNR比现有的Kang’s方法高2.99dB,比现有的Choi’s方法高2.38dB。As can be seen from Table 1, the peak signal-to-noise ratio PSNR of the video image reconstructed by the method of the present invention is 2.99dB higher than the existing Kang's method, and 2.38dB higher than the existing Choi's method.

Claims (5)

1. The time dimension video super-resolution method based on deep learning comprises the following steps:
(1) set S ═ S of color video images1,S2,...,Si,...,SNConverting into a gray scale video image set, i.e. an original video image set X ═ X1,X2,...,Xi,...,XNDirectly down-sampling the original video image set X by using a down-sampling matrix F to obtain a down-sampling video image set Y ═ Y1,Y2,...,Yi,...,YNAnd (c) the step of (c) in which,representing the ith original video image sample,representing the ith down-sampling video image sample, i is more than or equal to 1 and less than or equal to N, N represents the number of image samples in the original video image set, M represents the size of the original video image block, LhRepresenting the number of image blocks, L, in each sample of the original video image setlRepresenting the number of image blocks in each sample of a down-sampled video image set, and Lh=r×LlAnd r represents the magnification of the original video image set to the downsampled video image set;
(2) constructing a neural network model, and training neural network parameters by using a down-sampling video image set Y and an original video image set X:
(2a) determining the number of nodes of input layer, the number of nodes of output layer, the number of hidden layers and the number of nodes of hidden layer of neural network, and randomly initializing the connection weight W of each layer(t)And bias b(t)Given the learning rate η, the activation function is chosen to be:wherein g represents the input value of the neural network node, t is 1,2, …, n, n represents the total number of layers of the neural network;
(2b) randomly inputting a down-sampled video image Y in a down-sampled video image setiAs an input training sample, simultaneously inputting an original video image X in a corresponding original video image setiAs an output training sample, calculating an activation value of each layer of the neural network by using the selected activation function, and calculating to obtain:
the activation values of layer 1, the input layer, are: a is(1)=Yi
The activation values of the n-th layer are: a is(t′)=f(W(t′-1)*a(t′-1)+b(t′-1)) Wherein, in the second layer, the third layer and the fourth layer of the network, i.e. t '2 and t' 3,when t' is 4, in order to sufficiently extract the correlation between video frames, three-dimensional filters are designed to replace the conventional two-dimensional filter, f (g) represents the tan h (g) activation function, and g is W(t′-1)*a(t′-1)+b(t′-1),W(t′-1)And b(t′-1)Respectively represent the weight and offset of the t' -1 th layer, a(t-1)Represents the activation value of the t' -1 th layer;
(2c) calculating the learning error of each layer of the neural network:
the error of the output layer, i.e. the nth layer, is:(n)=Xi-a(n)
the t ″, n-1, n-2, 2-layer error is:(t′)=((W(t”))T (t”+1)).*f'(W(t”-1)*a(t”-1)+b(t”-1)) Wherein W is(t”)Represents the weight of the t-th layer,(t″+1)denotes an error of t "+1 layer, W(t”-1)And b(t”-1)Respectively representing weight and offset of the t "-1 th layer, a(t-1)Denotes the activation value of the t "-1 th layer, f ' (g ') denotes the derivative of the function f (g '), (g")TRepresenting a transposed transform, g ═ W(t”-1)*a(t”-1)+b(t”-1),g”=W(t”)
(2d) Updating the weight and the bias of each layer of the neural network according to an error gradient descent method:
update the weight value to W(t)=W(t)(t+1)(a(t))TUpdate the bias to b(t)=b(t)(t+1)Wherein(t+1)error of t +1 th layer, a(t)Represents an activation value of the t-th layer;
(2e) repeatedly executing the steps (2b) - (2d) until the error of the output layer of the neural network reaches the preset precision requirement or the training frequency reaches the maximum iteration frequency, finishing the training, and storing the network structure and parameters to obtain a trained neural network model;
(3) and inputting any section of video into the trained neural network model, wherein the output of the neural network is the video after time dimension super resolution.
2. The method of claim 1, wherein the step (1) of converting the original video image set X into the downsampled video image set Y using a downsampled matrix F is performed by multiplying the original video image by the downsampled matrix F:
Y=FX,
wherein,m denotes the size of the original video image block, LlRepresenting the number of image blocks in each sample of a down-sampled video image set, LhRepresenting the number of image blocks in each sample of the original video image set, and Lh=r×LlAnd r represents the magnification of the downsampled video image set to the original video image set in the time dimension.
3. The method of claim 1, wherein the number of input layer nodes of the neural network determined in step (2a) is determined according to the number of video frames of the input training sample, i.e. the number of input layer nodes is equal to the number L of image blocks in each sample of the downsampled video image setl
4. The method of claim 1, wherein the number of output layer nodes of the neural network determined in step (2a) is determined according to the number of video frames of the output training samples, i.e. the number of output layer nodes is equal to the number L of image blocks in each sample of the original video image seth
5. The method of claim 1, wherein the number of hidden layer nodes of the neural network determined in step (2a) is determined by experimental adjustment.
CN201710341864.3A 2017-05-16 2017-05-16 Time dimension video super-resolution method based on deep learning Pending CN107133919A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710341864.3A CN107133919A (en) 2017-05-16 2017-05-16 Time dimension video super-resolution method based on deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710341864.3A CN107133919A (en) 2017-05-16 2017-05-16 Time dimension video super-resolution method based on deep learning

Publications (1)

Publication Number Publication Date
CN107133919A true CN107133919A (en) 2017-09-05

Family

ID=59731773

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710341864.3A Pending CN107133919A (en) 2017-05-16 2017-05-16 Time dimension video super-resolution method based on deep learning

Country Status (1)

Country Link
CN (1) CN107133919A (en)

Cited By (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107784628A (en) * 2017-10-18 2018-03-09 南京大学 A kind of super-resolution implementation method based on reconstruction optimization and deep neural network
CN108111860A (en) * 2018-01-11 2018-06-01 安徽优思天成智能科技有限公司 Video sequence lost frames prediction restoration methods based on depth residual error network
CN108122197A (en) * 2017-10-27 2018-06-05 江西高创保安服务技术有限公司 A kind of image super-resolution rebuilding method based on deep learning
CN108322685A (en) * 2018-01-12 2018-07-24 广州华多网络科技有限公司 Video frame interpolation method, storage medium and terminal
CN108600762A (en) * 2018-04-23 2018-09-28 中国科学技术大学 In conjunction with the progressive video frame generating method of motion compensation and neural network algorithm
CN108805808A (en) * 2018-04-04 2018-11-13 东南大学 A method of improving video resolution using convolutional neural networks
CN109191376A (en) * 2018-07-18 2019-01-11 电子科技大学 High-resolution terahertz image reconstruction method based on SRCNN improved model
CN109862299A (en) * 2017-11-30 2019-06-07 北京大学 Resolution processing method and device
CN110166779A (en) * 2019-05-23 2019-08-23 西安电子科技大学 Video-frequency compression method based on super-resolution reconstruction
CN110177282A (en) * 2019-05-10 2019-08-27 杭州电子科技大学 A kind of inter-frame prediction method based on SRCNN
CN110996171A (en) * 2019-12-12 2020-04-10 北京金山云网络技术有限公司 Training data generation method and device for video tasks and server
CN111147893A (en) * 2018-11-02 2020-05-12 华为技术有限公司 A video adaptive method, related device and storage medium
CN111383172A (en) * 2018-12-29 2020-07-07 Tcl集团股份有限公司 Training method and device of neural network model and intelligent terminal
CN111567056A (en) * 2018-01-04 2020-08-21 三星电子株式会社 Video playback device and control method thereof
CN112188236A (en) * 2019-07-01 2021-01-05 北京新唐思创教育科技有限公司 Video interpolation frame model training method, video interpolation frame generation method and related device
WO2021093393A1 (en) * 2019-11-13 2021-05-20 南京邮电大学 Video compressed sensing and reconstruction method and apparatus based on deep neural network
US11140422B2 (en) * 2019-09-25 2021-10-05 Microsoft Technology Licensing, Llc Thin-cloud system for live streaming content
US11270187B2 (en) 2017-11-07 2022-03-08 Samsung Electronics Co., Ltd Method and apparatus for learning low-precision neural network that combines weight quantization and activation quantization
CN114598833A (en) * 2022-03-25 2022-06-07 西安电子科技大学 Video frame insertion method based on spatiotemporal joint attention
CN114979703A (en) * 2021-02-18 2022-08-30 阿里巴巴集团控股有限公司 Method of processing video data and method of processing image data
CN111126220B (en) * 2019-12-16 2023-10-17 北京瞭望神州科技有限公司 Real-time positioning method for video monitoring target

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104217404A (en) * 2014-08-27 2014-12-17 华南农业大学 Video image sharpness processing method in fog and haze day and device thereof
CN106485688A (en) * 2016-09-23 2017-03-08 西安电子科技大学 High spectrum image reconstructing method based on neutral net

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104217404A (en) * 2014-08-27 2014-12-17 华南农业大学 Video image sharpness processing method in fog and haze day and device thereof
CN106485688A (en) * 2016-09-23 2017-03-08 西安电子科技大学 High spectrum image reconstructing method based on neutral net

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
吴杰: "基于卷积神经网络的行为识别研究", 《中国硕士学位论文全文数据库信息科技辑》 *

Cited By (34)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107784628B (en) * 2017-10-18 2021-03-19 南京大学 A Super-Resolution Implementation Method Based on Reconstruction Optimization and Deep Neural Networks
CN107784628A (en) * 2017-10-18 2018-03-09 南京大学 A kind of super-resolution implementation method based on reconstruction optimization and deep neural network
CN108122197A (en) * 2017-10-27 2018-06-05 江西高创保安服务技术有限公司 A kind of image super-resolution rebuilding method based on deep learning
CN108122197B (en) * 2017-10-27 2021-05-04 江西高创保安服务技术有限公司 Image super-resolution reconstruction method based on deep learning
US11270187B2 (en) 2017-11-07 2022-03-08 Samsung Electronics Co., Ltd Method and apparatus for learning low-precision neural network that combines weight quantization and activation quantization
CN109862299B (en) * 2017-11-30 2021-08-27 北京大学 Resolution processing method and device
CN109862299A (en) * 2017-11-30 2019-06-07 北京大学 Resolution processing method and device
US11457273B2 (en) 2018-01-04 2022-09-27 Samsung Electronics Co., Ltd. Video playback device and control method thereof
CN111567056A (en) * 2018-01-04 2020-08-21 三星电子株式会社 Video playback device and control method thereof
CN108111860B (en) * 2018-01-11 2020-04-14 安徽优思天成智能科技有限公司 Lost frame prediction and recovery method of video sequence based on deep residual network
CN108111860A (en) * 2018-01-11 2018-06-01 安徽优思天成智能科技有限公司 Video sequence lost frames prediction restoration methods based on depth residual error network
CN108322685B (en) * 2018-01-12 2020-09-25 广州华多网络科技有限公司 Video frame insertion method, storage medium and terminal
WO2019137248A1 (en) * 2018-01-12 2019-07-18 广州华多网络科技有限公司 Video frame interpolation method, storage medium and terminal
CN108322685A (en) * 2018-01-12 2018-07-24 广州华多网络科技有限公司 Video frame interpolation method, storage medium and terminal
CN108805808A (en) * 2018-04-04 2018-11-13 东南大学 A method of improving video resolution using convolutional neural networks
CN108600762A (en) * 2018-04-23 2018-09-28 中国科学技术大学 In conjunction with the progressive video frame generating method of motion compensation and neural network algorithm
CN109191376B (en) * 2018-07-18 2022-11-25 电子科技大学 High-resolution terahertz image reconstruction method based on improved SRCNN model
CN109191376A (en) * 2018-07-18 2019-01-11 电子科技大学 High-resolution terahertz image reconstruction method based on SRCNN improved model
CN111147893B (en) * 2018-11-02 2021-10-22 华为技术有限公司 A video adaptive method, related device and storage medium
CN111147893A (en) * 2018-11-02 2020-05-12 华为技术有限公司 A video adaptive method, related device and storage medium
US11509860B2 (en) 2018-11-02 2022-11-22 Huawei Technologies Co., Ltd. Video adaptation method, related device, and storage medium
CN111383172A (en) * 2018-12-29 2020-07-07 Tcl集团股份有限公司 Training method and device of neural network model and intelligent terminal
CN110177282A (en) * 2019-05-10 2019-08-27 杭州电子科技大学 A kind of inter-frame prediction method based on SRCNN
CN110177282B (en) * 2019-05-10 2021-06-04 杭州电子科技大学 An Inter-frame Prediction Method Based on SRCNN
CN110166779B (en) * 2019-05-23 2021-06-08 西安电子科技大学 Video compression method based on super-resolution reconstruction
CN110166779A (en) * 2019-05-23 2019-08-23 西安电子科技大学 Video-frequency compression method based on super-resolution reconstruction
CN112188236A (en) * 2019-07-01 2021-01-05 北京新唐思创教育科技有限公司 Video interpolation frame model training method, video interpolation frame generation method and related device
US11140422B2 (en) * 2019-09-25 2021-10-05 Microsoft Technology Licensing, Llc Thin-cloud system for live streaming content
WO2021093393A1 (en) * 2019-11-13 2021-05-20 南京邮电大学 Video compressed sensing and reconstruction method and apparatus based on deep neural network
CN110996171B (en) * 2019-12-12 2021-11-26 北京金山云网络技术有限公司 Training data generation method and device for video tasks and server
CN110996171A (en) * 2019-12-12 2020-04-10 北京金山云网络技术有限公司 Training data generation method and device for video tasks and server
CN111126220B (en) * 2019-12-16 2023-10-17 北京瞭望神州科技有限公司 Real-time positioning method for video monitoring target
CN114979703A (en) * 2021-02-18 2022-08-30 阿里巴巴集团控股有限公司 Method of processing video data and method of processing image data
CN114598833A (en) * 2022-03-25 2022-06-07 西安电子科技大学 Video frame insertion method based on spatiotemporal joint attention

Similar Documents

Publication Publication Date Title
CN107133919A (en) Time dimension video super-resolution method based on deep learning
CN106485688B (en) High spectrum image reconstructing method neural network based
CN110675321B (en) Super-resolution image reconstruction method based on progressive depth residual error network
CN107784628B (en) A Super-Resolution Implementation Method Based on Reconstruction Optimization and Deep Neural Networks
CN111047515A (en) Cavity convolution neural network image super-resolution reconstruction method based on attention mechanism
CN113177882A (en) Single-frame image super-resolution processing method based on diffusion model
CN103020935B (en) The image super-resolution method of the online dictionary learning of a kind of self-adaptation
CN111369466B (en) Image distortion correction enhancement method of convolutional neural network based on deformable convolution
CN108805808A (en) A method of improving video resolution using convolutional neural networks
CN112801877A (en) Super-resolution reconstruction method of video frame
CN114202459B (en) Blind image super-resolution method based on depth priori
CN103093445A (en) Unified feature space image super-resolution reconstruction method based on joint sparse constraint
CN116416375A (en) A 3D reconstruction method and system based on deep learning
CN107644401A (en) Multiplicative noise minimizing technology based on deep neural network
CN113902985B (en) Training method, device and computer equipment for video frame optimization model
CN107341776A (en) Single frames super resolution ratio reconstruction method based on sparse coding and combinatorial mapping
CN113870422A (en) Pyramid Transformer-based point cloud reconstruction method, device, equipment and medium
CN109949217B (en) Video super-resolution reconstruction method based on residual learning and implicit motion compensation
CN114757830B (en) Image super-resolution reconstruction method based on channel-diffusion double-branch network
CN114299185B (en) Magnetic resonance image generation method, device, computer equipment and storage medium
CN114723787A (en) Optical flow calculation method and system
CN117593199A (en) A two-stream remote sensing image fusion method based on Gaussian prior distribution self-attention
CN107103592B (en) A multi-pose face image quality enhancement method based on dual-kernel norm regularization
CN119579410A (en) A super-resolution reconstruction method for tactile glove array signals based on diffusion model
Zeng et al. Face super-resolution via bilayer contextual representation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20170905

WD01 Invention patent application deemed withdrawn after publication