CN115797178B

CN115797178B - Video super-resolution method based on 3D convolution

Info

Publication number: CN115797178B
Application number: CN202211556262.7A
Authority: CN
Inventors: 魏文应; 张伟民; 安欣赏; 肖铁军; 张世雄; 龙仕强
Original assignee: Guangdong Bohua Ultra Hd Innovation Center Co ltd
Current assignee: Guangdong Bohua Ultra Hd Innovation Center Co ltd
Priority date: 2022-12-06
Filing date: 2022-12-06
Publication date: 2024-10-18
Anticipated expiration: 2042-12-06
Also published as: CN115797178A

Abstract

The invention provides a video super-resolution method based on 3D convolution, which solves the problems that in the existing method, the method comprises the following steps: s1, grouping video frames: reading video frames from a video file and grouping the video frames; S2.3D convolution calculation: constructing a feature extraction algorithm model based on the 3D convolutional neural network; s3, constructing a super-resolution algorithm model based on a countermeasure generation network (GAN). The method disclosed by the invention solves the problem that the reference information of the video frame is lost because the dependency characteristic extraction of the video frame with long time span cannot be performed by the super-resolution algorithm.

Description

Video super-resolution method based on 3D convolution

Technical Field

The invention relates to the field of computer vision, in particular to a video super-resolution method based on 3D convolution.

Background

With the development of science and technology, low-resolution electronic displays are gradually replaced by ultra-high-definition displays such as 2K/4K display screens, and the demands of consumers on ultra-high-definition video film sources are also increasing. However, old movies have no ultra-high definition film source because of the problems of lag of early shooting equipment and the like, and the watching experience is seriously affected. Meanwhile, a video super-resolution algorithm based on a depth neural network is widely applied to the technical application of processing standard definition video into ultra-high definition video, and great progress is made. In the existing method, video is input into a neural network after being decoded, the video with the resolution ratio of 1920 x 1080 is input, the time span of 100 frames of pictures is longer, the memory required after decoding is as high as 100 x 1920 x 1080 x 3 x 64 approximately equal to 4.6GB, the existing single-level deep neural network algorithm model is difficult to process large-scale data, and finally, the algorithm cannot establish the interdependence relation of long-time span image frames, so that the reference information of the video frames is lost.

The difficulty in solving the problems is as follows: in the prior art, a simple convolution calculation is generally performed on a video frame, or an antagonism generation network is combined with an optical flow algorithm to perform feature extraction of a dependency relationship of an image frame, and the extracted feature is used as reference information of a current video frame, but in super-resolution application, the data volume of the video frame is large, so that it is difficult to establish the image frame with a long time span.

Performing super-division on a certain frame of the video, wherein the more the input original information is, the more the super-divided picture is close to the real situation; the general coding characteristic of the video, more than ten frames, even hundreds of frames before and after the video, are strongly related to the current frame; the amount of data after video decoding is quite large. The meaning of solving the problems is that: the multi-level 3D convolution can batch process data and input the data into the neural network in batches, particularly a plurality of shallow neural networks in batches, and the multi-level 3D convolution neural network can rapidly reduce the data while extracting information characteristics; the calculated amount of the countermeasure generation network is determined by the number, the width and the depth of the channels, and after the data amount is reduced, the number and the width of the channels are reduced, and the depth is unchanged at the moment, so that the calculated amount of the countermeasure generation network is rapidly reduced along with the sharp reduction of the input data amount, and the problems that the input data amount which is difficult to solve by the current single-layer level neural network is overlarge, and the calculated amount is huge and difficult to calculate are solved.

Disclosure of Invention

The invention provides a video super-resolution method based on 3D convolution, which solves the problem that in the existing method, the super-resolution algorithm cannot extract the dependency characteristic of a long-time span video frame, so that the reference information of the video frame is lost.

The technical scheme of the invention is as follows:

The invention relates to a video super-resolution method based on 3D convolution, which comprises the following steps: s1, grouping video frames: reading video frames from a video file and grouping the video frames; S2.3D convolution calculation: constructing a feature extraction algorithm model based on the 3D convolutional neural network; s3, constructing a super-resolution algorithm model based on a countermeasure generation network (GAN).

Optionally, in the above method for video super resolution based on 3D convolution, in step S1, a general codec software tool is used to read a video file, decode a video frame into a general array matrix, and store the general array matrix in a memory in sequence; selecting a certain frame of the video frames, and dividing m frames of videos adjacent to each other in front of and behind the video frames into n groups in sequence; the video frames of each group are spliced to serve as input data of each level 3D convolution.

Optionally, in the above method for video super-resolution based on 3D convolution, in step S2, a multi-level, multi-input 3D convolution neural network model is constructed, and a video frame before super-division is input first, and a set of feature maps are obtained through the convolution neural network; inputting a group of video frames and a feature map of a previous stage in each level, and outputting a group of feature maps; and after the multi-level output, obtaining a group of characteristic diagrams finally output by the 3D convolution algorithm model.

Optionally, in the above method for video super-resolution based on 3D convolution, in step S3, an up-sampling algorithm model is constructed based on a countermeasure generation network (GAN), so as to implement a super-resolution generation algorithm, and the super-resolution generation algorithm inputs a set of feature graphs finally output by 3D convolution calculation, and outputs a video frame after current super-division.

Optionally, in the method for video super-resolution based on 3D convolution, the super-resolution algorithm model includes: the system comprises a CNN convolution-based super-division front video frame input network, a 3D convolution-based feature extraction network and a GAN countermeasure generation network-based super-resolution improvement network.

According to the technical scheme of the invention, the beneficial effects are that:

According to the video super-resolution method based on the 3D convolution, the dependence relation between the image frames is extracted by utilizing the 3D convolution according to the correlation between the video frames, and the multi-level image frame grouping input is adopted in combination with the generation capacity of a countermeasure generation network (GAN), so that the problem that the dependence relation characteristic extraction of the video frames cannot be carried out for a long time span by the traditional method to cause the deficiency of the reference information of the video frames is avoided, and finally, the super-division function of the dependence relation characteristic extraction of the video frames for a long time span is realized.

For a better understanding and explanation of the conception, working principle and inventive effect of the present invention, the present invention is described in detail below by way of specific examples with reference to the accompanying drawings, in which:

drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below.

FIG. 1 is a flow chart of a method of 3D convolution-based video super-resolution of the present invention;

FIG. 2 is a schematic diagram of a model of a hyperspectral algorithm involved in the method of the present invention;

fig. 3 is a schematic diagram of a super-resolution algorithm model involved in the method of the present invention.

Detailed Description

The present invention will be described in further detail with reference to the accompanying drawings and examples, in order to make the objects, technical methods and advantages of the present invention more apparent. These examples are illustrative only and are not limiting of the invention.

According to the video super-resolution method based on the 3D convolution, video frames are grouped to serve as input data, correlation among the video frames is extracted by means of multi-level 3D convolution, and the generation capacity of a GAN countermeasure generation network is combined, so that a video super-resolution function of long-span video frame dependency characteristic extraction is achieved. Specifically, a multi-level 3D convolution mode is adopted, video frame groups are used as input data, and 3D convolution is sequentially used for feature extraction (namely, correlation among video frames is extracted), so that the relation between a video image of a certain frame and other video frames before super division is obtained. According to the principle that the adjacent video frames are more closely related, the multi-layer group is input, so that unnecessary feature extraction among video frames can be reduced, the calculated amount is reduced, and finally, the video frames with long time span can be input, and the feature extraction of the dependency relationship of the video frames can be carried out.

The invention has the working principle that the correlation of video frames is extracted by utilizing 3D convolution, and the connection is established between the video frames, thereby realizing the function of extracting the dependency characteristic of the video frames with long time span; the low resolution feature map is generated as a high resolution image using the generation characteristics of the countermeasure generation network GAN. The method of the invention is characterized in that firstly, the characteristics are extracted, then, the characteristics are used as input data of an countermeasure generation network, the countermeasure generation network can be designed to be up-sampled or down-sampled, the resolution is improved instead of being reduced, so that the countermeasure generation network is designed to be up-sampled, and the resolution improvement function is realized.

As shown in fig. 1, the method for video super-resolution based on 3D convolution of the present invention comprises the following steps:

s1, grouping video frames: video frames are read from the video file and grouped. And preprocessing the data structure of the data stream according to the data structure requirement of the algorithm model.

In the step, a universal coding and decoding software tool is used for reading a video file, decoding a video frame into a universal array matrix, and sequentially storing the universal array matrix in a memory; selecting a certain frame of a video frame, and dividing m frames of videos adjacent to the video frame in front and behind into n groups in sequence; the video frames of each group are spliced in a general mode to be used as input data of 3D convolution of each level.

S2.3D convolution calculation: and constructing a feature extraction algorithm model based on the 3D convolutional neural network.

Constructing a multi-level and multi-input 3D convolutional neural network model, firstly inputting a video frame before super-division, and obtaining a group of characteristic diagrams through a Convolutional Neural Network (CNN); inputting a group of video frames and a feature map of a previous stage in each level, and outputting a group of feature maps; after the multi-level output, a set of feature graphs of the final output of the 3D convolution algorithm model (i.e., the feature extraction algorithm model) is obtained.

As shown in fig. 2, ① inputs a certain frame of the prepared video before superdivision (i.e., the video frame before superdivision in fig. 2) into a general convolutional neural network (CNN convolutional), and outputs a feature map C through neural network feature extraction. ② The method comprises the steps that a 3D convolutional neural network (3D convolutional 1) of a1 st level takes a feature map C and a1 st group of video frames (video frame group 1) as input data, and a group of feature maps 1 are obtained after feature extraction is carried out through the 3D convolutional neural network of the 1 st level; ③ The 3D convolutional neural network (3D convolutional 2) of the 2 nd level takes the feature map 1 and the 2 nd group of video frames (video frame group 2) as input data, and a group of feature maps 2 are obtained after feature extraction is carried out through the 3D convolutional neural network of the 2 nd level; ④ And so on, the final output feature map n will be the input data to the GAN antagonism generation network.

FIG. 2 is a schematic diagram of the model of the hyper-algorithm of the present invention. The algorithm model consists of three major parts: the method comprises the steps of inputting a super-division front video frame based on CNN convolution into a network, extracting a characteristic based on 3D convolution, and improving the super-resolution of a network based on GAN countermeasure generation.

Gan challenge generation (challenge generation network): a super-resolution algorithm model is constructed based on a countermeasure generation network (GAN).

Based on the countermeasure generation network (GAN), an up-sampling algorithm model (i.e., a super-resolution algorithm model) is constructed to implement a super-resolution generation algorithm, which inputs a set of feature graphs finally output by the 3D convolution calculation, and outputs a video frame after current super-division.

As shown in fig. 3, a schematic diagram of the super-resolution algorithm model of the present invention is shown. In step S2, a feature map n is obtained, which describes the dependency relationship between the current video frame and other video frames, and the information carried by the video frames. The feature map n is used as input data of a resolution enhancement network (GAN countermeasure generation network), and the resolution enhancement is realized through the countermeasure generation network, so that a video frame with high resolution is generated and output. The video frame is the final output super-divided video frame. Thus, the realization of the algorithm model of the video super-resolution based on the 3D convolution is completed.

The method utilizes the correlation between video frames and combines the feature extraction capability of 3D convolution on the video frames to design a brand new video super-resolution algorithm model, supports the feature extraction of the video frame dependency relationship of long time span, and particularly designs an algorithm model based on the 3D convolution dependency relationship feature extraction based on the 3D convolution to realize a multi-layer and multi-input video frame dependency relationship feature extraction algorithm so as to realize a video super-resolution method. The method supports the input of a plurality of groups of video frames in a multi-level mode, combines the generation capacity of GAN against a neural network, and finally achieves the super-division function of providing information references on long-time span video frames.

The above description is of the best mode of carrying out the inventive concept and principles of operation. The above examples should not be construed as limiting the scope of the claims, but other embodiments and combinations of implementations according to the inventive concept are within the scope of the invention.

Claims

1. A method for video super-resolution based on 3D convolution, comprising the steps of:

s1, grouping video frames: reading video frames from a video file and grouping the video frames;

S2.3D convolution calculation: constructing a feature extraction algorithm model based on the 3D convolutional neural network;

s3, constructing a super-resolution algorithm model based on a countermeasure generation network (GAN);

In step S1, using a general codec software tool, reading the video file, decoding the video frame into a general array matrix, and sequentially storing the array matrix in a memory; selecting a certain frame of a video frame, and dividing m frames of videos adjacent to the video frame in front and behind into n groups in sequence; splicing the video frames of each group to serve as input data of 3D convolution of each level;

In step S2, a multi-level and multi-input 3D convolutional neural network model is constructed, a video frame before super-division is input first, and a group of feature images are obtained through the convolutional neural network; inputting a group of video frames and a feature map of a previous stage in each level, and outputting a group of feature maps; after the multi-level output, a group of feature graphs which are finally output by the 3D convolution algorithm model are obtained;

in step S3, an up-sampling algorithm model is constructed based on the countermeasure generation network (GAN) to implement a super-resolution generation algorithm, and the super-resolution generation algorithm inputs a set of feature graphs finally output by 3D convolution calculation, and outputs a video frame after current super-division;

The super-resolution algorithm model comprises: the system comprises a CNN convolution-based super-division front video frame input network, a 3D convolution-based feature extraction network and a GAN countermeasure generation network-based super-resolution improvement network.