[go: up one dir, main page]

CN115797178B - Video super-resolution method based on 3D convolution - Google Patents

Video super-resolution method based on 3D convolution Download PDF

Info

Publication number
CN115797178B
CN115797178B CN202211556262.7A CN202211556262A CN115797178B CN 115797178 B CN115797178 B CN 115797178B CN 202211556262 A CN202211556262 A CN 202211556262A CN 115797178 B CN115797178 B CN 115797178B
Authority
CN
China
Prior art keywords
video
super
convolution
resolution
network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211556262.7A
Other languages
Chinese (zh)
Other versions
CN115797178A (en
Inventor
魏文应
张伟民
安欣赏
肖铁军
张世雄
龙仕强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Bohua Ultra Hd Innovation Center Co ltd
Original Assignee
Guangdong Bohua Ultra Hd Innovation Center Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Bohua Ultra Hd Innovation Center Co ltd filed Critical Guangdong Bohua Ultra Hd Innovation Center Co ltd
Priority to CN202211556262.7A priority Critical patent/CN115797178B/en
Publication of CN115797178A publication Critical patent/CN115797178A/en
Application granted granted Critical
Publication of CN115797178B publication Critical patent/CN115797178B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Image Analysis (AREA)

Abstract

The invention provides a video super-resolution method based on 3D convolution, which solves the problems that in the existing method, the method comprises the following steps: s1, grouping video frames: reading video frames from a video file and grouping the video frames; S2.3D convolution calculation: constructing a feature extraction algorithm model based on the 3D convolutional neural network; s3, constructing a super-resolution algorithm model based on a countermeasure generation network (GAN). The method disclosed by the invention solves the problem that the reference information of the video frame is lost because the dependency characteristic extraction of the video frame with long time span cannot be performed by the super-resolution algorithm.

Description

Video super-resolution method based on 3D convolution
Technical Field
The invention relates to the field of computer vision, in particular to a video super-resolution method based on 3D convolution.
Background
With the development of science and technology, low-resolution electronic displays are gradually replaced by ultra-high-definition displays such as 2K/4K display screens, and the demands of consumers on ultra-high-definition video film sources are also increasing. However, old movies have no ultra-high definition film source because of the problems of lag of early shooting equipment and the like, and the watching experience is seriously affected. Meanwhile, a video super-resolution algorithm based on a depth neural network is widely applied to the technical application of processing standard definition video into ultra-high definition video, and great progress is made. In the existing method, video is input into a neural network after being decoded, the video with the resolution ratio of 1920 x 1080 is input, the time span of 100 frames of pictures is longer, the memory required after decoding is as high as 100 x 1920 x 1080 x 3 x 64 approximately equal to 4.6GB, the existing single-level deep neural network algorithm model is difficult to process large-scale data, and finally, the algorithm cannot establish the interdependence relation of long-time span image frames, so that the reference information of the video frames is lost.
The difficulty in solving the problems is as follows: in the prior art, a simple convolution calculation is generally performed on a video frame, or an antagonism generation network is combined with an optical flow algorithm to perform feature extraction of a dependency relationship of an image frame, and the extracted feature is used as reference information of a current video frame, but in super-resolution application, the data volume of the video frame is large, so that it is difficult to establish the image frame with a long time span.
Performing super-division on a certain frame of the video, wherein the more the input original information is, the more the super-divided picture is close to the real situation; the general coding characteristic of the video, more than ten frames, even hundreds of frames before and after the video, are strongly related to the current frame; the amount of data after video decoding is quite large. The meaning of solving the problems is that: the multi-level 3D convolution can batch process data and input the data into the neural network in batches, particularly a plurality of shallow neural networks in batches, and the multi-level 3D convolution neural network can rapidly reduce the data while extracting information characteristics; the calculated amount of the countermeasure generation network is determined by the number, the width and the depth of the channels, and after the data amount is reduced, the number and the width of the channels are reduced, and the depth is unchanged at the moment, so that the calculated amount of the countermeasure generation network is rapidly reduced along with the sharp reduction of the input data amount, and the problems that the input data amount which is difficult to solve by the current single-layer level neural network is overlarge, and the calculated amount is huge and difficult to calculate are solved.
Disclosure of Invention
The invention provides a video super-resolution method based on 3D convolution, which solves the problem that in the existing method, the super-resolution algorithm cannot extract the dependency characteristic of a long-time span video frame, so that the reference information of the video frame is lost.
The technical scheme of the invention is as follows:
The invention relates to a video super-resolution method based on 3D convolution, which comprises the following steps: s1, grouping video frames: reading video frames from a video file and grouping the video frames; S2.3D convolution calculation: constructing a feature extraction algorithm model based on the 3D convolutional neural network; s3, constructing a super-resolution algorithm model based on a countermeasure generation network (GAN).
Optionally, in the above method for video super resolution based on 3D convolution, in step S1, a general codec software tool is used to read a video file, decode a video frame into a general array matrix, and store the general array matrix in a memory in sequence; selecting a certain frame of the video frames, and dividing m frames of videos adjacent to each other in front of and behind the video frames into n groups in sequence; the video frames of each group are spliced to serve as input data of each level 3D convolution.
Optionally, in the above method for video super-resolution based on 3D convolution, in step S2, a multi-level, multi-input 3D convolution neural network model is constructed, and a video frame before super-division is input first, and a set of feature maps are obtained through the convolution neural network; inputting a group of video frames and a feature map of a previous stage in each level, and outputting a group of feature maps; and after the multi-level output, obtaining a group of characteristic diagrams finally output by the 3D convolution algorithm model.
Optionally, in the above method for video super-resolution based on 3D convolution, in step S3, an up-sampling algorithm model is constructed based on a countermeasure generation network (GAN), so as to implement a super-resolution generation algorithm, and the super-resolution generation algorithm inputs a set of feature graphs finally output by 3D convolution calculation, and outputs a video frame after current super-division.
Optionally, in the method for video super-resolution based on 3D convolution, the super-resolution algorithm model includes: the system comprises a CNN convolution-based super-division front video frame input network, a 3D convolution-based feature extraction network and a GAN countermeasure generation network-based super-resolution improvement network.
According to the technical scheme of the invention, the beneficial effects are that:
According to the video super-resolution method based on the 3D convolution, the dependence relation between the image frames is extracted by utilizing the 3D convolution according to the correlation between the video frames, and the multi-level image frame grouping input is adopted in combination with the generation capacity of a countermeasure generation network (GAN), so that the problem that the dependence relation characteristic extraction of the video frames cannot be carried out for a long time span by the traditional method to cause the deficiency of the reference information of the video frames is avoided, and finally, the super-division function of the dependence relation characteristic extraction of the video frames for a long time span is realized.
For a better understanding and explanation of the conception, working principle and inventive effect of the present invention, the present invention is described in detail below by way of specific examples with reference to the accompanying drawings, in which:
drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below.
FIG. 1 is a flow chart of a method of 3D convolution-based video super-resolution of the present invention;
FIG. 2 is a schematic diagram of a model of a hyperspectral algorithm involved in the method of the present invention;
fig. 3 is a schematic diagram of a super-resolution algorithm model involved in the method of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings and examples, in order to make the objects, technical methods and advantages of the present invention more apparent. These examples are illustrative only and are not limiting of the invention.
According to the video super-resolution method based on the 3D convolution, video frames are grouped to serve as input data, correlation among the video frames is extracted by means of multi-level 3D convolution, and the generation capacity of a GAN countermeasure generation network is combined, so that a video super-resolution function of long-span video frame dependency characteristic extraction is achieved. Specifically, a multi-level 3D convolution mode is adopted, video frame groups are used as input data, and 3D convolution is sequentially used for feature extraction (namely, correlation among video frames is extracted), so that the relation between a video image of a certain frame and other video frames before super division is obtained. According to the principle that the adjacent video frames are more closely related, the multi-layer group is input, so that unnecessary feature extraction among video frames can be reduced, the calculated amount is reduced, and finally, the video frames with long time span can be input, and the feature extraction of the dependency relationship of the video frames can be carried out.
The invention has the working principle that the correlation of video frames is extracted by utilizing 3D convolution, and the connection is established between the video frames, thereby realizing the function of extracting the dependency characteristic of the video frames with long time span; the low resolution feature map is generated as a high resolution image using the generation characteristics of the countermeasure generation network GAN. The method of the invention is characterized in that firstly, the characteristics are extracted, then, the characteristics are used as input data of an countermeasure generation network, the countermeasure generation network can be designed to be up-sampled or down-sampled, the resolution is improved instead of being reduced, so that the countermeasure generation network is designed to be up-sampled, and the resolution improvement function is realized.
As shown in fig. 1, the method for video super-resolution based on 3D convolution of the present invention comprises the following steps:
s1, grouping video frames: video frames are read from the video file and grouped. And preprocessing the data structure of the data stream according to the data structure requirement of the algorithm model.
In the step, a universal coding and decoding software tool is used for reading a video file, decoding a video frame into a universal array matrix, and sequentially storing the universal array matrix in a memory; selecting a certain frame of a video frame, and dividing m frames of videos adjacent to the video frame in front and behind into n groups in sequence; the video frames of each group are spliced in a general mode to be used as input data of 3D convolution of each level.
S2.3D convolution calculation: and constructing a feature extraction algorithm model based on the 3D convolutional neural network.
Constructing a multi-level and multi-input 3D convolutional neural network model, firstly inputting a video frame before super-division, and obtaining a group of characteristic diagrams through a Convolutional Neural Network (CNN); inputting a group of video frames and a feature map of a previous stage in each level, and outputting a group of feature maps; after the multi-level output, a set of feature graphs of the final output of the 3D convolution algorithm model (i.e., the feature extraction algorithm model) is obtained.
As shown in fig. 2, ① inputs a certain frame of the prepared video before superdivision (i.e., the video frame before superdivision in fig. 2) into a general convolutional neural network (CNN convolutional), and outputs a feature map C through neural network feature extraction. ② The method comprises the steps that a 3D convolutional neural network (3D convolutional 1) of a1 st level takes a feature map C and a1 st group of video frames (video frame group 1) as input data, and a group of feature maps 1 are obtained after feature extraction is carried out through the 3D convolutional neural network of the 1 st level; ③ The 3D convolutional neural network (3D convolutional 2) of the 2 nd level takes the feature map 1 and the 2 nd group of video frames (video frame group 2) as input data, and a group of feature maps 2 are obtained after feature extraction is carried out through the 3D convolutional neural network of the 2 nd level; ④ And so on, the final output feature map n will be the input data to the GAN antagonism generation network.
FIG. 2 is a schematic diagram of the model of the hyper-algorithm of the present invention. The algorithm model consists of three major parts: the method comprises the steps of inputting a super-division front video frame based on CNN convolution into a network, extracting a characteristic based on 3D convolution, and improving the super-resolution of a network based on GAN countermeasure generation.
Gan challenge generation (challenge generation network): a super-resolution algorithm model is constructed based on a countermeasure generation network (GAN).
Based on the countermeasure generation network (GAN), an up-sampling algorithm model (i.e., a super-resolution algorithm model) is constructed to implement a super-resolution generation algorithm, which inputs a set of feature graphs finally output by the 3D convolution calculation, and outputs a video frame after current super-division.
As shown in fig. 3, a schematic diagram of the super-resolution algorithm model of the present invention is shown. In step S2, a feature map n is obtained, which describes the dependency relationship between the current video frame and other video frames, and the information carried by the video frames. The feature map n is used as input data of a resolution enhancement network (GAN countermeasure generation network), and the resolution enhancement is realized through the countermeasure generation network, so that a video frame with high resolution is generated and output. The video frame is the final output super-divided video frame. Thus, the realization of the algorithm model of the video super-resolution based on the 3D convolution is completed.
The method utilizes the correlation between video frames and combines the feature extraction capability of 3D convolution on the video frames to design a brand new video super-resolution algorithm model, supports the feature extraction of the video frame dependency relationship of long time span, and particularly designs an algorithm model based on the 3D convolution dependency relationship feature extraction based on the 3D convolution to realize a multi-layer and multi-input video frame dependency relationship feature extraction algorithm so as to realize a video super-resolution method. The method supports the input of a plurality of groups of video frames in a multi-level mode, combines the generation capacity of GAN against a neural network, and finally achieves the super-division function of providing information references on long-time span video frames.
The above description is of the best mode of carrying out the inventive concept and principles of operation. The above examples should not be construed as limiting the scope of the claims, but other embodiments and combinations of implementations according to the inventive concept are within the scope of the invention.

Claims (1)

1. A method for video super-resolution based on 3D convolution, comprising the steps of:
s1, grouping video frames: reading video frames from a video file and grouping the video frames;
S2.3D convolution calculation: constructing a feature extraction algorithm model based on the 3D convolutional neural network;
s3, constructing a super-resolution algorithm model based on a countermeasure generation network (GAN);
In step S1, using a general codec software tool, reading the video file, decoding the video frame into a general array matrix, and sequentially storing the array matrix in a memory; selecting a certain frame of a video frame, and dividing m frames of videos adjacent to the video frame in front and behind into n groups in sequence; splicing the video frames of each group to serve as input data of 3D convolution of each level;
In step S2, a multi-level and multi-input 3D convolutional neural network model is constructed, a video frame before super-division is input first, and a group of feature images are obtained through the convolutional neural network; inputting a group of video frames and a feature map of a previous stage in each level, and outputting a group of feature maps; after the multi-level output, a group of feature graphs which are finally output by the 3D convolution algorithm model are obtained;
in step S3, an up-sampling algorithm model is constructed based on the countermeasure generation network (GAN) to implement a super-resolution generation algorithm, and the super-resolution generation algorithm inputs a set of feature graphs finally output by 3D convolution calculation, and outputs a video frame after current super-division;
The super-resolution algorithm model comprises: the system comprises a CNN convolution-based super-division front video frame input network, a 3D convolution-based feature extraction network and a GAN countermeasure generation network-based super-resolution improvement network.
CN202211556262.7A 2022-12-06 2022-12-06 Video super-resolution method based on 3D convolution Active CN115797178B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211556262.7A CN115797178B (en) 2022-12-06 2022-12-06 Video super-resolution method based on 3D convolution

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211556262.7A CN115797178B (en) 2022-12-06 2022-12-06 Video super-resolution method based on 3D convolution

Publications (2)

Publication Number Publication Date
CN115797178A CN115797178A (en) 2023-03-14
CN115797178B true CN115797178B (en) 2024-10-18

Family

ID=85417383

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211556262.7A Active CN115797178B (en) 2022-12-06 2022-12-06 Video super-resolution method based on 3D convolution

Country Status (1)

Country Link
CN (1) CN115797178B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116366861A (en) * 2023-03-30 2023-06-30 广东博华超高清创新中心有限公司 Video super-resolution method based on self-coding

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113409190A (en) * 2021-05-14 2021-09-17 广东工业大学 Video super-resolution method based on multi-frame grouping and feedback network
US11270124B1 (en) * 2020-11-16 2022-03-08 Branded Entertainment Network, Inc. Temporal bottleneck attention architecture for video action recognition

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110827200B (en) * 2019-11-04 2023-04-07 Oppo广东移动通信有限公司 Image super-resolution reconstruction method, image super-resolution reconstruction device and mobile terminal

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11270124B1 (en) * 2020-11-16 2022-03-08 Branded Entertainment Network, Inc. Temporal bottleneck attention architecture for video action recognition
CN113409190A (en) * 2021-05-14 2021-09-17 广东工业大学 Video super-resolution method based on multi-frame grouping and feedback network

Also Published As

Publication number Publication date
CN115797178A (en) 2023-03-14

Similar Documents

Publication Publication Date Title
CN114092330B (en) Light-weight multi-scale infrared image super-resolution reconstruction method
CN108986050A (en) A kind of image and video enhancement method based on multiple-limb convolutional neural networks
CN117173024B (en) Mine image super-resolution reconstruction system and method based on overall attention
CN110349087B (en) RGB-D image high-quality grid generation method based on adaptive convolution
CN111985281B (en) Image generation model generation method and device and image generation method and device
CN113379606B (en) Face super-resolution method based on pre-training generation model
CN117745541A (en) Image super-resolution reconstruction method based on lightweight mixed attention network
CN117952830B (en) A stereo image super-resolution reconstruction method based on iterative interactive guidance
CN118333898B (en) Image defogging method and system based on improved generation countermeasure network
CN115797178B (en) Video super-resolution method based on 3D convolution
CN116309067B (en) Light field image space super-resolution method
CN119048356A (en) Video resolution enhancement system and method based on pre-training video generation model
CN115713462A (en) Super-resolution model training method, image recognition method, device and equipment
CN115272082A (en) Model training, video quality improvement method, device and computer equipment
Gao et al. Multi-branch aware module with channel shuffle pixel-wise attention for lightweight image super-resolution
Lin et al. Deep and adaptive feature extraction attention network for single image super‐resolution
CN118657831A (en) Absolute pose regression method based on cascaded attention module
CN118195899A (en) A lightweight hybrid attention distillation network based image super-resolution model
Pang et al. Video super-resolution using a hierarchical recurrent multireceptive-field integration network
US20230186608A1 (en) Method, device, and computer program product for video processing
CN114663306B (en) Video bit depth enhancement method and device based on pyramid multi-level information fusion
Zhang et al. CVIformer: Cross-View Interactive Transformer for Efficient Stereoscopic Image Super-Resolution
Zhao et al. A GPU-Enabled Framework for Light Field Efficient Compression and Real-Time Rendering
Chen et al. A review of super resolution based on deep learning
Zhou et al. Real-world image super-resolution via spatio-temporal correlation network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant