CN115797178B - Video super-resolution method based on 3D convolution - Google Patents
Video super-resolution method based on 3D convolution Download PDFInfo
- Publication number
- CN115797178B CN115797178B CN202211556262.7A CN202211556262A CN115797178B CN 115797178 B CN115797178 B CN 115797178B CN 202211556262 A CN202211556262 A CN 202211556262A CN 115797178 B CN115797178 B CN 115797178B
- Authority
- CN
- China
- Prior art keywords
- video
- super
- convolution
- resolution
- network
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 31
- 238000000605 extraction Methods 0.000 claims abstract description 22
- 238000013527 convolutional neural network Methods 0.000 claims abstract description 18
- 238000004364 calculation method Methods 0.000 claims abstract description 8
- 239000011159 matrix material Substances 0.000 claims description 6
- 230000006872 improvement Effects 0.000 claims description 3
- 238000005070 sampling Methods 0.000 claims description 3
- 238000013528 artificial neural network Methods 0.000 description 10
- 238000010586 diagram Methods 0.000 description 6
- 230000006870 function Effects 0.000 description 5
- 230000008485 antagonism Effects 0.000 description 2
- 239000010410 layer Substances 0.000 description 2
- 101150064138 MAP1 gene Proteins 0.000 description 1
- 238000010923 batch production Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000003062 neural network model Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 239000002356 single layer Substances 0.000 description 1
Classifications
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Image Analysis (AREA)
Abstract
The invention provides a video super-resolution method based on 3D convolution, which solves the problems that in the existing method, the method comprises the following steps: s1, grouping video frames: reading video frames from a video file and grouping the video frames; S2.3D convolution calculation: constructing a feature extraction algorithm model based on the 3D convolutional neural network; s3, constructing a super-resolution algorithm model based on a countermeasure generation network (GAN). The method disclosed by the invention solves the problem that the reference information of the video frame is lost because the dependency characteristic extraction of the video frame with long time span cannot be performed by the super-resolution algorithm.
Description
Technical Field
The invention relates to the field of computer vision, in particular to a video super-resolution method based on 3D convolution.
Background
With the development of science and technology, low-resolution electronic displays are gradually replaced by ultra-high-definition displays such as 2K/4K display screens, and the demands of consumers on ultra-high-definition video film sources are also increasing. However, old movies have no ultra-high definition film source because of the problems of lag of early shooting equipment and the like, and the watching experience is seriously affected. Meanwhile, a video super-resolution algorithm based on a depth neural network is widely applied to the technical application of processing standard definition video into ultra-high definition video, and great progress is made. In the existing method, video is input into a neural network after being decoded, the video with the resolution ratio of 1920 x 1080 is input, the time span of 100 frames of pictures is longer, the memory required after decoding is as high as 100 x 1920 x 1080 x 3 x 64 approximately equal to 4.6GB, the existing single-level deep neural network algorithm model is difficult to process large-scale data, and finally, the algorithm cannot establish the interdependence relation of long-time span image frames, so that the reference information of the video frames is lost.
The difficulty in solving the problems is as follows: in the prior art, a simple convolution calculation is generally performed on a video frame, or an antagonism generation network is combined with an optical flow algorithm to perform feature extraction of a dependency relationship of an image frame, and the extracted feature is used as reference information of a current video frame, but in super-resolution application, the data volume of the video frame is large, so that it is difficult to establish the image frame with a long time span.
Performing super-division on a certain frame of the video, wherein the more the input original information is, the more the super-divided picture is close to the real situation; the general coding characteristic of the video, more than ten frames, even hundreds of frames before and after the video, are strongly related to the current frame; the amount of data after video decoding is quite large. The meaning of solving the problems is that: the multi-level 3D convolution can batch process data and input the data into the neural network in batches, particularly a plurality of shallow neural networks in batches, and the multi-level 3D convolution neural network can rapidly reduce the data while extracting information characteristics; the calculated amount of the countermeasure generation network is determined by the number, the width and the depth of the channels, and after the data amount is reduced, the number and the width of the channels are reduced, and the depth is unchanged at the moment, so that the calculated amount of the countermeasure generation network is rapidly reduced along with the sharp reduction of the input data amount, and the problems that the input data amount which is difficult to solve by the current single-layer level neural network is overlarge, and the calculated amount is huge and difficult to calculate are solved.
Disclosure of Invention
The invention provides a video super-resolution method based on 3D convolution, which solves the problem that in the existing method, the super-resolution algorithm cannot extract the dependency characteristic of a long-time span video frame, so that the reference information of the video frame is lost.
The technical scheme of the invention is as follows:
The invention relates to a video super-resolution method based on 3D convolution, which comprises the following steps: s1, grouping video frames: reading video frames from a video file and grouping the video frames; S2.3D convolution calculation: constructing a feature extraction algorithm model based on the 3D convolutional neural network; s3, constructing a super-resolution algorithm model based on a countermeasure generation network (GAN).
Optionally, in the above method for video super resolution based on 3D convolution, in step S1, a general codec software tool is used to read a video file, decode a video frame into a general array matrix, and store the general array matrix in a memory in sequence; selecting a certain frame of the video frames, and dividing m frames of videos adjacent to each other in front of and behind the video frames into n groups in sequence; the video frames of each group are spliced to serve as input data of each level 3D convolution.
Optionally, in the above method for video super-resolution based on 3D convolution, in step S2, a multi-level, multi-input 3D convolution neural network model is constructed, and a video frame before super-division is input first, and a set of feature maps are obtained through the convolution neural network; inputting a group of video frames and a feature map of a previous stage in each level, and outputting a group of feature maps; and after the multi-level output, obtaining a group of characteristic diagrams finally output by the 3D convolution algorithm model.
Optionally, in the above method for video super-resolution based on 3D convolution, in step S3, an up-sampling algorithm model is constructed based on a countermeasure generation network (GAN), so as to implement a super-resolution generation algorithm, and the super-resolution generation algorithm inputs a set of feature graphs finally output by 3D convolution calculation, and outputs a video frame after current super-division.
Optionally, in the method for video super-resolution based on 3D convolution, the super-resolution algorithm model includes: the system comprises a CNN convolution-based super-division front video frame input network, a 3D convolution-based feature extraction network and a GAN countermeasure generation network-based super-resolution improvement network.
According to the technical scheme of the invention, the beneficial effects are that:
According to the video super-resolution method based on the 3D convolution, the dependence relation between the image frames is extracted by utilizing the 3D convolution according to the correlation between the video frames, and the multi-level image frame grouping input is adopted in combination with the generation capacity of a countermeasure generation network (GAN), so that the problem that the dependence relation characteristic extraction of the video frames cannot be carried out for a long time span by the traditional method to cause the deficiency of the reference information of the video frames is avoided, and finally, the super-division function of the dependence relation characteristic extraction of the video frames for a long time span is realized.
For a better understanding and explanation of the conception, working principle and inventive effect of the present invention, the present invention is described in detail below by way of specific examples with reference to the accompanying drawings, in which:
drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below.
FIG. 1 is a flow chart of a method of 3D convolution-based video super-resolution of the present invention;
FIG. 2 is a schematic diagram of a model of a hyperspectral algorithm involved in the method of the present invention;
fig. 3 is a schematic diagram of a super-resolution algorithm model involved in the method of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings and examples, in order to make the objects, technical methods and advantages of the present invention more apparent. These examples are illustrative only and are not limiting of the invention.
According to the video super-resolution method based on the 3D convolution, video frames are grouped to serve as input data, correlation among the video frames is extracted by means of multi-level 3D convolution, and the generation capacity of a GAN countermeasure generation network is combined, so that a video super-resolution function of long-span video frame dependency characteristic extraction is achieved. Specifically, a multi-level 3D convolution mode is adopted, video frame groups are used as input data, and 3D convolution is sequentially used for feature extraction (namely, correlation among video frames is extracted), so that the relation between a video image of a certain frame and other video frames before super division is obtained. According to the principle that the adjacent video frames are more closely related, the multi-layer group is input, so that unnecessary feature extraction among video frames can be reduced, the calculated amount is reduced, and finally, the video frames with long time span can be input, and the feature extraction of the dependency relationship of the video frames can be carried out.
The invention has the working principle that the correlation of video frames is extracted by utilizing 3D convolution, and the connection is established between the video frames, thereby realizing the function of extracting the dependency characteristic of the video frames with long time span; the low resolution feature map is generated as a high resolution image using the generation characteristics of the countermeasure generation network GAN. The method of the invention is characterized in that firstly, the characteristics are extracted, then, the characteristics are used as input data of an countermeasure generation network, the countermeasure generation network can be designed to be up-sampled or down-sampled, the resolution is improved instead of being reduced, so that the countermeasure generation network is designed to be up-sampled, and the resolution improvement function is realized.
As shown in fig. 1, the method for video super-resolution based on 3D convolution of the present invention comprises the following steps:
s1, grouping video frames: video frames are read from the video file and grouped. And preprocessing the data structure of the data stream according to the data structure requirement of the algorithm model.
In the step, a universal coding and decoding software tool is used for reading a video file, decoding a video frame into a universal array matrix, and sequentially storing the universal array matrix in a memory; selecting a certain frame of a video frame, and dividing m frames of videos adjacent to the video frame in front and behind into n groups in sequence; the video frames of each group are spliced in a general mode to be used as input data of 3D convolution of each level.
S2.3D convolution calculation: and constructing a feature extraction algorithm model based on the 3D convolutional neural network.
Constructing a multi-level and multi-input 3D convolutional neural network model, firstly inputting a video frame before super-division, and obtaining a group of characteristic diagrams through a Convolutional Neural Network (CNN); inputting a group of video frames and a feature map of a previous stage in each level, and outputting a group of feature maps; after the multi-level output, a set of feature graphs of the final output of the 3D convolution algorithm model (i.e., the feature extraction algorithm model) is obtained.
As shown in fig. 2, ① inputs a certain frame of the prepared video before superdivision (i.e., the video frame before superdivision in fig. 2) into a general convolutional neural network (CNN convolutional), and outputs a feature map C through neural network feature extraction. ② The method comprises the steps that a 3D convolutional neural network (3D convolutional 1) of a1 st level takes a feature map C and a1 st group of video frames (video frame group 1) as input data, and a group of feature maps 1 are obtained after feature extraction is carried out through the 3D convolutional neural network of the 1 st level; ③ The 3D convolutional neural network (3D convolutional 2) of the 2 nd level takes the feature map 1 and the 2 nd group of video frames (video frame group 2) as input data, and a group of feature maps 2 are obtained after feature extraction is carried out through the 3D convolutional neural network of the 2 nd level; ④ And so on, the final output feature map n will be the input data to the GAN antagonism generation network.
FIG. 2 is a schematic diagram of the model of the hyper-algorithm of the present invention. The algorithm model consists of three major parts: the method comprises the steps of inputting a super-division front video frame based on CNN convolution into a network, extracting a characteristic based on 3D convolution, and improving the super-resolution of a network based on GAN countermeasure generation.
Gan challenge generation (challenge generation network): a super-resolution algorithm model is constructed based on a countermeasure generation network (GAN).
Based on the countermeasure generation network (GAN), an up-sampling algorithm model (i.e., a super-resolution algorithm model) is constructed to implement a super-resolution generation algorithm, which inputs a set of feature graphs finally output by the 3D convolution calculation, and outputs a video frame after current super-division.
As shown in fig. 3, a schematic diagram of the super-resolution algorithm model of the present invention is shown. In step S2, a feature map n is obtained, which describes the dependency relationship between the current video frame and other video frames, and the information carried by the video frames. The feature map n is used as input data of a resolution enhancement network (GAN countermeasure generation network), and the resolution enhancement is realized through the countermeasure generation network, so that a video frame with high resolution is generated and output. The video frame is the final output super-divided video frame. Thus, the realization of the algorithm model of the video super-resolution based on the 3D convolution is completed.
The method utilizes the correlation between video frames and combines the feature extraction capability of 3D convolution on the video frames to design a brand new video super-resolution algorithm model, supports the feature extraction of the video frame dependency relationship of long time span, and particularly designs an algorithm model based on the 3D convolution dependency relationship feature extraction based on the 3D convolution to realize a multi-layer and multi-input video frame dependency relationship feature extraction algorithm so as to realize a video super-resolution method. The method supports the input of a plurality of groups of video frames in a multi-level mode, combines the generation capacity of GAN against a neural network, and finally achieves the super-division function of providing information references on long-time span video frames.
The above description is of the best mode of carrying out the inventive concept and principles of operation. The above examples should not be construed as limiting the scope of the claims, but other embodiments and combinations of implementations according to the inventive concept are within the scope of the invention.
Claims (1)
1. A method for video super-resolution based on 3D convolution, comprising the steps of:
s1, grouping video frames: reading video frames from a video file and grouping the video frames;
S2.3D convolution calculation: constructing a feature extraction algorithm model based on the 3D convolutional neural network;
s3, constructing a super-resolution algorithm model based on a countermeasure generation network (GAN);
In step S1, using a general codec software tool, reading the video file, decoding the video frame into a general array matrix, and sequentially storing the array matrix in a memory; selecting a certain frame of a video frame, and dividing m frames of videos adjacent to the video frame in front and behind into n groups in sequence; splicing the video frames of each group to serve as input data of 3D convolution of each level;
In step S2, a multi-level and multi-input 3D convolutional neural network model is constructed, a video frame before super-division is input first, and a group of feature images are obtained through the convolutional neural network; inputting a group of video frames and a feature map of a previous stage in each level, and outputting a group of feature maps; after the multi-level output, a group of feature graphs which are finally output by the 3D convolution algorithm model are obtained;
in step S3, an up-sampling algorithm model is constructed based on the countermeasure generation network (GAN) to implement a super-resolution generation algorithm, and the super-resolution generation algorithm inputs a set of feature graphs finally output by 3D convolution calculation, and outputs a video frame after current super-division;
The super-resolution algorithm model comprises: the system comprises a CNN convolution-based super-division front video frame input network, a 3D convolution-based feature extraction network and a GAN countermeasure generation network-based super-resolution improvement network.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211556262.7A CN115797178B (en) | 2022-12-06 | 2022-12-06 | Video super-resolution method based on 3D convolution |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211556262.7A CN115797178B (en) | 2022-12-06 | 2022-12-06 | Video super-resolution method based on 3D convolution |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115797178A CN115797178A (en) | 2023-03-14 |
CN115797178B true CN115797178B (en) | 2024-10-18 |
Family
ID=85417383
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211556262.7A Active CN115797178B (en) | 2022-12-06 | 2022-12-06 | Video super-resolution method based on 3D convolution |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115797178B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116366861A (en) * | 2023-03-30 | 2023-06-30 | 广东博华超高清创新中心有限公司 | Video super-resolution method based on self-coding |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113409190A (en) * | 2021-05-14 | 2021-09-17 | 广东工业大学 | Video super-resolution method based on multi-frame grouping and feedback network |
US11270124B1 (en) * | 2020-11-16 | 2022-03-08 | Branded Entertainment Network, Inc. | Temporal bottleneck attention architecture for video action recognition |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110827200B (en) * | 2019-11-04 | 2023-04-07 | Oppo广东移动通信有限公司 | Image super-resolution reconstruction method, image super-resolution reconstruction device and mobile terminal |
-
2022
- 2022-12-06 CN CN202211556262.7A patent/CN115797178B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11270124B1 (en) * | 2020-11-16 | 2022-03-08 | Branded Entertainment Network, Inc. | Temporal bottleneck attention architecture for video action recognition |
CN113409190A (en) * | 2021-05-14 | 2021-09-17 | 广东工业大学 | Video super-resolution method based on multi-frame grouping and feedback network |
Also Published As
Publication number | Publication date |
---|---|
CN115797178A (en) | 2023-03-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN114092330B (en) | Light-weight multi-scale infrared image super-resolution reconstruction method | |
CN108986050A (en) | A kind of image and video enhancement method based on multiple-limb convolutional neural networks | |
CN117173024B (en) | Mine image super-resolution reconstruction system and method based on overall attention | |
CN110349087B (en) | RGB-D image high-quality grid generation method based on adaptive convolution | |
CN111985281B (en) | Image generation model generation method and device and image generation method and device | |
CN113379606B (en) | Face super-resolution method based on pre-training generation model | |
CN117745541A (en) | Image super-resolution reconstruction method based on lightweight mixed attention network | |
CN117952830B (en) | A stereo image super-resolution reconstruction method based on iterative interactive guidance | |
CN118333898B (en) | Image defogging method and system based on improved generation countermeasure network | |
CN115797178B (en) | Video super-resolution method based on 3D convolution | |
CN116309067B (en) | Light field image space super-resolution method | |
CN119048356A (en) | Video resolution enhancement system and method based on pre-training video generation model | |
CN115713462A (en) | Super-resolution model training method, image recognition method, device and equipment | |
CN115272082A (en) | Model training, video quality improvement method, device and computer equipment | |
Gao et al. | Multi-branch aware module with channel shuffle pixel-wise attention for lightweight image super-resolution | |
Lin et al. | Deep and adaptive feature extraction attention network for single image super‐resolution | |
CN118657831A (en) | Absolute pose regression method based on cascaded attention module | |
CN118195899A (en) | A lightweight hybrid attention distillation network based image super-resolution model | |
Pang et al. | Video super-resolution using a hierarchical recurrent multireceptive-field integration network | |
US20230186608A1 (en) | Method, device, and computer program product for video processing | |
CN114663306B (en) | Video bit depth enhancement method and device based on pyramid multi-level information fusion | |
Zhang et al. | CVIformer: Cross-View Interactive Transformer for Efficient Stereoscopic Image Super-Resolution | |
Zhao et al. | A GPU-Enabled Framework for Light Field Efficient Compression and Real-Time Rendering | |
Chen et al. | A review of super resolution based on deep learning | |
Zhou et al. | Real-world image super-resolution via spatio-temporal correlation network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |