CN114025168A

CN114025168A - Video image processing method, processing device, electronic device and storage medium

Info

Publication number: CN114025168A
Application number: CN202111166007.7A
Authority: CN
Inventors: 方诚; 张雪; 彭双; 江东; 林聚财; 殷俊
Original assignee: Zhejiang Dahua Technology Co Ltd
Current assignee: Zhejiang Dahua Technology Co Ltd
Priority date: 2021-09-30
Filing date: 2021-09-30
Publication date: 2022-02-08
Anticipated expiration: 2041-09-30
Also published as: CN114025168B

Abstract

The application discloses a video image processing method, processing equipment, electronic equipment and a storage medium, wherein the processing method comprises the following steps: utilizing a reference image resampling technology to carry out down-sampling on at least one of the brightness component, the first chrominance component and the second chrominance component of the video image to obtain a down-sampling component; coding, decoding and reconstructing the down-sampling component to obtain a down-sampling reconstruction component; and inputting the down-sampled reconstruction components into a super-sampling network for super-sampling so as to recover and obtain a video image. By means of the method, the down sampling of different strategies can be flexibly carried out on each component, so that the bit consumption is reduced, and the image quality is improved.

Description

Video image processing method, processing device, electronic device and storage medium

Technical Field

The present application relates to the field of video coding technologies, and in particular, to a method, a device, an electronic device, and a storage medium for processing a video image.

Background

Generally, the amount of video image data is large, and it is usually necessary to compress video pixel data (RGB, YUV, etc.), the compressed data is called a video code stream, and the video code stream is transmitted to a user end through a wired or wireless network, and then decoded and viewed.

At present, the whole video coding process comprises the processes of block division, prediction, transformation, quantization, coding and the like, and various filtering processes can be added subsequently to make the image look more natural. When a video image is processed, all image components of the video image are often down-sampled, so that the image components are greatly lost, and the down-sampled image is not super-sampled flexibly, so that the image quality after super-sampling is greatly influenced.

Disclosure of Invention

To solve the foregoing technical problem, a first aspect of the present application provides a method for processing a video image, where the method includes: utilizing a reference image resampling technology to carry out down-sampling on at least one of the brightness component, the first chrominance component and the second chrominance component of the video image to obtain a down-sampling component; coding, decoding and reconstructing the down-sampling component to obtain a down-sampling reconstruction component; and inputting the down-sampled reconstruction components into a super-sampling network for super-sampling so as to recover and obtain a video image.

In order to solve the above technical problem, a second aspect of the present application provides a processing device for processing a video image, including:

the down-sampling module is used for performing down-sampling on at least one of the brightness component, the first chrominance component and the second chrominance component of the video image by utilizing a reference image resampling technology to obtain a down-sampling component;

the coding and decoding reconstruction module is used for coding, decoding and reconstructing the down-sampling image to obtain a down-sampling reconstruction component;

and the super-sampling module is used for inputting the down-sampled reconstruction components into a super-sampling network for super-sampling so that the recovery module is used for recovering and obtaining the video image.

In order to solve the above technical problem, a third aspect of the present application provides an electronic device, including: a processor and a memory, the memory having stored therein a computer program, the processor being configured to execute the computer program to implement the processing method according to the first aspect of the application.

In order to solve the above technical problem, a fourth aspect of the present application provides a computer-readable storage medium, where a computer program is stored, and the computer program is capable of implementing the processing method of the first aspect of the present application when being executed by a processor.

The beneficial effect of this application is: according to the method and the device, at least one of the brightness component, the first chrominance component and the second chrominance component of the video image is down-sampled by utilizing a reference image resampling technology, the image component is appropriately selected and processed, the bit consumption degree is reduced, the image component keeps high image quality, the degree of processing the image component is diversified, and the restored video image is more natural and flexible in performance.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 is a schematic flow chart diagram illustrating an embodiment of a video image processing method according to the present application;

FIG. 2 is a schematic flow chart of the present application selecting scheme B for selectively downsampling image components of a video image;

FIG. 3 is a flowchart of an embodiment of the method of FIG. 1 including step S13;

FIG. 4 is a flowchart illustrating an embodiment of step S13 in FIG. 1;

FIG. 5 is a schematic flow chart of another embodiment of step S13 in FIG. 1;

FIG. 6 is a schematic diagram illustrating a further embodiment of step S13 in FIG. 1;

FIG. 7 is a schematic diagram of another embodiment of step S13 in FIG. 1;

FIG. 8 is a schematic diagram illustrating still another embodiment of step S13 in FIG. 1;

FIG. 9 is a block diagram schematically illustrating the structure of an embodiment of the processing apparatus of the present application;

FIG. 10 is a block diagram illustrating the structure of an embodiment of the electronic device of the present application;

FIG. 11 is a schematic block circuit diagram of an embodiment of a computer-readable storage medium of the present application.

Detailed Description

In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system structures, techniques, etc. in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.

It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It is also to be understood that the terminology used in the description of the present application herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in the specification of the present application and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.

It should be further understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.

As used in this specification and the appended claims, the term "if" may be interpreted contextually as "when", "upon" or "in response to a determination" or "in response to a detection". Similarly, the phrase "if it is determined" or "if a [ described condition or event ] is detected" may be interpreted contextually to mean "upon determining" or "in response to determining" or "upon detecting [ described condition or event ]" or "in response to detecting [ described condition or event ]".

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

For explaining the technical solution of the present application, a method for processing a video image provided by the present application is described below by using specific embodiments, please refer to fig. 1, where fig. 1 is a schematic flow diagram of an embodiment of the method for processing a video image, and the method specifically includes the following steps:

s11: utilizing a reference image resampling technology to carry out down-sampling on at least one of the brightness component, the first chrominance component and the second chrominance component of the video image to obtain a down-sampling component;

in video coding, the most common color coding methods are YUV, RGB, and the like, and the color coding method adopted in the present invention is YUV. Y represents brightness; u and V (i.e., Cb and Cr) represent chrominance, which is used to describe the color and saturation of an image (or to describe the blue channel Cb and the red channel Cr), and are defined herein as a first chrominance component and a second chrominance component, respectively, corresponding to the first chrominance component and the second chrominance component, respectively, below, in order to distinguish the two meanings that chrominance encompasses, wherein the first chrominance component may be a hue component and the second chrominance component may be a color saturation component. Each Y luma block corresponds to one Cb and one Cr chroma block, and each chroma block corresponds to only one luma block. Taking the sample format of 4:2:0 as an example, one block of N × M corresponds to a luminance block size of N × M, the two corresponding chrominance blocks are both (N/2) × (M/2), and the chrominance block is 1/4 sizes of the luminance block.

The whole video coding process comprises the processes of block division, prediction, transformation, quantization, coding and the like, wherein the block division comprises the following steps: in video encoding, although image frames are input, when encoding an image frame, it is necessary to divide the image frame into a plurality of LCUs (largest coding units) and then perform recursive CU (coding unit) division of different sizes for each coding unit, and video encoding is performed in units of CUs.

Encoding is the conversion of data into computer-understood numbers. The encoding method includes various arithmetic encoding and side length encoding, which are not described herein. The present application may employ a Reference Picture Resampling (RPR) technique. The RPR technique is to perform encoding transmission after down-sampling an image of an original size into a small image in order to save bit consumption for encoding when transmission conditions are not good.

Therefore, at least one of the luminance component, the first chrominance component and the second chrominance component of the video image can be down-sampled by using a reference image resampling technology before encoding to obtain down-sampled components, and the bit consumption degree in the transmission process can be reduced by performing appropriate selection processing on the image components, so that the image components can keep higher image quality.

S12: coding, decoding and reconstructing the down-sampling component to obtain a down-sampling reconstruction component;

considering the subsequent continuous processing of downsampling at least one of the luminance component, the first chrominance component and the second chrominance component of the video image, the super sampling network may have two output modes, one is that the image component input into the super sampling network is consistent with the output, and the other is that the image component output from the super sampling network is only the downsampled image component, and based on this, the encoding and decoding objects of the image component are different.

On one hand, if the brightness component, the first chrominance component and the second chrominance component are all subjected to down-sampling processing, and the image component of the input super-sampling network is consistent with the output, the down-sampling components are subjected to coding, decoding and reconstruction, and more comprehensive down-sampling reconstruction components can be obtained.

On the other hand, this step may further include: if one or both of the brightness component, the first chrominance component and the second chrominance component are subjected to down-sampling processing, and/or the image component of the output super-sampling network only has the down-sampled image component, the non-down-sampled reconstruction component and the down-sampled reconstruction component are respectively obtained by performing encoding, decoding and reconstruction on the non-down-sampled image component and the down-sampled reconstruction component.

That is, further, when one or two of the luminance component, the first chrominance component, and the second chrominance component of the video image are downsampled using the reference image resampling technique, the processing method further includes: and coding, decoding and reconstructing the non-down-sampled component to obtain a reconstructed component without down-sampling.

In this step, since the downsampling processing of the luminance component, the first chrominance component, and the second chrominance component of the video image is diversified, and the super-sampling network may have two output modes, this step is not limited to performing encoding, decoding, and reconstruction only on the downsampled components, and may also perform encoding, decoding, and reconstruction on the image components that are not downsampled, and this step is not limited here specifically.

S13: and inputting the down-sampled reconstruction components into a super-sampling network for super-sampling so as to recover and obtain a video image.

Supersampling (supersampling) is also referred to as upsampling, i.e., a small-sized image is transformed into a large-sized image by interpolation filtering or other computational methods, including convolutional neural networks. Specifically, the small image can be restored to the original size image by the up-sampling filter after the decoding end completes the decoding reconstruction of the small image.

Specifically, the downsampling of the luminance component, the first chrominance component, and the second chrominance component of the video image may be performed in multiple combination manners, and then the obtained reconstruction components may also be performed in multiple combination manners, and then the downsampled reconstruction components in the multiple combination manners may be input into a supersampling network for supersampling, so as to recover and obtain the video image.

Therefore, the method and the device have the advantages that at least one of the brightness component, the first chrominance component and the second chrominance component of the video image is down-sampled by utilizing the reference image resampling technology, the image component is subjected to proper selection processing, the bit consumption degree is reduced, the image component keeps high image quality, the degree of processing the image component is diversified, and the restored video image is more natural and flexible in expression.

And coding, decoding and reconstructing the image sampled by the RPR, and inputting the image into a super sampling network. The application strategy is divided into two steps of an RPR down-sampling strategy and an image input and output strategy of a super-sampling network.

The RPR downsampling policy, that is, downsampling at least one of a luminance component, a first chrominance component, and a second chrominance component of a video image by using a reference image resampling technique, to obtain the downsampled components specifically includes:

scheme A: and utilizing a reference image resampling technology to carry out down-sampling on the brightness component, the first chrominance component and the second chrominance component of the video image to obtain a down-sampled brightness component, a down-sampled first chrominance component and a down-sampled second chrominance component.

Or,

scheme B: and performing down-sampling on one or two of the luminance component, the first chrominance component and the second chrominance component of the video image by using a reference image resampling technology to obtain one or two of the down-sampled luminance component, the down-sampled first chrominance component and the down-sampled second chrominance component.

Specifically, referring to fig. 2, fig. 2 is a schematic flow chart of the present application, wherein an alternative B selectively downsamples image components of a video image, and only Y components are downsampled with a width and a height of 1/2, and U and V components are unchanged, so that downsampled Y components, downsampled U components and downsampled V components can be obtained for encoding.

In which, only one super-sampling network is used, all components of the same super-sampling network are input, and the image size must be processed to the same size. The network only inputs the image components which are decoded and reconstructed after RPR down-sampling, outputs the same image components, and finally combines the output super-sampling components with the image components which are not input into the network and decoded and reconstructed, thereby obtaining the video image.

In addition, please refer to fig. 3, fig. 3 is a schematic flowchart of an embodiment including step S13 in fig. 1, and step S21 is similar to step S11 in fig. 1 and is not repeated here. In this example, only one super sampling network is used, and in addition to inputting all the image components reconstructed after RPR down sampling, 1 or 2 more decoded reconstructed image components are input in the super sampling network. The step of inputting the down-sampled reconstruction components into a super-sampling network for super-sampling to recover and obtain the video image further comprises the following steps:

s22: coding, decoding and reconstructing the down-sampling component to obtain a down-sampling reconstruction component;

specifically, when one or both of the luminance component, the first chrominance component, and the second chrominance component of the video image are downsampled using a reference image resampling technique, one or both of the downsampled luminance component, the downsampled first chrominance component, and the downsampled second chrominance component are obtained.

One or both of the downsampled luma component, the downsampled first chroma component, and the downsampled second chroma component may be codec reconstructed to obtain a downsampled reconstructed component.

S23: coding, decoding and reconstructing the non-down-sampled component to obtain a reconstructed component which is not down-sampled;

specifically, when one or two of the luminance component, the first chrominance component and the second chrominance component of the video image are down-sampled by using the reference image resampling technique, at least one image component is not down-sampled, so that one or two non-down-sampled components can be obtained.

And coding, decoding and reconstructing the non-down-sampled component to obtain the reconstructed component without down-sampling. The steps S22 and S23 may be performed simultaneously, or the step S22 may be performed first, and then the step S23 is performed, which is not limited herein.

S24: based on the reconstruction components which are not down-sampled, inputting the down-sampled reconstruction components into a super-sampling network for joint super-sampling;

the image input and output strategy of the super-sampling network comprises two steps, and only one network model can be used for super-sampling or different network models can be used for super-sampling aiming at different components in the overall input and output mode of the network image. This section only discusses the input and output of the image, and the remaining side information (e.g., QP information) may or may not be input.

Based on the reconstructed components which are not down-sampled, the down-sampled reconstructed components are input into a super-sampling network for joint super-sampling, specifically, only one network, such as a convolutional neural network, is used, and in addition to all the image components which are reconstructed after being down-sampled by an RPR, 1 or 2 other image components which are decoded and reconstructed are input into the network for joint super-sampling.

It is noted that joint supersampling herein means that during supersampling, downsampled reconstruction components can utilize information represented by the downsampled reconstruction components, but the downsampled reconstruction components do not perform substantial supersampling.

S25: and combining the image component after the super sampling with the reconstruction component which is not input into the super sampling network to obtain a video image.

Because the input is processed by the super sampling network at this time, the reconstructed component of the output super sampling network without down sampling is actually inconsistent with the reconstructed component of the super sampling network without input, and substantial distinction is required here.

Specifically, only one super sampling network is used, except for inputting all image components reconstructed after RPR down sampling, 1 or 2 other image components reconstructed by decoding are also input into the super sampling network, the components output by the network are consistent with the input, and then the image components output by the network after super sampling and the image components reconstructed by decoding and not input into the super sampling network are combined to obtain the video image.

In addition, referring to fig. 4, fig. 4 is another embodiment of the flowchart of step S13 in fig. 1, where the step of inputting the down-sampled reconstruction components into the super-sampling network for super-sampling to recover the video image further includes the following steps:

s31: based on the reconstruction components which are not down-sampled, inputting the down-sampled reconstruction components into a super-sampling network for joint super-sampling;

this step is similar to step S24 in fig. 3, and is not described here again.

S32: merging the supersampled image components to obtain a first merged image;

specifically, if the super-sampling network only outputs the down-sampled image components, the super-sampled image components output by the network are combined to form a first combined image. However, some features of the image components may be lost, and the video image cannot be restored, and further operations are required.

S33: judging whether the first combined image is a video image or not;

generally, the system of the processing method is preset with a complete image for determining whether the first merged image is a video image.

If the first merged image is a video image, step S34 is executed, that is, the first merged image is determined to be a video image; if not, that is, when it is determined that the first combined image is not a video image, the YUV full image cannot be synthesized, step S35 is performed, that is, the first combined image is combined with the reconstructed component without downsampling to obtain a video image, so as to form a full image.

For example, in one such embodiment, the RPR downsampling policy selects scheme B and downsamples only the U component. The network outputs only the down-sampled image component and inputs the decoded reconstructed Y component and V component in addition to the down-sampled U component, so that the information of the Y component and V component can be utilized when the U component is over-sampled in the network. Finally, outputting the U component after the super sampling, and combining the Y component and the V component after decoding and reconstructing to generate a complete image.

In addition, the super-sampling network at least comprises a first convolutional neural network and a second convolutional neural network; referring to fig. 5, when the input and output of the first convolutional neural network and the second convolutional neural network are consistent, fig. 5 is another embodiment of the process of step S13 in fig. 1, where the step of inputting the down-sampled reconstruction components into the super-sampling network for super-sampling to recover the video image includes:

s41: inputting the down-sampled reconstruction component into a first convolution neural network for oversampling, and outputting a first oversampling component;

specifically, each downsampled component has its own network, and in the current network of downsampled components, only the current downsampled component may be input, or 1 or 2 more decoded reconstructed image components may be input. Wherein the downsampled component includes a Y component, a U component, and a V component. Decoding the reconstructed image component also includes reconstructing the Y component, reconstructing the U component, and reconstructing the V component. I.e. outputs down-sampled reconstruction components, or also 1 or 2 non-down-sampled reconstruction components.

For example, a network is built for each of the U component and the V component, and for the U component network, in addition to the input U component, the decoded and reconstructed Y component is input, and the U component U1 and the Y component Y1 of the original size are output.

It is noted that when 1 or 2 additional decoded reconstructed image components are input to the first convolutional neural network for joint supersampling, the joint supersampling herein means that during supersampling, the downsampled reconstructed components input to the supersampling network can use the information represented by the downsampled reconstructed components and/or other downsampled reconstructed components, but the downsampled reconstructed components and/or other downsampled reconstructed components are not subjected to substantial supersampling.

S42: inputting the down-sampled reconstruction component into a second convolutional neural network for super-sampling, and outputting a second super-sampling component;

for example, for the V component, the network inputs the decoded reconstructed Y component in addition to the V component, and outputs the original Y component Y2 and the V component V1.

S43: carrying out weighted average on repeated image components in the first super-sampling component and the second super-sampling component to obtain a third super-sampling component corresponding to a reconstruction component which is not down-sampled or a fourth super-sampling component corresponding to a reconstruction component which is down-sampled;

if the component of the network output is consistent with the input, and if a certain component is used as the input of a plurality of networks at the same time, the final result of the component is a weighted average value of the plurality of network outputs. Such as the Y component Y1 and the Y component Y2 mentioned above, a weighted average of the Y components can be obtained by weighted averaging Y1 and Y2 for the repeated image components in the first super-sampled component and the second super-sampled component.

Specifically, if the same component is input to different networks, weighted averaging is required to obtain a reconstruction component that may not be downsampled, or weighted averaging is required to obtain a downsampled reconstruction component, weighted averaging is performed on an image component in which the first supersampled component and the second supersampled component are repeated, and a third supersampled component corresponding to the downsampled reconstruction component or a fourth supersampled component corresponding to the downsampled reconstruction component can be obtained.

S44: and on the basis of the third super-sampling component or the fourth super-sampling component, combining the first super-sampling component with the repeated image component removed and the second super-sampling component with the repeated image component removed with the reconstruction component which is not input into the super-sampling network to obtain the video image.

Specifically, the third supersampling component or the fourth supersampling component, the first supersampling component from which the repeated image component is removed, and the second supersampling component from which the repeated image component is removed may be combined with the reconstruction component that is not input to the supersampling network to obtain a video image, and a complete image is formed.

In one embodiment, the RPR downsampling strategy selects scheme B and downsamples both the U and V components. The output component and the input of the network are kept consistent, a network is respectively built for the U component and the V component, for the U component network, in addition to the input U component, the Y component which is reconstructed by decoding is also input, and the U component U1 and the Y component Y1 of the original size are output; for the V component, the network inputs the decoded and reconstructed Y component in addition to the input V component, and outputs the Y component Y2 and the V component V1 of the original size. And finally, carrying out weighted average on the Y1 and the Y2 to obtain a final Y component, and combining the final Y component with the V1 and the U1 to generate a complete image.

In addition, the super-sampling network at least comprises a first convolutional neural network and a second convolutional neural network; referring to fig. 6, when the first convolutional neural network and the second convolutional neural network only output down-sampled image components, fig. 6 is a schematic diagram of another specific implementation flow of step S13 in fig. 1, where the step of inputting the down-sampled reconstruction components into the super-sampling network for super-sampling to recover the video image includes:

s51: inputting the down-sampled reconstruction component into a first convolution neural network for oversampling, and outputting a first oversampling component;

specifically, this step is similar to step S41 in fig. 5, and is not described here again.

S52: inputting the down-sampled reconstruction component into a second convolutional neural network for super-sampling, and outputting a second super-sampling component;

specifically, this step is similar to step S42 in fig. 5, and is not described here again.

S53: merging the first supersampling component and the second supersampling component to obtain a second merged image;

because the network only outputs the image component after down sampling, the first super sampling component and the second super sampling component can be merged to obtain a second merged image. However, some features of the image components may be lost, and the video image cannot be restored, and further operations are required.

S54: judging whether the second combined image is a video image;

if yes, go to step S55, i.e., determine the second merged image as a video image; if not, that is, when it is determined that the second combined image is not a video image, it indicates that the YUV full image cannot be synthesized, the process proceeds to step S56, that is, the second combined image is combined with the reconstructed component that is not downsampled, so as to obtain a video image.

In addition, the super-sampling network comprises a first convolutional neural network and a second convolutional neural network, wherein the first convolutional neural network is a super-sampling network for the luminance component, the second convolutional neural network is a super-sampling network for the chrominance component, and the chrominance component comprises a first chrominance component and a second chrominance component.

Referring to fig. 7, when the input and output of the first convolutional neural network and the second convolutional neural network are consistent, fig. 7 is a schematic flowchart of another specific implementation of step S13 in fig. 1, where the step of inputting the down-sampled reconstruction components into the super-sampling network for super-sampling to recover the video image includes:

s61: inputting the downsampled brightness component into a first convolution neural network for oversampling, and outputting a first oversampling component;

in this embodiment, the networks are constructed separately for luminance and chrominance. Specifically, if the luminance is downsampled, a supersampling network of the luminance needs to be constructed, and only the downsampled luminance component may be input to the first convolution neural network for supersampling, and the first supersampling component may be output.

S62: inputting at least one of the chrominance components into a second convolutional neural network for oversampling, and outputting a second oversampled component;

if at least one component in the chroma is downsampled, a sampling network of the chroma is constructed, and only the downsampled chroma component can be input, or 1 or 2 other decoded and reconstructed image components can be input.

Specifically, at least one of the chrominance components may be input to a second convolutional neural network for oversampling, and a second oversampled component may be output. At least one of the chrominance components and 1 or 2 of the reconstructed components may also be input to a second convolutional neural network for supersampling, and a second supersampled component may be output.

It is noted that when 1 or 2 additional decoded reconstructed image components are input to the first convolutional neural network for joint supersampling, the joint supersampling refers to that in the process of supersampling, the downsampled reconstructed components input to the second convolutional neural network can use the information represented by the non-downsampled reconstructed components and/or other downsampled reconstructed components, but the non-downsampled reconstructed components and/or other downsampled reconstructed components do not perform substantial supersampling.

S63: carrying out weighted average on repeated image components in the first super-sampling component and the second super-sampling component to obtain a third super-sampling component corresponding to a reconstruction component which is not down-sampled or a fourth super-sampling component corresponding to a reconstruction component which is down-sampled;

specifically, this step is similar to step S43 in fig. 5, and is not described here again.

S64: and combining the first supersampling component and the second supersampling component with the repeated image components removed and the reconstruction component which is not input into the supersampling network based on the third supersampling component or the fourth supersampling component to obtain the video image.

Specifically, this step is similar to step S44 in fig. 5, and is not described here again.

In addition, the oversampling network includes a first convolutional neural network and a second convolutional neural network, wherein when downsampling the luminance component, the first convolutional neural network of the luminance component is constructed, and when downsampling the chrominance component, the second convolutional neural network of the chrominance component is constructed, and the chrominance component includes a first chrominance component and a second chrominance component, wherein the first chrominance component may be one of a hue component or a color saturation component, and the second chrominance component may be the other of the hue component or the color saturation component;

referring to fig. 8, when the first convolutional neural network and the second convolutional neural network only output down-sampled image components, fig. 8 is a schematic flowchart of another specific implementation of step S13 in fig. 1, where the step of inputting the down-sampled reconstruction components into the super-sampling network for super-sampling to recover the video image includes:

s71: inputting the downsampled brightness component into a first convolution neural network for oversampling, and outputting a first oversampling component;

specifically, this step is similar to step S61 in fig. 7, and is not described here again.

S72: inputting at least one of the chrominance components into a second convolutional neural network for oversampling, and outputting a second oversampled component;

specifically, this step is similar to step S62 in fig. 7, and is not described here again.

S73: merging based on the first supersampling component and the second supersampling component to obtain a third merged image;

specifically, this step is similar to step S53 in fig. 6, and is not described here again.

S74: judging whether the third combined image is a video image;

if yes, the process proceeds to step S75, that is, the third merged image is determined to be a video image; if not, that is, when it is determined that the third combined image is not a video image, it indicates that the YUV full image cannot be synthesized, the process proceeds to step S76, that is, when it is determined that the third combined image is not a video image, the third combined image and the reconstructed component that is not downsampled are combined to obtain a video image.

Specifically, in one embodiment, the RPR downsampling strategy selects scheme a, i.e., downsamples both YUV. The super sampling network only outputs the image component after down sampling, constructs a network for the Y component, only inputs the down sampling reconstruction image of the Y component, and outputs the Y component of the original size; and constructing a network for the UV component, and inputting a downsampled reconstructed Y component in addition to the downsampled U and V components, so that the network can output original-size images of the U and V components by utilizing the information of the Y component when the U and V components are oversampled in the network. The final network output Y component, and the network output UV component are combined to generate a complete image.

Therefore, the method and the device can flexibly carry out downsampling of different strategies on each component, balance and trade-off can be carried out between bit consumption and image quality, and the method and the device are more flexible. In addition, no matter the current brightness component or the chrominance component is subjected to super-sampling, the other 1 or 2 pieces of component information can be utilized, and the quality of super-sampling is favorably improved.

For explaining the technical solution of the present application, the present application further provides a processing device, please refer to fig. 9, where fig. 9 is a schematic block diagram of a structure of an embodiment of the processing device of the present application, and the processing device 7 is configured to process a video image, and specifically may include: a downsampling module 71, a codec reconstruction module 72 and a supersampling module 73.

The down-sampling module 71 is configured to perform down-sampling on at least one of the luminance component, the first chrominance component, and the second chrominance component of the video image by using a reference image resampling technique to obtain a down-sampled component; a coding and decoding reconstruction module 72, configured to perform coding, decoding reconstruction on the downsampled image to obtain a downsampled reconstruction component; and the supersampling module 73 is configured to input the downsampled reconstruction components into a supersampling network for supersampling, so as to obtain a video image by recovery.

Therefore, in the present application, the down-sampling module 71 performs down-sampling on at least one of the luminance component, the first chrominance component, and the second chrominance component of the video image by using the reference image resampling technique, and performs appropriate selection processing on the image component, so as to reduce the bit consumption degree, keep the image component at a higher image quality, and make the degree of processing the image component present diversity, so that the restored video image is more natural and flexible in representation.

For explaining a technical solution of the present application, the present application further provides an electronic device, please refer to fig. 10, where fig. 10 is a schematic block diagram of a structure of an embodiment of the electronic device of the present application, and the electronic device 8 includes: a processor 81 and a memory 82, wherein the memory 82 stores a computer program 821, and the processor 81 is configured to execute the computer program 821 to implement the method according to the first aspect of the embodiment of the present application, which is not described herein again.

In addition, the present application further provides a computer-readable storage medium, please refer to fig. 11, where fig. 11 is a schematic circuit block diagram of an embodiment of the computer-readable storage medium of the present application, the computer-readable storage medium 9 stores a computer program 91, and the computer program 91 can be executed by a processor to implement the method according to the first aspect of the embodiment of the present application, which is not described herein again.

If implemented in the form of software functional units and sold or used as a stand-alone product, may also be stored in a device having a memory function. Based on such understanding, the technical solution of the present application may be substantially implemented or contributed by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage device and includes instructions (program data) for causing a computer (which may be a personal computer, a server, a network device, or the like) or a processor (processor) to execute all or part of the steps of the method according to the embodiments of the present invention. The aforementioned storage device includes: various media such as a usb disk, a portable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and electronic devices such as a computer, a mobile phone, a notebook computer, a tablet computer, and a camera having the storage medium.

The description of the execution process of the program data in the device with a storage function may refer to the embodiments of the video image processing method of the present application, and will not be described herein again.

The above description is only an example of the present application and is not intended to limit the scope of the present application, and all modifications of equivalent structures and equivalent processes, which are made by the contents of the specification and the drawings, or which are directly or indirectly applied to other related technical fields, are intended to be included within the scope of the present application.

Claims

1. A method for processing video images, the method comprising:

utilizing a reference image resampling technology to carry out down-sampling on at least one of the brightness component, the first chrominance component and the second chrominance component of the video image to obtain a down-sampling component;

coding, decoding and reconstructing the down-sampling component to obtain a down-sampling reconstruction component;

and inputting the down-sampled reconstruction components into a super-sampling network for super-sampling so as to recover and obtain the video image.

2. The processing method according to claim 1,

after the step of downsampling at least one of the luma component, the first chroma component, and the second chroma component of the video image using a reference image resampling technique, the processing method further comprises:

and coding, decoding and reconstructing the non-down-sampled component to obtain a reconstructed component without down-sampling.

3. The processing method according to claim 2,

the step of inputting the down-sampled reconstruction components into a super-sampling network for super-sampling to recover and obtain the video image comprises the following steps:

based on the reconstruction components which are not downsampled, inputting the downsampled reconstruction components into a super-sampling network for joint super-sampling;

and combining the image component after the super sampling with the reconstruction component which is not input into the super sampling network to obtain the video image.

4. The processing method according to claim 2,

merging the supersampled image components to obtain a first merged image;

judging whether the first combined image is the video image or not;

if so, determining the first combined image as the video image;

if not, combining the first combined image and the reconstructed component which is not down-sampled to obtain the video image.

5. The processing method according to claim 2,

the super-sampling network at least comprises a first convolutional neural network and a second convolutional neural network;

when the input and output of the first convolutional neural network and the second convolutional neural network are consistent, the step of inputting the down-sampled reconstruction component into a super-sampling network for super-sampling to recover and obtain the video image comprises:

inputting the down-sampled reconstruction component into the first convolutional neural network for oversampling, and outputting a first oversampling component;

inputting the down-sampled reconstruction component into the second convolutional neural network for super-sampling, and outputting a second super-sampling component;

carrying out weighted average on repeated image components in the first supersampled component and the second supersampled component to obtain a third supersampled component corresponding to the non-downsampled reconstruction component or a fourth supersampled component corresponding to the downsampled reconstruction component;

and combining the first supersampled component with the removed repeated image component and the second supersampled component with the removed repeated image component with the reconstructed component which is not input into the supersampled network based on the third supersampled component or the fourth supersampled component to obtain the video image.

6. The processing method according to claim 2,

when the first convolutional neural network and the second convolutional neural network only output down-sampled image components, the step of inputting the down-sampled reconstruction components into a super-sampling network for super-sampling so as to recover and obtain the video image comprises:

combining the first supersampling component and the second supersampling component to obtain a second combined image;

judging whether the second merged image is the video image;

if so, determining the second combined image as the video image;

and if not, combining the second combined image with the reconstruction component which is not downsampled to obtain the video image.

7. The processing method according to claim 2,

the super-sampling network comprises a first convolutional neural network and a second convolutional neural network, wherein the first convolutional neural network is a super-sampling network for the luminance component, the second convolutional neural network is a super-sampling network for the chrominance component, and the chrominance component comprises the first chrominance component and the second chrominance component;

inputting the downsampled brightness component into the first convolution neural network for oversampling, and outputting a first oversampling component;

inputting at least one of the chrominance components into the second convolutional neural network for oversampling, and outputting a second oversampled component;

and combining the first supersampled component and the second supersampled component without repeated image components and a reconstruction component which is not input into the supersampled network based on the third supersampled component or the fourth supersampled component to obtain the video image.

8. The processing method according to claim 2,

the super-sampling network comprises a first convolutional neural network and a second convolutional neural network, wherein when the luminance component is downsampled, the first convolutional neural network of the luminance component is constructed, and when the chrominance component is downsampled, the second convolutional neural network of the chrominance component is constructed, and the chrominance component comprises the first chrominance component and the second chrominance component;

merging based on the first supersampling component and the second supersampling component to obtain a third merged image;

judging whether the third combined image is the video image;

if so, determining the third merged image as the video image;

if not, combining the third combined image with the reconstruction component which is not down-sampled to obtain the video image.

9. A processing device for processing video images, comprising:

and the super-sampling module is used for inputting the down-sampled reconstruction components into a super-sampling network for super-sampling so as to recover and obtain the video image.

10. An electronic device, comprising: a processor and a memory, the memory having stored therein a computer program for execution by the processor to implement the processing method of any one of claims 1-8.

11. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program which, when being executed by a processor, carries out the processing method of any one of claims 1 to 8.