CN110062226A

CN110062226A - A kind of method for video coding, video encoding/decoding method, apparatus and system

Info

Publication number: CN110062226A
Application number: CN201810050810.6A
Authority: CN
Inventors: 周璐璐
Original assignee: Hangzhou Hikvision Digital Technology Co Ltd
Current assignee: Hangzhou Hikvision Digital Technology Co Ltd
Priority date: 2018-01-18
Filing date: 2018-01-18
Publication date: 2019-07-26
Anticipated expiration: 2038-01-18
Also published as: WO2019141258A1; CN110062226B

Abstract

The application is to belong to coding and decoding video field about a kind of method for video coding, video encoding/decoding method, apparatus and system.The described method includes: obtaining distorted picture, generate the corresponding side information component of distorted picture, the distorted picture and side information component input convolutional neural networks model are filtered to obtain the distorted image block corresponding first that the distorted picture includes and remove distorted image block, an image block is selected to remove distorted image block as the corresponding target of the distorted image block from the corresponding image block set of the distorted image block, described image set of blocks includes that the distorted image block corresponding first removes distorted image block and/or the distorted image block；Each distorted image block for including according to the distorted picture is corresponding to remove distorted image block, encodes the original video picture after current original video picture and obtains video bit stream.The application can be improved the performance of distortion.

Description

Video coding method, video decoding method, device and system

Technical Field

The present application relates to the field of video encoding and decoding, and in particular, to a video encoding method, a video decoding method, an apparatus, and a system.

Background

In a video coding system, when an original video picture is coded, the original video picture is processed multiple times to obtain a reconstructed picture. In the process of video coding, the reconstructed picture can be used as a reference picture to code the original video picture.

The original video picture may be processed for multiple times to obtain a reconstructed picture, which may have a pixel offset from the original video picture, that is, the reconstructed picture has distortion, resulting in visual impairment or artifacts. These distortions affect the subjective quality of the reconstructed picture, and since the reconstructed picture is used as a reference picture for video encoding, it also affects the prediction accuracy of subsequent encoding, and affects the size of the final bitstream.

Disclosure of Invention

In order to improve the performance of distortion removal, the embodiment of the application provides a video encoding method, a video decoding device and a video decoding system. The technical scheme is as follows:

in a first aspect, an embodiment of the present application provides a video encoding method, where the method includes:

acquiring a distorted picture, wherein the distorted picture is generated when a current original video picture is coded;

generating a side information component corresponding to a distorted picture, wherein the side information component represents the distortion characteristics of the distorted picture relative to the current original video picture;

inputting the distorted picture and the side information component into a convolutional neural network model for filtering to obtain a first distortion removing image block corresponding to a distorted image block included in the distorted picture, wherein the convolutional neural network model is obtained by training based on a preset training set, the preset training set comprises an original sample picture, a plurality of distorted pictures corresponding to the original sample picture and the side information component corresponding to the distorted picture corresponding to each original sample picture;

selecting an image block from an image block set corresponding to the distorted image block as a target undistorted image block corresponding to the distorted image block, wherein the image block set comprises a first undistorted image block and/or the distorted image block corresponding to the distorted image block;

and according to the target distortion-removing image blocks corresponding to the distortion image blocks, encoding the original video pictures behind the current original video picture to obtain a video bit stream.

Optionally, the inputting the distorted picture and the side information component into a convolutional neural network model for filtering to obtain a first undistorted image block corresponding to a distorted image block included in the distorted picture includes:

dividing the distorted image to obtain distorted image blocks included in the distorted image, inputting the distorted image blocks and side information components corresponding to the distorted image blocks into a convolutional neural network model for filtering to obtain first distortion removing image blocks corresponding to the distorted image blocks; or,

and inputting the distorted picture and the side information component into a convolutional neural network model for filtering to obtain a de-distorted picture, and dividing the de-distorted picture to obtain a first de-distorted image block corresponding to a distorted image block included in the distorted picture.

Optionally, after the obtaining the distorted picture, the method further includes:

and inputting the distorted image into at least one filter for filtering to obtain a second distortion-removed image block corresponding to a distorted image block included in the distorted image output by each filter, wherein an image block set corresponding to the distorted image block also includes the second distortion-removed image block corresponding to the distorted image block output by each filter.

Optionally, the selecting an image block from the image block set includes:

selecting an image block from an image block set according to an original image block corresponding to the distorted image block in the current original video picture; or,

and selecting one image block from the image block set according to the coding information of each coding unit included in the distorted image block.

Optionally, the selecting an image block from an image block set according to the original image block corresponding to the distorted image block in the current original video picture includes:

respectively calculating difference values between each image block in the image block set and an original image block corresponding to the distorted image block;

and selecting an image block with the minimum difference value between original image blocks corresponding to the distorted image blocks from the image block set.

Optionally, the video bitstream further includes a filtering flag map corresponding to the distorted picture, and the method further includes:

and when one image block is selected from the image block set according to the original image block corresponding to the distorted image block, filling mark information for marking the data type of the target distortion-removed image block in the filtering mark map according to the position of the distorted image block in the distorted image.

In a second aspect, an embodiment of the present application provides a video decoding method, where the method includes:

entropy decoding is carried out on the received video bit stream to obtain current entropy decoding data;

acquiring each distorted image block included in a distorted image, wherein the distorted image is generated when the current entropy decoding data is decoded;

determining a data type corresponding to a target image block according to the current entropy decoding data, wherein the target image block is a distorted image block in the distorted image;

when the data type is used for representing filtering data of a convolutional neural network model, generating a side information component corresponding to a target image block, wherein the side information component represents distortion characteristics of the target image block relative to an original image block corresponding to the target image block in an original video picture, and the original video picture is a video picture corresponding to the current entropy decoding data;

inputting the target image block and the side information component into a convolutional neural network model for convolutional filtering to obtain a de-distorted image block corresponding to the target image block, wherein the convolutional neural network model is obtained by training based on a preset training set, and the preset training set comprises an original sample picture, a plurality of distorted pictures corresponding to the original sample picture and a side information component corresponding to the distorted picture corresponding to each original sample picture;

and decoding the subsequently received video bit stream according to the distortion-removed image blocks corresponding to the distortion image blocks included in the distortion picture.

Optionally, the method further includes:

when the data type is used for representing data filtered by a filter, inputting the target image block into the filter for filtering processing to obtain a distortion-removed image block corresponding to the target image block; or,

and when the data type is used for representing the target image block, determining the target image block as a distortion-removed image block corresponding to the target image block.

Optionally, the current entropy decoding data includes a filtering flag map, where the filtering flag map includes flag information corresponding to each distorted image block in the distorted image, and the flag information corresponding to the distorted image block is used to identify a data type corresponding to the distorted image block;

the determining the data type corresponding to the target image block according to the current entropy decoding data includes:

reading mark information corresponding to a target image block from the filtering mark image according to the position of the target image block in the distorted image;

and determining the data type corresponding to the target image block according to the mark information.

Optionally, the current entropy decoding data includes a position and coding information of each coding unit in the original video picture;

determining each coding unit included in the target image block according to the position of the target image block in the distorted image and the position of each coding unit in the original video image;

and determining the data type corresponding to the target image block according to the coding information of each coding unit included in the target image block.

In a third aspect, an embodiment of the present application provides a video encoding apparatus, including:

the system comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring a distorted picture, and the distorted picture is generated when a current original video picture is coded;

the generating module is used for generating a side information component corresponding to a distorted picture, wherein the side information component represents the distortion characteristics of the distorted picture relative to the current original video picture;

the filtering module is used for inputting the distorted picture and the side information component into a convolutional neural network model for filtering processing to obtain a first distortion removing image block corresponding to the distorted image block included in the distorted picture, wherein the convolutional neural network model is obtained by training based on a preset training set, the preset training set comprises an original sample picture, a plurality of distorted pictures corresponding to the original sample picture and the side information component corresponding to the distorted picture corresponding to each original sample picture;

the selection module is used for selecting an image block from an image block set corresponding to the distorted image block as a target undistorted image block corresponding to the distorted image block, wherein the image block set comprises a first undistorted image block and/or the distorted image block corresponding to the distorted image block;

and the coding module is used for coding the original video picture after the current original video picture according to the target distortion-removed image block corresponding to the distortion image block to obtain a video bit stream.

Optionally, the filtering module includes:

the first filtering unit is used for dividing the distorted image to obtain distorted image blocks included in the distorted image, inputting the distorted image blocks and side information components corresponding to the distorted image blocks into a convolutional neural network model for filtering to obtain first distortion removing image blocks corresponding to the distorted image blocks; or,

and the second filtering unit is used for inputting the distorted picture and the side information component into a convolutional neural network model for filtering processing to obtain a de-distorted picture, and dividing the de-distorted picture to obtain a first de-distorted image block corresponding to a distorted image block included in the distorted picture.

Optionally, the filtering module is further configured to:

Optionally, the selecting module includes:

the first selection unit is used for selecting an image block from an image block set according to an original image block corresponding to the distorted image block in the current original video picture; or,

and the second selection unit is used for selecting one image block from the image block set according to the coding information of each coding unit included in the distorted image block.

Optionally, the first selecting unit is configured to:

Optionally, the video bitstream further includes a filtering flag map corresponding to the distorted picture, and the apparatus further includes:

and the filling module is used for filling mark information for marking the data type of the target distortion-removed image block in the filtering mark map according to the position of the distorted image block in the distorted image when one image block is selected from the image block set according to the original image block corresponding to the distorted image block.

In a fourth aspect, an embodiment of the present application provides a video decoding apparatus, including:

the decoding module is used for carrying out entropy decoding on the received video bit stream to obtain current entropy decoding data;

the acquisition module is used for acquiring each distorted image block included in a distorted image, wherein the distorted image is generated when the current entropy decoding data is decoded;

the determining module is used for determining a data type corresponding to a target image block according to the current entropy decoding data, wherein the target image block is a distorted image block in the distorted image;

a generating module, configured to generate a side information component corresponding to a target image block when the data type is used to represent data filtered by a convolutional neural network model, where the side information component represents a distortion characteristic of the target image block relative to an original image block corresponding to the target image block in an original video picture, and the original video picture is a video picture corresponding to the current entropy decoding data;

the filtering module is used for inputting the target image block and the side information component into a convolutional neural network model for convolutional filtering processing to obtain a distortion-removed image block corresponding to the target image block, wherein the convolutional neural network model is obtained by training based on a preset training set, the preset training set comprises an original sample picture, a plurality of distorted pictures corresponding to the original sample picture and the side information component corresponding to each distorted picture of the original sample picture;

the decoding module is further configured to decode a subsequently received video bitstream according to a de-distorted image block corresponding to each distorted image block included in the distorted image.

Optionally, the filtering module is further configured to:

the determining module includes:

the reading unit is used for reading the mark information corresponding to the target image block from the filtering mark image according to the position of the target image block in the distorted image;

and the first determining unit is used for determining the data type corresponding to the target image block according to the mark information.

the determining module comprises:

a second determining unit, configured to determine, according to a position of the target image block in the distorted picture and positions of coding units in the original video picture, the coding units included in the target image block;

and the third determining unit is used for determining the data type corresponding to the target image block according to the coding information of each coding unit included in the target image block.

In a fifth aspect, an embodiment of the present application provides a video encoding method, where the method includes:

acquiring a distorted image block included in a distorted image, wherein the distorted image is generated when an original video image is coded;

determining a data type corresponding to a target image block according to coding information of a coding unit included in the target image block, wherein the target image block is any distorted image block in the distorted image;

when the data type is used for representing filtering data of a convolutional neural network model, generating a side information component corresponding to the target image block, wherein the side information component represents distortion characteristics of the target image block relative to an original image block corresponding to the target image block in the original video picture;

and according to the distortion-removed image blocks corresponding to the distortion image blocks in the distortion image, encoding the original video image behind the current original video image to obtain a video bit stream.

Optionally, the method further includes:

In a sixth aspect, an embodiment of the present application provides a video encoding apparatus, including:

the system comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring a distorted image block included in a distorted image, and the distorted image is generated when an original video image is coded;

the determining module is used for determining a data type corresponding to a target image block according to coding information of a coding unit included in the target image block, wherein the target image block is any distorted image block in the distorted image;

the generating module is used for generating a side information component corresponding to the target image block when the data type is used for representing the data filtered by the convolutional neural network model, wherein the side information component represents the distortion characteristics of the target image block relative to an original image block corresponding to the target image block in the original video picture;

the filtering module is used for inputting the target image block and the side information component into a convolutional neural network model for convolutional filtering processing to obtain a distortion-removed image block corresponding to the target image block, wherein the convolutional neural network model is obtained by training based on a preset training set, the preset training set comprises an original sample picture, a plurality of distorted pictures corresponding to the original sample picture and the side information component corresponding to the distorted picture corresponding to each original sample picture;

and the coding module is used for coding the original video picture after the current original video picture according to the distortion removing image blocks corresponding to the distortion image blocks in the distortion picture to obtain the video bit stream.

Optionally, the filtering module is further configured to:

In a seventh aspect, this application embodiment provides a computer-readable storage medium, in which a computer program is stored, and the computer program, when executed by a processor, implements the method steps provided in the first aspect or any optional manner of the first aspect or implements the method steps provided in the fifth aspect or any optional manner of the fifth aspect.

In an eighth aspect, the present application provides a computer-readable storage medium, in which a computer program is stored, and the computer program, when executed by a processor, implements the method steps provided in the second aspect or any optional manner of the second aspect.

In a ninth aspect, an embodiment of the present application provides a coding and decoding system, which includes the video encoding apparatus provided in the first aspect and the video decoding apparatus provided in the second aspect; or,

the system comprises the video encoding device provided by the sixth aspect and the video decoding device provided by the second aspect.

The technical scheme provided by the embodiment of the application can have the following beneficial effects:

the first distortion removing image block corresponding to the distortion image block of the distortion image is obtained after the distortion image is filtered, and then the image block is selected from the distortion image block and the first distortion removing image block corresponding to the distortion image block to be used as the image block obtained through final filtering, so that not only is the filtering performance improved, but also the distortion removing performance in the video coding process is improved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present application and together with the description, serve to explain the principles of the application.

Fig. 1 is a flowchart of a video encoding method according to an embodiment of the present application;

fig. 2-1 is a flowchart of another video encoding method provided by an embodiment of the present application;

fig. 2-2 is a block diagram of a video coding system according to an embodiment of the present disclosure;

fig. 2-3 are block diagrams of alternative video coding systems provided in embodiments of the present application;

2-4 are one of the schematic diagrams of the side information component provided by the embodiments of the present application;

2-5 are second schematic diagrams of side information components provided by embodiments of the present application;

FIGS. 2-6 are system architecture diagrams of the solution provided by the embodiments of the present application;

2-7 are data flow diagrams of the technical solutions provided by the embodiments of the present application;

FIGS. 2-8 are schematic diagrams of obtaining color components of a distorted image according to embodiments of the present application;

FIGS. 2-9 are flowcharts of a method for de-distorting a distorted image according to an embodiment of the present disclosure;

2-10 are flow charts of convolutional neural network model training methods provided by embodiments of the present application;

fig. 3 is a flowchart of a video decoding method according to an embodiment of the present application;

FIG. 4-1 is a flow chart of another video decoding method provided by the embodiments of the present application;

fig. 4-2 is a block diagram of a video decoding system according to an embodiment of the present disclosure;

fig. 4-3 is a block diagram of another video decoding system provided in the embodiments of the present application;

fig. 5 is a schematic structural diagram of a video encoding apparatus according to an embodiment of the present application;

fig. 6 is a schematic structural diagram of a video decoding apparatus according to an embodiment of the present application;

fig. 7 is a flowchart of another video encoding method provided in an embodiment of the present application;

fig. 8 is a flowchart of another video encoding method provided in an embodiment of the present application;

fig. 9 is a block diagram of another video encoding apparatus according to an embodiment of the present application;

fig. 10 is a schematic structural diagram of a video encoding and decoding system according to an embodiment of the present application;

fig. 11 is a schematic structural diagram of an apparatus according to an embodiment of the present application.

With the above figures, there are shown specific embodiments of the present application, which will be described in more detail below. These drawings and written description are not intended to limit the scope of the inventive concepts in any manner, but rather to illustrate the inventive concepts to those skilled in the art by reference to specific embodiments.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present application, as detailed in the appended claims.

Referring to fig. 1, an embodiment of the present application provides a video encoding method, where the method includes:

step 101: and acquiring a distorted picture, wherein the distorted picture is generated when the current original video picture is coded.

The encoding includes performing prediction, transformation, quantization and other processes on a current original video picture to obtain prediction data and residual information, performing entropy encoding according to the prediction data, the residual information and the like to obtain a video bit stream, and performing reconstruction processing according to the prediction data and the residual information to obtain a reconstructed picture. The distorted picture is the reconstructed picture or a picture obtained by filtering the reconstructed picture.

Step 102: and generating a side information component corresponding to the distorted picture, wherein the side information component represents the distortion characteristics of the distorted picture and the original video picture.

Step 103: and inputting the distorted picture and the side information component into a convolutional neural network model for filtering to obtain a first distortion removing image block corresponding to the distorted image block included in the distorted picture.

The convolutional neural network model is obtained by training based on a preset training set, wherein the preset training set comprises an original sample picture, a plurality of distorted pictures corresponding to the original sample picture and side information components corresponding to each distorted picture.

Step 104: and selecting an image block from an image block set corresponding to a distorted image block as a target undistorted image block corresponding to the distorted image block, wherein the image block set comprises a first undistorted image block corresponding to the distorted image block and/or the distorted image block.

In the selection, the original image block corresponding to the distorted image block may be used as a reference, and the image block with the smallest difference from the original image block may be selected from the image block set as the target undistorted image block. Or,

one image block from the set of distorted image blocks may be selected as a target undistorted image block according to encoding information of each encoding unit included in the distorted image block. The encoding information of the encoding unit may reflect the original image information corresponding to the encoding unit in the original video picture, so that an image block having a small difference from the original image block may also be selected as a target undistorted image block according to the encoding information.

Step 105: and according to the target distortion-removed image block corresponding to the distortion image block, encoding the original video picture behind the current original video picture to obtain a video bit stream.

In practical implementation, the target distortion-removed image blocks corresponding to the distortion image blocks can be combined into a frame of reference picture, and when the reference picture is selected to be used for coding an original video picture to be coded after a current original video picture, the original video picture to be coded can be coded according to the reference picture to obtain a video bit stream.

In the embodiment of the application, the distorted image is filtered to obtain a first distortion removing image block corresponding to the distorted image block of the distorted image, and then an image block with small difference with the original image is selected from the distorted image block and the first distortion removing image block corresponding to the distorted image block to be used as an image block obtained by final filtering, so that the filtering performance is improved, and the distortion removing performance in the video coding process is improved.

For the video coding method shown in fig. 1, referring to fig. 2-1, a detailed implementation process of the method may include:

step 201: and acquiring a distorted picture generated in the video coding process.

During the video encoding process, a reconstructed picture may be generated, and the distorted picture may be the reconstructed picture or may be a picture obtained by filtering the reconstructed picture.

Referring to the structural schematic diagram of the video coding system shown in fig. 2-2, the video coding system includes a prediction module, an adder, a transformation unit, a quantization unit, an entropy encoder, an inverse quantization unit, an inverse transformation unit, a reconstruction unit, a CNN (convolutional neural network model), and a buffer.

The video coding system comprises the following coding processes: and the prediction module predicts the input current original video picture according to the reference picture in the buffer to obtain prediction data, and inputs the prediction data into the adder, the entropy coder and the reconstruction unit. The prediction module comprises an intra-frame prediction unit, a motion estimation and motion compensation unit and a switch. The intra-frame prediction unit can perform intra-frame prediction on the current original video picture to obtain intra-frame prediction data, the motion estimation and motion compensation unit performs inter-frame prediction on the current original video picture according to the reference picture cached in the buffer to obtain inter-frame prediction data, and the switch selects to output the intra-frame prediction data or the inter-frame prediction data to the adder and the reconstruction unit.

The adder generates prediction error information according to the prediction data and the current original video picture, the transformation unit transforms the prediction error information and outputs the transformed prediction error information to the quantization unit; the quantization unit quantizes the transformed prediction error information according to the quantization parameter to obtain residual error information, and outputs the residual error information to the entropy coder and the inverse quantization unit; the entropy encoder encodes the residual information and the preset data to form a video bitstream, and the video comparison stream may include the encoding information of each encoding unit in the original video picture.

Meanwhile, the inverse quantization unit and the inverse transformation unit respectively perform inverse quantization and inverse transformation processing on the residual error information to obtain prediction error information, and the prediction error information is input into the reconstruction unit; a reconstruction unit generates a reconstructed picture from the prediction error information and the prediction data. Accordingly, in this step, the reconstructed picture generated by the reconstruction unit may be acquired and taken as a distorted picture.

Optionally, referring to fig. 2 to 3, a filter may be further connected in series between the convolutional neural network model and the reconstruction unit, and the filter may further filter the reconstructed picture generated by the reconstruction unit and output the filtered reconstructed picture. Accordingly, in this step, a filtered reconstructed picture may be obtained and the filtered reconstructed picture may be taken as a distorted picture.

Step 202: and dividing the distorted picture to obtain a plurality of distorted image blocks included in the distorted picture.

In the embodiment of the application, the distorted picture is divided into a plurality of distorted image blocks, and then each image block is filtered.

In the embodiment of the application, a pre-trained convolutional neural network model and at least one filter are provided, the convolutional neural network model is used for filtering the distorted image blocks, and each filter in the at least one filter is also used for filtering the distorted image blocks. The at least one filter may be a convolutional neural network filter, an Adaptive Loop filter technique (ALF), or the like.

Step 202 is optional, that is, step 202 may not be executed, and the entire frame of distorted picture may also be directly filtered during filtering.

Step 203: and generating a side information component corresponding to the target image block, wherein the side information component represents the distortion characteristics of the target image block relative to the corresponding original image block in the original video picture, and the target image block is a distorted image block in the distorted picture.

When the convolutional neural network model is used for filtering the target image block, a side information component corresponding to the target image block needs to be used.

The side information component corresponding to the target image block may be obtained according to the quantization parameter or the coding information of each coding unit included in the target image block.

If the whole frame of distorted picture is filtered, a side information component corresponding to the distorted picture is generated in the step, and the side information component represents the distortion characteristics of the distorted picture relative to the original video picture.

The process of generating the side information component corresponding to the distorted picture is the same as the process of generating the side information component corresponding to the target image block, and only the side information component corresponding to the target image block is described next.

For the side information component, which represents the distortion characteristics of the target image block relative to the corresponding original image block in the original picture, is an expression of the distortion characteristics determined by the image processing process.

In practical applications, the distortion characteristics may include at least one of the following distortion characteristics:

distortion degree, distortion position, distortion type:

first, the side information component may represent a degree of distortion of the distorted target picture block with respect to the original picture.

For example, in a mainstream video coding and decoding application, an image is generally divided into a plurality of non-overlapping and non-fixed-size coding units, the coding units respectively perform predictive coding and different-degree quantization, distortion between the coding units generally does not have consistency, and sudden pixel changes generally occur at the boundaries of the coding units, so that the boundary coordinates of the coding units can be used as side information for representing the distortion position a priori.

The side information component may also indicate a distortion type of the distorted target image block relative to the original image, for example, in a video coding and decoding application, different prediction modes may be adopted by different coding units in an image, and the different prediction modes may affect distribution of residual data, thereby affecting characteristics of the distorted target image block, and thus, the prediction mode of the coding unit may be used as side information representing the distortion type.

Optionally, the side information component may be a combination of one or more of the above, or may be multiple side information components of one of the above, for example, after image processing, the distortion degree of a distorted target image block may be represented by a parameter of one physical meaning, or the distortion degree of a distorted target image block may be represented by two parameters of different physical meanings, and accordingly, one or more side information components each representing the distortion degree may be used as input data according to actual needs.

As shown in fig. 2-4, the matrix structure of the side information component is the same as the matrix structure of the color component of the distorted target image block, where the coordinates [0,0], [0,1] represent the distortion position, and the element value 1 of the matrix represents the distortion degree, i.e. the side information component can represent both the distortion degree and the distortion position.

As shown in fig. 2-5, the coordinates [0,0], [0,1], [2,0], [2,4] represent the distortion position, and the values of the elements 1, 2 of the matrix represent the distortion type, i.e., the side information component can represent both the distortion type and the distortion position.

Moreover, the above solution provided by the embodiment of the present application may include two side information components respectively illustrated in fig. 2 to 4 and fig. 2 to 5.

Further, when the color components of the distorted image include a plurality of types, the side information components may include side information components respectively corresponding to each of the color components of the distorted image, according to the practical application and requirements of the scheme.

The solution provided by the embodiment of the present application can be applied to various currently known practical application scenarios, for example, an application scenario in which super-resolution processing is performed on an image, and the present invention is not limited herein.

This step can be achieved by two steps, respectively

Step 2031, determining a distortion degree value of each pixel point in the target image block for the target image block to be processed.

In practical application, after the original image is subjected to image processing in different manners, the physical parameters representing the distortion degree may also be different, and therefore, in this step, the corresponding distortion degree value capable of accurately representing the distortion degree of the pixel point may be determined based on different image processing manners, and specifically may be as follows:

the first mode is as follows: for a target image block obtained through encoding and decoding, the quantization parameter of each coding unit in the target image block is known, that is, the quantization parameter of each coding unit in the target image block can be obtained, and the quantization parameter of the coding unit where each pixel point of the target image block is located is determined as the distortion degree value of each pixel point of the target image block;

the second mode is as follows: for a target image block obtained through encoding and decoding, the encoding information of each encoding unit in the target image block is known, that is, the encoding information of each encoding unit in the target image block can be obtained, the quantization parameter of each encoding unit is calculated according to the encoding information of each encoding unit in the target image block, and the quantization parameter of the encoding unit where each pixel point of the target image block is located is determined as the distortion degree value of each pixel point of the target image block.

Step 2032, based on the position of each pixel point in the target image block, using the obtained distortion degree value of each pixel point to generate a side information component corresponding to the target image block, wherein each component value included in the side information component corresponds to a pixel point at the same position on the target image block.

Because each component value included in the side information component corresponds to a pixel point at the same position on the target image block, the side information component has the same structure as the color component of the distorted image of the target image block, namely, the matrix representing the side information component and the matrix representing the color component of the target image block are of the same type.

In this step, the distortion degree value of each pixel point obtained can be determined as the component value of the same position of the pixel point in the side information component corresponding to the target image block based on the position of each pixel point in the target image block, that is, the distortion degree value of each pixel point is directly determined as the component value corresponding to the pixel point.

When the pixel value range of the target image block is different from the value range of the distortion degree value of the pixel point, the obtained distortion degree value of each pixel point can be subjected to standardization processing based on the pixel value range of the target image block to obtain a processed distortion degree value, and the value range of the processed distortion degree value is the same as the pixel value range;

and then determining the processed distortion degree value of each pixel point as a component value of the same position of the pixel point in the side information component corresponding to the target image block based on the position of each pixel point in the target image block.

In this step, the distortion degree value of the pixel point may be normalized by the following formula:

wherein norm (x) is a processed distortion degree value obtained after standardization, x is a distortion degree value of a pixel point, and the pixel value range of the target image block is [ PIEXL_MIN，PIXEL_MAX]The value range of the distortion degree value of the pixel point is [ QP ]_MIN，QP_MAX]。

In the above two steps, that is, the process of generating the side information component of the target image block and generating the side information component, the side information guide map corresponding to the target image block may be generated, the side information guide map may indicate the distortion degree of the target image block by the side information component, and the side information guide map and the target image block may have the same height and the same width.

Step 204: and inputting the target image block and the side information component into a convolutional neural network model for filtering to obtain a first distortion-removed image block.

Optionally, referring to fig. 2-6, the convolutional neural network model comprises: a side information component generating module 11, a convolutional neural network 12 and a network training module 13;

the convolutional neural network 12 may include the following three layers:

an input layer processing unit 121, configured to receive an input of a convolutional neural network, where the input includes a color component of a distorted image of a target image block and a side information component of the target image block; performing a first layer of convolution filtering processing on the input data;

a hidden layer processing unit 122 for performing convolution filtering processing of at least one layer on the output data of the input layer processing unit 121;

and an output layer processing unit 123, which performs convolution filtering processing on the last layer of the output data of the hidden layer processing unit 122, and outputs the result as a color component of the de-distorted image, which is used for generating a de-distorted image block.

Fig. 2 to 7 are schematic diagrams of data flows for implementing the solution, where a distorted image color component of a target image block and a side information component of the target image block are input as input data into a pre-trained convolutional neural network model, the convolutional neural network model may be represented by a convolutional neural network with a preset structure and a configured network parameter set, and the input data is subjected to convolutional filtering processing of an input layer, a hidden layer and an output layer to obtain a de-distorted image block.

As input data of the convolutional neural network model, according to actual needs, one or more side information components may be included, and one or more distorted image color components may also be included, for example, at least one of a Y color component, a U color component, and a V color component.

For example, in some image processing, there may be a distortion condition only for one color component of all color components, and then only the color component of the distorted image may be used as input data in the distortion removal processing, or if there is a distortion condition for two color components, both the two color components of the distorted image may be used as input data, and accordingly, the corresponding color components of the distorted image are both output.

When the distorted image color component of the distorted image is obtained, the required value of one or more color components can be extracted from the stored data of each pixel point according to the requirement, so that the distorted image color component of the distorted image is obtained.

As shown in fig. 2 to 8, taking YUV color space as an example, the value of the Y color component of each pixel point is extracted from the YUV color space, so as to obtain the Y color component of the distorted image.

Referring to fig. 2-9, this step may specifically include the following processing steps:

in the embodiment of the invention, a scheme is described by taking the structure of a convolutional neural network model comprising an input layer, a hidden layer and an output layer as an example.

Step 61, using the distorted image color component of the target image block and the generated side information component as input data of a pre-established convolutional neural network model, and performing a first layer of convolutional filtering processing by the input layer, which may specifically be as follows:

in the convolutional neural network model, the input data may be input into the network through respective channels, and in this step, c may be input_yTarget image block color components Y and c of the channel_mThe side information components M of the channels are combined in the dimension of the channels to form c_y+c_mInput data I of the channel, and carrying out multidimensional convolution filtering and nonlinear mapping on the input data I by adopting the following formula to generate n₁Image blocks represented in sparse form:

F₁(I)＝g(W₁*I+B₁)；

wherein, F₁(I) Is the output of the input layer, I is the input of the convolution layer in the input layer, W is the convolution operation₁Weight coefficients of a convolutional layer filter bank for an input layer, B₁For the offset coefficients of the convolutional layer filter bank of the input layer, g () is a nonlinear mapping function.

Wherein, W₁Corresponds to n₁A convolution filter, i.e. having n₁The convolution filter acts on the input of the convolution layer of the input layer to output n₁Each image block; the size of the convolution kernel of each convolution filter is c₁×f₁×f₁Wherein c is₁For the number of input channels, f₁The size in space for each convolution kernel.

In a specific embodiment, the input layerThe parameters of (d) may be: c. C₁＝2，f₁＝5，n₁As g () the function of relu (rectified linear unit) is used, which is expressed as:

g(x)＝max(0，x)；

the input layer convolution processing expression in this embodiment is:

F₁(I)＝max(0，W₁*I+B₁)；

step 62 image block F of sparse representation output by hidden layer to input layer₁(I) Further high dimensional mapping is performed.

In the embodiment of the present invention, the number of convolutional layers, the connection manner of the convolutional layers, the attribute of the convolutional layers, and the like included in the hidden layer are not limited, and various structures known at present may be adopted, but the hidden layer includes at least 1 convolutional layer.

For example, the hidden layer comprises N-1(N ≧ 2) convolutional layers, and the hidden layer process is represented by the following formula:

F₁(I)＝g(W₁*F_1-1(I)+B₁)，i∈{2，3，…，N}；

wherein, F₁(I) Representing the output of the i-th convolutional layer in a convolutional neural network, W, a convolution operation₁Is the weight coefficient of the i-th convolutional layer filter bank, B₁For the offset coefficients of the i-th convolutional layer filter bank, g () is a nonlinear mapping function.

Wherein, W_iCorresponds to n_iA convolution filter, i.e. having n_iA convolution filter acting on the input of the i-th convolution layer and outputting n_iEach image block; the size of the convolution kernel of each convolution filter is c_i×f_i×f_iWherein c is_iFor the number of input channels, f_iThe size in space for each convolution kernel.

In one particular embodiment, the hidden layer may comprise 1 convolutional layer,the convolution filter parameters of the convolutional layer are: c. C₂＝64，f₂＝1，n₂Using the relu (rectified linear unit) function as g (), the convolution processing expression of the hidden layer in this embodiment is:

F₂(I)＝max(0，W₂*F₁(I)+B₂)；

step 63, outputting the high-dimensional image block F output by the output layer to the hidden layer_N(I) And aggregating and outputting the color components of the de-target image blocks for generating the de-distorted image blocks.

In the embodiment of the present invention, the structure of the output layer is not limited, and the output layer may be a Residual Learning structure, a Direct Learning structure, or another structure.

The process using the Residual Learning structure is as follows:

and performing convolution operation on the output of the hidden layer to obtain a compensation residual error, and adding the compensation residual error and the input color component of the distorted image to obtain a color component of a distortion-removed image, namely obtaining a second image block with distortion removed. The output layer processing can be represented by the following equation:

F(I)＝W_N+1*F_N(I)+B_N+1+Y；

wherein F (I) is the output of the output layer, F_N(I) As output of the hidden layer, as convolution operation, W_N+1Weight coefficients of the convolutional layer filter bank as output layer, B_N+1Y is a color component of the distorted image to be subjected to the distortion removal processing without being subjected to the convolution filter processing.

Wherein, W_N+1Corresponds to n_N+1A convolution filter, i.e. having n_N+1A convolution filter for outputting N by acting on the input of the (N + 1) th convolution layer_N+1Image block, n_N+1The number of output color components of the undistorted image is generally equal to the number of input color components of the distorted image, if only one kind of undistorted image is outputColor component, then n_N+1The value is generally 1; the size of the convolution kernel of each convolution filter is c_N+1×f_N+1×f_N+1Wherein c is_N+1For the number of input channels, f_N+1The size in space for each convolution kernel.

The process using the Direct Learning structure is as follows:

and after convolution operation is carried out on the output of the hidden layer, the color component of the distorted image is directly output, and the second image block subjected to distortion removal is obtained. The output layer processing can be represented by the following equation:

F(I)＝W_N+1，F_N(I)+B_N+1；

wherein F (I) is the output of the output layer, F_N(I) As output of the hidden layer, as convolution operation, W_N+1Weight coefficients of the convolutional layer filter bank as output layer, B_N+1Is the offset coefficient of the convolutional layer filter bank of the output layer.

Wherein, W_N+1Corresponds to n_N+1A convolution filter, i.e. having n_N+1A convolution filter for outputting N by acting on the input of the (N + 1) th convolution layer_N+1Image block, n_N+1The number of output undistorted image color components is generally equal to the number of input distorted image color components, and if only one type of undistorted image color component is output, then H_N+1The value is generally 1; the size of the convolution kernel of each convolution filter is c_N+1×f_N+1×f_N+1Wherein c is_N+1For the number of input channels, f_N+1The size in space for each convolution kernel.

In a specific embodiment, the output layer adopts a Residual Learning structure, the output layer includes 1 convolution layer, and the convolution filter parameters of the output layer are: c. C₃＝32，f₃＝3，n₃1, the convolution processing expression of the output layer in this embodiment is:

F(I)＝W₃*F₃(I)+B₃+Y。

in the solution provided by the embodiment of the present invention, a convolutional neural network model training method is further provided, as shown in fig. 2 to 10, which specifically includes the following processing steps:

step 71, obtaining a preset training set, where the preset training set includes an original sample image, color components of distorted images of multiple distorted images corresponding to the original sample image, and side information components corresponding to each distorted image, where the side information components corresponding to the distorted images represent distortion characteristics of the distorted images relative to the original sample image. The plurality of distorted images differ in distortion characteristics.

In this step, an original sample image (i.e., an undistorted natural image) may be subjected to image processing of different distortion degrees in advance to obtain respective corresponding distorted images, and corresponding side information components are generated for each distorted image according to the steps in the above-described distortion removal method, so that each original sample image, the corresponding distorted image, and the corresponding side information components form an image pair, and the image pair forms a preset training set Ω.

Further, the training set may include an original sample image, and the image processing is performed on the original sample image to obtain a plurality of distorted images with different distortion characteristics and a side information component corresponding to each distorted image;

the training set may also include a plurality of original sample images, and the image processing is performed on each original sample image to obtain a plurality of distorted images with different distortion characteristics and a side information component corresponding to each distorted image.

Step 72, initializing parameters in a network parameter set of the convolutional neural network CNN for the convolutional neural network CNN with a preset structure, where the initialized parameter set may be represented by Θ₁It is shown that the initialized parameters can be set according to actual needs and experience.

In this step, the training-related high-level parameters, such as the learning rate and the gradient descent algorithm, may also be set reasonably, and specifically, various manners in the prior art may be adopted, which are not described in detail herein.

Step 73, forward calculation is performed, specifically as follows:

and inputting the distorted image color component and the corresponding side information component of each distorted image in the preset training set into a convolutional neural network with a preset structure for convolutional filtering processing to obtain a de-distorted image color component corresponding to the distorted image.

In this step, the parameter set may be specifically set to Θ for the preset training set Ω_iThe forward calculation of the convolutional neural network CNN obtains the output f (y) of the convolutional neural network, i.e., the color component of the undistorted image corresponding to each distorted image.

When the processing of this step is entered for the first time, the current parameter set is Θ₁When the processing of this step is subsequently performed again, the current parameter set Θ is used_iFor the last used parameter set Θ_i-1Obtained after adjustment, see the following description.

Step 74, determining loss values for the plurality of original sample images based on the original image color components of the plurality of original sample images and the resulting de-distorted image color components.

Specifically, a Mean Square Error (MSE) formula can be used as a loss function to obtain a loss value L (Θ)_i) See the following formula for details:

wherein H represents the number of image pairs selected from a predetermined training set in a single training, I_hRepresenting the input data corresponding to the h-th distorted image, combined from the side information component and the color component of the distorted image, F (I)_h|Θ_i) Representing the convolutional neural network C for the h-th distorted imageNN at parameter set Θ_iUndistorted image color component, X, from a downward forward calculation_hRepresenting the color component of the original image corresponding to the h-th distorted image, and i is the count of the number of times forward calculation has been currently performed.

And step 75, determining whether the convolutional neural network adopting the preset structure of the current parameter set is converged or not based on the loss value, if not, entering step 76, and if so, entering step 77.

Specifically, convergence may be determined when the loss value is less than a preset loss value threshold; or when the difference between the loss value obtained by the current calculation and the loss value obtained by the previous calculation is smaller than a preset change threshold, determining convergence, which is not limited herein.

Step 76, adjust the parameters in the current parameter set to obtain the adjusted parameter set, and then go to step 73 for the next forward calculation.

The parameters in the current parameter set may be specifically adjusted by using a back propagation algorithm.

Step 77, the final parameter set Θ with the current parameter set as output_finalAnd will use the final parameter set Θ_finalThe convolutional neural network of the preset structure is used as a trained convolutional neural network model.

Optionally, if the entire frame of distorted picture is filtered, the distorted picture and the side information component corresponding to the distorted picture may be input into a convolutional neural network model for convolutional filtering processing, so as to obtain a de-distorted picture, and the de-distorted picture is divided to obtain a first de-fust true image block corresponding to each distorted image block in the distorted picture.

Step 205: and inputting the target image block into at least one filter for filtering to obtain a second distortion-removed image block output by each filter.

If the whole frame of distorted picture is filtered, the distorted picture can be input into at least one filter for filtering processing to obtain a de-distorted picture output by each filter, and then the de-distorted picture output by each filter is divided to obtain a second de-distorted image block corresponding to a distorted image block included in the distorted picture filtered by each filter.

Step 206: and selecting an image block from the image block set as a target de-distorted image block corresponding to the target image block, wherein the image block set comprises each second de-distorted image block, each first de-distorted image block and each target image block.

In this step, the image blocks may be selected in the following two ways, respectively:

first, an image block is selected from an image block set according to an original image block corresponding to a target image block.

In implementation, difference values between each image block in the image block set and the original image block corresponding to the target image block can be calculated respectively; and selecting an image block with the minimum difference value between original image blocks corresponding to the target image block from the image block set.

The difference value between the original image block and the target image block may be a Sum of Squares (SSD) value of the difference value between the estimated value and the estimated object, or the like.

Optionally, after the target undistorted image block is selected, according to the position of the target image block in the distorted picture, flag information for identifying the data type of the target undistorted image block may be filled in the filtering flag map.

The data type of the target undistorted image block may be data output by filtering of the convolutional neural network model, data output by a certain filter of the at least one filter, or the target image block.

And secondly, selecting an image block from the image block set according to the coding information of each coding unit included in the target image block.

The video bit stream output by the entropy coder comprises the position and the coding information of each coding unit in the current original video picture.

Therefore, in implementation, the coding units included in the target image block can be determined according to the position of the target image block in the distorted picture and the position of each coding unit in the current original video picture; one image block is selected from the image block set according to the coding information of each coding unit included in the target image block.

The encoding information of the coding unit included in the target image block may be a prediction mode and/or a motion vector, etc., and one or more kinds of encoding information are used to derive the selection result of the target image block. For example, if the coding units exceeding a preset first proportion in the target image block adopt an intra-frame coding mode, selecting a first distortion-removed image block filtered by a convolutional neural network model, and if the coding units exceeding a preset second proportion in the target image block adopt a SKIP mode (SKIP) for coding, selecting the target image block, wherein the second proportion is smaller than the first proportion; otherwise, a second de-distorted image block output by a certain filter is selected.

And obtaining target undistorted image blocks corresponding to the distorted image blocks in the distorted picture according to the steps 203 to 206. In this embodiment, since an image block set corresponding to each distorted image block is obtained, where the distorted image block set includes the distorted image block and a de-true image block obtained by filtering with different filters, and then according to an original image block corresponding to the distorted image block or coding information of each coding unit included in the distorted image block, an image block with the smallest difference from the original image block is selected from the image block set as a target de-distorted image block corresponding to the distorted image block, so that filtering performance and quality, and de-distortion performance can be improved.

Step 207: and coding the original video picture to be coded according to the target distortion-removed image blocks corresponding to the distortion image blocks included in the distortion picture to obtain a video bit stream.

Specifically, according to the position of each distorted image block in the distorted image included in the distorted image, the target distortion-removed image block corresponding to each distorted image block is filled in a blank reference picture, and the reference picture is cached in a buffer memory, so that when the reference picture is selected, an original video picture to be coded can be coded by using the reference picture to obtain a video bit stream, wherein the original video picture to be coded refers to an original video picture which is not yet coded, and can be an original video picture after the current original video picture.

In the embodiment of the application, for any distorted image block included in a distorted image, that is, a first undistorted image block corresponding to the distorted image block of the distorted image is obtained after a target image block is filtered, a second undistorted image block corresponding to the target image block output by each filter is obtained after the target image block is filtered by using at least one filter, and then an image block with a small difference from an original image block corresponding to the target image block is selected from the target image block, the first undistorted image block and the second undistorted image block as a final filtered image block, so that not only is the filtering performance improved, but also the undistorted performance in a video encoding process is improved. In addition, in the embodiment of the application, the side information component is added into the convolutional neural network model, so that the generalization capability of the convolutional neural network model is improved.

Referring to fig. 3, an embodiment of the present application provides a video decoding method, where the method includes:

step 301: and carrying out entropy decoding on the received video bit stream to obtain current entropy decoding data.

Step 302: and acquiring each distorted image block included in the distorted image, wherein the distorted image is generated when the current entropy decoding data is decoded.

Step 303: and determining the data type corresponding to the target image block according to the current entropy decoding data, wherein the target image block is a distorted image block in the distorted image.

Step 304: when the data type is used for representing the data filtered by the convolutional neural network model, a side information component corresponding to the target image block is generated.

Wherein the side information component represents a distortion characteristic of the target image block with respect to its corresponding original image block in an original video picture, the original video picture being a video picture to which the current entropy-decoded data corresponds.

Step 305: and inputting the target image block and the side information component into a convolutional neural network model for convolutional filtering processing to obtain a distortion-removed image block corresponding to the target image block.

Step 306: and decoding the subsequently received video bit stream according to the distortion-removed image blocks corresponding to the distortion image blocks included in the distortion picture.

In the embodiment of the application, the data type corresponding to the target image block is determined according to the current entropy decoding data, and the filter is selected according to the data type to filter the distorted image, so that the filtering performance is improved, and the distortion removal performance in the video decoding process is improved.

For the video decoding method shown in fig. 3, referring to fig. 4-1, a detailed implementation procedure of the method may include:

step 401: and carrying out entropy decoding on the received video bit stream to obtain current entropy decoding data.

Step 402: and obtaining each distorted image block included in the distorted image, wherein the distorted image is generated when the current entropy decoding data is decoded.

During the video decoding process, a reconstructed picture may be generated, and the distorted picture may be the reconstructed picture or may be a picture obtained by filtering the reconstructed picture.

Referring to the structural diagram of the video decoding system shown in fig. 4-2, the video decoding system includes a prediction module, an entropy decoder, an inverse quantization unit, an inverse transformation unit, a reconstruction unit, a CNN (convolutional neural network model), and a buffer.

The decoding process of the video decoding system comprises the following steps: the method comprises the steps that a received video bit stream is input into an entropy decoder, the entropy decoder decodes the bit stream to obtain entropy decoding data, the entropy decoding data comprise mode information, quantization parameters, residual error information, coding information and/or a filtering mark map of each coding unit included in an original video picture and the like, the mode information is input into a prediction module, the quantization parameters are input into a convolutional neural network model, and the residual error information is input into an inverse quantization unit. The prediction module predicts the input mode information according to the reference picture in the buffer to obtain prediction data, and inputs the prediction data into the reconstruction unit. The prediction module includes an intra prediction unit, a motion compensation unit and a switch, and the mode information may include intra mode information and inter mode information. The intra-frame prediction unit can predict the intra-frame mode information to obtain intra-frame prediction data, the motion compensation unit performs inter-frame prediction on the inter-frame mode information according to the reference picture cached in the buffer to obtain inter-frame prediction data, and the switch selects to output the intra-frame prediction data or the inter-frame prediction data to the reconstruction unit.

The inverse quantization unit and the inverse transformation unit respectively perform inverse quantization and inverse transformation processing on the residual error information to obtain prediction error information, and the prediction error information is input into the reconstruction unit; a reconstruction unit generates a reconstructed picture from the prediction error information and the prediction data. Accordingly, in this step, the reconstructed picture generated by the reconstruction unit may be acquired and taken as a distorted picture.

Optionally, referring to fig. 4-3, a filter may be further connected in series between the convolutional neural network model and the reconstruction unit, and the filter may further filter the reconstructed picture generated by the reconstruction unit and output the filtered reconstructed picture. Accordingly, in this step, a filtered reconstructed picture may be obtained and the filtered reconstructed picture may be taken as a distorted picture.

Step 403: and determining the data type corresponding to the target image block according to the current entropy decoding data, wherein the target image block is a distorted image block in the distorted image.

The current entropy decoding data comprises a filtering mark map, the filtering mark map comprises mark information corresponding to each distorted image block in the distorted image, and the mark information corresponding to the distorted image block is used for identifying the data type corresponding to the distorted image block.

The method comprises the following steps: reading mark information corresponding to the target image block from the filtering mark image according to the position of the target image block in the distorted image; and determining the data type corresponding to the target image block according to the mark information. Or,

the current entropy-decoded data includes the location and coding information of each coding unit in the original video picture. The method comprises the following steps: determining each coding unit included by the target image block according to the position of the target image block in the distorted image and the positions of each coding unit in the original video image; and determining the data type corresponding to the target image block according to the coding information of each coding unit included in the target image block.

The encoding information of the coding unit included in the target image block may be a prediction mode and/or a motion vector, etc. For example, if the coding units exceeding a preset first proportion in the target image block adopt an intra-frame coding mode, determining the data type to be data filtered by a convolutional neural network model; if the coding unit exceeding the preset second proportion in the target image block adopts a SKIP mode (SKIP) for coding, determining the data type as data filtered by a filter, wherein the second proportion is smaller than the first proportion; otherwise, determining the data type as the target image block.

Step 404: when the data type is used for representing the data filtered by the convolutional neural network model, a side information component corresponding to the target image block is generated.

The detailed implementation process of generating the side information component corresponding to the target image block can refer to the relevant content in step 203 in the embodiment described in fig. 2-1, and will not be described in detail here.

Step 405: and inputting the target image block and the side information component into a convolutional neural network model for convolutional filtering processing to obtain a distortion-removed image block corresponding to the target image block.

The detailed implementation process of the convolution filtering process performed by the convolution neural network model can be referred to in step 204 of the embodiment described in fig. 2-1, and will not be described in detail here.

Step 406: and when the data type is used for representing data output by a certain filter, inputting the target image block into the filter for filtering processing to obtain a distortion-removed image block.

Step 407: when the data type is used to represent a target image block, the target image block is determined to be a de-distorted image block.

And acquiring a distortion-removed image block corresponding to each distorted image block in the distorted picture according to the steps 403 to 407.

Step 408: and decoding the subsequently received video bit stream according to the distortion-removed image blocks corresponding to the distortion image blocks included in the distortion picture.

Specifically, according to the position of each distorted image block in the distorted picture, filling the undistorted image block corresponding to each distorted image block in a blank reference picture, and storing the reference picture in a buffer, so that the reference picture in the buffer can be used to decode a subsequently received video bitstream.

The following are embodiments of the apparatus of the present application that may be used to perform embodiments of the method of the present application. For details which are not disclosed in the embodiments of the apparatus of the present application, reference is made to the embodiments of the method of the present application.

Referring to fig. 5, an embodiment of the present application provides a video encoding apparatus 500, where the apparatus 500 includes:

an obtaining module 501, configured to obtain a distorted picture, where the distorted picture is generated when a current original video picture is encoded;

a generating module 502, configured to generate a side information component corresponding to a distorted picture, where the side information component represents a distortion characteristic of the distorted picture with respect to the current original video picture;

a filtering module 503, configured to input the distorted picture and the side information component into a convolutional neural network model for filtering to obtain a first distortion-removed image block corresponding to a distorted image block included in the distorted picture, where the convolutional neural network model is obtained by training based on a preset training set, and the preset training set includes an original sample picture, a plurality of distorted pictures corresponding to the original sample picture, and a side information component corresponding to the distorted picture corresponding to each original sample picture;

a selecting module 504, configured to select an image block from an image block set corresponding to the distorted image block as a target undistorted image block corresponding to the distorted image block, where the image block set includes a first undistorted image block and/or the distorted image block corresponding to the distorted image block;

and the encoding module 505 is configured to encode an original video picture after the current original video picture according to a target distortion-removed image block corresponding to the distorted image block to obtain a video bitstream.

Optionally, the filtering module 503 includes:

Optionally, the filtering module is further configured to:

Optionally, the selecting module 504 includes:

Optionally, the first selecting unit is configured to:

Referring to fig. 6, an embodiment of the present application provides a video decoding apparatus 600, the apparatus 600 including:

a decoding module 601, configured to perform entropy decoding on a received video bitstream to obtain current entropy-decoded data;

an obtaining module 602, configured to obtain each distorted image block included in a distorted picture, where the distorted picture is generated when the current entropy decoding data is decoded;

a determining module 603, configured to determine, according to the current entropy decoding data, a data type corresponding to a target image block, where the target image block is a distorted image block in the distorted image;

a generating module 604, configured to generate a side information component corresponding to a target image block when the data type is used to represent data filtered by a convolutional neural network model, where the side information component represents a distortion characteristic of the target image block relative to an original image block corresponding to the target image block in an original video picture, and the original video picture is a video picture corresponding to the current entropy decoding data;

a filtering module 605, configured to input the target image block and the side information component into a convolutional neural network model for convolutional filtering to obtain a distortion-removed image block corresponding to the target image block, where the convolutional neural network model is obtained by training based on a preset training set, and the preset training set includes an original sample picture, a plurality of distorted pictures corresponding to the original sample picture, and a side information component corresponding to a distorted picture corresponding to each original sample picture;

the decoding module 601 is further configured to decode a subsequently received video bitstream according to a de-distorted image block corresponding to each distorted image block included in the distorted image.

Optionally, the filtering module 605 is further configured to:

the determining module 603 includes:

the determining module 603 comprises:

Referring to fig. 7, an embodiment of the present application provides a video encoding method, including:

step 701: and acquiring a distorted image block included in the distorted image, wherein the distorted image is generated when the original video image is coded.

Step 702: and determining the data type corresponding to the target image block according to the coding information of the coding unit included in the target image block, wherein the target image block is any distorted image block in the distorted image.

Step 703: when the data type is used for representing the data filtered by the convolutional neural network model, a side information component corresponding to the target image block is generated, and the side information component represents the distortion characteristics of the target image block relative to the corresponding original image block in the original video picture.

Step 704: and inputting the target image block and the side information component into a convolutional neural network model for convolutional filtering to obtain a distortion-removed image block corresponding to the target image block, wherein the convolutional neural network model is obtained by training based on a preset training set, and the preset training set comprises an original sample picture, a plurality of distorted pictures corresponding to the original sample picture and the side information component corresponding to the distorted picture corresponding to each original picture.

Step 705: and according to the distortion-removed image blocks corresponding to the distortion image blocks in the distortion image, encoding the original video image behind the current original video image to obtain the video bit stream.

In the embodiment of the application, the data type corresponding to the target image block is determined according to the coding information of the coding unit included in the distorted image block, and the filter is selected according to the data type to filter the distorted image, so that the filtering performance is improved, and the distortion removal performance in the video coding process is improved.

For the video coding method shown in fig. 7, referring to fig. 8, a detailed implementation procedure of the method may include:

step 801-: respectively, as in step 201-202 in the embodiment shown in fig. 2-1, and will not be described in detail.

Step 803: and determining the data type corresponding to the target image block according to the coding information of the coding unit included in the target image block, wherein the target image block is any distorted image block in the distorted image.

When the current original video picture is subjected to video coding, a video bit stream is obtained, wherein the video bit stream comprises the position and the coding information of each coding unit in the current original video picture.

In this step, each coding unit included in the target image block may be determined according to the position of the target image block in the distorted picture and the position of each coding unit in the original video picture; and determining the data type corresponding to the target image block according to the coding information of each coding unit included in the target image block.

Step 804: when the data type is used for representing the data filtered by the convolutional neural network model, a side information component corresponding to the target image block is generated.

Step 805: and inputting the target image block and the side information component into a convolutional neural network model for convolutional filtering processing to obtain a distortion-removed image block corresponding to the target image block.

Step 806: and when the data type is used for representing data output by a certain filter, inputting the target image block into the filter for filtering processing to obtain a distortion-removed image block.

Step 807: when the data type is used to represent a target image block, the target image block is determined to be a de-distorted image block.

The undistorted image blocks corresponding to the distorted image blocks in the distorted picture are obtained according to the steps 803 to 807.

Step 808: and coding the original video picture behind the current original video picture according to the distortion-removed image blocks corresponding to the distortion image blocks included in the distortion picture to obtain the video bit stream.

Specifically, according to the position of each distorted image block in the distorted image included in the distorted image, the target undistorted image block corresponding to each distorted image block is filled in a blank reference picture, and the reference picture is cached in a buffer, so that when the reference picture is selected, the reference picture can be used to encode an original video picture to be encoded to obtain a video bitstream.

In the embodiment of the application, the data type corresponding to the target image block is determined according to the coding information of each coding unit included in the target image block, the filter is selected according to the data type to filter the distorted image, because the coding information can reflect the original image information of the coding unit in the original video image, a filtering mode with small distortion during filtering, namely the data type, can be determined according to the coding information, and the filter is selected according to the data type, so that the filtering performance is improved, and the distortion removal performance in the video coding process is improved.

Referring to fig. 9, an embodiment of the present application provides a video encoding apparatus 900, where the apparatus 900 includes:

an obtaining module 901, configured to obtain a distorted image block included in a distorted image, where the distorted image is generated when an original video image is encoded;

a determining module 902, configured to determine, according to coding information of a coding unit included in a target image block, a data type corresponding to the target image block, where the target image block is any distorted image block in the distorted image;

a generating module 903, configured to generate a side information component corresponding to the target image block when the data type is used to represent data filtered by a convolutional neural network model, where the side information component represents a distortion characteristic of the target image block relative to an original image block corresponding to the target image block in the original video picture;

a filtering module 904, configured to input the target image block and the side information component into a convolutional neural network model for convolutional filtering to obtain a de-distorted image block corresponding to the target image block, where the convolutional neural network model is obtained by training based on a preset training set, and the preset training set includes an original sample picture, a plurality of distorted pictures corresponding to the original sample picture, and a side information component corresponding to each distorted picture;

and the encoding module 905 is configured to encode an original video picture after the current original video picture according to a distortion-removed image block corresponding to a distortion image block in the distortion picture to obtain a video bitstream.

Optionally, the filtering module 904 is further configured to:

In the embodiment of the application, the data type corresponding to the target image block is determined according to the coding information of each coding unit included in the target image block, and the filter is selected according to the data type to filter the distorted image, so that not only is the filtering performance improved, but also the distortion removal performance in the video coding process is improved.

Referring to fig. 10, an embodiment of the present application provides a coding and decoding system 1000, where the system 1000 includes a video encoding apparatus 1001 provided in the embodiment shown in fig. 5 and a video decoding apparatus 1002 provided in the embodiment shown in fig. 6; or,

the system 1000 includes a video encoding apparatus 1001 provided in the embodiment shown in fig. 9 and a video decoding apparatus 1002 provided in the embodiment shown in fig. 6.

With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.

Fig. 11 shows a block diagram of a terminal 1100 according to an exemplary embodiment of the present invention. The terminal 1100 may be a portable mobile terminal such as: a smart phone, a tablet computer, an MP3 player (Moving Picture experts Group Audio Layer III, motion video experts compression standard Audio Layer 3), an MP4 player (Moving Picture experts Group Audio Layer IV, motion video experts compression standard Audio Layer 4), a notebook computer, or a desktop computer. Terminal 1100 may also be referred to by other names such as user equipment, portable terminal, laptop terminal, desktop terminal, and so forth.

In general, terminal 1100 includes: a processor 1101 and a memory 1102.

Processor 1101 may include one or more processing cores, such as a 4-core processor, an 8-core processor, or the like. The processor 1101 may be implemented in at least one hardware form of a DSP (Digital Signal Processing), an FPGA (Field-Programmable Gate Array), and a PLA (Programmable Logic Array). The processor 1101 may also include a main processor and a coprocessor, the main processor is a processor for processing data in an awake state, and is also called a Central Processing Unit (CPU); a coprocessor is a low power processor for processing data in a standby state. In some embodiments, the processor 1101 may be integrated with a GPU (Graphics Processing Unit) that is responsible for rendering and drawing the content that the display screen needs to display. In some embodiments, the processor 1101 may further include an AI (Artificial Intelligence) processor for processing computing operations related to machine learning.

Memory 1102 may include one or more computer-readable storage media, which may be non-transitory. Memory 1102 can also include high-speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer readable storage medium in memory 1102 is used to store at least one instruction for execution by processor 1101 to implement a video encoding method or a video decoding method provided by method embodiments herein.

In some embodiments, the terminal 1100 may further include: a peripheral interface 1103 and at least one peripheral. The processor 1101, memory 1102 and peripheral interface 1103 may be connected by a bus or signal lines. Various peripheral devices may be connected to the peripheral interface 1103 by buses, signal lines, or circuit boards. Specifically, the peripheral device includes: at least one of radio frequency circuitry 1104, touch display screen 1105, camera 1106, audio circuitry 1107, positioning component 1108, and power supply 1109.

The peripheral interface 1103 may be used to connect at least one peripheral associated with I/O (Input/Output) to the processor 1101 and the memory 1102. In some embodiments, the processor 1101, memory 1102, and peripheral interface 1103 are integrated on the same chip or circuit board; in some other embodiments, any one or two of the processor 1101, the memory 1102 and the peripheral device interface 1103 may be implemented on separate chips or circuit boards, which is not limited by this embodiment.

The Radio Frequency circuit 1104 is used to receive and transmit RF (Radio Frequency) signals, also called electromagnetic signals. The radio frequency circuit 1104 communicates with communication networks and other communication devices via electromagnetic signals. The radio frequency circuit 1104 converts an electric signal into an electromagnetic signal to transmit, or converts a received electromagnetic signal into an electric signal. Optionally, the radio frequency circuit 1104 includes: an antenna system, an RF transceiver, one or more amplifiers, a tuner, an oscillator, a digital signal processor, a codec chipset, a subscriber identity module card, and so forth. The radio frequency circuit 1104 may communicate with other terminals via at least one wireless communication protocol. The wireless communication protocols include, but are not limited to: the world wide web, metropolitan area networks, intranets, generations of mobile communication networks (2G, 3G, 4G, and 5G), Wireless local area networks, and/or WiFi (Wireless Fidelity) networks. In some embodiments, the rf circuit 1104 may further include NFC (Near Field Communication) related circuits, which are not limited in this application.

The display screen 1105 is used to display a UI (User Interface). The UI may include graphics, text, icons, video, and any combination thereof. When the display screen 1105 is a touch display screen, the display screen 1105 also has the ability to capture touch signals on or over the surface of the display screen 1105. The touch signal may be input to the processor 1101 as a control signal for processing. At this point, the display screen 1105 may also be used to provide virtual buttons and/or a virtual keyboard, also referred to as soft buttons and/or a soft keyboard. In some embodiments, display 1105 may be one, providing the front panel of terminal 1100; in other embodiments, the display screens 1105 can be at least two, respectively disposed on different surfaces of the terminal 1100 or in a folded design; in still other embodiments, display 1105 can be a flexible display disposed on a curved surface or on a folded surface of terminal 1100. Even further, the display screen 1105 may be arranged in a non-rectangular irregular pattern, i.e., a shaped screen. The Display screen 1105 may be made of LCD (Liquid Crystal Display), OLED (Organic Light-Emitting Diode), and the like.

Camera assembly 1106 is used to capture images or video. Optionally, camera assembly 1106 includes a front camera and a rear camera. Generally, a front camera is disposed at a front panel of the terminal, and a rear camera is disposed at a rear surface of the terminal. In some embodiments, the number of the rear cameras is at least two, and each rear camera is any one of a main camera, a depth-of-field camera, a wide-angle camera and a telephoto camera, so that the main camera and the depth-of-field camera are fused to realize a background blurring function, and the main camera and the wide-angle camera are fused to realize panoramic shooting and VR (Virtual Reality) shooting functions or other fusion shooting functions. In some embodiments, camera assembly 1106 may also include a flash. The flash lamp can be a monochrome temperature flash lamp or a bicolor temperature flash lamp. The double-color-temperature flash lamp is a combination of a warm-light flash lamp and a cold-light flash lamp, and can be used for light compensation at different color temperatures.

The audio circuitry 1107 may include a microphone and a speaker. The microphone is used for collecting sound waves of a user and the environment, converting the sound waves into electric signals, and inputting the electric signals to the processor 1101 for processing or inputting the electric signals to the radio frequency circuit 1104 to achieve voice communication. For stereo capture or noise reduction purposes, multiple microphones may be provided, each at a different location of terminal 1100. The microphone may also be an array microphone or an omni-directional pick-up microphone. The speaker is used to convert electrical signals from the processor 1101 or the radio frequency circuit 1104 into sound waves. The loudspeaker can be a traditional film loudspeaker or a piezoelectric ceramic loudspeaker. When the speaker is a piezoelectric ceramic speaker, the speaker can be used for purposes such as converting an electric signal into a sound wave audible to a human being, or converting an electric signal into a sound wave inaudible to a human being to measure a distance. In some embodiments, the audio circuitry 1107 may also include a headphone jack.

Positioning component 1108 is used to locate the current geographic position of terminal 1100 for purposes of navigation or LBS (location based Service). The positioning component 1108 may be a positioning component based on the GPS (global positioning System) in the united states, the beidou System in china, or the galileo System in russia.

Power supply 1109 is configured to provide power to various components within terminal 1100. The power supply 1109 may be alternating current, direct current, disposable or rechargeable. When the power supply 1109 includes a rechargeable battery, the rechargeable battery may be a wired rechargeable battery or a wireless rechargeable battery. The wired rechargeable battery is a battery charged through a wired line, and the wireless rechargeable battery is a battery charged through a wireless coil. The rechargeable battery may also be used to support fast charge technology.

In some embodiments, terminal 1100 can also include one or more sensors 810. The one or more sensors 810 include, but are not limited to: acceleration sensor 811, gyro sensor 812, pressure sensor 813, fingerprint sensor 814, optical sensor 815 and proximity sensor 816.

The acceleration sensor 811 may detect the magnitude of acceleration in three coordinate axes of the coordinate system established with the terminal 1100. For example, the acceleration sensor 811 may be used to detect the components of the gravitational acceleration in three coordinate axes. The processor 1101 may control the touch display screen 1105 to display the user interface in a landscape view or a portrait view according to the gravitational acceleration signal collected by the acceleration sensor 811. The acceleration sensor 811 may also be used for acquisition of motion data of a game or a user.

The gyro sensor 812 may detect a body direction and a rotation angle of the terminal 1100, and the gyro sensor 812 may cooperate with the acceleration sensor 811 to acquire a 3D motion of the user with respect to the terminal 1100. From the data collected by the gyro sensor 812, the processor 1101 may implement the following functions: motion sensing (such as changing the UI according to a user's tilting operation), image stabilization at the time of photographing, game control, and inertial navigation.

Pressure sensors 813 may be disposed on the side bezel of terminal 1100 and/or on the lower layers of touch display screen 1105. When the pressure sensor 813 is disposed on the side frame of the terminal 1100, the holding signal of the user to the terminal 1100 can be detected, and the processor 1101 performs left-right hand recognition or shortcut operation according to the holding signal collected by the pressure sensor 813. When the pressure sensor 813 is disposed at a lower layer of the touch display screen 1105, control of an operability control on the UI interface is realized by the processor 1101 according to a pressure operation of the user on the touch display screen 1105. The operability control comprises at least one of a button control, a scroll bar control, an icon control and a menu control.

The fingerprint sensor 814 is configured to collect a fingerprint of the user, and the processor 1101 identifies the user according to the fingerprint collected by the fingerprint sensor 814, or the fingerprint sensor 814 identifies the user according to the collected fingerprint. Upon recognizing that the user's identity is a trusted identity, the user is authorized by the processor 1101 to perform relevant sensitive operations including unlocking the screen, viewing encrypted information, downloading software, paying for and changing settings, etc. Fingerprint sensor 814 may be disposed on the front, back, or side of terminal 1100. When a physical button or a vendor Logo is provided on the terminal 1100, the fingerprint sensor 814 may be integrated with the physical button or the vendor Logo.

The optical sensor 815 is used to collect the ambient light intensity. In one embodiment, the processor 1101 may control the display brightness of the touch display screen 1105 based on the ambient light intensity collected by the optical sensor 815. Specifically, when the ambient light intensity is high, the display brightness of the touch display screen 1105 is increased; when the ambient light intensity is low, the display brightness of the touch display screen 1105 is turned down. In another embodiment, the processor 1101 may also dynamically adjust the shooting parameters of the camera assembly 1106 according to the ambient light intensity collected by the optical sensor 815.

A proximity sensor 816, also known as a distance sensor, is typically disposed on a front panel of the terminal 1100. The proximity sensor 816 is used to collect the distance between the user and the front surface of the terminal 1100. In one embodiment, when the proximity sensor 816 detects that the distance between the user and the front surface of the terminal 1100 gradually decreases, the processor 1101 controls the touch display screen 1105 to switch from a bright screen state to a dark screen state; when the proximity sensor 816 detects that the distance between the user and the front surface of the terminal 1100 becomes gradually larger, the touch display screen 1105 is controlled by the processor 1101 to switch from the rest screen state to the bright screen state.

Those skilled in the art will appreciate that the configuration shown in fig. 11 does not constitute a limitation of terminal 1100, and may include more or fewer components than those shown, or may combine certain components, or may employ a different arrangement of components.

Other embodiments of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the application disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the application being indicated by the following claims.

It will be understood that the present application is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the application is limited only by the appended claims.

Claims

1. A method of video encoding, the method comprising:

2. The method of claim 1, wherein the inputting the distorted picture and the side information component into a convolutional neural network model for filtering to obtain a first de-distorted image block corresponding to a distorted image block included in the distorted picture comprises:

3. The method of claim 1, wherein after obtaining the distorted picture, further comprising:

4. The method of any of claims 1 to 3, wherein the selecting an image block from a set of image blocks comprises:

5. The method of claim 4, wherein selecting an image block from a set of image blocks according to an original image block corresponding to the distorted image block in the current original video picture comprises:

6. The method of claim 4, wherein the video bitstream further comprises a filter flag map corresponding to the distorted picture, the method further comprising:

7. A method of video decoding, the method comprising:

8. The method of claim 7, wherein the method further comprises:

9. The method of claim 7, wherein the current entropy-decoded data comprises a filtering flag map, the filtering flag map comprises flag information corresponding to each distorted image block in the distorted picture, and the flag information corresponding to a distorted image block is used for identifying a data type corresponding to the distorted image block;

10. The method of claim 7, wherein the current entropy-decoded data includes location and coding information for each coding unit in the original video picture;

11. A video encoding apparatus, characterized in that the apparatus comprises:

12. The apparatus of claim 11, wherein the filtering module comprises:

13. The apparatus of claim 11, wherein the filtering module is further configured to:

14. The apparatus of any of claims 11 to 13, wherein the selection module comprises:

15. The apparatus as claimed in claim 14, wherein said first selection unit is configured to:

16. The apparatus of claim 14, wherein the video bitstream further comprises a filter flag map corresponding to the distorted picture, the apparatus further comprising:

17. A video decoding apparatus, characterized in that the apparatus comprises:

the filtering module is used for inputting the target image block and the side information component into a convolutional neural network model for convolutional filtering processing to obtain a distortion-removed image block corresponding to the target image block, wherein the convolutional neural network model is obtained by training based on a preset training set, the preset training set comprises an original sample picture, a plurality of distorted pictures corresponding to the original sample picture and the side information component corresponding to each distorted picture;

18. The apparatus of claim 17, wherein the filtering module is further configured to:

19. The apparatus of claim 17, wherein the current entropy decoded data comprises a filtering flag map, the filtering flag map comprising flag information corresponding to each distorted image block in the distorted picture, the flag information corresponding to a distorted image block being used to identify a data type corresponding to the distorted image block;

the determining module includes:

20. The apparatus of claim 17, wherein the current entropy decoding data includes position and coding information for each coding unit in the original video picture;

the determining module comprises:

21. A method of video encoding, the method comprising:

22. The method of claim 21, wherein the method further comprises:

23. A video encoding apparatus, characterized in that the apparatus comprises:

24. The apparatus of claim 23, wherein the filtering module is further configured to:

25. A computer-readable storage medium, characterized in that a computer program is stored in the computer-readable storage medium, which computer program, when being executed by a processor, carries out the method steps of any one of the claims 1-6 or carries out the method steps of claim 21 or 22.

26. A computer-readable storage medium, characterized in that a computer program is stored in the computer-readable storage medium, which computer program, when being executed by a processor, carries out the method steps of any of the claims 7-10.

27. A coding/decoding system comprising the video encoding apparatus according to any one of claims 11 to 16 and the video decoding apparatus according to any one of claims 17 to 20; or,

the system comprising a video encoding apparatus according to claim 21 or 22 and a video decoding apparatus according to any one of claims 17 to 20.