CN112261409B

CN112261409B - Residual encoding, decoding method and device, storage medium and electronic device

Info

Publication number: CN112261409B
Application number: CN201910663278.XA
Authority: CN
Inventors: 于婧; 黎天送; 曾幸; 王宁; 喻莉; 李君临
Original assignee: ZTE Corp; Huazhong University of Science and Technology
Current assignee: ZTE Corp; Huazhong University of Science and Technology
Priority date: 2019-07-22
Filing date: 2019-07-22
Publication date: 2024-12-20
Anticipated expiration: 2039-07-22
Also published as: WO2021012942A1; CN112261409A

Abstract

The invention provides a residual coding method, a residual decoding method, a residual coding device, a residual decoding device, a storage medium and an electronic device. In the residual coding method, a residual error of a reference block of a block to be coded is determined, the residual error of the block to be coded is predicted based on the residual error of the reference block of the block to be coded to obtain a predicted residual error of the block to be coded, and a residual error difference value between the predicted residual error of the block to be coded and an actual residual error of the block to be coded is coded into a code stream. The invention solves the problem of how to reduce the bit rate of the coding residual error in the related technology, thereby achieving the effect of saving the code rate.

Description

Residual coding and decoding methods and devices, storage medium and electronic device

Technical Field

The present invention relates to the field of communications, and in particular, to a method and apparatus for encoding and decoding a residual error, a storage medium, and an electronic apparatus.

Background

In recent years, with the rapid development of network communication and multimedia technology, video content is beginning to be presented to viewers in high-resolution and ultra-high-resolution forms. The higher the resolution of the video, the better the visual effect experience that people view the video content compared to standard resolution video. Meanwhile, high resolution video also puts higher demands on video coding technology. To address this challenge, the international joint Video Coding group (Joint Collaborative Team on Video Coding, abbreviated JCT-VC) developed a new generation of high efficiency Video Coding standard (HIGH EFFICIENCY Video Coding, abbreviated HEVC). Compared with the previous generation video coding standard H.264/MPEG-4 AVC, HEVC improves the compression efficiency by 50 percent and maintains the original same visual quality. Google announced that an open media alliance (Alliance of Open Media, abbreviated as AOM) was established with amazon, cisco, intel, microsoft, firefox, netflix, and AOM, which developed a new generation video coding format AV1, 9 months 1. In 12 months 2017, the Chinese digital audio and video coding and decoding technical standard (Audio Video Coding Standard, called AVS for short) provides a new generation of AVS3 video coding. The joint video exploration team (Joint Video Exploration Team, abbreviated JVET) determines the name of the latest generation video coding standard as universal video coding (VERSATILE VIDEO CODING, abbreviated VVC) at the 10 th month 4, 10 th united states san diego conference in 2018, the main objective is to improve existing HEVC, providing higher compression performance while optimizing for emerging applications (360 ° panoramic video and high dynamic range imaging (HIGH DYNAMIC RANGE IMAGING, abbreviated HDR or HDRI)). VVC is expected to be standardized before 2020, and the currently proposed solution has increased by more than 40% relative to HEVC.

Inter-frame prediction is the most core component in the video coding standard HEVC/AV1/AVS, and utilizes the correlation of video time domain and uses the adjacent coded image pixels of the time domain to predict the pixels of the current image so as to achieve the purpose of effectively removing the video time domain redundancy. The mainstream video coding standard at present adopts a block-based motion compensation technology, and the principle is that a best matching block is found for each pixel block of a current image in a previously coded image through motion estimation. The picture used for prediction is called a reference picture, the displacement of the reference block to the current pixel block is called a Motion Vector (MV), and the difference between the original pixel value of the current block and the pixel value of the predicted block after motion compensation of the reference block is called a residual (which may also be called a residual, a residual value, a residual block). The inter prediction only needs to encode the optimal MV, reference frame index and residual value of the encoded block and then write the encoded MV, reference frame index and residual value into the code stream to be transmitted to the decoder. The decoder side finds out the corresponding reference block in the reference frame according to the optimal MV and the reference frame index, and then adds the residual error value after decoding, so that the original pixel value of the decoding block can be recovered. The inter prediction mainly requires bit rate consuming coding of residual blocks, and the traditional inter prediction directly codes the actual residual obtained after prediction. However, for most complex motion cases, the value of the residual block is very large, which results in a very high bit rate of the encoded residual.

Therefore, how to reduce the bit rate of the encoded residual to achieve the effect of saving the code rate is a problem to be solved.

Disclosure of Invention

The embodiment of the invention provides a residual coding and decoding method and device, a storage medium and an electronic device, which at least solve the problem of how to reduce the bit rate of coded residual in the related technology.

According to one embodiment of the invention, a residual error coding method is provided, which comprises the steps of determining residual errors of a reference block of a block to be coded, wherein the reference block of the block to be coded comprises at least two first reference blocks of the block to be coded in a time domain, or at least two second reference blocks of the block to be coded in a space domain, or at least one first reference block of the block to be coded in the time domain and at least one second reference block of the block to be coded in the space domain, predicting the residual errors of the block to be coded based on the residual errors of the reference block of the block to be coded to obtain predicted residual errors of the block to be coded, and coding residual error differences between the predicted residual errors of the block to be coded and actual residual errors of the block to be coded into a code stream.

In at least one exemplary embodiment, in the case that the image frame where the block to be encoded is located is a P frame, the first reference block of the block to be encoded in the time domain includes an optimal prediction unit block PU of the block to be encoded in a forward reference frame, denoted as a first optimal PU, and/or an optimal PU of the block to be encoded in a frame preceding the forward reference frame, denoted as a second optimal PU, or in the case that the image frame where the block to be encoded is located is a B frame, the first reference block of the block to be encoded in the time domain includes an optimal PU of the block to be encoded in a forward reference frame, denoted as a third optimal PU, and/or an optimal PU of the block to be encoded in a backward reference frame, denoted as a fourth optimal PU.

In at least one exemplary embodiment, in the case that the image frame in which the block to be encoded is located is a P frame, a first reference block of the block to be encoded in the time domain is determined by determining the first optimal PU of the block to be encoded in the forward reference frame by motion estimation and determining a motion vector MV of the block to be encoded with respect to the first optimal PU, and determining the second optimal PU in a previous frame of the forward reference frame by motion estimation according to the position of the first optimal PU and the MV.

In at least one exemplary embodiment, the second reference block of the block to be encoded in the spatial domain is located in the image frame where the block to be encoded is located, and is adjacent to the block to be encoded in the spatial domain.

In at least one exemplary embodiment, the second reference block of the block to be encoded in the spatial domain includes a left block and/or an upper block adjacent to the block to be encoded in an image frame where the block to be encoded is located.

In at least one exemplary embodiment, the reference block of the block to be encoded includes:

Two first reference blocks with non-all zero corresponding residuals of the block to be coded in the time domain and two second reference blocks with non-all zero corresponding residuals of the block to be coded in the space domain, or

A first reference block with non-all zero corresponding residual error of the block to be coded in the time domain and a second reference block with non-all zero corresponding residual error of the block to be coded in the space domain, or

Two first reference blocks with non-all zero corresponding residuals of the block to be coded in the time domain and one second reference block with non-all zero corresponding residuals of the block to be coded in the space domain, or

A first reference block with non-all zero corresponding residual error of the block to be coded in the time domain and two second reference blocks with non-all zero corresponding residual error of the block to be coded in the space domain, or

Two first reference blocks with non-all zero corresponding residuals of the block to be coded in the time domain, or

And the corresponding residual errors of the block to be coded in the space domain are two second reference blocks which are not all zero.

In at least one exemplary embodiment, predicting the residual of the block to be encoded based on the residual of the reference block of the block to be encoded, and obtaining the predicted residual of the block to be encoded includes inputting the residual of the reference block into a residual prediction model to obtain the predicted residual of the block to be encoded, wherein the residual prediction model is obtained by training with a deep learning network based on training samples, the training samples include the residual of the reference block of the encoded block with a known residual and the actual residual of the encoded block with a known residual, or performing linear weighting on the residual of the reference block to obtain the predicted residual of the block to be encoded, wherein the linear weighting includes linear weighting of a single weight or linear weighting of multiple weights.

In at least one exemplary embodiment, the actual residual of the block to be encoded is a difference between a pixel value of an original image block of the block to be encoded and a pixel value of a prediction block of the block to be encoded, wherein the prediction block of the block to be encoded is a block obtained after motion compensation of the reference block of the encoding block.

According to another embodiment of the invention, a residual error decoding method is provided, which comprises the steps of determining a predicted block of a block to be decoded based on a motion vector MV resolved in a code stream, obtaining a residual error of a reference block of the block to be decoded, wherein the reference block of the block to be decoded comprises at least two first reference blocks of the block to be decoded in a time domain, or at least two second reference blocks of the block to be decoded in a space domain, or at least one first reference block of the block to be decoded in the time domain and at least one second reference block of the block to be decoded in the space domain, predicting the residual error of the block to be decoded based on the residual error of the reference block to be decoded, obtaining a predicted residual error of the block to be decoded, and determining an actual residual error of the block to be decoded according to the predicted residual error of the block to be decoded and a difference between the predicted residual error of the block to be decoded and an actual residual error of the block to be decoded resolved from the code stream.

In at least one exemplary embodiment, in the case that the image frame where the block to be decoded is located is a P frame, the first reference block of the block to be decoded in the time domain includes an optimal prediction unit block PU of the block to be decoded in a forward reference frame, denoted as a first optimal PU, and/or an optimal PU of the block to be decoded in a frame previous to the forward reference frame, denoted as a second optimal PU, or in the case that the image frame where the block to be decoded is located is a B frame, the first reference block of the block to be decoded in the time domain includes an optimal PU of the block to be decoded in a forward reference frame, denoted as a third optimal PU, and/or an optimal PU of the block to be decoded in a backward reference frame, denoted as a fourth optimal PU.

In at least one exemplary embodiment, in the case that the image frame where the block to be decoded is located is a P frame, a first reference block of the block to be decoded in the time domain is determined by determining the first optimal PU of the block to be decoded in the forward reference frame according to the MV parsed in the code stream, and determining the second optimal PU in the previous frame of the forward reference frame according to the co-located PU of the first optimal PU in the previous frame of the forward reference frame by motion estimation.

In at least one exemplary embodiment, the second reference block of the block to be decoded in the spatial domain is located in the image frame in which the block to be decoded is located, and is adjacent to the block to be decoded in the spatial domain.

In at least one exemplary embodiment, the second reference block of the block to be decoded in the spatial domain includes a left block and/or an upper block adjacent to the block to be decoded in an image frame where the block to be decoded is located.

In at least one exemplary embodiment, predicting the residual error of the block to be decoded based on the residual error of the reference block of the block to be decoded, to obtain the predicted residual error of the block to be decoded includes inputting the residual error of the reference block into a residual error prediction model to obtain the predicted residual error of the block to be decoded, wherein the residual error prediction model is the same as a residual error prediction model at an encoder end, or linearly weighting the residual error of the reference block to obtain the predicted residual error of the block to be decoded, wherein the linear weighting is the same as a linear weighting at the encoder end.

In at least one exemplary embodiment, after determining the actual residual of the block to be decoded, a residual difference between the predicted residual of the block to be decoded and the predicted residual of the block to be decoded parsed from the code stream and the actual residual of the block to be decoded, further comprises adding the actual residual to the predicted block of the block to be decoded, and recovering the original image block of the block to be decoded.

According to another embodiment of the invention, a residual coding device is provided, which comprises a coder side reference residual determining module, a coder side residual prediction module and a residual coding module, wherein the coder side reference residual determining module is used for determining the residual of a reference block of a block to be coded, the reference block of the block to be coded comprises at least two first reference blocks of the block to be coded in a time domain, or at least two second reference blocks of the block to be coded in a space domain, or at least one first reference block of the block to be coded in the time domain and at least one second reference block of the block to be coded in the space domain, the coder side residual prediction module is used for predicting the residual of the block to be coded based on the residual of the reference block to be coded to obtain the predicted residual of the block to be coded, and the residual coding module is used for coding the residual difference between the predicted residual of the block to be coded and the actual residual of the block to be coded into a code stream.

According to still another embodiment of the present invention, there is provided a residual decoding apparatus, including a decoder-side reference residual determining module configured to determine a prediction block of a block to be decoded based on a motion vector MV parsed in a code stream and obtain a residual of a reference block of the block to be decoded, where the reference block of the block to be decoded includes at least two first reference blocks of the block to be decoded in a time domain, or at least two second reference blocks of the block to be decoded in a space domain, or at least one first reference block of the block to be decoded in a time domain and at least one second reference block of the block to be decoded in a space domain, a decoder-side residual predicting module configured to predict a residual of the block to be decoded based on a residual of the reference block of the block to be decoded, to obtain a prediction residual of the block to be decoded, and a residual decoding module configured to determine an actual residual of the block to be decoded according to the prediction residual of the block to be decoded and a difference between the prediction residual of the block to be decoded and the residual to be decoded, which the actual residual is parsed in the code stream.

According to a further embodiment of the invention, there is also provided a storage medium having stored therein a computer program, wherein the computer program is arranged to perform the steps of any of the method embodiments described above when run.

According to a further embodiment of the invention, there is also provided an electronic device comprising a memory having stored therein a computer program and a processor arranged to run the computer program to perform the steps of any of the method embodiments described above.

According to the invention, the residual of the block to be coded is predicted according to the residual of the reference block to obtain the predicted residual, and the residual difference between the predicted residual and the actual residual is coded into the code stream. The same residual prediction process of the encoder end is laid out at the decoder end, the decoder end predicts the residual of the block to be decoded based on the residual of the reference block of the block to be decoded, and determines the actual residual of the block to be decoded according to the predicted residual and the residual difference carried in the code stream, so that the code stream based on the code rate can still be correctly recovered to obtain the actual residual of the block to be decoded at the decoder end.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute a limitation on the application. In the drawings:

fig. 1 is a block diagram of a hardware structure of a mobile terminal 10 provided with an inter prediction module capable of applying a residual coding method and a residual decoding method according to an embodiment of the present invention;

fig. 2 is a flowchart of a residual coding method according to embodiment 1 of the present invention;

Fig. 3 is a block diagram of a residual coding apparatus according to embodiment 1 of the present invention;

Fig. 4 is a flowchart of a residual decoding method according to embodiment 2 of the present invention;

fig. 5 is a block diagram of a residual decoding apparatus according to embodiment 2 of the present invention;

FIG. 6 is a schematic diagram of a convolutional neural network for generating a residual prediction module in accordance with embodiment 3 of the present invention;

Fig. 7 is a schematic diagram showing selection of a time domain reference residual block in the case of P-frame and B-frame according to embodiment 3 of the present invention;

Fig. 8 is a schematic diagram illustrating selection of spatial reference residual blocks according to embodiment 3 of the present invention;

fig. 9 is a schematic flowchart of a method for improving video inter-coding performance by using space-time correlation in embodiment 3 of the present invention.

Detailed Description

The application will be described in detail hereinafter with reference to the drawings in conjunction with embodiments. It should be noted that, without conflict, the embodiments of the present application and features of the embodiments may be combined with each other.

It should be noted that the terms "first," "second," and the like in the description and the claims of the present invention and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order.

The embodiment of the invention provides a scheme for improving coding performance (such as video inter-frame coding performance) by utilizing space-time correlation, which utilizes a residual prediction module to predict a residual value of a coding block in video inter-frame prediction so as to reduce the code rate of a coding residual and is used for improving inter-frame prediction coding efficiency in video coding. The codec adds a residual prediction module in the inter prediction, and predicts a residual value (hereinafter referred to as a prediction residual) of the current block to be encoded by using the residual of the reference block in the spatial and temporal domains of the current block to be encoded. Compared with a coding scheme of directly coding a residual block of a block to be coded into a code stream, an encoder only needs to transmit a difference value between an actual residual and a predicted residual of a current block, thereby reducing the bit rate required for coding the residual and achieving the effect of saving the code rate.

Embodiment 1 below describes a residual coding scheme from the encoder side that can reduce the bit rate required to code the residual, and embodiment 2 of the present application describes a residual decoding scheme from the decoder side that correctly parses the residual block of the block to be decoded, corresponding to the encoder side. The embodiment of the encoder-side provided in embodiment 1 and the embodiment of the decoder-side provided in embodiment 2 can be applied to inter-prediction modules of the codec-side (for example, to inter-prediction modules of existing video coding standards), and these inter-prediction modules can be provided in a mobile terminal, a computer terminal, or similar computing devices. Taking an example that the inter prediction module is disposed on the mobile terminal, fig. 1 is a block diagram of a hardware structure of a mobile terminal 10 provided with the inter prediction module capable of applying a residual coding method and a residual decoding method according to an embodiment of the present application. As shown in fig. 1, the mobile terminal 10 may include one or more (only one is shown in fig. 1) processors 102 (the processor 102 may include, but is not limited to, a microprocessor MCU or a programmable logic device FPGA or the like, on which a corresponding program may be run for implementing functions of an inter prediction module capable of applying a residual coding method and a residual decoding method) and a memory 104 for storing data, and optionally, a transmission device 106 for communication functions and an input-output device 108. It will be appreciated by those skilled in the art that the structure shown in fig. 1 is merely illustrative and not limiting of the structure of the mobile terminal described above. For example, the mobile terminal 10 may also include more or fewer components than shown in FIG. 1 or have a different configuration than shown in FIG. 1.

The memory 104 may be used to store a computer program, for example, a software program of application software and a module, such as a computer program corresponding to the encoder-side embodiment provided in embodiment 1 and the decoder-side embodiment provided in embodiment 2, and the processor 102 executes the computer program stored in the memory 104 to perform various functional applications and data processing, that is, implement the above-mentioned method. Memory 104 may include high-speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory 104 may further include memory located remotely from the processor 102, which may be connected to the mobile terminal 10 via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The transmission means 106 is arranged to receive or transmit data via a network. The specific examples of networks described above may include wireless networks provided by the communication provider of the mobile terminal 10. In one example, the transmission device 106 includes a network adapter (Network Interface Controller, simply referred to as a NIC) that can connect to other network devices through a base station to communicate with the internet. In one example, the transmission device 106 may be a Radio Frequency (RF) module, which is used to communicate with the internet wirelessly.

The residual coding scheme is described below from the encoder side and the decoder side by embodiments 1 and 2, respectively.

Example 1

In this embodiment, a residual coding method is provided, fig. 2 is a flowchart of the residual coding method according to embodiment 1 of the present invention, and as shown in fig. 2, the flowchart includes the following steps:

Step S202, determining a residual error (in the embodiment of the present invention, this term is also called residual error, residual error block, reference residual error block) of a reference block to be encoded, wherein the reference block of the block to be encoded comprises at least two first reference blocks of the block to be encoded in the time domain, or at least two second reference blocks of the block to be encoded in the space domain, or at least one first reference block of the block to be encoded in the time domain and at least one second reference block of the block to be encoded in the space domain;

step S204, predicting the residual error of the block to be encoded based on the residual error of the reference block of the block to be encoded, to obtain a predicted residual error of the block to be encoded;

Step S206, encoding a residual difference value between the prediction residual of the block to be encoded and an actual residual of the block to be encoded into a bitstream (e.g., a video inter-coded bitstream).

The residual of the reference block mentioned in step S202 refers to the actual coded residual of the reference block itself obtained in the conventional way during the encoding process, which is available by technical means because the reference block has already been encoded.

The actual residual of the block to be encoded mentioned in step S206 is a difference between a pixel value of an original image block of the block to be encoded and a pixel value of a prediction block of the block to be encoded, wherein the prediction block of the block to be encoded is a block obtained after motion compensation of the reference block of the block to be encoded.

Through the steps, the residual error of the block to be coded is predicted according to the residual error of the reference block to obtain the predicted residual error, and the residual error difference between the predicted residual error and the actual residual error is coded into the code stream.

Regarding the selection of reference blocks for a block to be encoded, in view of the continuity of the video image, selecting an appropriate reference block enables an accurate prediction of the residual of the block to be encoded.

In at least one exemplary embodiment, in the case that the image frame where the block to be encoded is located is a P frame, the first reference block of the block to be encoded in the time domain may include an optimal prediction unit block PU of the block to be encoded in a forward reference frame, denoted as a first optimal PU, and/or an optimal PU of the block to be encoded in a frame previous to the forward reference frame, denoted as a second optimal PU.

And under the condition that the image frame where the block to be encoded is positioned is a B frame, the first reference block of the block to be encoded in the time domain comprises an optimal PU (polyurethane) of the block to be encoded in a forward reference frame, which is marked as a third optimal PU, and/or an optimal PU of the block to be encoded in a backward reference frame, which is marked as a fourth optimal PU.

In at least one exemplary embodiment, in the case that the image frame in which the block to be encoded is located is a P frame, a first reference block of the block to be encoded in the time domain may be determined by determining the first optimal PU of the block to be encoded in the forward reference frame by motion estimation and determining a motion vector MV of the block to be encoded with respect to the first optimal PU, and determining the second optimal PU in a previous frame of the forward reference frame by motion estimation according to the position of the first optimal PU and the MV.

In at least one exemplary embodiment, the second reference block of the block to be encoded in the spatial domain includes a left block and/or an upper block adjacent to the block to be encoded (preferably may be a spatially adjacent left and/or upper block having the same size) in an image frame where the block to be encoded is located.

It will be appreciated by those skilled in the art that in practical applications, the first reference block of the block to be encoded in the time domain and/or the second reference block of the block to be encoded in the spatial domain may be determined in the manner described above. The residual corresponding to the first reference block in the time domain may be referred to as a time domain reference residual block, the residual corresponding to the second reference block in the space domain may be referred to as a space domain reference residual block, and residual information of the block to be encoded may be predicted based on at least one time domain reference residual block and/or at least one space domain reference residual block.

In a preferred embodiment, two time domain reference residual blocks and two spatial domain reference residual blocks may be selected for data symmetry consideration, and residual prediction may be performed based on the four reference residual blocks. At this time, the reference blocks of the block to be encoded include two first reference blocks whose corresponding residuals in the time domain are non-all zeros and two second reference blocks whose corresponding residuals in the space domain are non-all zeros.

In another preferred embodiment, when only one time domain reference residual block and one spatial reference residual block are non-all zero blocks, one time domain reference residual block and one spatial reference residual block can be selected and residual prediction is performed based on the two reference residual blocks. At this time, the reference blocks of the block to be encoded include a first reference block whose corresponding residual error in the time domain is non-all zero and a second reference block whose corresponding residual error in the space domain is non-all zero.

In another preferred embodiment, when only two time domain reference residual blocks and one spatial reference residual block are non-all zero blocks, two time domain reference residual blocks and one spatial reference residual block may be selected and residual prediction may be performed based on the three reference residual blocks. At this time, the reference blocks of the block to be encoded include two first reference blocks whose corresponding residuals in the time domain are non-all zeros and one second reference block whose corresponding residuals in the space domain are non-all zeros.

In another preferred embodiment, when only one time domain reference residual block and two spatial reference residual blocks are non-all zero blocks, one time domain reference residual block and two spatial reference residual blocks may be selected and residual prediction may be performed based on the three reference residual blocks. At this time, the reference blocks of the block to be encoded include one first reference block whose corresponding residual error in the time domain is non-all zero and two second reference blocks whose corresponding residual error in the space domain is non-all zero.

In another preferred embodiment, when only two time-domain reference residual blocks in the time domain are non-all-zero blocks, two time-domain reference residual blocks may be selected and residual prediction may be performed based on the two time-domain reference residual blocks. At this time, the reference blocks of the block to be encoded include two first reference blocks whose corresponding residuals in the time domain are non-all zeros.

In another preferred embodiment, when only two spatial reference residual blocks are non-all zero blocks, two spatial reference residual blocks may be selected and residual prediction may be performed based on the two spatial reference residual blocks. At this time, the reference blocks of the block to be encoded comprise two second reference blocks with non-all-zero corresponding residuals of the block to be encoded in the spatial domain.

The process of predicting the residual error of the block to be coded based on the residual error of the reference block can be implemented in various prediction modes. In at least one exemplary embodiment, predicting the residual of the block to be encoded based on the residual of the reference block of the block to be encoded, the obtaining the prediction residual of the block to be encoded may include one of:

(1) Inputting the residual error of the reference block into a residual error prediction model to obtain a predicted residual error of the block to be coded, wherein the residual error prediction model is a residual error prediction model which can be trained by a deep learning network, that is, the residual error prediction model is obtained by training by the deep learning network based on training samples, and the training samples comprise the residual error of the reference block of the coding block with the known residual error and the actual residual error of the coding block with the known residual error;

(2) And carrying out linear weighting on the residual error of the reference block to obtain a prediction residual error of the block to be coded, wherein the linear weighting comprises single-weight linear weighting or multi-weight linear weighting.

For example, a linear weighted sum of the individual weights may be calculated using the following formula:

ReiPred(i,j)=W₁ResiA(i,j)+W₂ResiB(i,j)

W1 and W2 are weights, resiA (i, j), resiB (i, j) is the pixel value of the selected reference residual block at pixel point (i, j), and ReiPred (i, j) is the pixel value of the prediction residual block at pixel point (i, j).

For example, a linear weighted sum of multiple weights may be calculated using the following formula:

W1 _ij and W2 _ij are weight values corresponding to each pixel point of the reference residual block, and they can be obtained through training. ResiA (i, j), resiB (i, j) are the pixel values of the selected reference residual block at pixel point (i, j), and ReiPred (i, j) are the pixel values of the prediction residual block at pixel point (i, j).

From the description of the above embodiments, it will be clear to a person skilled in the art that the method according to the above embodiments may be implemented by means of software plus the necessary general hardware platform, but of course also by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (e.g. ROM/RAM, magnetic disk, optical disk) comprising instructions for causing a terminal device (which may be a mobile phone, a computer, a server, or a network device, etc.) to perform the method according to the embodiments of the present invention.

In this embodiment, a residual coding device is provided corresponding to the residual coding method described above, so as to implement the foregoing embodiments and preferred embodiments, and the description is omitted herein. As used below, the term "module" may be a combination of software and/or hardware that implements a predetermined function. While the means described in the following embodiments are preferably implemented in software, implementation in hardware, or a combination of software and hardware, is also possible and contemplated.

Fig. 3 is a block diagram of a residual coding apparatus according to embodiment 1 of the present invention, and as shown in fig. 3, the apparatus includes:

An encoder-side reference residual determination module 32 configured to determine a residual of a reference block of a block to be encoded, where the reference block of the block to be encoded includes at least two first reference blocks of the block to be encoded in a time domain, or at least two second reference blocks of the block to be encoded in a spatial domain, or at least one first reference block of the block to be encoded in a time domain and at least one second reference block of the block to be encoded in a spatial domain;

The encoder-side residual prediction module 34 is configured to predict the residual of the block to be encoded based on the residual of the reference block of the block to be encoded, so as to obtain a predicted residual of the block to be encoded;

A residual coding module 36 arranged to code a residual difference between the prediction residual of the block to be coded and an actual residual of the block to be coded into a code stream.

It should be noted that each of the above modules may be implemented by software or hardware, and the latter may be implemented by, but not limited to, the above modules all being located in the same processor, or each of the above modules being located in different processors in any combination.

The present embodiment also provides a storage medium having stored therein a computer program, wherein the computer program is arranged to perform the steps in the residual coding method above in the present embodiment when run.

Alternatively, in the present embodiment, the above-described storage medium may be configured to store a computer program for performing the steps of:

S1, determining a residual error (in the embodiment of the invention, the term is also called residual error, residual error block, reference residual error and reference residual error block) of a reference block to be coded, wherein the reference block of the block to be coded comprises at least two first reference blocks of the block to be coded in a time domain, or at least two second reference blocks of the block to be coded in a space domain, or at least one first reference block of the block to be coded in the time domain and at least one second reference block of the block to be coded in the space domain;

S2, predicting the residual error of the block to be encoded based on the residual error of the reference block of the block to be encoded to obtain a predicted residual error of the block to be encoded;

and S3, encoding a residual difference value between the predicted residual of the block to be encoded and an actual residual of the block to be encoded into a code stream (for example, a video inter-coded code stream).

Alternatively, in the present embodiment, the storage medium may include, but is not limited to, a USB flash disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a removable hard disk, a magnetic disk, or an optical disk, etc. various media in which a computer program may be stored.

An embodiment of the invention also provides an electronic device comprising a memory in which a computer program is stored and a processor arranged to run the computer program to perform the steps of the residual coding method described above in this embodiment.

Optionally, the electronic apparatus may further include a transmission device and an input/output device, where the transmission device is connected to the processor, and the input/output device is connected to the processor.

Alternatively, in the present embodiment, the above-described processor may be configured to execute the following steps by a computer program:

Alternatively, specific examples in this embodiment may refer to examples described in the foregoing embodiments and optional implementations, and this embodiment is not described herein.

Example 2

In this embodiment, a residual decoding method is provided, fig. 4 is a flowchart of the residual decoding method according to embodiment 2 of the present invention, and as shown in fig. 4, the flowchart includes the following steps:

Step S402, a prediction block of a block to be decoded is determined based on a motion vector MV in a code stream (such as a video inter-coded code stream), and a residual error of a reference block of the block to be decoded (in the embodiment of the invention, such a term is also called a residual error, a residual error block, a reference residual error block) is obtained, wherein the reference block of the block to be decoded comprises at least two first reference blocks of the block to be decoded in a time domain, or at least two second reference blocks of the block to be decoded in a space domain, or at least one first reference block of the block to be decoded in a time domain and at least one second reference block of the block to be decoded in a space domain;

step S404, predicting the residual error of the block to be decoded based on the residual error of the reference block of the block to be decoded to obtain a predicted residual error of the block to be decoded;

Step S406, determining an actual residual error of the block to be decoded according to the prediction residual error of the block to be decoded and a residual error difference between the prediction residual error of the block to be decoded and the actual residual error of the block to be decoded, which are parsed from the code stream.

The residual of the reference block mentioned in step S402 refers to the actual coded residual of the reference block itself obtained in the encoding process according to the conventional method, and is available through technical means because the reference block has already been encoded.

In step S402, the prediction block of the block to be decoded is a block obtained after motion compensation of the block to be decoded.

The actual residual of the block to be decoded mentioned in step S406 is the difference between the pixel value of the original image block of the block to be decoded and the pixel value of the prediction block of the block to be decoded.

Through the steps, as the same residual prediction process of the encoder end is distributed at the decoder end, the decoder end predicts the residual of the block to be decoded based on the residual of the reference block of the block to be decoded, and determines the actual residual of the block to be decoded according to the predicted residual and the residual difference carried in the code stream, the code stream based on the code rate can still be correctly recovered to obtain the actual residual of the block to be decoded at the decoder end.

In at least one exemplary embodiment, in the case that the image frame where the block to be decoded is located is a P frame, the first reference block of the block to be decoded in the time domain includes an optimal prediction unit block PU of the block to be decoded in a forward reference frame, denoted as a first optimal PU, and/or an optimal PU of the block to be decoded in a frame previous to the forward reference frame, denoted as a second optimal PU.

In at least one exemplary embodiment, in the case that the image frame where the block to be decoded is located is a B frame, the first reference block of the block to be decoded in the time domain includes an optimal PU of the block to be decoded in a forward reference frame, denoted as a third optimal PU, and/or an optimal PU of the block to be decoded in a backward reference frame, denoted as a fourth optimal PU.

It should be noted that, in order to ensure accuracy of residual prediction, a reference block selection mode identical to that of the encoder is adopted at the decoder to ensure consistency with a prediction residual obtained by residual prediction at the encoder.

In at least one exemplary embodiment, the second reference block of the block to be decoded in the spatial domain includes a left block and/or an upper block adjacent to the block to be decoded (preferably may be a spatially adjacent left and/or upper block having the same size) in an image frame where the block to be decoded is located.

It will be appreciated by those skilled in the art that in practical applications, the first reference block of the block to be decoded in the time domain and/or the second reference block of the block to be decoded in the spatial domain may be determined in the manner described above. The residual corresponding to the first reference block in the time domain may be referred to as a time domain reference residual block, the residual corresponding to the second reference block in the space domain may be referred to as a space domain reference residual block, and residual information of the block to be decoded may be predicted based on at least one time domain reference residual block and/or at least one space domain reference residual block.

In a preferred embodiment, for data symmetry consideration, two time domain reference residual blocks and two space domain reference residual blocks can be selected, and residual prediction is performed based on the four reference residual blocks, where the reference blocks of the block to be decoded include two first reference blocks whose corresponding residuals in the time domain are non-all zero, and two second reference blocks whose corresponding residuals in the space domain are non-all zero.

In another preferred embodiment, when only one time domain reference residual block and one spatial reference residual block are non-all zero blocks, one time domain reference residual block and one spatial reference residual block can be selected and residual prediction is performed based on the two reference residual blocks. At this time, the reference blocks of the block to be decoded include a first reference block whose corresponding residual error in the time domain is non-all zero and a second reference block whose corresponding residual error in the space domain is non-all zero.

In another preferred embodiment, when only two time domain reference residual blocks and one spatial reference residual block are non-all zero blocks, two time domain reference residual blocks and one spatial reference residual block may be selected and residual prediction may be performed based on the three reference residual blocks. At this time, the reference blocks of the block to be decoded comprise two first reference blocks with non-all-zero corresponding residuals of the block to be decoded in the time domain and one second reference block with non-all-zero corresponding residuals of the block to be decoded in the space domain.

In another preferred embodiment, when only one time domain reference residual block and two spatial reference residual blocks are non-all zero blocks, one time domain reference residual block and two spatial reference residual blocks may be selected and residual prediction may be performed based on the three reference residual blocks. At this time, the reference blocks of the block to be decoded comprise a first reference block with non-all zero corresponding residual errors of the block to be decoded in the time domain and two second reference blocks with non-all zero corresponding residual errors of the block to be decoded in the space domain.

In another preferred embodiment, when only two time-domain reference residual blocks in the time domain are non-all-zero blocks, two time-domain reference residual blocks may be selected and residual prediction may be performed based on the two time-domain reference residual blocks. At this time, the reference blocks of the block to be decoded include two first reference blocks whose corresponding residuals in the time domain are non-all zeros.

In another preferred embodiment, when only two spatial reference residual blocks are non-all zero blocks, two spatial reference residual blocks may be selected and residual prediction may be performed based on the two spatial reference residual blocks. At this time, the reference blocks of the block to be decoded comprise two second reference blocks with non-all-zero corresponding residuals of the block to be decoded in the spatial domain.

The process of predicting the residual error of the block to be decoded based on the residual error of the reference block may be implemented in various prediction modes. In at least one exemplary embodiment, predicting the residual of the block to be decoded based on the residual of the reference block of the block to be decoded, the obtaining the prediction residual of the block to be decoded may include one of:

(1) Inputting the residual error of the reference block into a residual error prediction model to obtain a predicted residual error of the block to be decoded, wherein the residual error prediction model is the same as a residual error prediction model of an encoder end;

(2) And carrying out linear weighting on the residual error of the reference block to obtain a prediction residual error of the block to be decoded, wherein the linear weighting is the same as that of an encoder side.

ReiPred(i,j)=W₁ResiA(i,j)+W₂ResiB(i,j)

In this embodiment, a residual decoding device is provided corresponding to the residual decoding method described above, so as to implement the foregoing embodiments and preferred embodiments, and the description thereof will not be repeated. As used below, the term "module" may be a combination of software and/or hardware that implements a predetermined function. While the means described in the following embodiments are preferably implemented in software, implementation in hardware, or a combination of software and hardware, is also possible and contemplated.

Fig. 5 is a block diagram of the structure of a residual decoding apparatus according to embodiment 2 of the present invention, as shown in fig. 5, the apparatus includes:

The decoder-side reference residual determining module 52 is configured to determine a prediction block of a block to be decoded based on a motion vector MV parsed in a code stream, and obtain a residual of a reference block of the block to be decoded, where the reference block of the block to be decoded includes at least two first reference blocks of the block to be decoded in a time domain, or at least two second reference blocks of the block to be decoded in a space domain, or at least one first reference block of the block to be decoded in a time domain and at least one second reference block of the block to be decoded in a space domain;

A decoder-side residual prediction module 54, configured to predict a residual of the block to be decoded based on a residual of the reference block of the block to be decoded, to obtain a predicted residual of the block to be decoded;

The residual decoding module 56 is configured to determine an actual residual of the block to be decoded according to a residual difference between the predicted residual of the block to be decoded and the actual residual of the block to be decoded parsed from the code stream.

The present embodiment also provides a storage medium having a computer program stored therein, wherein the computer program is arranged to perform the steps of the residual decoding method described above in the present embodiment when run.

S1, determining a predicted block of a block to be decoded based on a motion vector MV in a code stream (such as a video inter-coded code stream), and acquiring a residual error (in the embodiment of the invention, the term is also called a residual error, a residual error block, a reference residual error and a reference residual error block) of a reference block of the block to be decoded, wherein the reference block of the block to be decoded comprises at least two first reference blocks of the block to be decoded in a time domain, or at least two second reference blocks of the block to be decoded in a space domain, or at least one first reference block of the block to be decoded in the time domain and at least one second reference block of the block to be decoded in the space domain;

S2, predicting the residual error of the block to be decoded based on the residual error of the reference block of the block to be decoded to obtain a predicted residual error of the block to be decoded;

s3, determining the actual residual error of the block to be decoded according to the predicted residual error of the block to be decoded and the residual error difference between the predicted residual error of the block to be decoded and the actual residual error of the block to be decoded, which are analyzed from the code stream.

The present embodiment also provides an electronic device comprising a memory having stored therein a computer program and a processor arranged to run the computer program to perform the steps of the residual decoding method described above in the present embodiment.

The following example 3 describes in detail a specific implementation of the residual coding scheme taking as an example the training of a residual prediction model (which may also be referred to as residual prediction module) by a deep learning network. It should be noted that, for the scheme of performing residual prediction by using a linear weighted sum method, the principle of overall residual encoding and decoding is similar to the following embodiments, and will not be described again.

Example 3

The present embodiment designs a method for improving video inter-frame coding performance by using space-time correlation for a coding unit of video coding, and in practical use, an encoder will call the method to complete inter-frame prediction. The details of what is involved in this method are described below.

(1) Residual prediction module

The residual prediction module may be generated by a convolutional neural network. Fig. 6 is a schematic diagram of a convolutional neural network for generating a residual prediction module according to embodiment 3 of the present invention, as shown in fig. 6, where the convolutional neural network takes two time domain reference residual blocks and two space domain reference residual blocks related to a current coding block as inputs (other numbers of time domain and space domain reference residual blocks, only time domain reference residual blocks or only space domain reference residual blocks may be used), and extracts feature information of a residual image through convolution and pooling, and outputs a prediction residual through a deconvolution process.

The convolutional neural network is input into four reference residual blocks, two reference frames in the time domain are selected, and two adjacent airspace are selected.

In the time domain, different operations are performed according to different prediction modes of the P frame and the B frame. Fig. 7 is a schematic diagram showing selection of a time domain reference residual block in the case of P-frame and B-frame according to embodiment 3 of the present invention, as shown in fig. 7:

For P-frames, since it is unidirectional Prediction, there is only one forward reference frame, and the Motion Vector (MV) of the current coding block is generated by first finding the optimal Prediction Unit (PU) in the forward reference frame by motion estimation. Then, based on the position of the optimal reference PU of the current coding block and the MV of the current coding block, another optimal reference PU is searched again from the previous frame of the forward reference frame through motion estimation, and the residual blocks corresponding to the two optimal reference PUs are input as two time domain reference residual blocks of the P frame.

For B frames, there are a forward reference frame and a backward reference frame, in which the residual blocks of an optimal PU are each found by motion estimation as temporal reference residual block inputs.

In the spatial domain, fig. 8 is a schematic diagram illustrating selection of spatial reference residual blocks according to embodiment 3 of the present invention. As shown in fig. 8, the left and upper blocks adjacent to the current block and having the same size are selected as reference blocks, and the residual block corresponding to the reference block is used as the input of the spatial domain reference residual block.

The output of the residual prediction module is a prediction residual block of the current block, and at the encoder end, the prediction residual block can be subtracted from the actual residual block to obtain a coded residual. At the decoder side, the prediction residual block plus the coded residual can recover the actual residual block.

(2) Inter-frame coding block residual prediction method based on space-time correlation

Fig. 9 is a schematic process diagram of a method for improving video inter-frame coding performance by using space-time correlation according to embodiment 3 of the present invention, in which a residual prediction module is implemented by a neural network as an example, and residual encoding and decoding operations at the encoder side and the decoder side are shown, and are described in detail below, respectively.

Operation of encoder side:

Firstly, pre-coding a video sequence provided by a universal measurement standard, and extracting residual data of four reference blocks and residual data of a current block as training data. The deep neural network embodiment constructed with fig. 6 is used for network training, and then the trained deep neural network is embedded into the inter-frame prediction of the encoder.

And secondly, reading a block to be encoded from an input video image, obtaining a predicted block of the block to be encoded through motion estimation prediction, and subtracting a pixel value of the predicted block from a pixel value of an original image to obtain a traditional residual block.

Thirdly, according to the difference of the prediction modes of the P frame and the B frame, the motion vector MV obtained through motion estimation searches for two time domain reference residual blocks shown in fig. 7, then two airspace reference residual blocks are obtained according to fig. 8, and the obtained four reference residual blocks are input into a residual prediction module to obtain a prediction residual block.

And fourthly, taking the difference value between the traditional residual error block and the predicted residual error block as the coding residual error of the current block to be coded, and writing the coding residual error into the code stream file. In this step, the conventional residual block is a residual value to be encoded, which is obtained by the current encoding block according to the conventional inter-frame prediction method, and the prediction residual is a residual value predicted by the residual prediction module.

In summary, it can be seen that, at the encoder end, a reference block is found in a reference frame by using motion estimation, and after motion compensation operation, a predicted block is obtained, where the difference between the original block and the predicted block is the actual residual block in the conventional inter prediction mode. In this process a residual prediction module is established to predict the residual of the current block. The input of the residual prediction module is two reference residual blocks on the time domain and the space domain corresponding to the reference block of the current coding block, the output of the residual prediction module is the prediction residual of the current block, and the difference value between the actual residual block and the prediction residual block is used as the coding residual to carry out subsequent transformation, quantization and entropy coding.

Operation of decoder side:

In a first step, a residual prediction module is embedded in the decoder inter prediction.

Secondly, obtaining a prediction block of a current block to be decoded through the motion vector MV analyzed by the code stream, then finding two time domain reference residual blocks shown in fig. 7, obtaining two space domain reference residual blocks shown in fig. 8, and inputting the obtained four reference residual blocks into a residual prediction module to obtain the prediction residual block. In this step, four reference residual blocks corresponding to the current block are generated in a manner consistent with the four reference residual blocks at the encoder side.

And thirdly, reading the coded residual error of the block to be decoded from the code stream, adding the predicted residual error block to the coded residual error to restore the actual residual error of the block to be decoded, and adding the predicted block to restore the original image block.

In summary, it can be seen that in the inter prediction decoding operation at the decoder side, the same residual prediction module as the encoder is established to predict the residual of the current block. The input of the residual prediction module is two reference residual blocks on the time domain and the space domain corresponding to the reference block of the current decoding block, the searching mode of the reference block is the same as that of the encoder, the output is the prediction residual of the current decoding block, and the prediction residual is added with the decoded residual and the prediction block of the block to be decoded to finish the reconstruction work of the current block.

It will be appreciated by those skilled in the art that the modules or steps of the invention described above may be implemented in a general purpose computing device, they may be concentrated on a single computing device, or distributed across a network of computing devices, they may alternatively be implemented in program code executable by computing devices, so that they may be stored in a memory device for execution by computing devices, and in some cases, the steps shown or described may be performed in a different order than that shown or described, or they may be separately fabricated into individual integrated circuit modules, or multiple modules or steps within them may be fabricated into a single integrated circuit module for implementation. Thus, the present invention is not limited to any specific combination of hardware and software.

The above description is only of the preferred embodiments of the present invention and is not intended to limit the present invention, but various modifications and variations can be made to the present invention by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A residual coding method, comprising:

Determine a residual of a reference block of a block to be encoded, wherein the reference block of the block to be encoded includes: at least two first reference blocks of the block to be encoded in the time domain; or, at least two second reference blocks of the block to be encoded in the spatial domain; or, at least one first reference block of the block to be encoded in the time domain and at least one second reference block of the block to be encoded in the spatial domain, wherein, in the case where the image frame in which the block to be encoded is located is a P frame, the first reference block of the block to be encoded in the time domain includes: an optimal prediction unit block PU of the block to be encoded in a forward reference frame, recorded as a first optimal PU, and an optimal PU of the block to be encoded in a frame previous to the forward reference frame, recorded as a second optimal PU;

Based on the residual of the reference block of the block to be encoded, predicting the residual of the block to be encoded to obtain the predicted residual of the block to be encoded, wherein based on the residual of the reference block of the block to be encoded, predicting the residual of the block to be encoded to obtain the predicted residual of the block to be encoded comprises: inputting the residual of the reference block into a residual prediction model to obtain the predicted residual of the block to be encoded, wherein the residual prediction model is obtained by training with a deep learning network based on training samples, and the training samples include: the residual of the reference block of the coding block with known residual, and the actual residual of the coding block with known residual;

The residual difference between the prediction residual of the block to be encoded and the actual residual of the block to be encoded is encoded into a bitstream.

2. The method according to claim 1, characterized in that

When the image frame where the block to be encoded is located is a B frame, the first reference block of the block to be encoded in the time domain includes: the optimal PU of the block to be encoded in the forward reference frame, recorded as the third optimal PU, and/or the optimal PU of the block to be encoded in the backward reference frame, recorded as the fourth optimal PU.

3. The method according to claim 2, characterized in that, when the image frame in which the block to be encoded is located is a P frame, the first reference block of the block to be encoded in the time domain is determined by:

Determine the first optimal PU of the block to be encoded in the forward reference frame by motion estimation, and determine a motion vector MV of the block to be encoded relative to the first optimal PU;

The second best PU is determined in a frame previous to the forward reference frame through motion estimation according to the position of the first best PU and the MV.

4. The method according to claim 1 is characterized in that the second reference block of the block to be encoded in the spatial domain is located in the image frame where the block to be encoded is located, and is adjacent to the block to be encoded in the spatial domain.

5. The method according to claim 4 is characterized in that the second reference block of the block to be encoded in the spatial domain includes: a block to the left and/or a block above the block to be encoded in the image frame where the block to be encoded is located.

6. The method according to any one of claims 1 to 5, characterized in that the reference block of the block to be encoded comprises:

Two first reference blocks whose corresponding residuals of the block to be encoded in the time domain are not all zero, and two second reference blocks whose corresponding residuals of the block to be encoded in the spatial domain are not all zero; or

A first reference block whose corresponding residual of the block to be encoded in the time domain is non-all zero, and a second reference block whose corresponding residual of the block to be encoded in the spatial domain is non-all zero; or

Two first reference blocks whose corresponding residuals of the block to be encoded in the time domain are non-all zero, and one second reference block whose corresponding residuals of the block to be encoded in the spatial domain are non-all zero; or

A first reference block whose corresponding residual of the block to be encoded in the time domain is non-all zero, and two second reference blocks whose corresponding residual of the block to be encoded in the spatial domain is non-all zero; or

The corresponding residuals of the block to be encoded in the time domain are two first reference blocks that are not all zero; or,

The corresponding residuals of the block to be encoded in the spatial domain are two second reference blocks that are not all zero.

7. The method according to any one of claims 1-5 is characterized in that the actual residual of the block to be encoded is the difference between the pixel value of the original image block of the block to be encoded and the pixel value of the prediction block of the block to be encoded, wherein the prediction block of the block to be encoded is a block obtained after motion compensation is performed on the reference block of the encoding block.

8. A residual decoding method, comprising:

Determine a prediction block of a block to be decoded based on a motion vector MV parsed from a bitstream, and obtain a residual of a reference block of the block to be decoded, wherein the reference block of the block to be decoded includes: at least two first reference blocks of the block to be decoded in the time domain; or, at least two second reference blocks of the block to be decoded in the spatial domain; or, at least one first reference block of the block to be decoded in the time domain and at least one second reference block of the block to be decoded in the spatial domain, wherein, in the case where the image frame where the block to be decoded is located is a P frame, the first reference block of the block to be decoded in the time domain includes: an optimal prediction unit block PU of the block to be decoded in a forward reference frame, recorded as a first optimal PU, and an optimal PU of the block to be decoded in a frame previous to the forward reference frame, recorded as a second optimal PU;

Based on the residual of the reference block of the block to be decoded, predicting the residual of the block to be decoded to obtain the predicted residual of the block to be decoded, wherein based on the residual of the reference block of the block to be decoded, predicting the residual of the block to be decoded to obtain the predicted residual of the block to be decoded comprises: inputting the residual of the reference block into a residual prediction model to obtain the predicted residual of the block to be decoded, wherein the residual prediction model is the same as the residual prediction model at the encoder end;

The actual residual of the block to be decoded is determined according to the prediction residual of the block to be decoded and the residual difference value parsed from the bitstream, wherein the residual difference value is the difference between the prediction residual of the block to be decoded and the actual residual of the block to be decoded.

9. The method according to claim 8, characterized in that

When the image frame where the block to be decoded is located is a B frame, the first reference block of the block to be decoded in the time domain includes: the optimal PU of the block to be decoded in the forward reference frame, recorded as the third optimal PU, and/or the optimal PU of the block to be decoded in the backward reference frame, recorded as the fourth optimal PU.

10. The method according to claim 9, characterized in that, when the image frame where the block to be decoded is located is a P frame, the first reference block of the block to be decoded in the time domain is determined by:

Determine, according to the MV parsed from the bitstream, the first optimal PU of the block to be decoded in the forward reference frame;

The second best PU is determined in the previous frame of the forward reference frame by motion estimation according to a co-located PU of the first best PU in the previous frame of the forward reference frame.

11. The method according to claim 8, characterized in that the second reference block of the block to be decoded in the spatial domain is located in the image frame where the block to be decoded is located, and is adjacent to the block to be decoded in the spatial domain.

12. The method according to claim 11 is characterized in that the second reference block of the block to be decoded in the spatial domain includes: a block to the left and/or a block above the block to be decoded that is adjacent to the block to be decoded in the image frame where the block to be decoded is located.

13. The method according to any one of claims 8 to 12, characterized in that after determining the actual residual of the block to be decoded by using the prediction residual of the block to be decoded and the residual difference parsed from the bitstream, the method further comprises:

The actual residual is added to the predicted block of the block to be decoded to restore the original image block of the block to be decoded.

14. A residual coding device, comprising:

The encoder-side reference residual determination module is configured to determine the residual of the reference block of the block to be encoded, wherein the reference block of the block to be encoded includes: at least two first reference blocks of the block to be encoded in the time domain; or, at least two second reference blocks of the block to be encoded in the spatial domain; or, at least one first reference block of the block to be encoded in the time domain and at least one second reference block of the block to be encoded in the spatial domain, wherein, in the case where the image frame where the block to be encoded is located is a P frame, the first reference block of the block to be encoded in the time domain includes: an optimal prediction unit block PU of the block to be encoded in a forward reference frame, recorded as a first optimal PU, and an optimal PU of the block to be encoded in a frame previous to the forward reference frame, recorded as a second optimal PU;

An encoder-side residual prediction module is configured to predict the residual of the block to be encoded based on the residual of the reference block of the block to be encoded, so as to obtain the predicted residual of the block to be encoded, wherein the encoder-side residual prediction module is configured to input the residual of the reference block into a residual prediction model to obtain the predicted residual of the block to be encoded, wherein the residual prediction model is obtained by training with a deep learning network based on training samples, and the training samples include: the residual of the reference block of the coding block with a known residual, and the actual residual of the coding block with the known residual;

The residual coding module is configured to encode the residual difference between the predicted residual of the block to be coded and the actual residual of the block to be coded into a bitstream.

15. A residual decoding device, comprising:

A reference residual determination module at the decoder end is configured to determine a prediction block of a block to be decoded based on a motion vector MV parsed from a bitstream, and obtain a residual of the reference block of the block to be decoded, wherein the reference block of the block to be decoded includes: at least two first reference blocks of the block to be decoded in the time domain; or, at least two second reference blocks of the block to be decoded in the spatial domain; or, at least one first reference block of the block to be decoded in the time domain and at least one second reference block of the block to be decoded in the spatial domain, wherein, in the case where the image frame where the block to be decoded is located is a P frame, the first reference block of the block to be decoded in the time domain includes: an optimal prediction unit block PU of the block to be decoded in a forward reference frame, recorded as a first optimal PU, and an optimal PU of the block to be decoded in a frame previous to the forward reference frame, recorded as a second optimal PU;

A residual prediction module at the decoder side is configured to predict the residual of the block to be decoded based on the residual of the reference block of the block to be decoded, so as to obtain the predicted residual of the block to be decoded, wherein the residual prediction module at the decoder side is configured to input the residual of the reference block into a residual prediction model to obtain the predicted residual of the block to be decoded, wherein the residual prediction model is the same as the residual prediction model at the encoder side;

The residual decoding module is configured to determine the actual residual of the block to be decoded according to the prediction residual of the block to be decoded and the residual difference value parsed from the bitstream, wherein the residual difference value is the difference between the prediction residual of the block to be decoded and the actual residual of the block to be decoded.

16. A storage medium, characterized in that a computer program is stored in the storage medium, wherein the computer program is configured to execute the method described in any one of claims 1 to 13 when running.

17. An electronic device comprising a memory and a processor, wherein a computer program is stored in the memory, and the processor is configured to run the computer program to execute the method according to any one of claims 1 to 13.