CN112261409B - Residual encoding, decoding method and device, storage medium and electronic device - Google Patents
Residual encoding, decoding method and device, storage medium and electronic device Download PDFInfo
- Publication number
- CN112261409B CN112261409B CN201910663278.XA CN201910663278A CN112261409B CN 112261409 B CN112261409 B CN 112261409B CN 201910663278 A CN201910663278 A CN 201910663278A CN 112261409 B CN112261409 B CN 112261409B
- Authority
- CN
- China
- Prior art keywords
- block
- residual
- decoded
- encoded
- frame
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 67
- 238000004590 computer program Methods 0.000 claims description 25
- 238000012549 training Methods 0.000 claims description 17
- 238000013135 deep learning Methods 0.000 claims description 6
- 238000005516 engineering process Methods 0.000 abstract description 5
- 230000000694 effects Effects 0.000 abstract description 3
- 208000037170 Delayed Emergence from Anesthesia Diseases 0.000 description 109
- 238000010586 diagram Methods 0.000 description 13
- 230000008569 process Effects 0.000 description 9
- 230000005540 biological transmission Effects 0.000 description 8
- 238000013527 convolutional neural network Methods 0.000 description 5
- 241001464377 Resia Species 0.000 description 4
- 238000004891 communication Methods 0.000 description 4
- 230000006870 function Effects 0.000 description 4
- 230000003287 optical effect Effects 0.000 description 4
- 238000013528 artificial neural network Methods 0.000 description 3
- 230000006835 compression Effects 0.000 description 2
- 238000007906 compression Methods 0.000 description 2
- 238000003384 imaging method Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 229920002635 polyurethane Polymers 0.000 description 2
- 230000002123 temporal effect Effects 0.000 description 2
- 230000000007 visual effect Effects 0.000 description 2
- 239000000306 component Substances 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 239000008358 core component Substances 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000006073 displacement reaction Methods 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 238000010295 mobile communication Methods 0.000 description 1
- 239000004814 polyurethane Substances 0.000 description 1
- 238000011176 pooling Methods 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 238000013139 quantization Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/169—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
- H04N19/17—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
- H04N19/172—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a picture, frame or field
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/169—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
- H04N19/17—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
- H04N19/176—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/42—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/50—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/50—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
- H04N19/503—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
Abstract
The invention provides a residual coding method, a residual decoding method, a residual coding device, a residual decoding device, a storage medium and an electronic device. In the residual coding method, a residual error of a reference block of a block to be coded is determined, the residual error of the block to be coded is predicted based on the residual error of the reference block of the block to be coded to obtain a predicted residual error of the block to be coded, and a residual error difference value between the predicted residual error of the block to be coded and an actual residual error of the block to be coded is coded into a code stream. The invention solves the problem of how to reduce the bit rate of the coding residual error in the related technology, thereby achieving the effect of saving the code rate.
Description
Technical Field
The present invention relates to the field of communications, and in particular, to a method and apparatus for encoding and decoding a residual error, a storage medium, and an electronic apparatus.
Background
In recent years, with the rapid development of network communication and multimedia technology, video content is beginning to be presented to viewers in high-resolution and ultra-high-resolution forms. The higher the resolution of the video, the better the visual effect experience that people view the video content compared to standard resolution video. Meanwhile, high resolution video also puts higher demands on video coding technology. To address this challenge, the international joint Video Coding group (Joint Collaborative Team on Video Coding, abbreviated JCT-VC) developed a new generation of high efficiency Video Coding standard (HIGH EFFICIENCY Video Coding, abbreviated HEVC). Compared with the previous generation video coding standard H.264/MPEG-4 AVC, HEVC improves the compression efficiency by 50 percent and maintains the original same visual quality. Google announced that an open media alliance (Alliance of Open Media, abbreviated as AOM) was established with amazon, cisco, intel, microsoft, firefox, netflix, and AOM, which developed a new generation video coding format AV1, 9 months 1. In 12 months 2017, the Chinese digital audio and video coding and decoding technical standard (Audio Video Coding Standard, called AVS for short) provides a new generation of AVS3 video coding. The joint video exploration team (Joint Video Exploration Team, abbreviated JVET) determines the name of the latest generation video coding standard as universal video coding (VERSATILE VIDEO CODING, abbreviated VVC) at the 10 th month 4, 10 th united states san diego conference in 2018, the main objective is to improve existing HEVC, providing higher compression performance while optimizing for emerging applications (360 ° panoramic video and high dynamic range imaging (HIGH DYNAMIC RANGE IMAGING, abbreviated HDR or HDRI)). VVC is expected to be standardized before 2020, and the currently proposed solution has increased by more than 40% relative to HEVC.
Inter-frame prediction is the most core component in the video coding standard HEVC/AV1/AVS, and utilizes the correlation of video time domain and uses the adjacent coded image pixels of the time domain to predict the pixels of the current image so as to achieve the purpose of effectively removing the video time domain redundancy. The mainstream video coding standard at present adopts a block-based motion compensation technology, and the principle is that a best matching block is found for each pixel block of a current image in a previously coded image through motion estimation. The picture used for prediction is called a reference picture, the displacement of the reference block to the current pixel block is called a Motion Vector (MV), and the difference between the original pixel value of the current block and the pixel value of the predicted block after motion compensation of the reference block is called a residual (which may also be called a residual, a residual value, a residual block). The inter prediction only needs to encode the optimal MV, reference frame index and residual value of the encoded block and then write the encoded MV, reference frame index and residual value into the code stream to be transmitted to the decoder. The decoder side finds out the corresponding reference block in the reference frame according to the optimal MV and the reference frame index, and then adds the residual error value after decoding, so that the original pixel value of the decoding block can be recovered. The inter prediction mainly requires bit rate consuming coding of residual blocks, and the traditional inter prediction directly codes the actual residual obtained after prediction. However, for most complex motion cases, the value of the residual block is very large, which results in a very high bit rate of the encoded residual.
Therefore, how to reduce the bit rate of the encoded residual to achieve the effect of saving the code rate is a problem to be solved.
Disclosure of Invention
The embodiment of the invention provides a residual coding and decoding method and device, a storage medium and an electronic device, which at least solve the problem of how to reduce the bit rate of coded residual in the related technology.
According to one embodiment of the invention, a residual error coding method is provided, which comprises the steps of determining residual errors of a reference block of a block to be coded, wherein the reference block of the block to be coded comprises at least two first reference blocks of the block to be coded in a time domain, or at least two second reference blocks of the block to be coded in a space domain, or at least one first reference block of the block to be coded in the time domain and at least one second reference block of the block to be coded in the space domain, predicting the residual errors of the block to be coded based on the residual errors of the reference block of the block to be coded to obtain predicted residual errors of the block to be coded, and coding residual error differences between the predicted residual errors of the block to be coded and actual residual errors of the block to be coded into a code stream.
In at least one exemplary embodiment, in the case that the image frame where the block to be encoded is located is a P frame, the first reference block of the block to be encoded in the time domain includes an optimal prediction unit block PU of the block to be encoded in a forward reference frame, denoted as a first optimal PU, and/or an optimal PU of the block to be encoded in a frame preceding the forward reference frame, denoted as a second optimal PU, or in the case that the image frame where the block to be encoded is located is a B frame, the first reference block of the block to be encoded in the time domain includes an optimal PU of the block to be encoded in a forward reference frame, denoted as a third optimal PU, and/or an optimal PU of the block to be encoded in a backward reference frame, denoted as a fourth optimal PU.
In at least one exemplary embodiment, in the case that the image frame in which the block to be encoded is located is a P frame, a first reference block of the block to be encoded in the time domain is determined by determining the first optimal PU of the block to be encoded in the forward reference frame by motion estimation and determining a motion vector MV of the block to be encoded with respect to the first optimal PU, and determining the second optimal PU in a previous frame of the forward reference frame by motion estimation according to the position of the first optimal PU and the MV.
In at least one exemplary embodiment, the second reference block of the block to be encoded in the spatial domain is located in the image frame where the block to be encoded is located, and is adjacent to the block to be encoded in the spatial domain.
In at least one exemplary embodiment, the second reference block of the block to be encoded in the spatial domain includes a left block and/or an upper block adjacent to the block to be encoded in an image frame where the block to be encoded is located.
In at least one exemplary embodiment, the reference block of the block to be encoded includes:
Two first reference blocks with non-all zero corresponding residuals of the block to be coded in the time domain and two second reference blocks with non-all zero corresponding residuals of the block to be coded in the space domain, or
A first reference block with non-all zero corresponding residual error of the block to be coded in the time domain and a second reference block with non-all zero corresponding residual error of the block to be coded in the space domain, or
Two first reference blocks with non-all zero corresponding residuals of the block to be coded in the time domain and one second reference block with non-all zero corresponding residuals of the block to be coded in the space domain, or
A first reference block with non-all zero corresponding residual error of the block to be coded in the time domain and two second reference blocks with non-all zero corresponding residual error of the block to be coded in the space domain, or
Two first reference blocks with non-all zero corresponding residuals of the block to be coded in the time domain, or
And the corresponding residual errors of the block to be coded in the space domain are two second reference blocks which are not all zero.
In at least one exemplary embodiment, predicting the residual of the block to be encoded based on the residual of the reference block of the block to be encoded, and obtaining the predicted residual of the block to be encoded includes inputting the residual of the reference block into a residual prediction model to obtain the predicted residual of the block to be encoded, wherein the residual prediction model is obtained by training with a deep learning network based on training samples, the training samples include the residual of the reference block of the encoded block with a known residual and the actual residual of the encoded block with a known residual, or performing linear weighting on the residual of the reference block to obtain the predicted residual of the block to be encoded, wherein the linear weighting includes linear weighting of a single weight or linear weighting of multiple weights.
In at least one exemplary embodiment, the actual residual of the block to be encoded is a difference between a pixel value of an original image block of the block to be encoded and a pixel value of a prediction block of the block to be encoded, wherein the prediction block of the block to be encoded is a block obtained after motion compensation of the reference block of the encoding block.
According to another embodiment of the invention, a residual error decoding method is provided, which comprises the steps of determining a predicted block of a block to be decoded based on a motion vector MV resolved in a code stream, obtaining a residual error of a reference block of the block to be decoded, wherein the reference block of the block to be decoded comprises at least two first reference blocks of the block to be decoded in a time domain, or at least two second reference blocks of the block to be decoded in a space domain, or at least one first reference block of the block to be decoded in the time domain and at least one second reference block of the block to be decoded in the space domain, predicting the residual error of the block to be decoded based on the residual error of the reference block to be decoded, obtaining a predicted residual error of the block to be decoded, and determining an actual residual error of the block to be decoded according to the predicted residual error of the block to be decoded and a difference between the predicted residual error of the block to be decoded and an actual residual error of the block to be decoded resolved from the code stream.
In at least one exemplary embodiment, in the case that the image frame where the block to be decoded is located is a P frame, the first reference block of the block to be decoded in the time domain includes an optimal prediction unit block PU of the block to be decoded in a forward reference frame, denoted as a first optimal PU, and/or an optimal PU of the block to be decoded in a frame previous to the forward reference frame, denoted as a second optimal PU, or in the case that the image frame where the block to be decoded is located is a B frame, the first reference block of the block to be decoded in the time domain includes an optimal PU of the block to be decoded in a forward reference frame, denoted as a third optimal PU, and/or an optimal PU of the block to be decoded in a backward reference frame, denoted as a fourth optimal PU.
In at least one exemplary embodiment, in the case that the image frame where the block to be decoded is located is a P frame, a first reference block of the block to be decoded in the time domain is determined by determining the first optimal PU of the block to be decoded in the forward reference frame according to the MV parsed in the code stream, and determining the second optimal PU in the previous frame of the forward reference frame according to the co-located PU of the first optimal PU in the previous frame of the forward reference frame by motion estimation.
In at least one exemplary embodiment, the second reference block of the block to be decoded in the spatial domain is located in the image frame in which the block to be decoded is located, and is adjacent to the block to be decoded in the spatial domain.
In at least one exemplary embodiment, the second reference block of the block to be decoded in the spatial domain includes a left block and/or an upper block adjacent to the block to be decoded in an image frame where the block to be decoded is located.
In at least one exemplary embodiment, predicting the residual error of the block to be decoded based on the residual error of the reference block of the block to be decoded, to obtain the predicted residual error of the block to be decoded includes inputting the residual error of the reference block into a residual error prediction model to obtain the predicted residual error of the block to be decoded, wherein the residual error prediction model is the same as a residual error prediction model at an encoder end, or linearly weighting the residual error of the reference block to obtain the predicted residual error of the block to be decoded, wherein the linear weighting is the same as a linear weighting at the encoder end.
In at least one exemplary embodiment, after determining the actual residual of the block to be decoded, a residual difference between the predicted residual of the block to be decoded and the predicted residual of the block to be decoded parsed from the code stream and the actual residual of the block to be decoded, further comprises adding the actual residual to the predicted block of the block to be decoded, and recovering the original image block of the block to be decoded.
According to another embodiment of the invention, a residual coding device is provided, which comprises a coder side reference residual determining module, a coder side residual prediction module and a residual coding module, wherein the coder side reference residual determining module is used for determining the residual of a reference block of a block to be coded, the reference block of the block to be coded comprises at least two first reference blocks of the block to be coded in a time domain, or at least two second reference blocks of the block to be coded in a space domain, or at least one first reference block of the block to be coded in the time domain and at least one second reference block of the block to be coded in the space domain, the coder side residual prediction module is used for predicting the residual of the block to be coded based on the residual of the reference block to be coded to obtain the predicted residual of the block to be coded, and the residual coding module is used for coding the residual difference between the predicted residual of the block to be coded and the actual residual of the block to be coded into a code stream.
According to still another embodiment of the present invention, there is provided a residual decoding apparatus, including a decoder-side reference residual determining module configured to determine a prediction block of a block to be decoded based on a motion vector MV parsed in a code stream and obtain a residual of a reference block of the block to be decoded, where the reference block of the block to be decoded includes at least two first reference blocks of the block to be decoded in a time domain, or at least two second reference blocks of the block to be decoded in a space domain, or at least one first reference block of the block to be decoded in a time domain and at least one second reference block of the block to be decoded in a space domain, a decoder-side residual predicting module configured to predict a residual of the block to be decoded based on a residual of the reference block of the block to be decoded, to obtain a prediction residual of the block to be decoded, and a residual decoding module configured to determine an actual residual of the block to be decoded according to the prediction residual of the block to be decoded and a difference between the prediction residual of the block to be decoded and the residual to be decoded, which the actual residual is parsed in the code stream.
According to a further embodiment of the invention, there is also provided a storage medium having stored therein a computer program, wherein the computer program is arranged to perform the steps of any of the method embodiments described above when run.
According to a further embodiment of the invention, there is also provided an electronic device comprising a memory having stored therein a computer program and a processor arranged to run the computer program to perform the steps of any of the method embodiments described above.
According to the invention, the residual of the block to be coded is predicted according to the residual of the reference block to obtain the predicted residual, and the residual difference between the predicted residual and the actual residual is coded into the code stream. The same residual prediction process of the encoder end is laid out at the decoder end, the decoder end predicts the residual of the block to be decoded based on the residual of the reference block of the block to be decoded, and determines the actual residual of the block to be decoded according to the predicted residual and the residual difference carried in the code stream, so that the code stream based on the code rate can still be correctly recovered to obtain the actual residual of the block to be decoded at the decoder end.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute a limitation on the application. In the drawings:
fig. 1 is a block diagram of a hardware structure of a mobile terminal 10 provided with an inter prediction module capable of applying a residual coding method and a residual decoding method according to an embodiment of the present invention;
fig. 2 is a flowchart of a residual coding method according to embodiment 1 of the present invention;
Fig. 3 is a block diagram of a residual coding apparatus according to embodiment 1 of the present invention;
Fig. 4 is a flowchart of a residual decoding method according to embodiment 2 of the present invention;
fig. 5 is a block diagram of a residual decoding apparatus according to embodiment 2 of the present invention;
FIG. 6 is a schematic diagram of a convolutional neural network for generating a residual prediction module in accordance with embodiment 3 of the present invention;
Fig. 7 is a schematic diagram showing selection of a time domain reference residual block in the case of P-frame and B-frame according to embodiment 3 of the present invention;
Fig. 8 is a schematic diagram illustrating selection of spatial reference residual blocks according to embodiment 3 of the present invention;
fig. 9 is a schematic flowchart of a method for improving video inter-coding performance by using space-time correlation in embodiment 3 of the present invention.
Detailed Description
The application will be described in detail hereinafter with reference to the drawings in conjunction with embodiments. It should be noted that, without conflict, the embodiments of the present application and features of the embodiments may be combined with each other.
It should be noted that the terms "first," "second," and the like in the description and the claims of the present invention and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order.
The embodiment of the invention provides a scheme for improving coding performance (such as video inter-frame coding performance) by utilizing space-time correlation, which utilizes a residual prediction module to predict a residual value of a coding block in video inter-frame prediction so as to reduce the code rate of a coding residual and is used for improving inter-frame prediction coding efficiency in video coding. The codec adds a residual prediction module in the inter prediction, and predicts a residual value (hereinafter referred to as a prediction residual) of the current block to be encoded by using the residual of the reference block in the spatial and temporal domains of the current block to be encoded. Compared with a coding scheme of directly coding a residual block of a block to be coded into a code stream, an encoder only needs to transmit a difference value between an actual residual and a predicted residual of a current block, thereby reducing the bit rate required for coding the residual and achieving the effect of saving the code rate.
Embodiment 1 below describes a residual coding scheme from the encoder side that can reduce the bit rate required to code the residual, and embodiment 2 of the present application describes a residual decoding scheme from the decoder side that correctly parses the residual block of the block to be decoded, corresponding to the encoder side. The embodiment of the encoder-side provided in embodiment 1 and the embodiment of the decoder-side provided in embodiment 2 can be applied to inter-prediction modules of the codec-side (for example, to inter-prediction modules of existing video coding standards), and these inter-prediction modules can be provided in a mobile terminal, a computer terminal, or similar computing devices. Taking an example that the inter prediction module is disposed on the mobile terminal, fig. 1 is a block diagram of a hardware structure of a mobile terminal 10 provided with the inter prediction module capable of applying a residual coding method and a residual decoding method according to an embodiment of the present application. As shown in fig. 1, the mobile terminal 10 may include one or more (only one is shown in fig. 1) processors 102 (the processor 102 may include, but is not limited to, a microprocessor MCU or a programmable logic device FPGA or the like, on which a corresponding program may be run for implementing functions of an inter prediction module capable of applying a residual coding method and a residual decoding method) and a memory 104 for storing data, and optionally, a transmission device 106 for communication functions and an input-output device 108. It will be appreciated by those skilled in the art that the structure shown in fig. 1 is merely illustrative and not limiting of the structure of the mobile terminal described above. For example, the mobile terminal 10 may also include more or fewer components than shown in FIG. 1 or have a different configuration than shown in FIG. 1.
The memory 104 may be used to store a computer program, for example, a software program of application software and a module, such as a computer program corresponding to the encoder-side embodiment provided in embodiment 1 and the decoder-side embodiment provided in embodiment 2, and the processor 102 executes the computer program stored in the memory 104 to perform various functional applications and data processing, that is, implement the above-mentioned method. Memory 104 may include high-speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory 104 may further include memory located remotely from the processor 102, which may be connected to the mobile terminal 10 via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The transmission means 106 is arranged to receive or transmit data via a network. The specific examples of networks described above may include wireless networks provided by the communication provider of the mobile terminal 10. In one example, the transmission device 106 includes a network adapter (Network Interface Controller, simply referred to as a NIC) that can connect to other network devices through a base station to communicate with the internet. In one example, the transmission device 106 may be a Radio Frequency (RF) module, which is used to communicate with the internet wirelessly.
The residual coding scheme is described below from the encoder side and the decoder side by embodiments 1 and 2, respectively.
Example 1
In this embodiment, a residual coding method is provided, fig. 2 is a flowchart of the residual coding method according to embodiment 1 of the present invention, and as shown in fig. 2, the flowchart includes the following steps:
Step S202, determining a residual error (in the embodiment of the present invention, this term is also called residual error, residual error block, reference residual error block) of a reference block to be encoded, wherein the reference block of the block to be encoded comprises at least two first reference blocks of the block to be encoded in the time domain, or at least two second reference blocks of the block to be encoded in the space domain, or at least one first reference block of the block to be encoded in the time domain and at least one second reference block of the block to be encoded in the space domain;
step S204, predicting the residual error of the block to be encoded based on the residual error of the reference block of the block to be encoded, to obtain a predicted residual error of the block to be encoded;
Step S206, encoding a residual difference value between the prediction residual of the block to be encoded and an actual residual of the block to be encoded into a bitstream (e.g., a video inter-coded bitstream).
The residual of the reference block mentioned in step S202 refers to the actual coded residual of the reference block itself obtained in the conventional way during the encoding process, which is available by technical means because the reference block has already been encoded.
The actual residual of the block to be encoded mentioned in step S206 is a difference between a pixel value of an original image block of the block to be encoded and a pixel value of a prediction block of the block to be encoded, wherein the prediction block of the block to be encoded is a block obtained after motion compensation of the reference block of the block to be encoded.
Through the steps, the residual error of the block to be coded is predicted according to the residual error of the reference block to obtain the predicted residual error, and the residual error difference between the predicted residual error and the actual residual error is coded into the code stream.
Regarding the selection of reference blocks for a block to be encoded, in view of the continuity of the video image, selecting an appropriate reference block enables an accurate prediction of the residual of the block to be encoded.
In at least one exemplary embodiment, in the case that the image frame where the block to be encoded is located is a P frame, the first reference block of the block to be encoded in the time domain may include an optimal prediction unit block PU of the block to be encoded in a forward reference frame, denoted as a first optimal PU, and/or an optimal PU of the block to be encoded in a frame previous to the forward reference frame, denoted as a second optimal PU.
And under the condition that the image frame where the block to be encoded is positioned is a B frame, the first reference block of the block to be encoded in the time domain comprises an optimal PU (polyurethane) of the block to be encoded in a forward reference frame, which is marked as a third optimal PU, and/or an optimal PU of the block to be encoded in a backward reference frame, which is marked as a fourth optimal PU.
In at least one exemplary embodiment, in the case that the image frame in which the block to be encoded is located is a P frame, a first reference block of the block to be encoded in the time domain may be determined by determining the first optimal PU of the block to be encoded in the forward reference frame by motion estimation and determining a motion vector MV of the block to be encoded with respect to the first optimal PU, and determining the second optimal PU in a previous frame of the forward reference frame by motion estimation according to the position of the first optimal PU and the MV.
In at least one exemplary embodiment, the second reference block of the block to be encoded in the spatial domain is located in the image frame where the block to be encoded is located, and is adjacent to the block to be encoded in the spatial domain.
In at least one exemplary embodiment, the second reference block of the block to be encoded in the spatial domain includes a left block and/or an upper block adjacent to the block to be encoded (preferably may be a spatially adjacent left and/or upper block having the same size) in an image frame where the block to be encoded is located.
It will be appreciated by those skilled in the art that in practical applications, the first reference block of the block to be encoded in the time domain and/or the second reference block of the block to be encoded in the spatial domain may be determined in the manner described above. The residual corresponding to the first reference block in the time domain may be referred to as a time domain reference residual block, the residual corresponding to the second reference block in the space domain may be referred to as a space domain reference residual block, and residual information of the block to be encoded may be predicted based on at least one time domain reference residual block and/or at least one space domain reference residual block.
In a preferred embodiment, two time domain reference residual blocks and two spatial domain reference residual blocks may be selected for data symmetry consideration, and residual prediction may be performed based on the four reference residual blocks. At this time, the reference blocks of the block to be encoded include two first reference blocks whose corresponding residuals in the time domain are non-all zeros and two second reference blocks whose corresponding residuals in the space domain are non-all zeros.
In another preferred embodiment, when only one time domain reference residual block and one spatial reference residual block are non-all zero blocks, one time domain reference residual block and one spatial reference residual block can be selected and residual prediction is performed based on the two reference residual blocks. At this time, the reference blocks of the block to be encoded include a first reference block whose corresponding residual error in the time domain is non-all zero and a second reference block whose corresponding residual error in the space domain is non-all zero.
In another preferred embodiment, when only two time domain reference residual blocks and one spatial reference residual block are non-all zero blocks, two time domain reference residual blocks and one spatial reference residual block may be selected and residual prediction may be performed based on the three reference residual blocks. At this time, the reference blocks of the block to be encoded include two first reference blocks whose corresponding residuals in the time domain are non-all zeros and one second reference block whose corresponding residuals in the space domain are non-all zeros.
In another preferred embodiment, when only one time domain reference residual block and two spatial reference residual blocks are non-all zero blocks, one time domain reference residual block and two spatial reference residual blocks may be selected and residual prediction may be performed based on the three reference residual blocks. At this time, the reference blocks of the block to be encoded include one first reference block whose corresponding residual error in the time domain is non-all zero and two second reference blocks whose corresponding residual error in the space domain is non-all zero.
In another preferred embodiment, when only two time-domain reference residual blocks in the time domain are non-all-zero blocks, two time-domain reference residual blocks may be selected and residual prediction may be performed based on the two time-domain reference residual blocks. At this time, the reference blocks of the block to be encoded include two first reference blocks whose corresponding residuals in the time domain are non-all zeros.
In another preferred embodiment, when only two spatial reference residual blocks are non-all zero blocks, two spatial reference residual blocks may be selected and residual prediction may be performed based on the two spatial reference residual blocks. At this time, the reference blocks of the block to be encoded comprise two second reference blocks with non-all-zero corresponding residuals of the block to be encoded in the spatial domain.
The process of predicting the residual error of the block to be coded based on the residual error of the reference block can be implemented in various prediction modes. In at least one exemplary embodiment, predicting the residual of the block to be encoded based on the residual of the reference block of the block to be encoded, the obtaining the prediction residual of the block to be encoded may include one of:
(1) Inputting the residual error of the reference block into a residual error prediction model to obtain a predicted residual error of the block to be coded, wherein the residual error prediction model is a residual error prediction model which can be trained by a deep learning network, that is, the residual error prediction model is obtained by training by the deep learning network based on training samples, and the training samples comprise the residual error of the reference block of the coding block with the known residual error and the actual residual error of the coding block with the known residual error;
(2) And carrying out linear weighting on the residual error of the reference block to obtain a prediction residual error of the block to be coded, wherein the linear weighting comprises single-weight linear weighting or multi-weight linear weighting.
For example, a linear weighted sum of the individual weights may be calculated using the following formula:
ReiPred(i,j)=W1ResiA(i,j)+W2ResiB(i,j)
W1 and W2 are weights, resiA (i, j), resiB (i, j) is the pixel value of the selected reference residual block at pixel point (i, j), and ReiPred (i, j) is the pixel value of the prediction residual block at pixel point (i, j).
For example, a linear weighted sum of multiple weights may be calculated using the following formula:
W1 ij and W2 ij are weight values corresponding to each pixel point of the reference residual block, and they can be obtained through training. ResiA (i, j), resiB (i, j) are the pixel values of the selected reference residual block at pixel point (i, j), and ReiPred (i, j) are the pixel values of the prediction residual block at pixel point (i, j).
From the description of the above embodiments, it will be clear to a person skilled in the art that the method according to the above embodiments may be implemented by means of software plus the necessary general hardware platform, but of course also by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (e.g. ROM/RAM, magnetic disk, optical disk) comprising instructions for causing a terminal device (which may be a mobile phone, a computer, a server, or a network device, etc.) to perform the method according to the embodiments of the present invention.
In this embodiment, a residual coding device is provided corresponding to the residual coding method described above, so as to implement the foregoing embodiments and preferred embodiments, and the description is omitted herein. As used below, the term "module" may be a combination of software and/or hardware that implements a predetermined function. While the means described in the following embodiments are preferably implemented in software, implementation in hardware, or a combination of software and hardware, is also possible and contemplated.
Fig. 3 is a block diagram of a residual coding apparatus according to embodiment 1 of the present invention, and as shown in fig. 3, the apparatus includes:
An encoder-side reference residual determination module 32 configured to determine a residual of a reference block of a block to be encoded, where the reference block of the block to be encoded includes at least two first reference blocks of the block to be encoded in a time domain, or at least two second reference blocks of the block to be encoded in a spatial domain, or at least one first reference block of the block to be encoded in a time domain and at least one second reference block of the block to be encoded in a spatial domain;
The encoder-side residual prediction module 34 is configured to predict the residual of the block to be encoded based on the residual of the reference block of the block to be encoded, so as to obtain a predicted residual of the block to be encoded;
A residual coding module 36 arranged to code a residual difference between the prediction residual of the block to be coded and an actual residual of the block to be coded into a code stream.
It should be noted that each of the above modules may be implemented by software or hardware, and the latter may be implemented by, but not limited to, the above modules all being located in the same processor, or each of the above modules being located in different processors in any combination.
The present embodiment also provides a storage medium having stored therein a computer program, wherein the computer program is arranged to perform the steps in the residual coding method above in the present embodiment when run.
Alternatively, in the present embodiment, the above-described storage medium may be configured to store a computer program for performing the steps of:
S1, determining a residual error (in the embodiment of the invention, the term is also called residual error, residual error block, reference residual error and reference residual error block) of a reference block to be coded, wherein the reference block of the block to be coded comprises at least two first reference blocks of the block to be coded in a time domain, or at least two second reference blocks of the block to be coded in a space domain, or at least one first reference block of the block to be coded in the time domain and at least one second reference block of the block to be coded in the space domain;
S2, predicting the residual error of the block to be encoded based on the residual error of the reference block of the block to be encoded to obtain a predicted residual error of the block to be encoded;
and S3, encoding a residual difference value between the predicted residual of the block to be encoded and an actual residual of the block to be encoded into a code stream (for example, a video inter-coded code stream).
Alternatively, in the present embodiment, the storage medium may include, but is not limited to, a USB flash disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a removable hard disk, a magnetic disk, or an optical disk, etc. various media in which a computer program may be stored.
An embodiment of the invention also provides an electronic device comprising a memory in which a computer program is stored and a processor arranged to run the computer program to perform the steps of the residual coding method described above in this embodiment.
Optionally, the electronic apparatus may further include a transmission device and an input/output device, where the transmission device is connected to the processor, and the input/output device is connected to the processor.
Alternatively, in the present embodiment, the above-described processor may be configured to execute the following steps by a computer program:
S1, determining a residual error (in the embodiment of the invention, the term is also called residual error, residual error block, reference residual error and reference residual error block) of a reference block to be coded, wherein the reference block of the block to be coded comprises at least two first reference blocks of the block to be coded in a time domain, or at least two second reference blocks of the block to be coded in a space domain, or at least one first reference block of the block to be coded in the time domain and at least one second reference block of the block to be coded in the space domain;
S2, predicting the residual error of the block to be encoded based on the residual error of the reference block of the block to be encoded to obtain a predicted residual error of the block to be encoded;
and S3, encoding a residual difference value between the predicted residual of the block to be encoded and an actual residual of the block to be encoded into a code stream (for example, a video inter-coded code stream).
Alternatively, specific examples in this embodiment may refer to examples described in the foregoing embodiments and optional implementations, and this embodiment is not described herein.
Example 2
In this embodiment, a residual decoding method is provided, fig. 4 is a flowchart of the residual decoding method according to embodiment 2 of the present invention, and as shown in fig. 4, the flowchart includes the following steps:
Step S402, a prediction block of a block to be decoded is determined based on a motion vector MV in a code stream (such as a video inter-coded code stream), and a residual error of a reference block of the block to be decoded (in the embodiment of the invention, such a term is also called a residual error, a residual error block, a reference residual error block) is obtained, wherein the reference block of the block to be decoded comprises at least two first reference blocks of the block to be decoded in a time domain, or at least two second reference blocks of the block to be decoded in a space domain, or at least one first reference block of the block to be decoded in a time domain and at least one second reference block of the block to be decoded in a space domain;
step S404, predicting the residual error of the block to be decoded based on the residual error of the reference block of the block to be decoded to obtain a predicted residual error of the block to be decoded;
Step S406, determining an actual residual error of the block to be decoded according to the prediction residual error of the block to be decoded and a residual error difference between the prediction residual error of the block to be decoded and the actual residual error of the block to be decoded, which are parsed from the code stream.
The residual of the reference block mentioned in step S402 refers to the actual coded residual of the reference block itself obtained in the encoding process according to the conventional method, and is available through technical means because the reference block has already been encoded.
In step S402, the prediction block of the block to be decoded is a block obtained after motion compensation of the block to be decoded.
The actual residual of the block to be decoded mentioned in step S406 is the difference between the pixel value of the original image block of the block to be decoded and the pixel value of the prediction block of the block to be decoded.
Through the steps, as the same residual prediction process of the encoder end is distributed at the decoder end, the decoder end predicts the residual of the block to be decoded based on the residual of the reference block of the block to be decoded, and determines the actual residual of the block to be decoded according to the predicted residual and the residual difference carried in the code stream, the code stream based on the code rate can still be correctly recovered to obtain the actual residual of the block to be decoded at the decoder end.
In at least one exemplary embodiment, in the case that the image frame where the block to be decoded is located is a P frame, the first reference block of the block to be decoded in the time domain includes an optimal prediction unit block PU of the block to be decoded in a forward reference frame, denoted as a first optimal PU, and/or an optimal PU of the block to be decoded in a frame previous to the forward reference frame, denoted as a second optimal PU.
In at least one exemplary embodiment, in the case that the image frame where the block to be decoded is located is a B frame, the first reference block of the block to be decoded in the time domain includes an optimal PU of the block to be decoded in a forward reference frame, denoted as a third optimal PU, and/or an optimal PU of the block to be decoded in a backward reference frame, denoted as a fourth optimal PU.
It should be noted that, in order to ensure accuracy of residual prediction, a reference block selection mode identical to that of the encoder is adopted at the decoder to ensure consistency with a prediction residual obtained by residual prediction at the encoder.
In at least one exemplary embodiment, in the case that the image frame where the block to be decoded is located is a P frame, a first reference block of the block to be decoded in the time domain is determined by determining the first optimal PU of the block to be decoded in the forward reference frame according to the MV parsed in the code stream, and determining the second optimal PU in the previous frame of the forward reference frame according to the co-located PU of the first optimal PU in the previous frame of the forward reference frame by motion estimation.
In at least one exemplary embodiment, the second reference block of the block to be decoded in the spatial domain is located in the image frame in which the block to be decoded is located, and is adjacent to the block to be decoded in the spatial domain.
In at least one exemplary embodiment, the second reference block of the block to be decoded in the spatial domain includes a left block and/or an upper block adjacent to the block to be decoded (preferably may be a spatially adjacent left and/or upper block having the same size) in an image frame where the block to be decoded is located.
It will be appreciated by those skilled in the art that in practical applications, the first reference block of the block to be decoded in the time domain and/or the second reference block of the block to be decoded in the spatial domain may be determined in the manner described above. The residual corresponding to the first reference block in the time domain may be referred to as a time domain reference residual block, the residual corresponding to the second reference block in the space domain may be referred to as a space domain reference residual block, and residual information of the block to be decoded may be predicted based on at least one time domain reference residual block and/or at least one space domain reference residual block.
In a preferred embodiment, for data symmetry consideration, two time domain reference residual blocks and two space domain reference residual blocks can be selected, and residual prediction is performed based on the four reference residual blocks, where the reference blocks of the block to be decoded include two first reference blocks whose corresponding residuals in the time domain are non-all zero, and two second reference blocks whose corresponding residuals in the space domain are non-all zero.
In another preferred embodiment, when only one time domain reference residual block and one spatial reference residual block are non-all zero blocks, one time domain reference residual block and one spatial reference residual block can be selected and residual prediction is performed based on the two reference residual blocks. At this time, the reference blocks of the block to be decoded include a first reference block whose corresponding residual error in the time domain is non-all zero and a second reference block whose corresponding residual error in the space domain is non-all zero.
In another preferred embodiment, when only two time domain reference residual blocks and one spatial reference residual block are non-all zero blocks, two time domain reference residual blocks and one spatial reference residual block may be selected and residual prediction may be performed based on the three reference residual blocks. At this time, the reference blocks of the block to be decoded comprise two first reference blocks with non-all-zero corresponding residuals of the block to be decoded in the time domain and one second reference block with non-all-zero corresponding residuals of the block to be decoded in the space domain.
In another preferred embodiment, when only one time domain reference residual block and two spatial reference residual blocks are non-all zero blocks, one time domain reference residual block and two spatial reference residual blocks may be selected and residual prediction may be performed based on the three reference residual blocks. At this time, the reference blocks of the block to be decoded comprise a first reference block with non-all zero corresponding residual errors of the block to be decoded in the time domain and two second reference blocks with non-all zero corresponding residual errors of the block to be decoded in the space domain.
In another preferred embodiment, when only two time-domain reference residual blocks in the time domain are non-all-zero blocks, two time-domain reference residual blocks may be selected and residual prediction may be performed based on the two time-domain reference residual blocks. At this time, the reference blocks of the block to be decoded include two first reference blocks whose corresponding residuals in the time domain are non-all zeros.
In another preferred embodiment, when only two spatial reference residual blocks are non-all zero blocks, two spatial reference residual blocks may be selected and residual prediction may be performed based on the two spatial reference residual blocks. At this time, the reference blocks of the block to be decoded comprise two second reference blocks with non-all-zero corresponding residuals of the block to be decoded in the spatial domain.
The process of predicting the residual error of the block to be decoded based on the residual error of the reference block may be implemented in various prediction modes. In at least one exemplary embodiment, predicting the residual of the block to be decoded based on the residual of the reference block of the block to be decoded, the obtaining the prediction residual of the block to be decoded may include one of:
(1) Inputting the residual error of the reference block into a residual error prediction model to obtain a predicted residual error of the block to be decoded, wherein the residual error prediction model is the same as a residual error prediction model of an encoder end;
(2) And carrying out linear weighting on the residual error of the reference block to obtain a prediction residual error of the block to be decoded, wherein the linear weighting is the same as that of an encoder side.
For example, a linear weighted sum of the individual weights may be calculated using the following formula:
ReiPred(i,j)=W1ResiA(i,j)+W2ResiB(i,j)
W1 and W2 are weights, resiA (i, j), resiB (i, j) is the pixel value of the selected reference residual block at pixel point (i, j), and ReiPred (i, j) is the pixel value of the prediction residual block at pixel point (i, j).
For example, a linear weighted sum of multiple weights may be calculated using the following formula:
W1 ij and W2 ij are weight values corresponding to each pixel point of the reference residual block, and they can be obtained through training. ResiA (i, j), resiB (i, j) are the pixel values of the selected reference residual block at pixel point (i, j), and ReiPred (i, j) are the pixel values of the prediction residual block at pixel point (i, j).
In at least one exemplary embodiment, after determining the actual residual of the block to be decoded, a residual difference between the predicted residual of the block to be decoded and the predicted residual of the block to be decoded parsed from the code stream and the actual residual of the block to be decoded, further comprises adding the actual residual to the predicted block of the block to be decoded, and recovering the original image block of the block to be decoded.
From the description of the above embodiments, it will be clear to a person skilled in the art that the method according to the above embodiments may be implemented by means of software plus the necessary general hardware platform, but of course also by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (e.g. ROM/RAM, magnetic disk, optical disk) comprising instructions for causing a terminal device (which may be a mobile phone, a computer, a server, or a network device, etc.) to perform the method according to the embodiments of the present invention.
In this embodiment, a residual decoding device is provided corresponding to the residual decoding method described above, so as to implement the foregoing embodiments and preferred embodiments, and the description thereof will not be repeated. As used below, the term "module" may be a combination of software and/or hardware that implements a predetermined function. While the means described in the following embodiments are preferably implemented in software, implementation in hardware, or a combination of software and hardware, is also possible and contemplated.
Fig. 5 is a block diagram of the structure of a residual decoding apparatus according to embodiment 2 of the present invention, as shown in fig. 5, the apparatus includes:
The decoder-side reference residual determining module 52 is configured to determine a prediction block of a block to be decoded based on a motion vector MV parsed in a code stream, and obtain a residual of a reference block of the block to be decoded, where the reference block of the block to be decoded includes at least two first reference blocks of the block to be decoded in a time domain, or at least two second reference blocks of the block to be decoded in a space domain, or at least one first reference block of the block to be decoded in a time domain and at least one second reference block of the block to be decoded in a space domain;
A decoder-side residual prediction module 54, configured to predict a residual of the block to be decoded based on a residual of the reference block of the block to be decoded, to obtain a predicted residual of the block to be decoded;
The residual decoding module 56 is configured to determine an actual residual of the block to be decoded according to a residual difference between the predicted residual of the block to be decoded and the actual residual of the block to be decoded parsed from the code stream.
It should be noted that each of the above modules may be implemented by software or hardware, and the latter may be implemented by, but not limited to, the above modules all being located in the same processor, or each of the above modules being located in different processors in any combination.
The present embodiment also provides a storage medium having a computer program stored therein, wherein the computer program is arranged to perform the steps of the residual decoding method described above in the present embodiment when run.
Alternatively, in the present embodiment, the above-described storage medium may be configured to store a computer program for performing the steps of:
S1, determining a predicted block of a block to be decoded based on a motion vector MV in a code stream (such as a video inter-coded code stream), and acquiring a residual error (in the embodiment of the invention, the term is also called a residual error, a residual error block, a reference residual error and a reference residual error block) of a reference block of the block to be decoded, wherein the reference block of the block to be decoded comprises at least two first reference blocks of the block to be decoded in a time domain, or at least two second reference blocks of the block to be decoded in a space domain, or at least one first reference block of the block to be decoded in the time domain and at least one second reference block of the block to be decoded in the space domain;
S2, predicting the residual error of the block to be decoded based on the residual error of the reference block of the block to be decoded to obtain a predicted residual error of the block to be decoded;
s3, determining the actual residual error of the block to be decoded according to the predicted residual error of the block to be decoded and the residual error difference between the predicted residual error of the block to be decoded and the actual residual error of the block to be decoded, which are analyzed from the code stream.
Alternatively, in the present embodiment, the storage medium may include, but is not limited to, a USB flash disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a removable hard disk, a magnetic disk, or an optical disk, etc. various media in which a computer program may be stored.
The present embodiment also provides an electronic device comprising a memory having stored therein a computer program and a processor arranged to run the computer program to perform the steps of the residual decoding method described above in the present embodiment.
Optionally, the electronic apparatus may further include a transmission device and an input/output device, where the transmission device is connected to the processor, and the input/output device is connected to the processor.
Alternatively, in the present embodiment, the above-described processor may be configured to execute the following steps by a computer program:
S1, determining a predicted block of a block to be decoded based on a motion vector MV in a code stream (such as a video inter-coded code stream), and acquiring a residual error (in the embodiment of the invention, the term is also called a residual error, a residual error block, a reference residual error and a reference residual error block) of a reference block of the block to be decoded, wherein the reference block of the block to be decoded comprises at least two first reference blocks of the block to be decoded in a time domain, or at least two second reference blocks of the block to be decoded in a space domain, or at least one first reference block of the block to be decoded in the time domain and at least one second reference block of the block to be decoded in the space domain;
S2, predicting the residual error of the block to be decoded based on the residual error of the reference block of the block to be decoded to obtain a predicted residual error of the block to be decoded;
s3, determining the actual residual error of the block to be decoded according to the predicted residual error of the block to be decoded and the residual error difference between the predicted residual error of the block to be decoded and the actual residual error of the block to be decoded, which are analyzed from the code stream.
Alternatively, specific examples in this embodiment may refer to examples described in the foregoing embodiments and optional implementations, and this embodiment is not described herein.
The following example 3 describes in detail a specific implementation of the residual coding scheme taking as an example the training of a residual prediction model (which may also be referred to as residual prediction module) by a deep learning network. It should be noted that, for the scheme of performing residual prediction by using a linear weighted sum method, the principle of overall residual encoding and decoding is similar to the following embodiments, and will not be described again.
Example 3
The present embodiment designs a method for improving video inter-frame coding performance by using space-time correlation for a coding unit of video coding, and in practical use, an encoder will call the method to complete inter-frame prediction. The details of what is involved in this method are described below.
(1) Residual prediction module
The residual prediction module may be generated by a convolutional neural network. Fig. 6 is a schematic diagram of a convolutional neural network for generating a residual prediction module according to embodiment 3 of the present invention, as shown in fig. 6, where the convolutional neural network takes two time domain reference residual blocks and two space domain reference residual blocks related to a current coding block as inputs (other numbers of time domain and space domain reference residual blocks, only time domain reference residual blocks or only space domain reference residual blocks may be used), and extracts feature information of a residual image through convolution and pooling, and outputs a prediction residual through a deconvolution process.
The convolutional neural network is input into four reference residual blocks, two reference frames in the time domain are selected, and two adjacent airspace are selected.
In the time domain, different operations are performed according to different prediction modes of the P frame and the B frame. Fig. 7 is a schematic diagram showing selection of a time domain reference residual block in the case of P-frame and B-frame according to embodiment 3 of the present invention, as shown in fig. 7:
For P-frames, since it is unidirectional Prediction, there is only one forward reference frame, and the Motion Vector (MV) of the current coding block is generated by first finding the optimal Prediction Unit (PU) in the forward reference frame by motion estimation. Then, based on the position of the optimal reference PU of the current coding block and the MV of the current coding block, another optimal reference PU is searched again from the previous frame of the forward reference frame through motion estimation, and the residual blocks corresponding to the two optimal reference PUs are input as two time domain reference residual blocks of the P frame.
For B frames, there are a forward reference frame and a backward reference frame, in which the residual blocks of an optimal PU are each found by motion estimation as temporal reference residual block inputs.
In the spatial domain, fig. 8 is a schematic diagram illustrating selection of spatial reference residual blocks according to embodiment 3 of the present invention. As shown in fig. 8, the left and upper blocks adjacent to the current block and having the same size are selected as reference blocks, and the residual block corresponding to the reference block is used as the input of the spatial domain reference residual block.
The output of the residual prediction module is a prediction residual block of the current block, and at the encoder end, the prediction residual block can be subtracted from the actual residual block to obtain a coded residual. At the decoder side, the prediction residual block plus the coded residual can recover the actual residual block.
(2) Inter-frame coding block residual prediction method based on space-time correlation
Fig. 9 is a schematic process diagram of a method for improving video inter-frame coding performance by using space-time correlation according to embodiment 3 of the present invention, in which a residual prediction module is implemented by a neural network as an example, and residual encoding and decoding operations at the encoder side and the decoder side are shown, and are described in detail below, respectively.
Operation of encoder side:
Firstly, pre-coding a video sequence provided by a universal measurement standard, and extracting residual data of four reference blocks and residual data of a current block as training data. The deep neural network embodiment constructed with fig. 6 is used for network training, and then the trained deep neural network is embedded into the inter-frame prediction of the encoder.
And secondly, reading a block to be encoded from an input video image, obtaining a predicted block of the block to be encoded through motion estimation prediction, and subtracting a pixel value of the predicted block from a pixel value of an original image to obtain a traditional residual block.
Thirdly, according to the difference of the prediction modes of the P frame and the B frame, the motion vector MV obtained through motion estimation searches for two time domain reference residual blocks shown in fig. 7, then two airspace reference residual blocks are obtained according to fig. 8, and the obtained four reference residual blocks are input into a residual prediction module to obtain a prediction residual block.
And fourthly, taking the difference value between the traditional residual error block and the predicted residual error block as the coding residual error of the current block to be coded, and writing the coding residual error into the code stream file. In this step, the conventional residual block is a residual value to be encoded, which is obtained by the current encoding block according to the conventional inter-frame prediction method, and the prediction residual is a residual value predicted by the residual prediction module.
In summary, it can be seen that, at the encoder end, a reference block is found in a reference frame by using motion estimation, and after motion compensation operation, a predicted block is obtained, where the difference between the original block and the predicted block is the actual residual block in the conventional inter prediction mode. In this process a residual prediction module is established to predict the residual of the current block. The input of the residual prediction module is two reference residual blocks on the time domain and the space domain corresponding to the reference block of the current coding block, the output of the residual prediction module is the prediction residual of the current block, and the difference value between the actual residual block and the prediction residual block is used as the coding residual to carry out subsequent transformation, quantization and entropy coding.
Operation of decoder side:
In a first step, a residual prediction module is embedded in the decoder inter prediction.
Secondly, obtaining a prediction block of a current block to be decoded through the motion vector MV analyzed by the code stream, then finding two time domain reference residual blocks shown in fig. 7, obtaining two space domain reference residual blocks shown in fig. 8, and inputting the obtained four reference residual blocks into a residual prediction module to obtain the prediction residual block. In this step, four reference residual blocks corresponding to the current block are generated in a manner consistent with the four reference residual blocks at the encoder side.
And thirdly, reading the coded residual error of the block to be decoded from the code stream, adding the predicted residual error block to the coded residual error to restore the actual residual error of the block to be decoded, and adding the predicted block to restore the original image block.
In summary, it can be seen that in the inter prediction decoding operation at the decoder side, the same residual prediction module as the encoder is established to predict the residual of the current block. The input of the residual prediction module is two reference residual blocks on the time domain and the space domain corresponding to the reference block of the current decoding block, the searching mode of the reference block is the same as that of the encoder, the output is the prediction residual of the current decoding block, and the prediction residual is added with the decoded residual and the prediction block of the block to be decoded to finish the reconstruction work of the current block.
It will be appreciated by those skilled in the art that the modules or steps of the invention described above may be implemented in a general purpose computing device, they may be concentrated on a single computing device, or distributed across a network of computing devices, they may alternatively be implemented in program code executable by computing devices, so that they may be stored in a memory device for execution by computing devices, and in some cases, the steps shown or described may be performed in a different order than that shown or described, or they may be separately fabricated into individual integrated circuit modules, or multiple modules or steps within them may be fabricated into a single integrated circuit module for implementation. Thus, the present invention is not limited to any specific combination of hardware and software.
The above description is only of the preferred embodiments of the present invention and is not intended to limit the present invention, but various modifications and variations can be made to the present invention by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the principle of the present invention should be included in the protection scope of the present invention.
Claims (17)
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910663278.XA CN112261409B (en) | 2019-07-22 | 2019-07-22 | Residual encoding, decoding method and device, storage medium and electronic device |
PCT/CN2020/100558 WO2021012942A1 (en) | 2019-07-22 | 2020-07-07 | Residual coding method and device, residual decoding method and device, storage medium, and electronic device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910663278.XA CN112261409B (en) | 2019-07-22 | 2019-07-22 | Residual encoding, decoding method and device, storage medium and electronic device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112261409A CN112261409A (en) | 2021-01-22 |
CN112261409B true CN112261409B (en) | 2024-12-20 |
Family
ID=74193180
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910663278.XA Active CN112261409B (en) | 2019-07-22 | 2019-07-22 | Residual encoding, decoding method and device, storage medium and electronic device |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN112261409B (en) |
WO (1) | WO2021012942A1 (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115190306A (en) * | 2021-04-01 | 2022-10-14 | Oppo广东移动通信有限公司 | Image processing method, image processing device, storage medium and electronic equipment |
CN115695812A (en) * | 2021-07-30 | 2023-02-03 | 中兴通讯股份有限公司 | Video encoding, video decoding method, device, electronic device and storage medium |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101350927A (en) * | 2008-07-29 | 2009-01-21 | 北京中星微电子有限公司 | Method and apparatus for forecasting and selecting optimum estimation mode in a frame |
CN102196256A (en) * | 2010-03-11 | 2011-09-21 | 中国科学院微电子研究所 | A video coding method and device |
Family Cites Families (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR100763181B1 (en) * | 2005-04-19 | 2007-10-05 | 삼성전자주식회사 | Method and apparatus for improving coding rate by coding prediction information from base layer and enhancement layer |
KR100678911B1 (en) * | 2005-07-21 | 2007-02-05 | 삼성전자주식회사 | Method and apparatus for encoding and decoding video signals by extending the application of directional intra prediction |
WO2007024106A1 (en) * | 2005-08-24 | 2007-03-01 | Samsung Electronics Co., Ltd. | Method for enhancing performance of residual prediction and video encoder and decoder using the same |
CN103037220B (en) * | 2008-01-04 | 2016-01-13 | 华为技术有限公司 | Video coding, coding/decoding method and device and processing system for video |
EP2343901B1 (en) * | 2010-01-08 | 2017-11-29 | BlackBerry Limited | Method and device for video encoding using predicted residuals |
CN102148989B (en) * | 2011-04-22 | 2012-07-25 | 西安交通大学 | Method for detecting all-zero blocks in H.264 |
WO2013009716A2 (en) * | 2011-07-08 | 2013-01-17 | Dolby Laboratories Licensing Corporation | Hybrid encoding and decoding methods for single and multiple layered video coding systems |
GB2506853B (en) * | 2012-09-28 | 2015-03-18 | Canon Kk | Method and apparatus for encoding an image into a video bitstream and decoding corresponding video bitstream |
KR20140072939A (en) * | 2012-12-04 | 2014-06-16 | 광운대학교 산학협력단 | Method and apparatus for residual prediction for multi-view video coding |
CN104244002B (en) * | 2013-06-14 | 2019-02-05 | 北京三星通信技术研究有限公司 | The acquisition methods and device of motion information in a kind of video coding/decoding |
CN104427345B (en) * | 2013-09-11 | 2019-01-08 | 华为技术有限公司 | Acquisition methods, acquisition device, Video Codec and its method of motion vector |
CN104702954B (en) * | 2013-12-05 | 2017-11-17 | 华为技术有限公司 | Method for video coding and device |
FI3958572T3 (en) * | 2014-01-02 | 2024-03-13 | Dolby Laboratories Licensing Corp | Method for encoding multi-view video, method for decoding multi-view video and recording medium therefore |
CN103916672B (en) * | 2014-03-21 | 2018-03-13 | 华为技术有限公司 | A kind of data decoding method, relevant apparatus and system |
EP3453178A1 (en) * | 2016-05-06 | 2019-03-13 | VID SCALE, Inc. | Systems and methods for motion compensated residual prediction |
CN108848380B (en) * | 2018-06-20 | 2021-11-30 | 腾讯科技(深圳)有限公司 | Video encoding and decoding method, device, computer device and storage medium |
-
2019
- 2019-07-22 CN CN201910663278.XA patent/CN112261409B/en active Active
-
2020
- 2020-07-07 WO PCT/CN2020/100558 patent/WO2021012942A1/en active Application Filing
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101350927A (en) * | 2008-07-29 | 2009-01-21 | 北京中星微电子有限公司 | Method and apparatus for forecasting and selecting optimum estimation mode in a frame |
CN102196256A (en) * | 2010-03-11 | 2011-09-21 | 中国科学院微电子研究所 | A video coding method and device |
Also Published As
Publication number | Publication date |
---|---|
WO2021012942A1 (en) | 2021-01-28 |
CN112261409A (en) | 2021-01-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107809642B (en) | Method for encoding and decoding video image, encoding device and decoding device | |
CN107534770B (en) | Image prediction method and relevant device | |
US20240098298A1 (en) | Segmentation-based parameterized motion models | |
TW201526617A (en) | Method and system for image processing, decoding method, encoder and decoder | |
CN107205156B (en) | Motion vector prediction by scaling | |
WO2019114721A1 (en) | Interframe prediction method and device for video data | |
US11115678B2 (en) | Diversified motion using multiple global motion models | |
CN110692246A (en) | Motion Field Based Reference Frame Rendering for Motion Compensated Prediction in Video Coding | |
CN107018412B (en) | A kind of DVC-HEVC video transcoding method based on key frame coding unit partition mode | |
CN114449286A (en) | Video coding method, decoding method and device | |
CN112261409B (en) | Residual encoding, decoding method and device, storage medium and electronic device | |
US20210337184A1 (en) | Candidate motion vector list construction method and device thereof | |
CN109151476B (en) | A method and device for generating reference frame of B-frame image based on bidirectional prediction | |
JP7437426B2 (en) | Inter prediction method and device, equipment, storage medium | |
CN110351560A (en) | A kind of coding method, system and electronic equipment and storage medium | |
CN117337569A (en) | Symmetric affine pattern | |
CN101790848B (en) | Selection of decoding function assigned to decoder | |
CN110392264A (en) | An Alignment and Extrapolation Frame Method Based on Neural Network | |
WO2020043004A1 (en) | Construction method for candidate motion information list, inter-frame prediction method, and apparatus | |
Shi et al. | Comprehensive Review of End-to-End Video Compression | |
CN107277508B (en) | Pixel-level bidirectional intra-frame prediction method adopting self-adaptive mode selection | |
CN119402639A (en) | Video encoding method, video decoding device, video encoding apparatus, video decoding apparatus, video encoding device, video decoding medium, and video decoding product | |
CN119278623A (en) | Method, device and medium for video processing | |
CN112714312A (en) | Encoding mode selection method, device and readable storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |