CN113489978A - Distortion optimization quantization circuit for AVS3 - Google Patents
Distortion optimization quantization circuit for AVS3 Download PDFInfo
- Publication number
- CN113489978A CN113489978A CN202110584035.4A CN202110584035A CN113489978A CN 113489978 A CN113489978 A CN 113489978A CN 202110584035 A CN202110584035 A CN 202110584035A CN 113489978 A CN113489978 A CN 113489978A
- Authority
- CN
- China
- Prior art keywords
- coefficient
- zero
- unit
- decision
- quantization
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000013139 quantization Methods 0.000 title claims abstract description 49
- 238000005457 optimization Methods 0.000 title abstract description 9
- 230000009466 transformation Effects 0.000 abstract description 12
- 238000010586 diagram Methods 0.000 description 5
- 238000000034 method Methods 0.000 description 4
- 239000012141 concentrate Substances 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/134—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
- H04N19/146—Data rate or code amount at the encoder output
- H04N19/147—Data rate or code amount at the encoder output according to rate distortion criteria
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/124—Quantisation
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/42—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/42—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation
- H04N19/423—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation characterised by memory arrangements
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
Abstract
The present disclosure relates to a distortion optimized quantization circuit for AVS3, applied to a small-sized transform block of 16x16 and its following sizes, comprising: the register is used for storing all the transformation coefficients of the transformation block and the corresponding zigzag scanning coordinates; the distortion optimization quantization circuit consists of a pre-quantization unit, a decision optimal coefficient unit, a decision final non-zero coefficient position unit and a zero setting unit. The present disclosure also relates to a distortion optimized quantization circuit for AVS3, applied to a large size transform block of size 16x16 or larger, comprising: an SRAM for storing all transform coefficients of the transform block; the distortion optimization quantization circuit consists of a pre-quantization unit, a decision optimal coefficient unit, a decision final non-zero coefficient position unit and a zero setting unit.
Description
Technical Field
The present disclosure relates to the field of coding circuit technology, and more particularly, to a distortion optimized quantization circuit for AVS 3.
Background
Rate-distortion optimized quantization (RDOQ) is a very important technique in the AVS3 video coding standard, and combines rate-distortion optimization (RDO) with the ordinary scalar quantization technique in video coding to obtain better quantization result, thereby improving the overall coding performance. However, due to the naturally strong data dependency of RDOQ, it is difficult for the module to meet timing requirements in hardware design. In addition, a larger transform block size is introduced into the new generation of encoding standard AVS3, and how to implement RDOQ operations of all sizes while ensuring a reasonable circuit area is also a big difficulty in hardware design, corresponding to RDOQ operations of sizes from 4x4 to 64x 64.
Disclosure of Invention
The method aims to solve the technical problem that the prior art cannot meet the requirement of meeting the real-time performance of 4k30 frames under the condition of reasonable circuit area.
To achieve the above technical object, the present disclosure provides a distortion optimized quantization circuit for AVS3, applied to a small-sized transform block of 16x16 and the following sizes, comprising:
the register is used for storing all the transformation coefficients of the transformation block and the corresponding zigzag scanning coordinates;
the distortion optimization quantization circuit consists of a pre-quantization unit, a decision optimal coefficient unit, a decision final non-zero coefficient position unit and a zero setting unit.
Further, the pre-quantization unit is specifically configured to perform standard quantization in advance.
Further, the decision optimal coefficient unit is specifically configured to perform an optimal coefficient adjustment decision on the data quantized by the pre-quantization unit.
Further, the decision final non-zero coefficient position unit is specifically configured to perform conditional judgment on each coefficient in the scanning area; and for the coefficient which meets the condition judgment, calculating a distortion value of the coefficient with zero as a rate distortion cost, and for the coefficient which does not meet the condition judgment, calculating the distortion value of the coefficient with zero and a bit required by a mark for coding whether the coefficient is zero as the rate distortion cost.
Further, the performing condition judgment on each coefficient in the scanning area specifically includes:
if the coefficient is at the lower boundary or the right boundary of the scanning area, and the coefficient is the only non-zero coefficient in the row corresponding to the lower boundary or the column corresponding to the right boundary, the coefficient accords with the condition judgment;
if the coefficient is not positioned at the lower boundary or the right boundary of the scanning area, the coefficient is not in accordance with the condition judgment;
and if the coefficient is positioned at the lower boundary or the right boundary of the scanning area, and the coefficient is not the only non-zero coefficient in the row corresponding to the lower boundary or the column corresponding to the right boundary, the coefficient is not consistent with the condition.
To solve the above technical problem, the present disclosure further provides a distortion optimized quantization circuit for AVS3, applied to a large-sized transform block with a size of 16x16 or larger, including:
an SRAM for storing all transform coefficients of the transform block;
the distortion optimization quantization circuit consists of a pre-quantization unit, a decision optimal coefficient unit, a decision final non-zero coefficient position unit and a zero setting unit.
Further, the pre-quantization unit is specifically configured to perform standard quantization in advance.
Further, the decision optimal coefficient unit is specifically configured to perform an optimal coefficient adjustment decision on the data quantized by the pre-quantization unit.
Further, the decision final non-zero coefficient position unit is specifically configured to perform conditional judgment on each coefficient in the scanning area; and for the coefficient which meets the condition judgment, calculating a distortion value of the coefficient with zero as a rate distortion cost, and for the coefficient which does not meet the condition judgment, calculating the distortion value of the coefficient with zero and a bit required by a mark for coding whether the coefficient is zero as the rate distortion cost.
Further, the performing condition judgment on each coefficient in the scanning area specifically includes:
if the coefficient is at the lower boundary or the right boundary of the scanning area, and the coefficient is the only non-zero coefficient in the row corresponding to the lower boundary or the column corresponding to the right boundary, the coefficient accords with the condition judgment;
if the coefficient is not positioned at the lower boundary or the right boundary of the scanning area, the coefficient is not in accordance with the condition judgment;
and if the coefficient is positioned at the lower boundary or the right boundary of the scanning area, and the coefficient is not the only non-zero coefficient in the row corresponding to the lower boundary or the column corresponding to the right boundary, the coefficient is not consistent with the condition.
The beneficial effect of this disclosure does:
the scheme realizes the design of the RDOQ circuit of the AVS3 with different sizes, and finally effectively realizes the balance between the circuit area and the speed, for the 64x64 size circuit, the pipeline structure is shown in figure 1, the maximum parallelism of the circuit is 64, 15 cycles are needed for reading, pre-quantizing, deciding the optimal coefficient, deciding the position of the last non-zero coefficient, setting zero and writing out data of 64 data in parallel, 79 cycles (64+15) are needed for finishing the TU with the size of 64x64, the final circuit is integrated through Vivado HLS2019.2, and the circuit resource and the cycle number consumed by each size are shown in a table I.
Table one, comprehensive results of each size circuit
Size of | Number of cycles | BRAM | DSP | FF | LUT |
4x4 | 5 | 1 | 1100 | 27353 | 101651 |
8x8 | 10 | 4 | 3689 | 100003 | 251796 |
16x16 | 15 | 118 | 1406 | 87393 | 218651 |
32x32 | 40 | 166 | 1136 | 159009 | 432429 |
64x64 | 79 | 168 | 1154 | 271210 | 717142 |
Drawings
FIG. 1 shows a schematic diagram of a pipeline architecture for a 64x64 sized circuit;
fig. 2 shows a schematic structural diagram of embodiment 1 of the present disclosure;
fig. 3 shows a schematic structural diagram of embodiment 2 of the present disclosure;
fig. 4 shows a schematic diagram of a ladder-type data reading structure of embodiment 2 of the present disclosure;
FIG. 5 shows a schematic view of a zigzag scanning sequence of embodiment 2 of the present disclosure;
fig. 6 shows a schematic diagram of embodiment 2 of the present disclosure, which combines the characteristic of DCT transform to concentrate energy in the upper left corner to further reduce the amount of data that needs to be processed by large-size circuits.
Detailed Description
Hereinafter, embodiments of the present disclosure will be described with reference to the accompanying drawings. It should be understood that the description is illustrative only and is not intended to limit the scope of the present disclosure. Moreover, in the following description, descriptions of well-known structures and techniques are omitted so as to not unnecessarily obscure the concepts of the present disclosure.
Various structural schematics according to embodiments of the present disclosure are shown in the figures. The figures are not drawn to scale, wherein certain details are exaggerated and possibly omitted for clarity of presentation. The shapes of various regions, layers, and relative sizes and positional relationships therebetween shown in the drawings are merely exemplary, and deviations may occur in practice due to manufacturing tolerances or technical limitations, and a person skilled in the art may additionally design regions/layers having different shapes, sizes, relative positions, as actually required.
The first embodiment is as follows:
as shown in fig. 2:
the present disclosure provides a distortion optimized quantization circuit for AVS3, applied to a small-sized transform block of 16x16 and the following sizes, comprising:
the register is used for storing all the transformation coefficients of the transformation block and the corresponding zigzag scanning coordinates;
the distortion optimization quantization circuit consists of a pre-quantization unit, a decision optimal coefficient unit, a decision final non-zero coefficient position unit and a zero setting unit.
Further, the pre-quantization unit is specifically configured to perform standard quantization in advance.
Further, the decision optimal coefficient unit is specifically configured to perform an optimal coefficient adjustment decision on the data quantized by the pre-quantization unit.
Further, the decision final non-zero coefficient position unit is specifically configured to perform conditional judgment on each coefficient in the scanning area; and for the coefficient which meets the condition judgment, calculating a distortion value of the coefficient with zero as a rate distortion cost, and for the coefficient which does not meet the condition judgment, calculating the distortion value of the coefficient with zero and a bit required by a mark for coding whether the coefficient is zero as the rate distortion cost.
Further, the performing condition judgment on each coefficient in the scanning area specifically includes:
if the coefficient is at the lower boundary or the right boundary of the scanning area, and the coefficient is the only non-zero coefficient in the row corresponding to the lower boundary or the column corresponding to the right boundary, the coefficient accords with the condition judgment;
if the coefficient is not positioned at the lower boundary or the right boundary of the scanning area, the coefficient is not in accordance with the condition judgment;
and if the coefficient is positioned at the lower boundary or the right boundary of the scanning area, and the coefficient is not the only non-zero coefficient in the row corresponding to the lower boundary or the column corresponding to the right boundary, the coefficient is not consistent with the condition.
For a small-size transformation block (TU) with the size of 16x16 and the size below the transformation block, all transformation coefficients and corresponding zigzag scanning coordinates are sent into a register, a small-size RDOQ circuit reads all data in the register in parallel to perform four steps of pre-quantization, optimal coefficient decision, final non-zero coefficient position decision and zero setting, and finally obtained results are output to the register in parallel.
Example two:
as shown in fig. 3:
the present disclosure also provides a distortion optimized quantization circuit for AVS3, applied to a large-size transform block of size 16x16 or larger, comprising:
an SRAM for storing all transform coefficients of the transform block;
the distortion optimization quantization circuit consists of a pre-quantization unit, a decision optimal coefficient unit, a decision final non-zero coefficient position unit and a zero setting unit.
Further, the pre-quantization unit is specifically configured to perform standard quantization in advance.
Further, the decision optimal coefficient unit is specifically configured to perform an optimal coefficient adjustment decision on the data quantized by the pre-quantization unit.
Further, the decision final non-zero coefficient position unit is specifically configured to perform conditional judgment on each coefficient in the scanning area; and for the coefficient which meets the condition judgment, calculating a distortion value of the coefficient with zero as a rate distortion cost, and for the coefficient which does not meet the condition judgment, calculating the distortion value of the coefficient with zero and a bit required by a mark for coding whether the coefficient is zero as the rate distortion cost.
Further, the performing condition judgment on each coefficient in the scanning area specifically includes:
if the coefficient is at the lower boundary or the right boundary of the scanning area, and the coefficient is the only non-zero coefficient in the row corresponding to the lower boundary or the column corresponding to the right boundary, the coefficient accords with the condition judgment;
if the coefficient is not positioned at the lower boundary or the right boundary of the scanning area, the coefficient is not in accordance with the condition judgment;
and if the coefficient is positioned at the lower boundary or the right boundary of the scanning area, and the coefficient is not the only non-zero coefficient in the row corresponding to the lower boundary or the column corresponding to the right boundary, the coefficient is not consistent with the condition.
For a large-size transformation block TU with the size of more than 16x16, the circuit area generated by a scheme of processing all transformation coefficients in a full parallel mode is unacceptable, for a large-size TU with the size of MxN, all the transformation coefficients are stored in an SRAM with M banks, a certain amount of data are read from the SRAM in each period through a ladder-type data reading structure and are sent to a large-size RDOQ circuit for processing, and finally the obtained result is written into the SRAM through a ladder-type data writing structure. The function of the ladder-type data reading structure is to make the sequence of reading data by the circuit satisfy the zigzag scanning sequence, and at the same time, limit the parallelism of the circuit, as shown in fig. 4, the first 8 banks of MxN size are intercepted, in the figure, the data of the same color is read in the same period, and the number in the square represents the period number of reading the data. It can be seen that the order in which the circuits read the data follows the zigzag scan order shown in fig. 5, and that the maximum parallelism of the circuits is limited to N for one MxN size circuit.
As shown in fig. 6, the characteristic that the DCT transform concentrates energy in the upper left corner is combined to further reduce the amount of data that needs to be processed by the large-size circuit, and the circuit performs RDOQ operation only on the data of the black part in the upper left corner and directly clears the data of the white part in the lower right corner.
The embodiments of the present disclosure have been described above. However, these examples are for illustrative purposes only and are not intended to limit the scope of the present disclosure. The scope of the disclosure is defined by the appended claims and equivalents thereof. Various alternatives and modifications can be devised by those skilled in the art without departing from the scope of the present disclosure, and such alternatives and modifications are intended to be within the scope of the present disclosure.
Claims (10)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110584035.4A CN113489978A (en) | 2021-05-27 | 2021-05-27 | Distortion optimization quantization circuit for AVS3 |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110584035.4A CN113489978A (en) | 2021-05-27 | 2021-05-27 | Distortion optimization quantization circuit for AVS3 |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113489978A true CN113489978A (en) | 2021-10-08 |
Family
ID=77933169
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110584035.4A Pending CN113489978A (en) | 2021-05-27 | 2021-05-27 | Distortion optimization quantization circuit for AVS3 |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113489978A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114786010A (en) * | 2022-03-07 | 2022-07-22 | 杭州未名信科科技有限公司 | Rate distortion optimization quantization method and device, storage medium and electronic equipment |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103748876A (en) * | 2011-04-22 | 2014-04-23 | 汤姆逊许可公司 | Method and device for lossy compress-encoding data and corresponding method and device for reconstructing data |
CN111787324A (en) * | 2020-06-29 | 2020-10-16 | 北京大学 | Rate-distortion optimized quantization method, coding method and apparatus |
-
2021
- 2021-05-27 CN CN202110584035.4A patent/CN113489978A/en active Pending
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103748876A (en) * | 2011-04-22 | 2014-04-23 | 汤姆逊许可公司 | Method and device for lossy compress-encoding data and corresponding method and device for reconstructing data |
CN111787324A (en) * | 2020-06-29 | 2020-10-16 | 北京大学 | Rate-distortion optimized quantization method, coding method and apparatus |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114786010A (en) * | 2022-03-07 | 2022-07-22 | 杭州未名信科科技有限公司 | Rate distortion optimization quantization method and device, storage medium and electronic equipment |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111787324B (en) | Rate-distortion optimized quantization method, coding method and apparatus | |
CN111669582B (en) | Method, encoding end, decoding end and system for encoding and decoding | |
TW200913718A (en) | Operating method and device of coded block flag parameter | |
CN101404772B (en) | VLSI Image Compression Encoder Based on Wavelet Transform | |
CN105491377A (en) | Video decoding macro-block-grade parallel scheduling method for perceiving calculation complexity | |
CN101729893B (en) | MPEG multi-format compatible decoding method based on software and hardware coprocessing and device thereof | |
CN110933445B (en) | A DCT operation method based on coefficient matrix transformation and its transformation device | |
Chiang et al. | A reconfigurable inverse transform architecture design for HEVC decoder | |
CN113489978A (en) | Distortion optimization quantization circuit for AVS3 | |
CN102572430A (en) | Method for implementing H.264 deblocking filter algorithm based on reconfigurable technique | |
CN107105245B (en) | High Speed JPEG Image Compression Method Based on TMS320C6678 Chip | |
CN101771867B (en) | Size-reducing decoding method and system | |
Lian et al. | Parallel content-aware adaptive quantization-oriented lossy frame memory recompression for HEVC | |
CN108184127B (en) | A Configurable Multi-Size DCT Transform Hardware Multiplexing Architecture | |
CN107027039B (en) | Discrete Cosine Transform Implementation Method Based on High Efficiency Video Coding Standard | |
CN101605258B (en) | A Method of Accelerating Video Decoding | |
Shi et al. | A spatio-temporal video denoising co-processor with adaptive codec | |
CN106911935B (en) | Integrated circuit design method based on HEVC entropy encoder | |
CN104602026B (en) | A kind of reconstruction loop structure being multiplexed entirely encoder under HEVC standard | |
CN101272492B (en) | Inverse transformation method for self-adapting two-dimension 4x4 block | |
KR100729241B1 (en) | Deblocking Filter | |
CN103731674B (en) | H.264 two-dimensional parallel post-processing block removing filter hardware achieving method | |
CN109255770B (en) | An Image Transform Domain Downsampling Method | |
CN105007490B (en) | Jpeg compression algorithms based on OmapL138 chips | |
CN114205614B (en) | A Parallel Hardware Method for Intra Prediction Mode Based on HEVC Standard |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20211008 |