CN113489978A

CN113489978A - Distortion optimization quantization circuit for AVS3

Info

Publication number: CN113489978A
Application number: CN202110584035.4A
Authority: CN
Inventors: 向国庆; 徐锦畅; 张鹏; 张广耀; 严韫瑶; 宋磊
Original assignee: Hangzhou Boya Hongtu Video Technology Co ltd
Current assignee: Hangzhou Boya Hongtu Video Technology Co ltd
Priority date: 2021-05-27
Filing date: 2021-05-27
Publication date: 2021-10-08

Abstract

The present disclosure relates to a distortion optimized quantization circuit for AVS3, applied to a small-sized transform block of 16x16 and its following sizes, comprising: the register is used for storing all the transformation coefficients of the transformation block and the corresponding zigzag scanning coordinates; the distortion optimization quantization circuit consists of a pre-quantization unit, a decision optimal coefficient unit, a decision final non-zero coefficient position unit and a zero setting unit. The present disclosure also relates to a distortion optimized quantization circuit for AVS3, applied to a large size transform block of size 16x16 or larger, comprising: an SRAM for storing all transform coefficients of the transform block; the distortion optimization quantization circuit consists of a pre-quantization unit, a decision optimal coefficient unit, a decision final non-zero coefficient position unit and a zero setting unit.

Description

Distortion optimization quantization circuit for AVS3

Technical Field

The present disclosure relates to the field of coding circuit technology, and more particularly, to a distortion optimized quantization circuit for AVS 3.

Background

Rate-distortion optimized quantization (RDOQ) is a very important technique in the AVS3 video coding standard, and combines rate-distortion optimization (RDO) with the ordinary scalar quantization technique in video coding to obtain better quantization result, thereby improving the overall coding performance. However, due to the naturally strong data dependency of RDOQ, it is difficult for the module to meet timing requirements in hardware design. In addition, a larger transform block size is introduced into the new generation of encoding standard AVS3, and how to implement RDOQ operations of all sizes while ensuring a reasonable circuit area is also a big difficulty in hardware design, corresponding to RDOQ operations of sizes from 4x4 to 64x 64.

Disclosure of Invention

The method aims to solve the technical problem that the prior art cannot meet the requirement of meeting the real-time performance of 4k30 frames under the condition of reasonable circuit area.

To achieve the above technical object, the present disclosure provides a distortion optimized quantization circuit for AVS3, applied to a small-sized transform block of 16x16 and the following sizes, comprising:

the register is used for storing all the transformation coefficients of the transformation block and the corresponding zigzag scanning coordinates;

the distortion optimization quantization circuit consists of a pre-quantization unit, a decision optimal coefficient unit, a decision final non-zero coefficient position unit and a zero setting unit.

Further, the pre-quantization unit is specifically configured to perform standard quantization in advance.

Further, the decision optimal coefficient unit is specifically configured to perform an optimal coefficient adjustment decision on the data quantized by the pre-quantization unit.

Further, the decision final non-zero coefficient position unit is specifically configured to perform conditional judgment on each coefficient in the scanning area; and for the coefficient which meets the condition judgment, calculating a distortion value of the coefficient with zero as a rate distortion cost, and for the coefficient which does not meet the condition judgment, calculating the distortion value of the coefficient with zero and a bit required by a mark for coding whether the coefficient is zero as the rate distortion cost.

Further, the performing condition judgment on each coefficient in the scanning area specifically includes:

if the coefficient is at the lower boundary or the right boundary of the scanning area, and the coefficient is the only non-zero coefficient in the row corresponding to the lower boundary or the column corresponding to the right boundary, the coefficient accords with the condition judgment;

if the coefficient is not positioned at the lower boundary or the right boundary of the scanning area, the coefficient is not in accordance with the condition judgment;

and if the coefficient is positioned at the lower boundary or the right boundary of the scanning area, and the coefficient is not the only non-zero coefficient in the row corresponding to the lower boundary or the column corresponding to the right boundary, the coefficient is not consistent with the condition.

To solve the above technical problem, the present disclosure further provides a distortion optimized quantization circuit for AVS3, applied to a large-sized transform block with a size of 16x16 or larger, including:

an SRAM for storing all transform coefficients of the transform block;

The beneficial effect of this disclosure does:

the scheme realizes the design of the RDOQ circuit of the AVS3 with different sizes, and finally effectively realizes the balance between the circuit area and the speed, for the 64x64 size circuit, the pipeline structure is shown in figure 1, the maximum parallelism of the circuit is 64, 15 cycles are needed for reading, pre-quantizing, deciding the optimal coefficient, deciding the position of the last non-zero coefficient, setting zero and writing out data of 64 data in parallel, 79 cycles (64+15) are needed for finishing the TU with the size of 64x64, the final circuit is integrated through Vivado HLS2019.2, and the circuit resource and the cycle number consumed by each size are shown in a table I.

Table one, comprehensive results of each size circuit

Size of	Number of cycles	BRAM	DSP	FF	LUT
						4x4	5	1	1100	27353	101651
8x8	10	4	3689	100003	251796
						16x16	15	118	1406	87393	218651
32x32	40	166	1136	159009	432429
						64x64	79	168	1154	271210	717142

Drawings

FIG. 1 shows a schematic diagram of a pipeline architecture for a 64x64 sized circuit;

fig. 2 shows a schematic structural diagram of embodiment 1 of the present disclosure;

fig. 3 shows a schematic structural diagram of embodiment 2 of the present disclosure;

fig. 4 shows a schematic diagram of a ladder-type data reading structure of embodiment 2 of the present disclosure;

FIG. 5 shows a schematic view of a zigzag scanning sequence of embodiment 2 of the present disclosure;

fig. 6 shows a schematic diagram of embodiment 2 of the present disclosure, which combines the characteristic of DCT transform to concentrate energy in the upper left corner to further reduce the amount of data that needs to be processed by large-size circuits.

Detailed Description

Hereinafter, embodiments of the present disclosure will be described with reference to the accompanying drawings. It should be understood that the description is illustrative only and is not intended to limit the scope of the present disclosure. Moreover, in the following description, descriptions of well-known structures and techniques are omitted so as to not unnecessarily obscure the concepts of the present disclosure.

Various structural schematics according to embodiments of the present disclosure are shown in the figures. The figures are not drawn to scale, wherein certain details are exaggerated and possibly omitted for clarity of presentation. The shapes of various regions, layers, and relative sizes and positional relationships therebetween shown in the drawings are merely exemplary, and deviations may occur in practice due to manufacturing tolerances or technical limitations, and a person skilled in the art may additionally design regions/layers having different shapes, sizes, relative positions, as actually required.

The first embodiment is as follows:

as shown in fig. 2:

the present disclosure provides a distortion optimized quantization circuit for AVS3, applied to a small-sized transform block of 16x16 and the following sizes, comprising:

For a small-size transformation block (TU) with the size of 16x16 and the size below the transformation block, all transformation coefficients and corresponding zigzag scanning coordinates are sent into a register, a small-size RDOQ circuit reads all data in the register in parallel to perform four steps of pre-quantization, optimal coefficient decision, final non-zero coefficient position decision and zero setting, and finally obtained results are output to the register in parallel.

Example two:

as shown in fig. 3:

the present disclosure also provides a distortion optimized quantization circuit for AVS3, applied to a large-size transform block of size 16x16 or larger, comprising:

an SRAM for storing all transform coefficients of the transform block;

For a large-size transformation block TU with the size of more than 16x16, the circuit area generated by a scheme of processing all transformation coefficients in a full parallel mode is unacceptable, for a large-size TU with the size of MxN, all the transformation coefficients are stored in an SRAM with M banks, a certain amount of data are read from the SRAM in each period through a ladder-type data reading structure and are sent to a large-size RDOQ circuit for processing, and finally the obtained result is written into the SRAM through a ladder-type data writing structure. The function of the ladder-type data reading structure is to make the sequence of reading data by the circuit satisfy the zigzag scanning sequence, and at the same time, limit the parallelism of the circuit, as shown in fig. 4, the first 8 banks of MxN size are intercepted, in the figure, the data of the same color is read in the same period, and the number in the square represents the period number of reading the data. It can be seen that the order in which the circuits read the data follows the zigzag scan order shown in fig. 5, and that the maximum parallelism of the circuits is limited to N for one MxN size circuit.

As shown in fig. 6, the characteristic that the DCT transform concentrates energy in the upper left corner is combined to further reduce the amount of data that needs to be processed by the large-size circuit, and the circuit performs RDOQ operation only on the data of the black part in the upper left corner and directly clears the data of the white part in the lower right corner.

The embodiments of the present disclosure have been described above. However, these examples are for illustrative purposes only and are not intended to limit the scope of the present disclosure. The scope of the disclosure is defined by the appended claims and equivalents thereof. Various alternatives and modifications can be devised by those skilled in the art without departing from the scope of the present disclosure, and such alternatives and modifications are intended to be within the scope of the present disclosure.

Claims

1. a distortion optimized quantization circuit for AVS3, applied to the small-size transform block of 16x16 and the following dimensions thereof, is characterized in that, comprising:

a register for storing all transform coefficients of the transform block and their corresponding zigzag scan coordinates;

The distortion-optimized quantization circuit is composed of a pre-quantization unit, a decision-making optimal coefficient unit, a final non-zero coefficient position unit for decision-making, and a zero-setting unit.

2 . The circuit according to claim 1 , wherein the pre-quantization unit is specifically configured to perform standard quantization in advance. 3 .

3 . The circuit according to claim 1 , wherein the optimal coefficient decision unit is specifically configured to perform an optimal coefficient adjustment decision on the data quantized by the pre-quantization unit. 4 .

4. The circuit according to claim 1, wherein the final non-zero coefficient position unit of the decision is specifically used to perform conditional judgment on each coefficient in the scanning area; for the coefficients that meet the conditional judgment, calculate this The distortion value of the coefficient set to zero is taken as the rate-distortion cost, and for the coefficients that do not meet the condition judgment, the distortion value of the zero-set coefficient and the bits required to encode the flag of whether it is zero are taken as the rate-distortion cost.

5. The circuit according to claim 4, wherein the conditional judgment on each coefficient in the scanning area specifically comprises:

If the coefficient corresponds to the lower boundary or the right boundary of the scanning area, and the coefficient is the only non-zero coefficient in the row corresponding to the lower boundary or the column corresponding to the right boundary, then the coefficient meets the condition judgment;

If the coefficient is not at the lower boundary or the right boundary of the scanning area, the coefficient does not meet the condition judgment;

If the coefficient is at the lower or right boundary of the scanning area, and the coefficient is not the only non-zero coefficient in the row corresponding to the lower boundary or the column corresponding to the right boundary, then the coefficient does not meet the condition when break.

6. A distortion-optimized quantization circuit for AVS3, applied to a large-size transform block of a size above 16×16, characterized in that, comprising:

SRAM for storing all transform coefficients of the transform block;

7. The circuit according to claim 6, wherein the pre-quantization unit is specifically configured to perform standard quantization in advance.

8 . The circuit according to claim 6 , wherein the optimal coefficient decision unit is specifically configured to perform an optimal coefficient adjustment decision on the data quantized by the pre-quantization unit. 9 .

9. The circuit according to claim 6, wherein the final non-zero coefficient position unit of the decision is specifically used to perform conditional judgment on each coefficient in the scanning area; for the coefficients that meet the conditional judgment, calculate this The distortion value of the coefficient set to zero is taken as the rate-distortion cost, and for the coefficients that do not meet the above-mentioned condition judgment, the distortion value of the zero-set coefficient and the bits required to encode the flag of whether the coefficient is zero are taken as the rate-distortion cost.

10. The circuit according to claim 9, wherein the conditional judgment on each coefficient in the scanning area specifically comprises:

If the coefficient is at the lower boundary or the right boundary of the scanning area, and the coefficient is not the only non-zero coefficient in the row corresponding to the lower boundary or the column corresponding to the right boundary, then the coefficient does not meet the above conditions. break.