CN100452880C

CN100452880C - Integral discrete cosine transform method in use for encoding video

Info

Publication number: CN100452880C
Application number: CNB2006100121618A
Authority: CN
Inventors: 赵欣; 王宇; 李凤亭
Original assignee: Tsinghua University
Current assignee: Tsinghua University
Priority date: 2006-06-09
Filing date: 2006-06-09
Publication date: 2009-01-14
Anticipated expiration: 2026-06-09
Also published as: CN1874510A

Abstract

The present invention provides an integral discrete cosine transform method for video encoding, which belongs to the technical field of video transmission. The method is that firstly, every element in an integral discrete cosine transform core is disassembled in an equivalent mode, so N matrixes are obtained. The N matrixes are grouped and added, so M auxiliary transform cores are obtained. According to the M auxiliary transform cores, M auxiliary transforms are calculated, and according to the low-to-high order of a suffix, M auxiliary transform results are combined. A transformed result DX<T> matrix of a first processing unit is used as a Y matrix of a second processing unit. The steps are repeated, so an integral discrete cosine transform coefficient is obtained. The method of the present invention has the advantages that by using the cutting operation of integral DCT transform, before the cutting operation is carried out, partial computation redundancy is removed, so the whole bit wide of an adder in PU is decreased, and hardware resources are saved.

Description

A Method of Integer Discrete Cosine Transform for Video Coding

技术领域 technical field

本发明涉及一种用于视频编码的整数离散余弦变换方法，属于视频传输技术领域。The invention relates to an integer discrete cosine transform method for video coding, belonging to the technical field of video transmission.

背景技术 Background technique

在已有技术中，名称为“Development of integer cosine transforms by theprinciple of dyadic symmetry”(Proc.Inst.Elect.Eng.，Partl，vol.136，Aug.1989，pp.276-282.)的论文公开了可以用于视频编码的用于视频编码的整数离散余弦变换方法，其原理是：离散余弦变换的数学式为F＝DXD^T，其中D，X为N×N的矩阵，D^T为D的转置，D称为该变换的变换核。对于整数离散余弦变换而言，D的元素全部为整数，D的大小和数值不唯一，比如应用在H.264视频编解码标准的基准档次当中的变换核为 $[\begin{matrix} 1 & 1 & 1 & 1 \\ 2 & 1 & - 1 & - 2 \\ 1 & - 1 & - 1 & 1 \\ 1 & - 2 & 2 & - 1 \end{matrix}],$ 而在数字音视频编解码标准(以下简称AVS)当中的变换核为 $[\begin{matrix} 8 & 8 & 8 & 8 & 8 & 8 & 8 & 8 \\ 10 & 9 & 6 & 2 & - 2 & - 6 & - 9 & - 10 \\ 10 & 4 & - 4 & - 10 & - 10 & - 4 & 4 & 10 \\ 9 & - 2 & - 10 & - 6 & 6 & 10 & 2 & - 9 \\ 8 & - 8 & - 8 & 8 & 8 & - 8 & - 8 & 8 \\ 6 & - 10 & 2 & 9 & - 9 & - 2 & 10 & - 6 \\ 4 & - 10 & 10 & - 4 & - 4 & 10 & - 10 & 4 \\ 2 & - 6 & 9 & - 10 & 10 & - 9 & 6 & - 2 \end{matrix}] .$ In the prior art, a paper titled "Development of integer cosine transforms by the principle of dyadic symmetry" (Proc.Inst.Elect.Eng., Partl, vol.136, Aug.1989, pp.276-282.) is published An integer discrete cosine transform method for video coding that can be used for video coding is provided. Its principle is: the mathematical formula of discrete cosine transform is F=D×D ^T , wherein D, X is a matrix of N×N, and D ^T is a matrix of D Transpose, D is called the transformation kernel of this transformation. For integer discrete cosine transform, the elements of D are all integers, and the size and value of D are not unique. For example, the transformation kernel used in the benchmark grade of the H.264 video codec standard is $[\begin{matrix} 1 & 1 & 1 & 1 \\ 2 & 1 & - 1 & - 2 \\ 1 & - 1 & - 1 & 1 \\ 1 & - 2 & 2 & - 1 \end{matrix}],$ In the digital audio and video codec standard (hereinafter referred to as AVS), the transform kernel is $[\begin{matrix} 8 & 8 & 8 & 8 & 8 & 8 & 8 & 8 \\ 10 & 9 & 6 & 2 & - 2 & - 6 & - 9 & - 10 \\ 10 & 4 & - 4 & - 10 & - 10 & - 4 & 4 & 10 \\ 9 & - 2 & - 10 & - 6 & 6 & 10 & 2 & - 9 \\ 8 & - 8 & - 8 & 8 & 8 & - 8 & - 8 & 8 \\ 6 & - 10 & 2 & 9 & - 9 & - 2 & 10 & - 6 \\ 4 & - 10 & 10 & - 4 & - 4 & 10 & - 10 & 4 \\ 2 & - 6 & 9 & - 10 & 10 & - 9 & 6 & - 2 \end{matrix}] .$

传统的快速实现上述整数离散余弦变换的方法，是采用一种流图形式的蝶形计算结构来计算1维的整数离散余弦变换(以下简称DCT)，然后通过两个这样结构相同的处理单元(以下简称PU)的串连来实现整个二维整数DCT变换，称之为行列分解的方法，其流程框图如图1所示，用数学式描述为F＝DXD^T＝D(DX^T)^T，一个PU所完成的计算就是DX^T，也就是一维的整数DCT变换，那么整个变换可以通过两个PU串联连接来实现。因此整数离散余弦变换的核心计算就是PU，PU所完成的数学运算包括：矩阵转置和矩阵相乘，对于第一个PU，可以通过将输入矩阵按照转置后的矩阵输入，而对于第二个PU，需要一个专门计算转置的运算单元。The traditional method for quickly realizing the above-mentioned integer discrete cosine transform is to use a butterfly computing structure in the form of a flow graph to calculate the 1-dimensional integer discrete cosine transform (hereinafter referred to as DCT), and then use two processing units with the same structure ( ^hereinafter referred to as PU) in series to realize the whole two-dimensional integer DCT transformation, which is ^called the method of decomposing rows and columns ^. The calculation completed by one PU is DX ^T , that is, the one-dimensional integer DCT transformation, and the entire transformation can be realized by connecting two PUs in series. Therefore, the core calculation of the integer discrete cosine transform is the PU. The mathematical operations performed by the PU include: matrix transposition and matrix multiplication. For the first PU, the input matrix can be input according to the transposed matrix, while for the second A PU requires an arithmetic unit dedicated to calculating the transpose.

上述传统的DCT算法中，以AVS所采用的变换核为例，基于蝶形结构的实现流图如图2所示：In the above-mentioned traditional DCT algorithm, taking the transformation kernel adopted by AVS as an example, the implementation flow diagram based on the butterfly structure is shown in Figure 2:

x0～x7为X^T的一列数据，整个流图实现了X的一列数据与变换核相乘的结果X0~X7，，分别将X^T的8列数据经过上面的流图计算，可以得到一个PU的计算结果DX^T。这种结构利于用硬件实现，因为乘法操作转化为小整数的乘法，小整数的乘法可以用移位操作来实现。x0~x7 is a column of data of X ^T , and the whole flow graph realizes the result X0~X7 of multiplying a column of data of X and the transformation kernel, respectively, the 8 columns of data of X ^T are calculated through the above flow graph, and a PU can be obtained The calculation result of DX ^T . This structure is beneficial to realize by hardware, because the multiplication operation is transformed into the multiplication of small integers, and the multiplication of small integers can be realized by shift operation.

已有传统蝶形结构的缺点是，由于整数变换核元素全部为整数，而标准的离散余弦变换核为正交矩阵，元素的绝对值全部小于1，因此与整数核相乘会有较大的数据增益，为了抵消这一增益需要额外的截断操作，由于数据采用二进制表示，截断操作具体指将数据二进制表示的地若干位直接舍弃，这样的操作等效为对数据进行2的某次幂的除法。截断操作带来的结果是数据二进制表示的低若干位被舍弃，如图1所示，每个PU后面都有一个右移的操作。第一个PU右移a比特，第二个PU右移b比特，以AVS所采用的变换核为例，a和b的数值满足a+b＝9，因为该整数变换核和标准的浮点数变换核相比，增益为2的9次幂，这个数值因为不同的变换核而不同。由于右移操作导致计算结果的低若干位的数据被舍弃，因此PU当中凡是有关这些低位比特的计算都是冗余的，这些冗余的计算导致硬件资源的浪费，但是由于传统蝶形结构所固有的特性，这些冗余无法很好地单独抽取出来，对硬件资源的有效利用不利。The disadvantage of the existing traditional butterfly structure is that since the integer transform kernel elements are all integers, while the standard discrete cosine transform kernel is an orthogonal matrix, the absolute value of the elements is all less than 1, so the multiplication with the integer kernel will have a large Data gain, in order to offset this gain, an additional truncation operation is required. Since the data is represented in binary, the truncation operation specifically refers to directly discarding several bits of the binary representation of the data. Such an operation is equivalent to performing a certain power of 2 on the data. division. The result of the truncation operation is that the lower bits of the data binary representation are discarded. As shown in Figure 1, each PU is followed by a right shift operation. The first PU shifts a bit to the right, and the second PU shifts to the right by b bits. Taking the transformation kernel used by AVS as an example, the values of a and b satisfy a+b=9, because the integer transformation kernel is the same as the standard floating-point number Compared with the transformation kernel, the gain is 2 to the 9th power, and this value is different due to different transformation kernels. Due to the right shift operation, the lower bits of the calculation result are discarded, so all calculations related to these lower bits in the PU are redundant, and these redundant calculations lead to waste of hardware resources. However, due to the traditional butterfly structure Inherent in nature, these redundancies cannot be extracted separately, which is not good for the effective use of hardware resources.

发明内容 Contents of the invention

本发明的目的是提出一种用于视频编码的整数离散余弦变换方法，针对已有技术的缺点，在PU的计算过程当中直接去除部分计算冗余，以便节约硬件资源。The purpose of the present invention is to propose an integer discrete cosine transform method for video coding, to address the shortcomings of the prior art, to directly remove part of the calculation redundancy in the PU calculation process, so as to save hardware resources.

本发明提出的用于视频编码的整数离散余弦变换方法，包括以下步骤：The integer discrete cosine transform method that the present invention proposes for video coding comprises the following steps:

(1)将整数离散余弦变换核中的每个元素进行等效拆分，得到N个矩阵，D₀，D₁，……D_N-1，则原整数离散余弦变换核D的表达式D＝(D₀+D₁+…+D_N-1)；(1) Each element in the integer discrete cosine transform kernel is equivalently split to obtain N matrices, D ₀ , D ₁ , ... D _N-1 , then the expression D of the original integer discrete cosine transform kernel D =(D ₀ +D ₁ +...+D _N-1 );

(2)将上述N个矩阵分组相加得到M个子变换核H₀，H₁，H_M-1，其中每个子变换核为一组n矩阵的和，1＜n＜N；(2) adding the above N matrices into groups to obtain M sub-transformation kernels H ₀ , H ₁ , H _M-1 , wherein each sub-transformation kernel is the sum of a set of n matrices, 1<n<N;

(3)根据上述M个子变换核，计算M个子变换H₀X^T，H₁X^T，H₂X^T…H_M-1X^T，并按照下标由低到高的顺序对M个子变换结果进行合并，得到第一个处理单元的变换结果，即DX^T＝(H₀X^T+H₁X^T+…+H_M-1X^T)^T，其中X为视频帧亮度块矩阵，X^T是X的转置；(3) According to the above M sub-transform kernels, calculate M sub-transformations H ₀ X ^T , H ₁ X ^T , H ₂ X ^T ... H _M-1 X ^T , and perform M sub-transformations in the order of subscripts from low to high The results are combined to obtain the transformation result of the first processing unit, that is, DX ^T =(H ₀ X ^T +H ₁ X ^T +…+H _M-1 X ^T ) ^T , where X is the brightness block matrix of the video frame, and X ^T is the transpose of X;

(4)将上述第一个处理单元的变换结果DX^T矩阵作为第二个处理单元的Y矩阵，重复步骤(1)～(3)，得到DY^T，则整数离散余弦变换系数为：F＝DXD^T＝(H₀+H₁+…H_M-1)X(H₀+H₁+…+H_M-1)^T＝(H₀+H₁+…+H_M-1)(H₀X^T+H₁X^T+…+H_M-1X^T)^T。(4) the transformation result DX ^T matrix of the above-mentioned first processing unit is used as the Y matrix of the second processing unit, repeat steps (1)～(3), obtain DY ^T , then the integer discrete cosine transform coefficient is: F= DXD ^T ＝(H ₀ +H ₁ +…H _M-1 )X(H ₀ +H ₁ +…+H _M-1 ) ^T ＝(H ₀ +H ₁ +…+H _M-1 )(H ₀ X ^T +H ₁ X ^T +…+H _M-1 X ^T ) ^T .

上述方法中，对整数离散余弦变换核中的每个元素进行等效拆分的方法包括以下步骤：In the above method, the method for equivalently splitting each element in the integer discrete cosine transform kernel includes the following steps:

(1)按照变换核中整数元素的二进制表示，拆分为多个2的幂的和；(1) According to the binary representation of the integer element in the transformation kernel, it is split into the sum of multiple powers of 2;

(2)按照从低次幂到高次幂的顺序，将整数离散余弦变换核中各元素的i次幂的拆分项组合成矩阵Di，其中0≤i≤N-1。(2) According to the order from low power to high power, the i-th power split items of each element in the integer discrete cosine transform kernel are combined into a matrix Di, where 0≤i≤N-1.

上述方法中，对M个子变换进行合并的方法为：In the above method, the method of merging the M sub-transformations is:

(1)对M个子变换中的每个子变换，分别提出2的j(0≤j≤N-1)次幂的公因子；(1) For each of the M sub-transformations, a common factor of 2 to the power of j (0≤j≤N-1) is proposed;

(2)对H₀X^T和H₁X^T进行合并，设从H₁X^T中提出的公因子为2^j1，若j1≤a，则只对矩阵元素二进制表示中高于j1位的部分合并，若j1＞a，则只对矩阵元素二进制表示中高于a位的部分合并，合并的结果记为X_Temp，其中a由待进行整数离散余弦变换核决定；(2) Merge H ₀ X ^T and H ₁ X ^T , set the common factor proposed from H ₁ X ^T to be 2 ^j1 , if j1≤a, then only merge the parts higher than j1 in the binary representation of matrix elements , if j1>a, then only the part of the binary representation of the matrix elements higher than a is merged, and the merged result is recorded as X _Temp , where a is determined by the integer discrete cosine transform kernel to be performed;

(3)将上述合并结果X_Temp与H₂X^T按上述步骤(2)的方法进行合并，合并的结果记为X_Temp；(3) Merge the above-mentioned merged result X _Temp and H ₂ X ^T according to the method of the above-mentioned step (2), and the merged result is recorded as X _Temp ;

(4)重复步骤(2)和(3)，依次逐个合并所有子变换，完成M个子变换的合并。(4) Steps (2) and (3) are repeated, and all sub-transformations are merged one by one in order to complete the merger of M sub-transformations.

本发明提出的用于视频编码的整数离散余弦变换方法，其特点和优点是，采用一种和传统快速算法完全不同的方法，从整数变换核入手，将元素为小整数的变换核拆分为若干个子变换核，由于各个子变换核能够提出一个2的幂的公因子，因此可以利用整数DCT变换的截断操作，减小PU当中的加法器的整体位宽，节省硬件资源。与传统的蝶形结构相比，本发明方法能够单独分离出整数离散余弦变换当中的一部分计算冗余，在截断操作进行之前去除掉这些计算冗余，从而降低了PU的加法器的整体位宽，节省了硬件资源，与已有的蝶形结构相比，本发明方法采用可编程门阵列(FPGA)或者可定制集成电路(ASIC)实现，硬件资源可以节省10％以上。The integer discrete cosine transform method for video coding proposed by the present invention has the characteristics and advantages of adopting a method completely different from the traditional fast algorithm, starting from the integer transform kernel, and splitting the transform kernel whose elements are small integers into For several sub-transformation cores, since each sub-transformation core can propose a common factor of a power of 2, the truncation operation of the integer DCT transformation can be used to reduce the overall bit width of the adder in the PU and save hardware resources. Compared with the traditional butterfly structure, the method of the present invention can separately separate a part of the calculation redundancy in the integer discrete cosine transform, and remove these calculation redundancy before the truncation operation, thereby reducing the overall bit width of the adder of the PU , saving hardware resources, compared with the existing butterfly structure, the method of the present invention adopts programmable gate array (FPGA) or customizable integrated circuit (ASIC) to realize, and hardware resources can save more than 10%.

附图说明 Description of drawings

图1是已有的采用行列分解的结构实现整数DCT的流程框图。FIG. 1 is a flowchart of an existing integer DCT using a decomposed structure.

图2是传统的实现AVS所采用的DCT的流图结构图。FIG. 2 is a flow graph structure diagram of a traditional DCT used to realize AVS.

图3是已有的实现转置运算所采用的寄存器堆结构框图。FIG. 3 is a structural block diagram of a conventional register file used to implement a transpose operation.

图4是采用本发明方法实现DCT的部分流图结构图。Fig. 4 is a partial flow graph structure diagram for realizing DCT by adopting the method of the present invention.

图5是图2所示的流图结构图中的一部分。FIG. 5 is a part of the flowchart structure diagram shown in FIG. 2 .

图6是用于实现本发明DCT的硬件结构图。Fig. 6 is a hardware structure diagram for realizing the DCT of the present invention.

图7和图8是本发明方法与已有方法所采用的加法器资源的效果比较图。Fig. 7 and Fig. 8 are comparison diagrams of the effect of the adder resource adopted by the method of the present invention and the existing method.

具体实施方式 Detailed ways

本发明提出的用于视频编码的整数离散余弦变换方法，首先将整数离散余弦变换核中的每个元素进行等效拆分，得到N个矩阵，D₀，D₁，……D_N-1，则原整数离散余弦变换核D的表达式D＝(D₀+D₁+…+D_M-1)；将上述N个矩阵分组相加得到M个子变换核H₀，H₁，H_M-1，其中每个子变换核为一组N矩阵的和；根据上述M个子变换核，计算M个子变换H₀X^T，H₁X^T，H₂X^T…H_M-1X^T，并按照下标由低到高的顺序对M个子变换结果进行合并，得到第一个处理单元的变换结果，即DX^T＝(H₀X^T+H₁X^T+…+H_M-1X^T)^T，其中X为视频帧亮度块矩阵，X^T是X的转置；将上述第一个处理单元的变换结果DX^T矩阵作为第二个处理单元的Y矩阵，重复上述步骤，得到DY^T，则整数离散余弦变换系数为：F＝DXD^T＝(H₀+H₁+…H_M-1)X(H₀+H₁+…+H_M-1)^T＝(H₀+H₁+…+H_M-1)(H₀X^T+H₀X^T+…+H_M-1X^T)^T。The integer discrete cosine transform method for video coding proposed by the present invention first performs equivalent splitting of each element in the integer discrete cosine transform kernel to obtain N matrices, D ₀ , D ₁ , ... D _N-1 , then the expression D=(D ₀ +D ₁ +…+D _M-1 ) of the original integer discrete cosine transform kernel D; the above N matrices are grouped and added to obtain M sub-transform kernels H ₀ , H ₁ , H _{M -1} , where each sub-transformation kernel is the sum of a set of N matrices; according to the above M sub-transformation kernels, calculate M sub-transformations H ₀ X ^T , H ₁ X ^T , H ₂ X ^T ... H _M-1 X ^T , and Merge the M sub-transformation results in order of subscripts from low to high to obtain the transformation result of the first processing unit, that is, DX ^T =(H ₀ X ^T +H ₁ X ^T +...+H _M-1 X ^T ) ^T , where X is the brightness block matrix of the video frame, and X ^T is the transposition of X; the transformation result DX ^T matrix of the first processing unit is used as the Y matrix of the second processing unit, and the above steps are repeated to obtain DY ^T , then the integer discrete cosine transform coefficients are: F＝DXD ^T ＝(H ₀ +H ₁ +…H _M-1 )X(H ₀ +H ₁ +…+H _M-1 ) ^T ＝(H ₀ +H ₁ +…+H _M-1 )(H ₀ X ^T +H ₀ X ^T +…+H _M-1 X ^T ) ^T .

上述方法中，对整数离散余弦变换核中的每个元素进行等效拆分的方法可以包括以下步骤：(1)按照变换核中整数元素的二进制表示，拆分为多个2的幂的和；(2)按照从低次幂到高次幂的顺序，将整数离散余弦变换核中各元素的i次幂的拆分项组合成矩阵Di，其中0≤i≤N-1。In the above method, the method for equivalently splitting each element in the integer discrete cosine transform kernel may include the following steps: (1) according to the binary representation of the integer element in the transform kernel, split into a sum of multiple powers of 2 ; (2) According to the order from the low power to the high power, the i-th power split items of each element in the integer discrete cosine transform kernel are combined into a matrix Di, where 0≤i≤N-1.

上述方法中，对M个子变换进行合并的方法可以为：(1)对M个子变换中的每个子变换，分别提出2的j(0≤j≤N-1)次幂的公因子；(2)对H₀X^T和H₁X^T进行合并，设从H₁X^T中提出的公因子为2^j1，若j1≤a，则只对矩阵元素二进制表示中高于j1位的部分合并，若j1＞a，则只对矩阵元素二进制表示中高于a位的部分合并，合并的结果记为X_Temp，其中a由待进行整数离散余弦变换核决定；(3)将上述合并结果X_Temp与H₂X^T按上述步骤(2)的方法进行合并，合并的结果记为X_Temp；(4)重复步骤(2)和(3)，依次逐个合并所有子变换，完成M个子变换的合并。In the above method, the method for merging the M sub-transformations may be: (1) for each sub-transformation in the M sub-transformations, respectively propose a common factor of 2 to the power of j (0≤j≤N-1); (2 ) merge H ₀ X ^T and H ₁ X ^T , set the common factor proposed from H ₁ X ^T to be 2 ^j1 , if j1≤a, then only merge the parts higher than j1 in the binary representation of matrix elements, if j1>a, then only the part of the binary representation of the matrix elements that is higher than a is combined, and the combined result is recorded as X _Temp , where a is determined by the integer discrete cosine transform kernel to be performed; (3) the above combined result X _Temp and H ₂ X ^T is merged according to the method of the above step (2), and the merged result is recorded as X _Temp ; (4) Steps (2) and (3) are repeated, and all sub-transformations are merged one by one in order to complete the merging of M sub-transformations.

本发明属于图像编码和图像处理领域，特别应用于整数离散余弦变换的快速实现。The invention belongs to the field of image coding and image processing, and is especially applied to the rapid realization of integer discrete cosine transform.

本发明中DCT变换以8点一维DCT变换为基本单元，采用行列分解的结构实现二维DCT变换。行列分解的数学表示为F＝DXD^T＝D(DX^T)^T，即以DX^T为一个PU，一个PU完成的功能是，对将输入的矩阵转置，并左乘矩阵D，通过两个PU的串行即可通过一维DCT变换实现二维DCT变换，如图1所示。虽然两个PU的数学运算的内容相同，但是它们的转置操作有所不同，第一个PU的转置操作可以通过将输入按照转置后的矩阵输入，而第二个PU必须有专门的转置运算单元，本发明所采用的转置运算单元是通过已有技术中如图3所示的寄存器堆来实现的。In the present invention, the DCT transformation takes the 8-point one-dimensional DCT transformation as the basic unit, and adopts the decomposed structure of rows and columns to realize the two-dimensional DCT transformation. The mathematical expression of row and column decomposition is F=DXD ^T ＝D(DX ^T ) ^T , that is, DX ^T is a PU, and the function completed by a PU is to transpose the input matrix and multiply the matrix D to the left, through two The serialization of PUs can realize two-dimensional DCT transformation through one-dimensional DCT transformation, as shown in Figure 1. Although the content of the mathematical operations of the two PUs is the same, their transpose operations are different. The transpose operation of the first PU can be input according to the transposed matrix, while the second PU must have a dedicated The transposition operation unit, the transposition operation unit adopted in the present invention is realized through the register file shown in FIG. 3 in the prior art.

本发明所针对的是整数DCT变换，整数DCT变换的变换核是由整数构成，便于使用加法器和移位器来硬件实现。The present invention is aimed at the integer DCT transformation, and the transformation kernel of the integer DCT transformation is composed of integers, which is convenient for hardware realization by using an adder and a shifter.

对于任何一个整数变换核，特别是小整数变换核，本发明将其中的每个元素进行等效的拆分，拆分的方法是按照该整数元素的二进制表示，拆分为若干2的某次幂的加法和，比如元素6可以拆分为2¹+2²。按照从低次幂到高次幂的顺序，将变换核的元素的i次幂的拆分项组合成矩阵Di，则Di仅含有2ⁱ和0元素。按照上面的操作得到N个矩阵D₀，D₁，……D_N-1，将这N个矩阵再次分组相加得到M个新的子变换核H₀，H₁，H_M-1，其中每个变换核的元素为组合成该组的矩阵的对应位置的元素的和。For any integer transformation kernel, especially a small integer transformation kernel, the present invention splits each element thereof equivalently. Additive sums of powers, such as element 6, can be split into 2 ¹ +2 ² . According to the order from the low power to the high power, the i-th power split items of the transformation kernel elements are combined into a matrix Di, then Di only contains 2 ⁱ and 0 elements. According to the above operation, N matrices D ₀ , D ₁ , ... D _N-1 are obtained, and these N matrices are grouped and added again to obtain M new sub-transformation kernels H ₀ , H ₁ , H _M-1 , where The elements of each transform kernel are the sum of the correspondingly positioned elements of the matrices that make up the set.

得到的这M个子变换核H₀，H₁，H_M-1，在本发明称为子变换核。则整个变换的结果为这M个子变换的结果之和，即The obtained M sub-transformation kernels H ₀ , H ₁ , and H _M-1 are called sub-transformation kernels in the present invention. Then the result of the whole transformation is the sum of the results of these M sub-transformations, namely

F＝DXD^T＝(H₀+H₁+…+H_M-1)X(H₀+H₁+…+H_M-1)^T F＝DXD ^T ＝(H ₀ +H ₁ +…+H _M-1 )X(H ₀ +H ₁ +…+H _M-1 ) ^T

＝(H₀+H₁+…+H_M-1)(H₀X^T+H₁X^T+…+H_M-1X^T)^T ＝(H ₀ +H ₁ +…+H _M-1 )(H ₀ X ^T +H ₁ X ^T +…+H _M-1 X ^T ) ^T

其中对于H₀X^T，H₁X^T…H_M-1X^T，本发明定义为子变换。Wherein for H ₀ X ^T , H ₁ X ^T ... H _M-1 X ^T , the present invention defines sub-transformation.

对于每个子变换可以提出一个2的j(0≤j≤N-1)次幂的公因子，提出这个公因子，可以降低变换核元素的绝对值大小，使得子变换核的元素都在一个小的范围内，这样可以大大简化子变换的计算，而且可以利用截断操作对加法器的输入位宽作一定程度的减小。减小输入位宽的方法如下描述：For each sub-transformation, a common factor of 2 to the power of j (0≤j≤N-1) can be proposed. This common factor can reduce the absolute value of the transformation kernel elements, so that the elements of the sub-transformation kernels are all in a small In this way, the calculation of the sub-transform can be greatly simplified, and the input bit width of the adder can be reduced to a certain extent by using the truncation operation. The method of reducing the input bit width is described as follows:

假设截断操作为对结果二进制表示的最低a位进行截断。Suppose the truncation operation is to truncate the lowest a bit of the resulting binary representation.

采用行列分解的结构实现本发明所提出的整数DCT算法，在各个子变换的结果计算出来之后需要对结果进行合并，首先合并H₀X^T和H₁X^T，合并的结果再继续与H₂X^T合并，重复这个过程，直到全部的子变换被合并完毕。The integer DCT algorithm proposed by the present invention is realized by adopting the decomposed structure of ranks ^and ^columns . After the results of each sub-transformation _are _calculated , the results need to be combined _. X ^T merge, and repeat this process until all sub-transforms have been merged.

由于H₁X^T可以提出公因子2^j1，那么在对两个子变换结果的对应元素相加的时候，元素的二进制表示的低j1位的相加不会出现溢出，如果j1≤a，那么可以直接将H₀X^T的结果的低j1位舍弃不进行相加操作，如果j1＞a，那么可以直接将H₀X^T的结果的低a位舍弃不进行相加，合并的结果记为X_Temp。Since H ₁ X ^T can propose a common factor 2 ^j1 , when adding the corresponding elements of the two sub-transformation results, the addition of the low j1 bits of the binary representation of the elements will not overflow. If j1≤a, then it can be Directly discard the lower j1 bits of the result of H ₀ X ^T without performing the addition operation. If j1>a, then directly discard the lower a bits of the result of H ₀ X ^T without performing the addition operation, and record the combined result as X _temp .

如上描述对前两个子变换进行合并，合并的结果记为X_Temp，。之后再将X_Temp与子变换H₂X^T进行合并，子变换H₂X^T可以提出公因子2^j2，按照和上面同样的过程，得到合并结果，仍然记为X_Temp。Merge the first two sub-transforms as described above, and record the result of the merge as X _Temp . Then merge X _Temp with the sub-transformation H ₂ X ^T , the sub-transformation H ₂ X ^T can come up with the common factor 2 ^j2 , follow the same process as above to get the combined result, which is still recorded as X _Temp .

之后再将X_Temp与剩余的子变换依次合并，直到全部的子变换被合并完毕得到最终的结果为H₀X^T+H₁X^T+…+H_M-1X^T，即DX^T，记为Y矩阵。Then merge X _Temp with the remaining sub-transforms in turn until all sub-transforms are merged to get the final result H ₀ X ^T +H ₁ X ^T +…+H _M-1 X ^T , that is, DX ^T , record is the Y matrix.

采用行列分解的结构进行正变换，那么需要将上面第一个PU的计算结果DX^T作为Y矩阵再次输入到结构相同的第二个PU上，计算DY^T，经过第二个PU的计算，才能完成一个完整的整数DCT变换。If the structure of row and column decomposition is used for forward transformation, then the calculation result DX ^T of the first PU above needs to be re-inputted as a Y matrix to the second PU with the same structure to calculate DY ^T . After the calculation of the second PU, the Complete a complete integer DCT transform.

图4所示，为本发明所提出的方法的流图描述的部分实现结构，图5所示为传统蝶形运算相同部分的实现形式，将两者作比较可以看出，本发明所提出的算法数据流动的方向上位宽的增加比传统蝶形算法要小。图4和图5仅给出了一部分变换系数的流图形式作为比较，是因为本发明所提出的算法和传统蝶形算法结构上有较大差异，不便于用流图的形式描述，否则会比较杂乱，不利于说明算法的差异。As shown in Fig. 4, it is the partial realization structure described by the flow diagram of the method proposed by the present invention, and Fig. 5 shows the realization form of the same part of the traditional butterfly operation, and it can be seen that the two are compared, and the present invention proposes The increase of the bit width in the direction of the algorithm data flow is smaller than that of the traditional butterfly algorithm. Fig. 4 and Fig. 5 only provide the flow graph form of a part of transformation coefficients as a comparison, because the algorithm proposed by the present invention is quite different from the structure of the traditional butterfly algorithm, and it is not convenient to describe in the form of a flow graph, otherwise it will be It is messy and not conducive to explaining the difference in algorithms.

以下介绍本发明方法的一个实施例：An embodiment of the inventive method is introduced below:

以AVS视频编码标准所采用的整数变换核为例来说明本发明所提出的基于变换核位平面分解的快速整数DCT实现算法。Taking the integer transform kernel used in the AVS video coding standard as an example to illustrate the fast integer DCT implementation algorithm based on transform kernel bit-plane decomposition proposed by the present invention.

$[\begin{matrix} 88 & 88 & 88 & 88 & 88 & 88 & 88 & 88 \\ 1010 & 99 & 66 & 22 & - - 22 & - - 66 & - - 99 & - - 1010 \\ 1010 & 44 & - - 44 & - - 1010 & - - 1010 & - - 44 & 44 & 1010 \\ 99 & - - 22 & - - 1010 & - - 66 & 66 & 1010 & 22 & - - 99 \\ 88 & - - 88 & - - 88 & 88 & 88 & - - 88 & - - 88 & 88 \\ 66 & - - 1010 & 22 & 99 & - - 99 & - - 22 & 1010 & - - 66 \\ 44 & - - 1010 & 1010 & - - 44 & - - 44 & 1010 & - - 1010 & 44 \\ 22 & - - 66 & 99 & - - 1010 & 1010 & - - 99 & 66 & - - 22 \end{matrix}] . .$

该整数变换核的元素全部为绝对值小于16的整数，二进制表示的第4位以上全部为零，因此可以将该变换核拆分为四个矩阵(其中元素6和-6较为特殊，它们各自可以有两种不同的拆分方法：6＝4+2＝8-2，-6＝-4-2＝-8+2，经过实践的比较发现6＝8-2，-6＝-8+2这种拆分形式较为有利)The elements of the integer transformation kernel are all integers whose absolute value is less than 16, and the fourth and above digits of the binary representation are all zeros, so the transformation kernel can be split into four matrices (the elements 6 and -6 are special, and their respective There can be two different split methods: 6=4+2=8-2, -6=-4-2=-8+2, after practical comparison, it is found that 6=8-2, -6=-8+ 2 This form of splitting is more favorable)

$[\begin{matrix} 00 & 00 & 00 & 00 & 00 & 00 & 00 & 00 \\ 00 & 11 & 00 & 00 & 00 & 00 & - - 11 & 00 \\ 00 & 00 & 00 & 00 & 00 & 00 & 00 & 00 \\ 11 & 00 & 00 & 00 & 00 & 00 & 00 & - - 11 \\ 00 & 00 & 00 & 00 & 00 & 00 & 00 & 00 \\ 00 & 00 & 00 & 11 & - - 11 & 00 & 00 & 00 \\ 00 & 00 & 00 & 00 & 00 & 00 & 00 & 00 \\ 00 & 00 & 11 & 00 & 00 & - - 11 & 00 & 00 \end{matrix}],,$ $[\begin{matrix} 00 & 00 & 00 & 00 & 00 & 00 & 00 & 00 \\ 22 & 00 & - - 22 & 22 & - - 22 & 22 & 00 & - - 22 \\ 22 & 00 & 00 & - - 22 & - - 22 & 00 & 00 & 22 \\ 00 & - - 22 & - - 22 & 22 & - - 22 & 22 & 22 & 00 \\ 00 & 00 & 00 & 00 & 00 & 00 & 00 & 00 \\ - - 22 & - - 22 & 22 & 00 & 00 & - - 22 & 22 & 22 \\ 00 & - - 22 & 22 & 00 & 00 & 22 & - - 22 & 00 \\ 22 & 22 & 00 & - - 22 & 22 & 00 & - - 22 & - - 22 \end{matrix}]$

$[\begin{matrix} 00 & 00 & 00 & 00 & 00 & 00 & 00 & 00 \\ 00 & 00 & 00 & 00 & 00 & 00 & 00 & 00 \\ 00 & 44 & - - 44 & 00 & 00 & - - 44 & 44 & 00 \\ 00 & 00 & 00 & 00 & 00 & 00 & 00 & 00 \\ 00 & 00 & 00 & 00 & 00 & 00 & 00 & 00 \\ 00 & 00 & 00 & 00 & 00 & 00 & 00 & 00 \\ 44 & 00 & 00 & - - 44 & - - 44 & 00 & 00 & 44 \\ 00 & 00 & 00 & 00 & 00 & 00 & 00 & 00 \end{matrix}],,$ $[\begin{matrix} 88 & 88 & 88 & 88 & 88 & 88 & 88 & 88 \\ 88 & 88 & 88 & 00 & 00 & - - 88 & - - 88 & - - 88 \\ 88 & 00 & 00 & - - 88 & - - 88 & 00 & 00 & 88 \\ 88 & 00 & - - 88 & - - 88 & 88 & 88 & 00 & - - 88 \\ 88 & - - 88 & - - 88 & 88 & 88 & - - 88 & - - 88 & 88 \\ 88 & - - 88 & 00 & 88 & - - 88 & 00 & 88 & - - 88 \\ 00 & - - 88 & 88 & 00 & 00 & 88 & - - 88 & 00 \\ 00 & - - 88 & 88 & - - 88 & 88 & - - 88 & 88 & 00 \end{matrix}]$

其中，上面两个矩阵可以合并为Among them, the above two matrices can be combined as

${H h}_{00} = = [\begin{matrix} 00 & 00 & 00 & 00 & 00 & 00 & 00 & 00 \\ 22 & 11 & - - 22 & 22 & - - 22 & 22 & - - 11 & - - 22 \\ 22 & 00 & 00 & - - 22 & - - 22 & 00 & 00 & 22 \\ 11 & - - 22 & - - 22 & 22 & - - 22 & 22 & 22 & - - 11 \\ 00 & 00 & 00 & 00 & 00 & 00 & 00 & 00 \\ - - 22 & - - 22 & 22 & 11 & - - 11 & - - 22 & 22 & 22 \\ 00 & - - 22 & 22 & 00 & 00 & 22 & - - 22 & 00 \\ 22 & 22 & 11 & - - 22 & 22 & - - 11 & - - 22 & - - 22 \end{matrix}],,$

下两个矩阵可以合并为The next two matrices can be combined as

${H h}_{11} = = [\begin{matrix} 88 & 88 & 88 & 88 & 88 & 88 & 88 & 88 \\ 88 & 88 & 88 & 00 & 00 & - - 88 & - - 88 & - - 88 \\ 88 & 44 & - - 44 & - - 88 & - - 88 & - - 44 & 44 & 88 \\ 88 & 00 & - - 88 & - - 88 & 88 & 88 & 00 & - - 88 \\ 88 & - - 88 & - - 88 & 88 & 88 & - - 88 & - - 88 & 88 \\ 88 & - - 88 & 00 & 88 & - - 88 & 00 & 88 & - - 88 \\ 44 & - - 88 & 88 & - - 44 & - - 44 & 88 & - - 88 & 44 \\ 00 & - - 88 & 88 & - - 88 & 88 & - - 88 & 88 & 00 \end{matrix}]$

这样一个PU被分解为DX^T＝H₀X^T+H₁X^T，其中H₁亦可以写作 $H_{1} = 4 \times [\begin{matrix} 2 & 2 & 2 & 2 & 2 & 2 & 2 & 2 \\ 2 & 2 & 2 & 0 & 0 & - 2 & - 2 & - 2 \\ 2 & 1 & - 1 & - 2 & - 2 & - 1 & 1 & 2 \\ 2 & 0 & - 2 & - 2 & 2 & 2 & 0 & - 2 \\ 2 & - 2 & - 2 & 2 & 2 & - 2 & - 2 & 2 \\ 2 & - 2 & 0 & 2 & - 2 & 0 & 2 & - 2 \\ 1 & - 2 & 2 & - 1 & - 1 & 2 & - 2 & 1 \\ 0 & - 2 & 2 & - 2 & 2 & - 2 & 2 & 0 \end{matrix}],$ 这样一个PU的两个子变换核就仅含-2，-1，1，2四个整数，由于系数之间的相近，运算的重复性就会加大，加法器的复用度可以充分地利用。Such a PU is decomposed into DX ^T ＝H ₀ X ^T +H ₁ X ^T , where H ₁ can also be written as $h_{1} = 4 \times [\begin{matrix} 2 & 2 & 2 & 2 & 2 & 2 & 2 & 2 \\ 2 & 2 & 2 & 0 & 0 & - 2 & - 2 & - 2 \\ 2 & 1 & - 1 & - 2 & - 2 & - 1 & 1 & 2 \\ 2 & 0 & - 2 & - 2 & 2 & 2 & 0 & - 2 \\ 2 & - 2 & - 2 & 2 & 2 & - 2 & - 2 & 2 \\ 2 & - 2 & 0 & 2 & - 2 & 0 & 2 & - 2 \\ 1 & - 2 & 2 & - 1 & - 1 & 2 & - 2 & 1 \\ 0 & - 2 & 2 & - 2 & 2 & - 2 & 2 & 0 \end{matrix}],$ The two sub-transform cores of such a PU only contain four integers -2, -1, 1, and 2. Due to the similarity between the coefficients, the repeatability of the operation will increase, and the multiplexing degree of the adder can be fully utilized. .

根据整数DCT变换核的左右对称性，可以将两个子变换H₀X^T和H₁X^T的变换核简化为According to the left-right symmetry of the integer DCT transformation kernel, the transformation kernels of the two sub-transformations H ₀ X ^T and H ₁ X ^T can be simplified as

${H h}_{00} = = [\begin{matrix} 00 & 00 & 00 & 00 \\ 22 & 11 & - - 22 & 22 \\ 22 & 00 & 00 & - - 22 \\ 11 & - - 22 & - - 22 & 22 \\ 00 & 00 & 00 & 00 \\ - - 22 & - - 22 & 22 & 11 \\ 00 & - - 22 & 22 & 00 \\ 22 & 22 & 11 & - - 22 \end{matrix}],,$ ${H h}_{11} = = 44 \times \times [\begin{matrix} 22 & 22 & 22 & 22 \\ 22 & 22 & 22 & 00 \\ 22 & 11 & - - 11 & - - 22 \\ 22 & 00 & - - 22 & - - 22 \\ 22 & - - 22 & - - 22 & 22 \\ 22 & - - 22 & 00 & 22 \\ 11 & - - 22 & 22 & - - 11 \\ 00 & - - 22 & 22 & - - 22 \end{matrix}]$

那么两个子变换的偶数行输入相同，奇数行输入相同，因此两个子变换的偶数行之间和奇数行之间也可以实现加法器复用，以H₀X^T和H₁X^T的奇数行为例，H₀X^T可以分解为奇数行变换H_0eX_p ^T和偶数行变换H_0oX_m ^T，其中H_0eX_p ^T的变换核对应着H₀的奇数行元素，H_0oX_m ^T的变换核对应着H₀的偶数行元素，两个变换的结果对应为H₀X^T的奇数行结果和偶数行结果，对于H₁X^T也有同样的结论。如上面的描述，H_0eX_p ^T和H_1eX_p ^T由于输入相同，因此可以加法器复用。Then the input of the even-numbered rows of the two sub-transformations is the same, and the input of the odd-numbered rows is the same, so the adder multiplexing can also be realized between the even-numbered rows and between the odd-numbered rows of the two sub-transformations, with the odd-numbered behavior of H ₀ X ^T and H ₁ X ^T For example, H ₀ X ^T can be decomposed into odd row transformation H _0e X _p ^T and even row transformation H _0o X _m ^T , where the transformation kernel of H _0e X _p ^T corresponds to the odd row elements of H ₀ , H _0o X _m ^T The transformation kernel of corresponds to the even-numbered row elements of H ₀ , and the results of the two transformations correspond to the odd-numbered and even-numbered row results of H ₀ X ^T , and the same conclusion is drawn for H ₁ X ^T. As described above, since H _0e X _p ^T and H _1e X _p ^T have the same input, they can be multiplexed by the adder.

${H h}_{00 e e} = = [\begin{matrix} 00 & 00 & 00 & 00 \\ 22 & 00 & 00 & - - 22 \\ 00 & 00 & 00 & 00 \\ 00 & - - 22 & 22 & 00 \end{matrix}],,$ ${H h}_{11 e e} = = 44 \times \times [\begin{matrix} 22 & 22 & 22 & 22 \\ 22 & 11 & - - 11 & - - 22 \\ 22 & - - 22 & - - 22 & 22 \\ 11 & - - 22 & 22 & - - 11 \end{matrix}],,$ ${X x}_{p p}^{T T} = = [\begin{matrix} {X x}_{i i 00} + + {X x}_{i i 77} \\ {X x}_{i i 11} + + {X x}_{i i 66} \\ {X x}_{i i 22} + + {X x}_{i i 55} \\ {X x}_{i i 33} + + {X x}_{i i 44} \end{matrix}],,$

观察H_0e和H_1e可以发现，这两个变换核的计算冗余度很大，可以先计算X_p ^T[0]±X_p ^T[3]和X_p ^T[1]±X_p ^T[2]，通过这四个计算结果的组合加减就可以得到H_0eX_p ^T和H_1eX_p ^T计算结果。Observing H _0e and H _1e , it can be found that the calculation redundancy of these two transformation kernels is very large, and X _p ^T [0]±X _p ^T [3] and X _p ^T [1]±X _p ^T [ 2], the calculation results of H _0e X _p ^T and H _1e X _p ^T can be obtained by adding and subtracting the combination of these four calculation results.

同样对于same for

${H h}_{00 o o} = = [\begin{matrix} 22 & 11 & - - 22 & 22 \\ 11 & - - 22 & - - 22 & 22 \\ - - 22 & - - 22 & 22 & 11 \\ 22 & 22 & 11 & - - 22 \end{matrix}],,$ ${H h}_{11 o o} = = 44 \times \times [\begin{matrix} 22 & 22 & 22 & 00 \\ 22 & 00 & - - 22 & - - 22 \\ 22 & - - 22 & 00 & 22 \\ 00 & - - 22 & 22 & - - 22 \end{matrix}] = = 88 \times \times [\begin{matrix} 11 & 11 & 11 & 00 \\ 11 & 00 & - - 11 & - - 11 \\ 11 & - - 11 & 00 & 11 \\ 00 & - - 11 & 11 & - - 11 \end{matrix}],,$

$X_{m}^{T} = [\begin{matrix} X_{i 0} - X_{i 7} \\ X_{i 1} - X_{i 6} \\ X_{i 2} - X_{i 5} \\ X_{i 3} - X_{i 4} \end{matrix}],$ 同样可以利用这两个变换核的计算冗余度来压缩加法器的使用数量，而且由于H_1e可以提出公因子4，H_1o可以提出公因子8，这对于利用视频编码当中的截断处理来减少加法器输入位宽非常有用，因为在视频编码的DCT变换，如果是基于行列分解，第一个PU之后，会对变换结果的最低3bit进行截断，而由于H_1e可以提出公因子4，H_1o可以提出公因子8，因此在进行子变换计算结果合并的时候，可以把3bit截断一起考虑，H_0eX_p ^T+H_1eX_p ^T的低2位加法不会出现溢出，因此可以直接忽略这两位，减小加法器的输入位宽，H_0oX_m ^T+H_1oX_m ^T的低3位加法不会出现溢出，因此可以直接忽略这三位，减小加法器的输入位宽。另外注意到H_0e的第一行和第三行元素全部为零，因此在子变换合并的时候对于这两行的合并，不需要额外的加法器。 $x_{m}^{T} = [\begin{matrix} x_{i 0} - x_{i 7} \\ x_{i 1} - x_{i 6} \\ x_{i 2} - x_{i 5} \\ x_{i 3} - x_{i 4} \end{matrix}],$ The calculation redundancy of these two transformation cores can also be used to compress the number of adders used, and since H _1e can propose a common factor of 4, H _1o can propose a common factor of 8, which is useful for reducing The input bit width of the adder is very useful, because in the DCT transformation of video coding, if it is based on row-column decomposition, after the first PU, the lowest 3 bits of the transformation result will be truncated, and since H _1e can propose a common factor of 4, H _1o A common factor of 8 can be proposed, so when combining sub-transform calculation results, 3-bit truncation can be considered together, and the addition of the lower 2 bits of H _0e X _p ^T +H _1e X _p ^T will not overflow, so this can be ignored directly Two bits, reduce the input bit width of the adder, the addition of the lower 3 bits of H _0o X _m ^T +H _1o X _m ^T will not overflow, so these three bits can be directly ignored to reduce the input bit width of the adder. Also notice that the elements of the first row and the third row of H _0e are all zero, so no additional adder is needed for the combination of these two rows when the sub-transforms are merged.

经过上面描述的两个PU的串联，可以完成本发明的方法，实现快速无损的整数DCT变换。Through the series connection of the two PUs described above, the method of the present invention can be completed to realize fast and lossless integer DCT transformation.

采用本发明中提到的基于变换核位平面分解的算法，来实现符合AVS标准的整数DCT变换，可以统计得到本发明所提出的算法消耗的主要硬件资源：Adopt the algorithm based on transform nuclear bit plane decomposition that mentions in the present invention, realize the integer DCT transform that meets AVS standard, can obtain the main hardware resources that the proposed algorithm of the present invention consumes statistically:

名称 name 数目 number 8bit加法器 8bit adder 8 8 9bit加法器 9bit adder 8 8 10bit加法器 10bit adder 10 10 11bit加法器 11bit adder 10 10 12bit加法器 12bit adder 2 2

以及传统的蝶形运算所消耗的主要硬件资源：And the main hardware resources consumed by traditional butterfly operations:

名称 name 数目 number 8bit加法器 8bit adder 8 8 9bit加法器 9bit adder 8 8 10bit加法器 10bit adder 2 2 11bit加法器 11bit adder 6 6 12bit加法器 12bit adder 4 4 13bit加法器 13bit adder 6 6

14bit加法器 14bit adder 15bit加法器 15bit adder 4 4

以下通过图7和图8说明本发明方法与已有技术相比，所具有的优点和效果：图7是蝶形算法，图8是本发明的方法，横轴代表加法器的输入位宽，纵轴代表使用到的数目，符合AVS标准的整数DCT变换的第一个PU有3比特的截断，而对于第二个PU有7比特的截断。可以比较看出，两种算法所采用的加法器的数目是相同的(38个)，但是基于位平面分解的算法所采用的加法器集中在8，9，10，11输入，而蝶形运算从8到15不等，从而基于位平面分解的算法输入位宽整体偏低，因此如果DCT模块采用FPGA或者ASIC实现时，基于位平面分解的算法会更加节省资源。采用FPGA实现，以Xilinx VirtexIV为例，基于位平面分解的算法会节省10％以上的资源。Below by Fig. 7 and Fig. 8 illustrate the present invention method compared with prior art, advantage and effect that have: Fig. 7 is a butterfly algorithm, Fig. 8 is the method of the present invention, and horizontal axis represents the input bit width of adder, The vertical axis represents the number used, the first PU of the integer DCT transform conforming to the AVS standard has a 3-bit truncation, and there is a 7-bit truncation for the second PU. It can be seen by comparison that the number of adders used by the two algorithms is the same (38), but the adders used in the algorithm based on bit-plane decomposition are concentrated in 8, 9, 10, 11 inputs, while the butterfly operation It varies from 8 to 15, so the input bit width of the algorithm based on bit-plane decomposition is generally low. Therefore, if the DCT module is implemented by FPGA or ASIC, the algorithm based on bit-plane decomposition will save more resources. Using FPGA implementation, taking Xilinx VirtexIV as an example, the algorithm based on bit-plane decomposition will save more than 10% of resources.

对于不同的整数变换核，都可以采用位平面分解的算法，但是对于不同的整数变换核，分解得到的位平面矩阵和具有最大加法器复用度的位平面分组可能会不同，但都是尽量提取大系数，降低子变换核的元素的范围，利用截断减小加法器位宽，以及调整位平面系数，使加法器的复用度尽量大，从这两个方面着手来减小对资源的开销。For different integer transformation kernels, bit-plane decomposition algorithms can be used, but for different integer transformation kernels, the decomposed bit-plane matrix and the bit-plane grouping with the maximum adder multiplexing degree may be different, but they are all as far as possible Extract large coefficients, reduce the range of elements of the sub-transform kernel, use truncation to reduce the bit width of the adder, and adjust the bit plane coefficients to make the multiplexing of the adder as large as possible, and reduce the resource consumption from these two aspects overhead.

Claims

1. An integer discrete cosine transform method for video coding, characterized in that the method comprises the following steps:

(1) Each element in the integer discrete cosine transform kernel is equivalently split to obtain N matrices, D ₀ , D ₁ , ... D _N-1 , then the original integer discrete cosine transform kernel D The expression D=(D ₀ +D ₁ +...+D _N-1 ), wherein the method for equivalently splitting each element in the integer discrete cosine transform kernel includes the following steps:

(1-1) According to the binary representation of the integer elements in the transformation kernel, it is split into the sum of multiple powers of 2;

(1-2) According to the order from the low power to the high power, the splitting items of the i power of each element in the integer discrete cosine transform kernel are combined into a matrix Di, where 0≤i≤N-1;

(2) adding the above N matrices into groups to obtain M sub-transformation kernels H ₀ , H ₁ , H _M-1 ;

(3) According to the above M sub-transform kernels, calculate M sub-transforms H ₀ X ^T , H ₁ X ^T , H ₂ X ^T ... H _M-1 X ^T , and perform M sub-transformations in order of subscripts from low to high The sub-transformation results are combined to obtain the transformation result of the first processing unit, that is, DX ^T =(H ₀ X ^T +H ₁ X ^T +...+H _M-1 X ^T ) ^T , where X is the brightness of the video frame Block matrix, X ^T is the transposition of X, wherein the method for merging M sub-transformation results includes the following steps:

(3-1) For each of the M sub-transformations, a common factor of 2 to the power of j is proposed, where 0≤j≤N-1;

(3-2) Merge H ₀ X ^T and H ₁ X ^T , set the common factor proposed from H ₁ X ^T to be 2 ^j1 , if j1≤a, then only the bit higher than j1 in the binary representation of matrix elements Partial merging, if j1>a, then only the part of the binary representation of the matrix elements higher than a is merged, and the result of the merging is recorded as X _Temp , where a is determined by the integer discrete cosine transform kernel to be performed;

(3-3) Merge the above-mentioned merged result X _Temp and H ₂ X ^T according to the method of the above-mentioned step (3-2), and record the merged result as X _Temp ;

(3-4) Repeating steps (3-2) and (3-3), merging all sub-transformations one by one in turn, completing the merging of M sub-transformations;

(4) make above-mentioned conversion result D X ^T matrix=Y, with Y matrix as the X matrix in the step (3), repeat steps (1)～(3), obtain DY ^T , then integer discrete cosine transform coefficient is: F= DXD ^T ＝(H ₀ +H ₁ +...H _M-1 )X(H ₀ +H ₁ +...+H _M-1 ) ^T ＝(H ₀ +H ₁ +...+H _{M -1} )(H ₀ X ^T +H ₁ X ^T +...+H _M-1 X ^T ) ^T .