CN100452880C - Integral discrete cosine transform method in use for encoding video - Google Patents
Integral discrete cosine transform method in use for encoding video Download PDFInfo
- Publication number
- CN100452880C CN100452880C CNB2006100121618A CN200610012161A CN100452880C CN 100452880 C CN100452880 C CN 100452880C CN B2006100121618 A CNB2006100121618 A CN B2006100121618A CN 200610012161 A CN200610012161 A CN 200610012161A CN 100452880 C CN100452880 C CN 100452880C
- Authority
- CN
- China
- Prior art keywords
- sub
- discrete cosine
- cosine transform
- transformation
- matrix
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
Images
Landscapes
- Compression Or Coding Systems Of Tv Signals (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
- Complex Calculations (AREA)
Abstract
Description
技术领域 technical field
本发明涉及一种用于视频编码的整数离散余弦变换方法,属于视频传输技术领域。The invention relates to an integer discrete cosine transform method for video coding, belonging to the technical field of video transmission.
背景技术 Background technique
在已有技术中,名称为“Development of integer cosine transforms by theprinciple of dyadic symmetry”(Proc.Inst.Elect.Eng.,Partl,vol.136,Aug.1989,pp.276-282.)的论文公开了可以用于视频编码的用于视频编码的整数离散余弦变换方法,其原理是:离散余弦变换的数学式为F=DXDT,其中D,X为N×N的矩阵,DT为D的转置,D称为该变换的变换核。对于整数离散余弦变换而言,D的元素全部为整数,D的大小和数值不唯一,比如应用在H.264视频编解码标准的基准档次当中的变换核为
传统的快速实现上述整数离散余弦变换的方法,是采用一种流图形式的蝶形计算结构来计算1维的整数离散余弦变换(以下简称DCT),然后通过两个这样结构相同的处理单元(以下简称PU)的串连来实现整个二维整数DCT变换,称之为行列分解的方法,其流程框图如图1所示,用数学式描述为F=DXDT=D(DXT)T,一个PU所完成的计算就是DXT,也就是一维的整数DCT变换,那么整个变换可以通过两个PU串联连接来实现。因此整数离散余弦变换的核心计算就是PU,PU所完成的数学运算包括:矩阵转置和矩阵相乘,对于第一个PU,可以通过将输入矩阵按照转置后的矩阵输入,而对于第二个PU,需要一个专门计算转置的运算单元。The traditional method for quickly realizing the above-mentioned integer discrete cosine transform is to use a butterfly computing structure in the form of a flow graph to calculate the 1-dimensional integer discrete cosine transform (hereinafter referred to as DCT), and then use two processing units with the same structure ( hereinafter referred to as PU) in series to realize the whole two-dimensional integer DCT transformation, which is called the method of decomposing rows and columns . The calculation completed by one PU is DX T , that is, the one-dimensional integer DCT transformation, and the entire transformation can be realized by connecting two PUs in series. Therefore, the core calculation of the integer discrete cosine transform is the PU. The mathematical operations performed by the PU include: matrix transposition and matrix multiplication. For the first PU, the input matrix can be input according to the transposed matrix, while for the second A PU requires an arithmetic unit dedicated to calculating the transpose.
上述传统的DCT算法中,以AVS所采用的变换核为例,基于蝶形结构的实现流图如图2所示:In the above-mentioned traditional DCT algorithm, taking the transformation kernel adopted by AVS as an example, the implementation flow diagram based on the butterfly structure is shown in Figure 2:
x0~x7为XT的一列数据,整个流图实现了X的一列数据与变换核相乘的结果X0~X7,,分别将XT的8列数据经过上面的流图计算,可以得到一个PU的计算结果DXT。这种结构利于用硬件实现,因为乘法操作转化为小整数的乘法,小整数的乘法可以用移位操作来实现。x0~x7 is a column of data of X T , and the whole flow graph realizes the result X0~X7 of multiplying a column of data of X and the transformation kernel, respectively, the 8 columns of data of X T are calculated through the above flow graph, and a PU can be obtained The calculation result of DX T . This structure is beneficial to realize by hardware, because the multiplication operation is transformed into the multiplication of small integers, and the multiplication of small integers can be realized by shift operation.
已有传统蝶形结构的缺点是,由于整数变换核元素全部为整数,而标准的离散余弦变换核为正交矩阵,元素的绝对值全部小于1,因此与整数核相乘会有较大的数据增益,为了抵消这一增益需要额外的截断操作,由于数据采用二进制表示,截断操作具体指将数据二进制表示的地若干位直接舍弃,这样的操作等效为对数据进行2的某次幂的除法。截断操作带来的结果是数据二进制表示的低若干位被舍弃,如图1所示,每个PU后面都有一个右移的操作。第一个PU右移a比特,第二个PU右移b比特,以AVS所采用的变换核为例,a和b的数值满足a+b=9,因为该整数变换核和标准的浮点数变换核相比,增益为2的9次幂,这个数值因为不同的变换核而不同。由于右移操作导致计算结果的低若干位的数据被舍弃,因此PU当中凡是有关这些低位比特的计算都是冗余的,这些冗余的计算导致硬件资源的浪费,但是由于传统蝶形结构所固有的特性,这些冗余无法很好地单独抽取出来,对硬件资源的有效利用不利。The disadvantage of the existing traditional butterfly structure is that since the integer transform kernel elements are all integers, while the standard discrete cosine transform kernel is an orthogonal matrix, the absolute value of the elements is all less than 1, so the multiplication with the integer kernel will have a large Data gain, in order to offset this gain, an additional truncation operation is required. Since the data is represented in binary, the truncation operation specifically refers to directly discarding several bits of the binary representation of the data. Such an operation is equivalent to performing a certain power of 2 on the data. division. The result of the truncation operation is that the lower bits of the data binary representation are discarded. As shown in Figure 1, each PU is followed by a right shift operation. The first PU shifts a bit to the right, and the second PU shifts to the right by b bits. Taking the transformation kernel used by AVS as an example, the values of a and b satisfy a+b=9, because the integer transformation kernel is the same as the standard floating-point number Compared with the transformation kernel, the gain is 2 to the 9th power, and this value is different due to different transformation kernels. Due to the right shift operation, the lower bits of the calculation result are discarded, so all calculations related to these lower bits in the PU are redundant, and these redundant calculations lead to waste of hardware resources. However, due to the traditional butterfly structure Inherent in nature, these redundancies cannot be extracted separately, which is not good for the effective use of hardware resources.
发明内容 Contents of the invention
本发明的目的是提出一种用于视频编码的整数离散余弦变换方法,针对已有技术的缺点,在PU的计算过程当中直接去除部分计算冗余,以便节约硬件资源。The purpose of the present invention is to propose an integer discrete cosine transform method for video coding, to address the shortcomings of the prior art, to directly remove part of the calculation redundancy in the PU calculation process, so as to save hardware resources.
本发明提出的用于视频编码的整数离散余弦变换方法,包括以下步骤:The integer discrete cosine transform method that the present invention proposes for video coding comprises the following steps:
(1)将整数离散余弦变换核中的每个元素进行等效拆分,得到N个矩阵,D0,D1,……DN-1,则原整数离散余弦变换核D的表达式D=(D0+D1+…+DN-1);(1) Each element in the integer discrete cosine transform kernel is equivalently split to obtain N matrices, D 0 , D 1 , ... D N-1 , then the expression D of the original integer discrete cosine transform kernel D =(D 0 +D 1 +...+D N-1 );
(2)将上述N个矩阵分组相加得到M个子变换核H0,H1,HM-1,其中每个子变换核为一组n矩阵的和,1<n<N;(2) adding the above N matrices into groups to obtain M sub-transformation kernels H 0 , H 1 , H M-1 , wherein each sub-transformation kernel is the sum of a set of n matrices, 1<n<N;
(3)根据上述M个子变换核,计算M个子变换H0XT,H1XT,H2XT…HM-1XT,并按照下标由低到高的顺序对M个子变换结果进行合并,得到第一个处理单元的变换结果,即DXT=(H0XT+H1XT+…+HM-1XT)T,其中X为视频帧亮度块矩阵,XT是X的转置;(3) According to the above M sub-transform kernels, calculate M sub-transformations H 0 X T , H 1 X T , H 2 X T ... H M-1 X T , and perform M sub-transformations in the order of subscripts from low to high The results are combined to obtain the transformation result of the first processing unit, that is, DX T =(H 0 X T +H 1 X T +…+H M-1 X T ) T , where X is the brightness block matrix of the video frame, and X T is the transpose of X;
(4)将上述第一个处理单元的变换结果DXT矩阵作为第二个处理单元的Y矩阵,重复步骤(1)~(3),得到DYT,则整数离散余弦变换系数为:F=DXDT=(H0+H1+…HM-1)X(H0+H1+…+HM-1)T=(H0+H1+…+HM-1)(H0XT+H1XT+…+HM-1XT)T。(4) the transformation result DX T matrix of the above-mentioned first processing unit is used as the Y matrix of the second processing unit, repeat steps (1)~(3), obtain DY T , then the integer discrete cosine transform coefficient is: F= DXD T =(H 0 +H 1 +…H M-1 )X(H 0 +H 1 +…+H M-1 ) T =(H 0 +H 1 +…+H M-1 )(H 0 X T +H 1 X T +…+H M-1 X T ) T .
上述方法中,对整数离散余弦变换核中的每个元素进行等效拆分的方法包括以下步骤:In the above method, the method for equivalently splitting each element in the integer discrete cosine transform kernel includes the following steps:
(1)按照变换核中整数元素的二进制表示,拆分为多个2的幂的和;(1) According to the binary representation of the integer element in the transformation kernel, it is split into the sum of multiple powers of 2;
(2)按照从低次幂到高次幂的顺序,将整数离散余弦变换核中各元素的i次幂的拆分项组合成矩阵Di,其中0≤i≤N-1。(2) According to the order from low power to high power, the i-th power split items of each element in the integer discrete cosine transform kernel are combined into a matrix Di, where 0≤i≤N-1.
上述方法中,对M个子变换进行合并的方法为:In the above method, the method of merging the M sub-transformations is:
(1)对M个子变换中的每个子变换,分别提出2的j(0≤j≤N-1)次幂的公因子;(1) For each of the M sub-transformations, a common factor of 2 to the power of j (0≤j≤N-1) is proposed;
(2)对H0XT和H1XT进行合并,设从H1XT中提出的公因子为2j1,若j1≤a,则只对矩阵元素二进制表示中高于j1位的部分合并,若j1>a,则只对矩阵元素二进制表示中高于a位的部分合并,合并的结果记为XTemp,其中a由待进行整数离散余弦变换核决定;(2) Merge H 0 X T and H 1 X T , set the common factor proposed from H 1 X T to be 2 j1 , if j1≤a, then only merge the parts higher than j1 in the binary representation of matrix elements , if j1>a, then only the part of the binary representation of the matrix elements higher than a is merged, and the merged result is recorded as X Temp , where a is determined by the integer discrete cosine transform kernel to be performed;
(3)将上述合并结果XTemp与H2XT按上述步骤(2)的方法进行合并,合并的结果记为XTemp;(3) Merge the above-mentioned merged result X Temp and H 2 X T according to the method of the above-mentioned step (2), and the merged result is recorded as X Temp ;
(4)重复步骤(2)和(3),依次逐个合并所有子变换,完成M个子变换的合并。(4) Steps (2) and (3) are repeated, and all sub-transformations are merged one by one in order to complete the merger of M sub-transformations.
本发明提出的用于视频编码的整数离散余弦变换方法,其特点和优点是,采用一种和传统快速算法完全不同的方法,从整数变换核入手,将元素为小整数的变换核拆分为若干个子变换核,由于各个子变换核能够提出一个2的幂的公因子,因此可以利用整数DCT变换的截断操作,减小PU当中的加法器的整体位宽,节省硬件资源。与传统的蝶形结构相比,本发明方法能够单独分离出整数离散余弦变换当中的一部分计算冗余,在截断操作进行之前去除掉这些计算冗余,从而降低了PU的加法器的整体位宽,节省了硬件资源,与已有的蝶形结构相比,本发明方法采用可编程门阵列(FPGA)或者可定制集成电路(ASIC)实现,硬件资源可以节省10%以上。The integer discrete cosine transform method for video coding proposed by the present invention has the characteristics and advantages of adopting a method completely different from the traditional fast algorithm, starting from the integer transform kernel, and splitting the transform kernel whose elements are small integers into For several sub-transformation cores, since each sub-transformation core can propose a common factor of a power of 2, the truncation operation of the integer DCT transformation can be used to reduce the overall bit width of the adder in the PU and save hardware resources. Compared with the traditional butterfly structure, the method of the present invention can separately separate a part of the calculation redundancy in the integer discrete cosine transform, and remove these calculation redundancy before the truncation operation, thereby reducing the overall bit width of the adder of the PU , saving hardware resources, compared with the existing butterfly structure, the method of the present invention adopts programmable gate array (FPGA) or customizable integrated circuit (ASIC) to realize, and hardware resources can save more than 10%.
附图说明 Description of drawings
图1是已有的采用行列分解的结构实现整数DCT的流程框图。FIG. 1 is a flowchart of an existing integer DCT using a decomposed structure.
图2是传统的实现AVS所采用的DCT的流图结构图。FIG. 2 is a flow graph structure diagram of a traditional DCT used to realize AVS.
图3是已有的实现转置运算所采用的寄存器堆结构框图。FIG. 3 is a structural block diagram of a conventional register file used to implement a transpose operation.
图4是采用本发明方法实现DCT的部分流图结构图。Fig. 4 is a partial flow graph structure diagram for realizing DCT by adopting the method of the present invention.
图5是图2所示的流图结构图中的一部分。FIG. 5 is a part of the flowchart structure diagram shown in FIG. 2 .
图6是用于实现本发明DCT的硬件结构图。Fig. 6 is a hardware structure diagram for realizing the DCT of the present invention.
图7和图8是本发明方法与已有方法所采用的加法器资源的效果比较图。Fig. 7 and Fig. 8 are comparison diagrams of the effect of the adder resource adopted by the method of the present invention and the existing method.
具体实施方式 Detailed ways
本发明提出的用于视频编码的整数离散余弦变换方法,首先将整数离散余弦变换核中的每个元素进行等效拆分,得到N个矩阵,D0,D1,……DN-1,则原整数离散余弦变换核D的表达式D=(D0+D1+…+DM-1);将上述N个矩阵分组相加得到M个子变换核H0,H1,HM-1,其中每个子变换核为一组N矩阵的和;根据上述M个子变换核,计算M个子变换H0XT,H1XT,H2XT…HM-1XT,并按照下标由低到高的顺序对M个子变换结果进行合并,得到第一个处理单元的变换结果,即DXT=(H0XT+H1XT+…+HM-1XT)T,其中X为视频帧亮度块矩阵,XT是X的转置;将上述第一个处理单元的变换结果DXT矩阵作为第二个处理单元的Y矩阵,重复上述步骤,得到DYT,则整数离散余弦变换系数为:F=DXDT=(H0+H1+…HM-1)X(H0+H1+…+HM-1)T=(H0+H1+…+HM-1)(H0XT+H0XT+…+HM-1XT)T。The integer discrete cosine transform method for video coding proposed by the present invention first performs equivalent splitting of each element in the integer discrete cosine transform kernel to obtain N matrices, D 0 , D 1 , ... D N-1 , then the expression D=(D 0 +D 1 +…+D M-1 ) of the original integer discrete cosine transform kernel D; the above N matrices are grouped and added to obtain M sub-transform kernels H 0 , H 1 , H M -1 , where each sub-transformation kernel is the sum of a set of N matrices; according to the above M sub-transformation kernels, calculate M sub-transformations H 0 X T , H 1 X T , H 2 X T ... H M-1 X T , and Merge the M sub-transformation results in order of subscripts from low to high to obtain the transformation result of the first processing unit, that is, DX T =(H 0 X T +H 1 X T +...+H M-1 X T ) T , where X is the brightness block matrix of the video frame, and X T is the transposition of X; the transformation result DX T matrix of the first processing unit is used as the Y matrix of the second processing unit, and the above steps are repeated to obtain DY T , then the integer discrete cosine transform coefficients are: F=DXD T =(H 0 +H 1 +…H M-1 )X(H 0 +H 1 +…+H M-1 ) T =(H 0 +H 1 +…+H M-1 )(H 0 X T +H 0 X T +…+H M-1 X T ) T .
上述方法中,对整数离散余弦变换核中的每个元素进行等效拆分的方法可以包括以下步骤:(1)按照变换核中整数元素的二进制表示,拆分为多个2的幂的和;(2)按照从低次幂到高次幂的顺序,将整数离散余弦变换核中各元素的i次幂的拆分项组合成矩阵Di,其中0≤i≤N-1。In the above method, the method for equivalently splitting each element in the integer discrete cosine transform kernel may include the following steps: (1) according to the binary representation of the integer element in the transform kernel, split into a sum of multiple powers of 2 ; (2) According to the order from the low power to the high power, the i-th power split items of each element in the integer discrete cosine transform kernel are combined into a matrix Di, where 0≤i≤N-1.
上述方法中,对M个子变换进行合并的方法可以为:(1)对M个子变换中的每个子变换,分别提出2的j(0≤j≤N-1)次幂的公因子;(2)对H0XT和H1XT进行合并,设从H1XT中提出的公因子为2j1,若j1≤a,则只对矩阵元素二进制表示中高于j1位的部分合并,若j1>a,则只对矩阵元素二进制表示中高于a位的部分合并,合并的结果记为XTemp,其中a由待进行整数离散余弦变换核决定;(3)将上述合并结果XTemp与H2XT按上述步骤(2)的方法进行合并,合并的结果记为XTemp;(4)重复步骤(2)和(3),依次逐个合并所有子变换,完成M个子变换的合并。In the above method, the method for merging the M sub-transformations may be: (1) for each sub-transformation in the M sub-transformations, respectively propose a common factor of 2 to the power of j (0≤j≤N-1); (2 ) merge H 0 X T and H 1 X T , set the common factor proposed from H 1 X T to be 2 j1 , if j1≤a, then only merge the parts higher than j1 in the binary representation of matrix elements, if j1>a, then only the part of the binary representation of the matrix elements that is higher than a is combined, and the combined result is recorded as X Temp , where a is determined by the integer discrete cosine transform kernel to be performed; (3) the above combined result X Temp and H 2 X T is merged according to the method of the above step (2), and the merged result is recorded as X Temp ; (4) Steps (2) and (3) are repeated, and all sub-transformations are merged one by one in order to complete the merging of M sub-transformations.
本发明属于图像编码和图像处理领域,特别应用于整数离散余弦变换的快速实现。The invention belongs to the field of image coding and image processing, and is especially applied to the rapid realization of integer discrete cosine transform.
本发明中DCT变换以8点一维DCT变换为基本单元,采用行列分解的结构实现二维DCT变换。行列分解的数学表示为F=DXDT=D(DXT)T,即以DXT为一个PU,一个PU完成的功能是,对将输入的矩阵转置,并左乘矩阵D,通过两个PU的串行即可通过一维DCT变换实现二维DCT变换,如图1所示。虽然两个PU的数学运算的内容相同,但是它们的转置操作有所不同,第一个PU的转置操作可以通过将输入按照转置后的矩阵输入,而第二个PU必须有专门的转置运算单元,本发明所采用的转置运算单元是通过已有技术中如图3所示的寄存器堆来实现的。In the present invention, the DCT transformation takes the 8-point one-dimensional DCT transformation as the basic unit, and adopts the decomposed structure of rows and columns to realize the two-dimensional DCT transformation. The mathematical expression of row and column decomposition is F=DXD T =D(DX T ) T , that is, DX T is a PU, and the function completed by a PU is to transpose the input matrix and multiply the matrix D to the left, through two The serialization of PUs can realize two-dimensional DCT transformation through one-dimensional DCT transformation, as shown in Figure 1. Although the content of the mathematical operations of the two PUs is the same, their transpose operations are different. The transpose operation of the first PU can be input according to the transposed matrix, while the second PU must have a dedicated The transposition operation unit, the transposition operation unit adopted in the present invention is realized through the register file shown in FIG. 3 in the prior art.
本发明所针对的是整数DCT变换,整数DCT变换的变换核是由整数构成,便于使用加法器和移位器来硬件实现。The present invention is aimed at the integer DCT transformation, and the transformation kernel of the integer DCT transformation is composed of integers, which is convenient for hardware realization by using an adder and a shifter.
对于任何一个整数变换核,特别是小整数变换核,本发明将其中的每个元素进行等效的拆分,拆分的方法是按照该整数元素的二进制表示,拆分为若干2的某次幂的加法和,比如元素6可以拆分为21+22。按照从低次幂到高次幂的顺序,将变换核的元素的i次幂的拆分项组合成矩阵Di,则Di仅含有2i和0元素。按照上面的操作得到N个矩阵D0,D1,……DN-1,将这N个矩阵再次分组相加得到M个新的子变换核H0,H1,HM-1,其中每个变换核的元素为组合成该组的矩阵的对应位置的元素的和。For any integer transformation kernel, especially a small integer transformation kernel, the present invention splits each element thereof equivalently. Additive sums of powers, such as element 6, can be split into 2 1 +2 2 . According to the order from the low power to the high power, the i-th power split items of the transformation kernel elements are combined into a matrix Di, then Di only contains 2 i and 0 elements. According to the above operation, N matrices D 0 , D 1 , ... D N-1 are obtained, and these N matrices are grouped and added again to obtain M new sub-transformation kernels H 0 , H 1 , H M-1 , where The elements of each transform kernel are the sum of the correspondingly positioned elements of the matrices that make up the set.
得到的这M个子变换核H0,H1,HM-1,在本发明称为子变换核。则整个变换的结果为这M个子变换的结果之和,即The obtained M sub-transformation kernels H 0 , H 1 , and H M-1 are called sub-transformation kernels in the present invention. Then the result of the whole transformation is the sum of the results of these M sub-transformations, namely
F=DXDT=(H0+H1+…+HM-1)X(H0+H1+…+HM-1)T F=DXD T =(H 0 +H 1 +…+H M-1 )X(H 0 +H 1 +…+H M-1 ) T
=(H0+H1+…+HM-1)(H0XT+H1XT+…+HM-1XT)T =(H 0 +H 1 +…+H M-1 )(H 0 X T +H 1 X T +…+H M-1 X T ) T
其中对于H0XT,H1XT…HM-1XT,本发明定义为子变换。Wherein for H 0 X T , H 1 X T ... H M-1 X T , the present invention defines sub-transformation.
对于每个子变换可以提出一个2的j(0≤j≤N-1)次幂的公因子,提出这个公因子,可以降低变换核元素的绝对值大小,使得子变换核的元素都在一个小的范围内,这样可以大大简化子变换的计算,而且可以利用截断操作对加法器的输入位宽作一定程度的减小。减小输入位宽的方法如下描述:For each sub-transformation, a common factor of 2 to the power of j (0≤j≤N-1) can be proposed. This common factor can reduce the absolute value of the transformation kernel elements, so that the elements of the sub-transformation kernels are all in a small In this way, the calculation of the sub-transform can be greatly simplified, and the input bit width of the adder can be reduced to a certain extent by using the truncation operation. The method of reducing the input bit width is described as follows:
假设截断操作为对结果二进制表示的最低a位进行截断。Suppose the truncation operation is to truncate the lowest a bit of the resulting binary representation.
采用行列分解的结构实现本发明所提出的整数DCT算法,在各个子变换的结果计算出来之后需要对结果进行合并,首先合并H0XT和H1XT,合并的结果再继续与H2XT合并,重复这个过程,直到全部的子变换被合并完毕。The integer DCT algorithm proposed by the present invention is realized by adopting the decomposed structure of ranks and columns . After the results of each sub-transformation are calculated , the results need to be combined . X T merge, and repeat this process until all sub-transforms have been merged.
由于H1XT可以提出公因子2j1,那么在对两个子变换结果的对应元素相加的时候,元素的二进制表示的低j1位的相加不会出现溢出,如果j1≤a,那么可以直接将H0XT的结果的低j1位舍弃不进行相加操作,如果j1>a,那么可以直接将H0XT的结果的低a位舍弃不进行相加,合并的结果记为XTemp。Since H 1 X T can propose a
如上描述对前两个子变换进行合并,合并的结果记为XTemp,。之后再将XTemp与子变换H2XT进行合并,子变换H2XT可以提出公因子2j2,按照和上面同样的过程,得到合并结果,仍然记为XTemp。Merge the first two sub-transforms as described above, and record the result of the merge as X Temp . Then merge X Temp with the sub-transformation H 2 X T , the sub-transformation H 2 X T can come up with the
之后再将XTemp与剩余的子变换依次合并,直到全部的子变换被合并完毕得到最终的结果为H0XT+H1XT+…+HM-1XT,即DXT,记为Y矩阵。Then merge X Temp with the remaining sub-transforms in turn until all sub-transforms are merged to get the final result H 0 X T +H 1 X T +…+H M-1 X T , that is, DX T , record is the Y matrix.
采用行列分解的结构进行正变换,那么需要将上面第一个PU的计算结果DXT作为Y矩阵再次输入到结构相同的第二个PU上,计算DYT,经过第二个PU的计算,才能完成一个完整的整数DCT变换。If the structure of row and column decomposition is used for forward transformation, then the calculation result DX T of the first PU above needs to be re-inputted as a Y matrix to the second PU with the same structure to calculate DY T . After the calculation of the second PU, the Complete a complete integer DCT transform.
图4所示,为本发明所提出的方法的流图描述的部分实现结构,图5所示为传统蝶形运算相同部分的实现形式,将两者作比较可以看出,本发明所提出的算法数据流动的方向上位宽的增加比传统蝶形算法要小。图4和图5仅给出了一部分变换系数的流图形式作为比较,是因为本发明所提出的算法和传统蝶形算法结构上有较大差异,不便于用流图的形式描述,否则会比较杂乱,不利于说明算法的差异。As shown in Fig. 4, it is the partial realization structure described by the flow diagram of the method proposed by the present invention, and Fig. 5 shows the realization form of the same part of the traditional butterfly operation, and it can be seen that the two are compared, and the present invention proposes The increase of the bit width in the direction of the algorithm data flow is smaller than that of the traditional butterfly algorithm. Fig. 4 and Fig. 5 only provide the flow graph form of a part of transformation coefficients as a comparison, because the algorithm proposed by the present invention is quite different from the structure of the traditional butterfly algorithm, and it is not convenient to describe in the form of a flow graph, otherwise it will be It is messy and not conducive to explaining the difference in algorithms.
以下介绍本发明方法的一个实施例:An embodiment of the inventive method is introduced below:
以AVS视频编码标准所采用的整数变换核为例来说明本发明所提出的基于变换核位平面分解的快速整数DCT实现算法。Taking the integer transform kernel used in the AVS video coding standard as an example to illustrate the fast integer DCT implementation algorithm based on transform kernel bit-plane decomposition proposed by the present invention.
该整数变换核的元素全部为绝对值小于16的整数,二进制表示的第4位以上全部为零,因此可以将该变换核拆分为四个矩阵(其中元素6和-6较为特殊,它们各自可以有两种不同的拆分方法:6=4+2=8-2,-6=-4-2=-8+2,经过实践的比较发现6=8-2,-6=-8+2这种拆分形式较为有利)The elements of the integer transformation kernel are all integers whose absolute value is less than 16, and the fourth and above digits of the binary representation are all zeros, so the transformation kernel can be split into four matrices (the elements 6 and -6 are special, and their respective There can be two different split methods: 6=4+2=8-2, -6=-4-2=-8+2, after practical comparison, it is found that 6=8-2, -6=-8+ 2 This form of splitting is more favorable)
其中,上面两个矩阵可以合并为Among them, the above two matrices can be combined as
下两个矩阵可以合并为The next two matrices can be combined as
这样一个PU被分解为DXT=H0XT+H1XT,其中H1亦可以写作
根据整数DCT变换核的左右对称性,可以将两个子变换H0XT和H1XT的变换核简化为According to the left-right symmetry of the integer DCT transformation kernel, the transformation kernels of the two sub-transformations H 0 X T and H 1 X T can be simplified as
那么两个子变换的偶数行输入相同,奇数行输入相同,因此两个子变换的偶数行之间和奇数行之间也可以实现加法器复用,以H0XT和H1XT的奇数行为例,H0XT可以分解为奇数行变换H0eXp T和偶数行变换H0oXm T,其中H0eXp T的变换核对应着H0的奇数行元素,H0oXm T的变换核对应着H0的偶数行元素,两个变换的结果对应为H0XT的奇数行结果和偶数行结果,对于H1XT也有同样的结论。如上面的描述,H0eXp T和H1eXp T由于输入相同,因此可以加法器复用。Then the input of the even-numbered rows of the two sub-transformations is the same, and the input of the odd-numbered rows is the same, so the adder multiplexing can also be realized between the even-numbered rows and between the odd-numbered rows of the two sub-transformations, with the odd-numbered behavior of H 0 X T and H 1 X T For example, H 0 X T can be decomposed into odd row transformation H 0e X p T and even row transformation H 0o X m T , where the transformation kernel of H 0e X p T corresponds to the odd row elements of H 0 , H 0o X m T The transformation kernel of corresponds to the even-numbered row elements of H 0 , and the results of the two transformations correspond to the odd-numbered and even-numbered row results of H 0 X T , and the same conclusion is drawn for H 1 X T. As described above, since H 0e X p T and H 1e X p T have the same input, they can be multiplexed by the adder.
观察H0e和H1e可以发现,这两个变换核的计算冗余度很大,可以先计算Xp T[0]±Xp T[3]和Xp T[1]±Xp T[2],通过这四个计算结果的组合加减就可以得到H0eXp T和H1eXp T计算结果。Observing H 0e and H 1e , it can be found that the calculation redundancy of these two transformation kernels is very large, and X p T [0]±X p T [3] and X p T [1]±X p T [ 2], the calculation results of H 0e X p T and H 1e X p T can be obtained by adding and subtracting the combination of these four calculation results.
同样对于same for
经过上面描述的两个PU的串联,可以完成本发明的方法,实现快速无损的整数DCT变换。Through the series connection of the two PUs described above, the method of the present invention can be completed to realize fast and lossless integer DCT transformation.
采用本发明中提到的基于变换核位平面分解的算法,来实现符合AVS标准的整数DCT变换,可以统计得到本发明所提出的算法消耗的主要硬件资源:Adopt the algorithm based on transform nuclear bit plane decomposition that mentions in the present invention, realize the integer DCT transform that meets AVS standard, can obtain the main hardware resources that the proposed algorithm of the present invention consumes statistically:
以及传统的蝶形运算所消耗的主要硬件资源:And the main hardware resources consumed by traditional butterfly operations:
以下通过图7和图8说明本发明方法与已有技术相比,所具有的优点和效果:图7是蝶形算法,图8是本发明的方法,横轴代表加法器的输入位宽,纵轴代表使用到的数目,符合AVS标准的整数DCT变换的第一个PU有3比特的截断,而对于第二个PU有7比特的截断。可以比较看出,两种算法所采用的加法器的数目是相同的(38个),但是基于位平面分解的算法所采用的加法器集中在8,9,10,11输入,而蝶形运算从8到15不等,从而基于位平面分解的算法输入位宽整体偏低,因此如果DCT模块采用FPGA或者ASIC实现时,基于位平面分解的算法会更加节省资源。采用FPGA实现,以Xilinx VirtexIV为例,基于位平面分解的算法会节省10%以上的资源。Below by Fig. 7 and Fig. 8 illustrate the present invention method compared with prior art, advantage and effect that have: Fig. 7 is a butterfly algorithm, Fig. 8 is the method of the present invention, and horizontal axis represents the input bit width of adder, The vertical axis represents the number used, the first PU of the integer DCT transform conforming to the AVS standard has a 3-bit truncation, and there is a 7-bit truncation for the second PU. It can be seen by comparison that the number of adders used by the two algorithms is the same (38), but the adders used in the algorithm based on bit-plane decomposition are concentrated in 8, 9, 10, 11 inputs, while the butterfly operation It varies from 8 to 15, so the input bit width of the algorithm based on bit-plane decomposition is generally low. Therefore, if the DCT module is implemented by FPGA or ASIC, the algorithm based on bit-plane decomposition will save more resources. Using FPGA implementation, taking Xilinx VirtexIV as an example, the algorithm based on bit-plane decomposition will save more than 10% of resources.
对于不同的整数变换核,都可以采用位平面分解的算法,但是对于不同的整数变换核,分解得到的位平面矩阵和具有最大加法器复用度的位平面分组可能会不同,但都是尽量提取大系数,降低子变换核的元素的范围,利用截断减小加法器位宽,以及调整位平面系数,使加法器的复用度尽量大,从这两个方面着手来减小对资源的开销。For different integer transformation kernels, bit-plane decomposition algorithms can be used, but for different integer transformation kernels, the decomposed bit-plane matrix and the bit-plane grouping with the maximum adder multiplexing degree may be different, but they are all as far as possible Extract large coefficients, reduce the range of elements of the sub-transform kernel, use truncation to reduce the bit width of the adder, and adjust the bit plane coefficients to make the multiplexing of the adder as large as possible, and reduce the resource consumption from these two aspects overhead.
Claims (1)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CNB2006100121618A CN100452880C (en) | 2006-06-09 | 2006-06-09 | Integral discrete cosine transform method in use for encoding video |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CNB2006100121618A CN100452880C (en) | 2006-06-09 | 2006-06-09 | Integral discrete cosine transform method in use for encoding video |
Publications (2)
Publication Number | Publication Date |
---|---|
CN1874510A CN1874510A (en) | 2006-12-06 |
CN100452880C true CN100452880C (en) | 2009-01-14 |
Family
ID=37484730
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CNB2006100121618A Expired - Fee Related CN100452880C (en) | 2006-06-09 | 2006-06-09 | Integral discrete cosine transform method in use for encoding video |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN100452880C (en) |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106231304A (en) * | 2016-08-30 | 2016-12-14 | 成都金本华电子有限公司 | A kind of video decoding integer transform method based on one-dimensional quick dish algorithm improvement |
CN106550267B (en) * | 2016-11-25 | 2019-03-29 | 广州酷狗计算机科技有限公司 | Multimedia messages coding/decoding method and device |
CN107018420B (en) * | 2017-05-08 | 2019-07-12 | 电子科技大学 | A kind of low-power consumption two-dimension discrete cosine transform method and its circuit |
CN112804531A (en) * | 2021-04-12 | 2021-05-14 | 北京电信易通信息技术股份有限公司 | Method for constructing HEVC (high efficiency video coding) coding chip based on HLS (hyper text transfer system) |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN88102019A (en) * | 1987-04-10 | 1988-12-21 | 菲利浦光灯制造公司 | Use the television transmission system of transition coding |
US5226002A (en) * | 1991-06-28 | 1993-07-06 | Industrial Technology Research Institute | Matrix multiplier circuit |
CN1253340A (en) * | 1998-10-30 | 2000-05-17 | 惠普公司 | Signal processing of distributed operation architecture |
-
2006
- 2006-06-09 CN CNB2006100121618A patent/CN100452880C/en not_active Expired - Fee Related
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN88102019A (en) * | 1987-04-10 | 1988-12-21 | 菲利浦光灯制造公司 | Use the television transmission system of transition coding |
US5226002A (en) * | 1991-06-28 | 1993-07-06 | Industrial Technology Research Institute | Matrix multiplier circuit |
CN1253340A (en) * | 1998-10-30 | 2000-05-17 | 惠普公司 | Signal processing of distributed operation architecture |
Non-Patent Citations (2)
Title |
---|
一种基于高度并行结构的二维DCT/IDCT处理器设计. 刘锋,代国定,庄奕琪.电路与系统学报,第8卷第3期. 2003 |
一种基于高度并行结构的二维DCT/IDCT处理器设计. 刘锋,代国定,庄奕琪.电路与系统学报,第8卷第3期. 2003 * |
Also Published As
Publication number | Publication date |
---|---|
CN1874510A (en) | 2006-12-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108491926B (en) | A low-bit efficient deep convolutional neural network hardware acceleration design method, module and system based on logarithmic quantization | |
Chen et al. | Efficient architecture of variable size HEVC 2D-DCT for FPGA platforms | |
CN110933445B (en) | A DCT operation method based on coefficient matrix transformation and its transformation device | |
CN100452880C (en) | Integral discrete cosine transform method in use for encoding video | |
CN105183425A (en) | Fixed-bit-width multiplier with high accuracy and low complexity properties | |
CN102036075A (en) | Image and digital video coding and decoding methods | |
Shams et al. | A low power high performance distributed DCT architecture | |
CN108184127B (en) | A Configurable Multi-Size DCT Transform Hardware Multiplexing Architecture | |
CN114007079A (en) | Conversion circuit, method, device and encoder | |
CN102300092A (en) | Lifting scheme-based 9/7 wavelet inverse transformation image decompressing method | |
Kiruba et al. | Register pre-allocation based folded discrete Tchebichef transformation technique for image compression | |
Wahid et al. | On the error-free realization of a scaled DCT algorithm and its VLSI implementation | |
Kim et al. | Low-power multiplierless DCT for image/video coders | |
Kumar et al. | 2D-Discrete cosine transform based dynamically controllable image compression technique | |
Anas et al. | FPGA implementation of a pipelined 2D-DCT and simplified quantization for real-time applications | |
Agostini et al. | A FPGA based design of a multiplierless and fully pipelined JPEG compressor | |
Devarani et al. | Design and implementation of truncated multipliers for precision improvement | |
Panchani et al. | Fast and multiplierless integer DCT for HEVC | |
CN109451307B (en) | One-dimensional DCT operation method and DCT transformation device based on approximate coefficient | |
Sekhar | Precision-Aware and Quantization of Lifting Based DWT Hardware Architecture | |
Knifati | FPGA implementation of discrete cosine transform using difference based adder graph algorith | |
CN102025988B (en) | Mode-related fast transformation method | |
CN100388316C (en) | High-precision digital cosine transform circuit without multiplier and its transform method | |
CN101316367B (en) | A Two-Dimensional Inverse Transformation Method in Video Codec Standard and Its Implementation Circuit | |
Teja et al. | Verilog implementation of fully pipelined and multiplierless 2D DCT/IDCT JPEG architecture |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
C17 | Cessation of patent right | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20090114 Termination date: 20100609 |