CN100566419C

CN100566419C - Equipment and method with harmless mode encoding digital image data

Info

Publication number: CN100566419C
Application number: CN 200610146476
Authority: CN
Inventors: V·R·拉维德兰; K·塞亚加拉简; J·拉特泽尔; S·A·莫利
Original assignee: Qualcomm Inc
Current assignee: Qualcomm Inc
Priority date: 2001-07-02
Filing date: 2002-07-02
Publication date: 2009-12-02
Anticipated expiration: 2022-07-02
Also published as: CN1992896A

Abstract

A method (900) of losslessly compressing and encoding a signal representing image information is presented. A lossy compressed data file (922) and a residual item compressed data file (960) are generated. When the lossy compressed data file and the residual compressed data file are combined, a lossless data file that is substantially identical to the source data file is created.

Description

Apparatus and method for encoding digital image data in a lossless manner

本申请是申请日为2002年7月2日、申请号为02817150.0、发明名称为“以无损的方式编码数字图像数据的设备和方法”的发明专利申请的分案申请。This application is a divisional application of an invention patent application with a filing date of July 2, 2002, an application number of 02817150.0, and an invention title of "equipment and method for encoding digital image data in a lossless manner".

技术领域 technical field

本发明涉及图像处理和压缩。更具体地说，本发明涉及频率领域中对视频图像和音频信息的无损编码。The present invention relates to image processing and compression. More specifically, the invention relates to the lossless coding of video image and audio information in the frequency domain.

背景技术 Background technique

在一般的数字信号处理的原理中，数字图片处理占有很重要的地位。人类视觉感知的重要性极大地激励了对本领域以及数字图片处理科学的兴趣和进展。在发送及接收视频信号的领域中，例如用于投影影片及电影，已对图像压缩技术做了多种改进。许多目前的或者提议的视频系统使用数字编码技术。该领域的这些方面包括：图像编码、图像复原以及图像特征选择。图像编码表示试图将图片以有效的方式在数字通信信道上发射，使用尽可能少的位以最小化所要求的带宽，与此同时，将失真维持在一定的限度内。图像复原表示努力恢复目标的真实图像的工作。编码后的图像在通信信道上发射会由于多种因素而失真。在从目标创建图像时，就可能产生失真源。特征选择是指选择图片的某些属性。这些属性可能是识别、分类以及宽泛内容中的判决所要求的。In the principle of general digital signal processing, digital picture processing occupies a very important position. The importance of human visual perception has greatly stimulated interest and progress in this field and in the science of digital image processing. In the field of transmitting and receiving video signals, such as for projecting films and movies, various improvements have been made to image compression techniques. Many current or proposed video systems use digital encoding techniques. These aspects of the field include: image coding, image restoration, and image feature selection. Image coding represents an attempt to transmit pictures over a digital communication channel in an efficient manner, using as few bits as possible to minimize the required bandwidth, while maintaining distortion within certain limits. Image restoration refers to the work that strives to recover the real image of the target. The encoded image transmitted over the communication channel may be distorted due to various factors. When creating an image from a target, sources of distortion can arise. Feature selection refers to the selection of certain attributes of an image. These attributes may be required for identification, classification, and adjudication in general.

对于视频的数字编码，例如在数码相机中进行的，是一个受益于改进的图像压缩技术的领域。数字图像压缩一般可以分成两类：无损方法和有损方法。无损图像可以不丢失任何信息而被恢复。有损方法会导致一些信息不可恢复地丢失，这要根据压缩率、压缩算法的质量以及算法的实施。通常，有损压缩的方法考虑获得经济的数字影院所期望的压缩率。为了达到数字影院的质量等级，该压缩方法需要提供视觉无损的性能等级。这样，尽管压缩处理仍会导致信息的数学丢失，但是由这种丢失而引起的图像失真在一般的观赏环境下是不会被观赏者所察觉的。The digital encoding of video, such as that performed in digital cameras, is an area that would benefit from improved image compression techniques. Digital image compression can generally be divided into two categories: lossless methods and lossy methods. Lossless images can be recovered without losing any information. Lossy methods cause some information to be lost irretrievably, depending on the compression ratio, the quality of the compression algorithm, and the implementation of the algorithm. Typically, lossy compression methods are considered to achieve the compression ratios desired by economical digital cinema. To achieve digital cinema quality levels, the compression method needs to deliver a visually lossless level of performance. Thus, although the compression process still results in a mathematical loss of information, the image distortion caused by this loss will not be noticed by the viewer under normal viewing conditions.

现存的数字图像压缩技术已被发展用于其它应用，即电视系统。这些技术已经设计为兼顾适合于所需的应用，却没有满足用于影院表现所需要的质量要求。Existing digital image compression techniques have been developed for other applications, namely television systems. These technologies have been designed with a balance of suitability for the desired application, but have not met the quality requirements required for cinema performance.

数字影院压缩技术需要提供看电影者之前已经体验过的视觉质量。理想地，数字影院的视觉质量要试图超越高质量的发行影印的影片。同时，压缩技术应该具有实用的高编码效率。如这里所定义的，编码效率是指压缩图像质量满足某一质量标准所需要的比特率。更进一步，该系统和编码技术需要内置足够的灵活性以容纳不同的格式，并且还应该是经济的，也就是，一种小大小并且有效的解码器或者编码器处理。Digital cinema compression technology needs to deliver the visual quality moviegoers have experienced before. Ideally, the visual quality of digital cinema should try to surpass that of high-quality distribution photocopies. At the same time, compression techniques should have practically high coding efficiency. As defined herein, coding efficiency refers to the bit rate required for compressed image quality to meet a certain quality standard. Still further, the system and encoding technique needs to have enough flexibility built in to accommodate different formats, and should also be economical, ie, a small size and efficient decoder or encoder processing.

许多现存的压缩技术可提供显著的压缩程度，但是会导致视频信号质量的下降。一般而言，用于发送压缩的信息的技术要求压缩的信息以恒定的比特率被发送。Many existing compression techniques can provide significant degrees of compression, but result in a degradation of video signal quality. In general, techniques for sending compressed information require that the compressed information be sent at a constant bit rate.

一种能提供显著的压缩程度并保持期望的视频信号质量水平的压缩技术使用编码的离散余弦变换(DCT)系数数据的自适应大小的块和子块。该技术下面就称为自适应块大小离散余弦变换(ABSDCT)方法。该技术在美国专利号5,021,891中公开，题为“自适应块大小图像压缩方法及系统”(“AdaptiveBlock SizeImage Compression Method And System”)，转让给本申请的受让人并通过引用结合于此。DCT技术还在美国专利No.5,107,345中公开，题为“自适应块大小图像压缩方法及系统”(“Adaptive Block Size Image CompressionMethod And System”)，转让给本申请的受让人并通过引用结合于此。此外，使用ABSDCT技术和差分四叉树变换技术的结合在美国专利No.5,452,104中公开，题为“自适应块大小图像压缩方法及系统”(“Adaptive Block Size ImageCompression Method And System”)同样转让给本申请的受让人并通过引用结合于此。这些专利中公开的系统使用称之为“帧内”编码的方法，该方法中图像数据的每一帧的编码不考虑任何其它帧的内容。使用ABSDCT技术，达到的数据速率可从每秒15亿比特左右降至大约每秒5000万比特，而图像质量却不出现可识别的下降。One compression technique that can provide a significant degree of compression while maintaining a desired level of video signal quality uses adaptively sized blocks and sub-blocks of encoded discrete cosine transform (DCT) coefficient data. This technique is hereinafter referred to as the Adaptive Block Size Discrete Cosine Transform (ABSDCT) method. This technique is disclosed in U.S. Patent No. 5,021,891, entitled "Adaptive Block Size Image Compression Method And System," assigned to the assignee of the present application and incorporated herein by reference. DCT techniques are also disclosed in U.S. Patent No. 5,107,345, entitled "Adaptive Block Size Image Compression Method And System," assigned to the assignee of the present application and incorporated by reference in this. In addition, the combination of using ABSDCT technology and differential quadtree transform technology is disclosed in U.S. Patent No. 5,452,104, entitled "Adaptive Block Size Image Compression Method and System" ("Adaptive Block Size Image Compression Method And System"), also assigned to The assignee of this application is hereby incorporated by reference. The systems disclosed in these patents use a method called "intra" coding in which each frame of image data is coded without regard to the content of any other frames. Using ABSDCT technology, the achieved data rate can be reduced from around 1.5 billion bits per second to about 50 million bits per second without discernible degradation in image quality.

ABSDCT技术可用于压缩黑白或彩色图像，或者是表示图像的信号。彩色输入信号可以是YIQ格式，其中Y是亮度，或辉度采样而I和Q是色度或彩色采样，对于每一个为4:4:4或者其他格式。也可以使用其它已知的格式比如YUV、YC_bC_r或RGB格式。由于眼睛对于色彩的空间灵敏度较低，多数的研究结果表明在垂直和水平方向上对彩色分量取4倍因子的子采样是合理的。因而，一个视频信号可是用4个亮度采样和两个色度采样来表示。ABSDCT technology can be used to compress black-and-white or color images, or signals representing images. The color input signal may be in YIQ format, where Y is luma, or luminance samples and I and Q are chrominance or color samples, for each 4:4:4 or other format. Other known formats such as YUV, YC _b C _r or RGB formats may also be used. Due to the low spatial sensitivity of the eye to color, most research results show that it is reasonable to sub-sample the color components by a factor of 4 in the vertical and horizontal directions. Thus, a video signal can be represented by 4 luma samples and two chrominance samples.

使用ABSDCT，一个视频信号一般将被分成像素块以进行处理。对于每个块，亮度和色度分量被送到一块大小分配元件，或者一块交错器。例如，一16×16(像素)的块被呈现给该块交错器，块交错器在每个16×16块中排列或组织图像采样以产生用于离散余弦变换(DCT)分析的数据块和合成的子块。DCT操作是将时间和空间采样的信号转换成同一信号的频率表示的一种方法。通过转换成频率表示，DCT技术表现出了允许很高程度的压缩，因为量化器可以设计成利用图像的频率分布特性。在一个较佳实施例中，一个16×16DCT被应用于第一次排序，4个8×8DCT被应用于第二排序，16个4×4DCT被应用于第三排序而64个2×2DCT被应用于第四排序。Using ABSDCT, a video signal will typically be divided into blocks of pixels for processing. For each block, the luma and chrominance components are sent to a size distribution element, or an interleaver. For example, a 16x16 (pixel) block is presented to the block interleaver, which arranges or organizes image samples within each 16x16 block to produce data blocks for discrete cosine transform (DCT) analysis and Synthetic subblocks. The DCT operation is a method of converting a temporally and spatially sampled signal into a frequency representation of the same signal. By converting to a frequency representation, the DCT technique appears to allow a high degree of compression because the quantizer can be designed to take advantage of the frequency distribution properties of the image. In a preferred embodiment, a 16×16 DCT is applied to the first sorting, 4 8×8 DCTs are applied to the second sorting, 16 4×4 DCTs are applied to the third sorting and 64 2×2 DCTs are applied to the Applied to the fourth sort.

DCT操作降低了视频源中固有的空间冗余。在进行DCT之后，大多数的视频信号能量将集中到少数几个DCT参数上。附加变换，差分四叉树变换，可用于降低DCT参数中的冗余。The DCT operation reduces the spatial redundancy inherent in video sources. After performing DCT, most of the video signal energy will be concentrated on a few DCT parameters. An additional transform, the differential quadtree transform, can be used to reduce redundancy in the DCT parameters.

对于16×16块和每一个子块，分析DCT参数值和DQT值(如果使用DQT)以确定编码该块或子块所要求的比特的数量。然后，选择要求最少数量的比特来实现编码的块或子块的组合来表示图像分段。例如，2个8×8子块、6个4×4子块和8个2×2子块可被选择用于表示图像分段。For a 16x16 block and each sub-block, the DCT parameter values and DQT values (if DQT is used) are analyzed to determine the number of bits required to encode the block or sub-block. Then, the block or combination of sub-blocks requiring the fewest number of bits to achieve encoding is chosen to represent the image segment. For example, 2 8x8 sub-blocks, 6 4x4 sub-blocks and 8 2x2 sub-blocks may be selected to represent image segments.

选择的块或子块的组合然后被适当地排列到一个16×16块中。DCT/DQT参数值可能经过频率加权、量化和编码(比如可变长度编码)以准备用于传送。尽管上述的ABSDCT技术表现得相当之好，但它的计算强度大。The selected block or combination of sub-blocks is then suitably arranged into a 16x16 block. DCT/DQT parameter values may be frequency weighted, quantized and coded (such as variable length coded) in preparation for transmission. Although the ABSDCT technique described above performs reasonably well, it is computationally intensive.

另外，尽管使用ABSDCT是视觉无损的，但是它有时希望以精确地和其编码一样的方式来恢复数据。例如，用于控制和存档要求以能精确恢复到原来的域中的方式来压缩数据。Also, although using ABSDCT is visually lossless, it is sometimes desirable to recover data in exactly the same way as it was encoded. For example, applications for containment and archiving require that data be compressed in such a way that it can be restored exactly to its original domain.

传统的，用于图像的无损压缩系统包括预测器，预测器估计将要进行编码的当前像素的值。剩余项像素作为真实和预测的像素之间的方差而被获得。剩余项像素接下来被进行熵编码并保存或进行发送。由于预测除去了像素之间的关联，剩余项像素具有降低的动态范围，该动态范围为特征双边指数(拉普拉斯)分布。因此该压缩。对于剩余项的压缩的量同时取决于预测和后来的编码方法。大多数通常使用的预测方法为差分脉冲编码调制(DPCM)以及它的变种如自适应DPCM(ADPCM)。Traditionally, lossless compression systems for images include a predictor that estimates the value of the current pixel to be encoded. The remaining term pixels are obtained as the variance between the true and predicted pixels. The remaining term pixels are then entropy encoded and stored or sent. Since the prediction removes the correlation between pixels, the remaining term pixels have a reduced dynamic range that is characteristically bilaterally exponential (Laplace) distributed. Hence the compression. The amount of compression for the remainder depends on both the prediction and the subsequent encoding method. The most commonly used prediction method is Differential Pulse Code Modulation (DPCM) and its variants such as Adaptive DPCM (ADPCM).

使用基于像素的预测所带来的一个问题是剩余项仍然具有很高的能量。这是由于在预测过程中，实际上仅使用很小数量的相邻的像素。因此，在改进基于像素的预测方案的编码效率方面还有很大的空间。One problem with using pixel-based predictions is that the remaining terms still have high energy. This is due to the fact that only a very small number of adjacent pixels are used in the prediction process. Therefore, there is still much room for improving the coding efficiency of pixel-based prediction schemes.

发明内容 Contents of the invention

本发明的实施例描述了一种以无损的方式进行数字图像和视频数据编码以实现压缩的系统。该系统是混合式的，意谓该系统具有以有损的方式压缩所述数据的一部分以及以无损的方式压缩余项数据的另一部分。对于有损部分，该系统使用自适应块大小离散余弦变换(ABSDCT)算法。该ABSDCT系统压缩所述数据并提供高视觉质量和高压缩比率。余项图像作为源图像和从所述ABSDCT系统中解压缩得到的图像的方差被获得。该余项使用Golomb-Rice编码算法进行无损编码。由于基于视觉的自适应块大小和对于DCT系数的量化，剩余项具有很低的能量，这样就提供了较好的整体无损压缩比率。Embodiments of the present invention describe a system for encoding digital image and video data in a lossless manner to achieve compression. The system is hybrid, meaning that the system has a part that compresses the data in a lossy manner and another part that compresses the remaining data in a lossless manner. For the lossy part, the system uses the Adaptive Block Size Discrete Cosine Transform (ABSDCT) algorithm. The ABSDCT system compresses the data and provides high visual quality and high compression ratio. The remainder image is obtained as the variance of the source image and the image decompressed from the ABSDCT system. The remainder is losslessly encoded using the Golomb-Rice encoding algorithm. Due to the vision-based adaptive block size and quantization to the DCT coefficients, the residual terms have very low energy, which provides a better overall lossless compression ratio.

该ABSDCT系统在影院质量上能实现高压缩比率。由于它是基于块的，它能比其他任何基于像素的方案更好地去除像素之间的相关性。因此它在这里将要描述的无损系统中用作预测器。结合该预测器增加一个无损编码系统就形成一个混合无损压缩系统。需要注意的是，该系统能够压缩静止的图像和运动的图像。如果是静止的图像，仅有该ABSDCT压缩数据和熵编码的剩余项数据被用作压缩输出。对于运动的序列，会做一个判决以确定是使用帧内还是帧间压缩。例如，如果f(t)表示时刻t的图像帧，F(t)和F(t+Δt)分别表示在时刻t和t+Δt图像帧的DCT。注意，Δt表示两个连续帧之间的时间间隔。The ABSDCT system achieves high compression ratios at cinema quality. Since it is block-based, it can de-correlate pixels better than any other pixel-based scheme. It is therefore used as a predictor in the lossless system to be described here. Combining the predictor with the addition of a lossless coding system forms a hybrid lossless compression system. It should be noted that the system is capable of compressing still images as well as moving images. In the case of a still image, only the ABSDCT compressed data and the entropy-coded residual item data are used as compressed output. For moving sequences, a decision is made as to whether to use intra or inter compression. For example, if f(t) represents the image frame at time t, F(t) and F(t+Δt) represent the DCT of the image frame at time t and t+Δt, respectively. Note that Δt represents the time interval between two consecutive frames.

本发明是体现在一个允许以精确地和编码数据相同的方式来恢复数据的压缩数据的装置及方法中。实施例包括一进行帧内编码、帧间编码或者两者混合的系统。该系统是基于质量的系统，使用离散余弦变换系数数据的自适应大小的块和子块。一个像素数据块被输入到一个编码器。该编码器包括一个块大小分配(BSA)元件，其将输入的像素块进行分段以用于处理。块大小的分配是基于输入块和进一步分割的子块的方差。通常，假设块和子块的平均值落入不同的预先确定的范围中，则方差较大的区域被分割成更小的块，而方差较小的区域不再进行子分割。这样，首先块的方差门限根据它的平均值从它的标称值进行修改，接下来块的方差值与该门限值进行比较，如果方差值大于门限值，则该块进行子分割。The present invention is embodied in an apparatus and method for compressing data which allow data to be recovered in exactly the same manner as encoded data. Embodiments include a system that performs intra-coding, inter-coding, or a combination of both. The system is a quality based system using adaptively sized blocks and sub-blocks of discrete cosine transform coefficient data. A block of pixel data is input to an encoder. The encoder includes a block size allocation (BSA) element that segments an input pixel block for processing. The allocation of the block size is based on the variance of the input block and further divided sub-blocks. Typically, regions with larger variance are split into smaller blocks, while regions with smaller variance are not subdivided, assuming the mean values of the blocks and sub-blocks fall into different pre-determined ranges. In this way, first the variance threshold of the block is modified from its nominal value according to its mean value, then the variance value of the block is compared with the threshold value, if the variance value is greater than the threshold value, the block is subclassed. segmentation.

块大小分配被提供给一变换元件，它将像素数据变换成频域数据。该变换仅在通过块大小分配选择的块和子块上进行。对于AC元件，变换数据接下来通过量化和串行化来进行定标。变换数据的量化是基于图像质量度量，例如调整对比度的标度因子、系数计数、码率失真，块大小分配密度和/或以前的标度因子。串行化，例如Z字形扫描，是基于建立对于同一值的最长的可能的游程长度。数据流接下来由可变长度编码器进行编码以准备用于传送。编码可以是Huffman编码，或者是基于指数分布的编码，比如Golomb-Rice编码。The block size allocation is provided to a transform element, which transforms the pixel data into frequency domain data. This transformation is only performed on blocks and subblocks selected by the block size allocation. For AC elements, the transformed data is then scaled by quantization and serialization. Quantization of transformed data is based on image quality metrics such as scaling factors for adjusting contrast, coefficient counts, rate-distortion, block size allocation density and/or previous scaling factors. Serialization, such as zigzag scanning, is based on establishing the longest possible run length for the same value. The data stream is then encoded by a variable length encoder in preparation for transmission. The coding can be Huffman coding, or coding based on exponential distribution, such as Golomb-Rice coding.

使用诸如ABSDCT的混合压缩系统，可作为像素或者DCT值的良好的预测器。因此它能得到比任何使用基于像素的预测更高的无损压缩比率。有损部分提供数字影院质量的结果，即压缩产生的一个文件是视觉无损的。对于无损部分，不同于Huffman编码，Golomb-Rice编码不要求生成任何先验编码。因此，它不需要像Huffman编码那样保存一个大容量的编码本。这样就能更有效率地使用芯片的资源，因此，用硬件实现的芯片大小可以缩小。另外，Golomb-Rice编码比Huffman编码实现起来更加简单。而且，由于DCT参数或剩余项具有自然的指数分布的特性，Golomb-Rice编码比Huffman编码实现更高的编码效率。更进一步，由于压缩系统的有损部分在块的子分割中使用视觉重要信息，在剩余项编码中会继承内容模型。这一点很重要，因为这样就不需要额外的存储寄存器来收集用于剩余项编码的相关内容数据。由于不使用任何的运动估计，该系统实现起来也非常简单。Use a hybrid compression system such as ABSDCT as a good predictor of pixel or DCT values. Therefore it can achieve higher lossless compression ratios than any using pixel-based prediction. The lossy part provides digital cinema quality results, ie the compression produces a file that is visually lossless. For the lossless part, unlike Huffman coding, Golomb-Rice coding does not require any prior coding to be generated. Therefore, it does not need to save a large codebook like Huffman coding. This enables a more efficient use of the chip's resources, so the size of the chip implemented in hardware can be reduced. In addition, Golomb-Rice coding is simpler to implement than Huffman coding. Moreover, Golomb-Rice coding achieves higher coding efficiency than Huffman coding due to the natural exponential distribution of DCT parameters or residuals. Furthermore, since the lossy part of the compression system uses visually important information in the sub-segmentation of the block, the content model is inherited in the encoding of the remaining items. This is important because then no additional storage registers are required to collect relevant content data for remaining item encodings. The system is also very simple to implement since it does not use any motion estimation.

提出一种用于无损压缩和编码表示图像信息的信号的装置和方法。表示图像信息的信号被压缩以创建压缩版本的图像。图像的压缩版本被量化，并创建该一个有损版本的图像。压缩版本的图像同样进行串行化以创建一个串行化的、量化的、压缩的版本的图像。该版本的图像接下来被解压缩，并确定源图像和解压缩版本之间的区别，并创建一个剩余项版本的图像。有损版本的图像和余项版本的图像可被分开地输出或者组合地输出，其中将解压缩的有损版本的图像和剩余项版本的图像相结合就实质上与源图像相同。An apparatus and method for losslessly compressing and encoding a signal representing image information is presented. Signals representing image information are compressed to create a compressed version of the image. A compressed version of the image is quantized and a lossy version of the image is created. The compressed version of the image is also serialized to create a serialized, quantized, compressed version of the image. This version of the image is next decompressed and the differences between the source image and the decompressed version are determined and a residual version of the image is created. The lossy version of the image and the remainder version of the image may be output separately or in combination, where the combination of the decompressed lossy version of the image and the remainder version of the image is substantially identical to the source image.

提出了一种无损压缩和编码表示图像信息的信号的方法。生成一个有损数据文件和一个剩余项压缩数据文件。当有损数据文件和余项压缩数据文件被组合时，创建一个实质上与源数据文件一致的无损数据文件。A method for losslessly compressing and encoding signals representing image information is presented. Produces a lossy data file and a residual item compressed data file. When the lossy data file and residual compressed data file are combined, a lossless data file is created that is substantially identical to the source data file.

因而，一个实施例的一个方面提供一种有效提供无损压缩的设备和方法。Thus, an aspect of an embodiment provides an apparatus and method for efficiently providing lossless compression.

一个实施例的另一个方面以有益于控制和存档的目的来无损压缩数字图像和音频信息。Another aspect of an embodiment provides lossless compression of digital image and audio information for purposes beneficial for control and archiving.

一个实施例的又一方面提供一种基于帧间的无损压缩系统。Yet another aspect of an embodiment provides an inter-based lossless compression system.

一个实施例的再一方面提供一种基于帧内的无损压缩系统。Another aspect of an embodiment provides an intra-based lossless compression system.

本发明的一个实施例描述了一种对包括源图像的数据进行编码的装置，该装置包括：用于压缩表示所述源图像的数据并由此创建所述源图像的压缩版本的装置，其中所述压缩使用了由对源图像进行自适应块大小调整而在先生成的数据；用于量化所述源图像的压缩版本并由此创建所述源图像的有损版本的装置；用于解压缩所述源图像的压缩版本以创建经解压缩的图像的装置，其中所述解压缩使用了由对源图像进行自适应块大小调整而在先生成的数据；用于确定所述源图像和所述经解压缩的图像之间的差值并由此创建与所述源图像相关联的剩余项数据的装置；以及用于输出所述源图像的有损版本和所述剩余项数据的装置，其中所述源图像的有损版本和所述剩余项数据可用来创建与所述源图像实质上一致的图像。An embodiment of the invention describes an apparatus for encoding data comprising a source image, the apparatus comprising: means for compressing data representing said source image and thereby creating a compressed version of said source image, wherein said compression uses data previously generated by adaptive block resizing of a source image; means for quantizing a compressed version of said source image and thereby creating a lossy version of said source image; means for compressing a compressed version of said source image to create a decompressed image, wherein said decompression uses data previously generated by adaptive block resizing of said source image; for determining said source image and means for taking the difference between said decompressed images and thereby creating remnant data associated with said source image; and means for outputting a lossy version of said source image and said remnant data , wherein the lossy version of the source image and the residual item data can be used to create an image substantially identical to the source image.

本发明的另一个实施例描述了一种对包括来自源图像的多个源帧的数据进行编码的方法，所述方法包括：压缩表示所述多个源帧的第一源帧的数据、并由此创建所述第一源帧的压缩版本，其中所述压缩使用了由对所述多个源帧进行自适应块大小调整而在先生成的数据；量化所述第一源帧的压缩版本并由此创建所述第一源帧的有损版本；解压缩所述第一源帧的压缩版本以创建经解压缩的帧，其中所述解压缩使用了由对所述多个源帧进行自适应块大小调整而在先生成的数据；确定第二源帧和所述经解压缩的帧之间的差值并由此创建与所述第一源帧相关联的剩余项数据；以及输出所述第一源帧的有损版本和所述剩余项数据，其中所述第一源帧的有损版本和所述剩余项数据可用来创建与所述第一源帧实质上一致的帧。Another embodiment of the present invention describes a method of encoding data comprising a plurality of source frames from a source image, the method comprising: compressing data representing a first source frame of the plurality of source frames, and thereby creating a compressed version of the first source frame, wherein the compression uses data previously generated by adaptive block resizing of the plurality of source frames; quantizing the compressed version of the first source frame and thereby creating a lossy version of the first source frame; decompressing the compressed version of the first source frame to create a decompressed frame, wherein the decompressing uses previously generated data from adaptive block resizing; determining a difference between a second source frame and said decompressed frame and thereby creating residual item data associated with said first source frame; and outputting A lossy version of the first source frame and the remnant data, wherein the lossy version of the first source frame and the remnant data may be used to create a frame substantially identical to the first source frame.

本发明的又一个实施例描述了一种对包括来自源图像的多个源帧的数据进行编码的装置，所述装置包括：用于压缩表示所述多个源帧的第一源帧的数据、并由此创建所述第一源帧的压缩版本的装置，其中所述压缩使用了由对所述多个源帧进行自适应块大小调整而在先生成的数据；用于量化所述第一源帧的压缩版本并由此创建所述第一源帧的有损版本的装置；用于解压缩所述第一源帧的压缩版本以创建经解压缩的帧的装置，其中所述解压缩使用了由对所述多个源帧进行自适应块大小调整而在先生成的数据；用于确定第二源帧和所述经解压缩的帧之间的差值并由此创建与所述第一源帧相关联的剩余项数据的装置；以及用于输出所述第一源帧的有损版本和所述剩余项数据的装置，其中所述第一源帧的有损版本和所述剩余项数据可用来创建与所述第一源帧实质上一致的帧。Yet another embodiment of the present invention describes an apparatus for encoding data comprising a plurality of source frames from a source image, the apparatus comprising: compressing data representing a first source frame of the plurality of source frames , and thereby create a compressed version of said first source frame, wherein said compression uses data previously generated by adaptive block resizing of said plurality of source frames; for quantizing said first source frame means for decompressing the compressed version of the first source frame to create a decompressed frame, wherein the decompressing The compression uses data previously generated by adaptive block resizing of the plurality of source frames; used to determine the difference between the second source frame and the decompressed frame and thereby create a means for remnant item data associated with said first source frame; and means for outputting a lossy version of said first source frame and said remnant item data, wherein said lossy version of said first source frame and said remnant item data The remaining item data may be used to create a frame substantially identical to the first source frame.

附图说明 Description of drawings

本发明的特征和优势将会通过下面结合附图所作的详细描述后边的更明显，在附图中，相同的标号始终标示相同的特征，其中：Features and advantages of the present invention will be more apparent through the following detailed description made in conjunction with the accompanying drawings. In the accompanying drawings, the same reference numerals indicate the same features throughout, wherein:

图1一个图像压缩和处理系统中的编码部分的框图；Fig. 1 is a block diagram of an encoding part in an image compression and processing system;

图2是一个图像压缩和处理系统中的解码部分的框图；Fig. 2 is a block diagram of the decoding part in an image compression and processing system;

图3图示了涉及在基于方差的块大小分配的处理步骤的流程图；Figure 3 illustrates a flowchart of the processing steps involved in variance-based block size allocation;

图4a图示了DCT系数矩阵中Y分量游程长度的指数分布；Figure 4a illustrates the exponential distribution of the run length of the Y component in the DCT coefficient matrix;

图4b图示了DCT系数矩阵中C_b分量游程长度的指数分布；Figure 4b illustrates the exponential distribution of the run length of the C _b component in the DCT coefficient matrix;

图4c图示了DCT系数矩阵中C_r分量游程长度的指数分布；Figure 4c illustrates the exponential distribution of the run length of the _Cr component in the DCT coefficient matrix;

图5a图示了Y分量幅度大小或DCT系数矩阵中Y分量幅度大小的指数分布；Figure 5a illustrates the magnitude of the Y component magnitude or the exponential distribution of the magnitude of the Y component magnitude in the DCT coefficient matrix;

图5b图示了C_b分量幅度大小或DCT系数矩阵中C_b分量幅度大小的指数分布；Figure 5b illustrates the C _b component amplitude magnitude or the exponential distribution of the C _b component amplitude magnitude in the DCT coefficient matrix;

图5c图示了C_r分量幅度大小或DCT系数矩阵中C_r分量幅度大小的指数分布；Figure 5c illustrates the magnitude of the _Cr component magnitude or the exponential distribution of the magnitude of the _Cr component magnitude in the DCT coefficient matrix;

图6图示了Golomb-Rice编码处理；Figure 6 illustrates the Golomb-Rice encoding process;

图7图示了用于Golomb-Rice编码的设备；Figure 7 illustrates an apparatus for Golomb-Rice encoding;

图8图示了编码DC分量值的处理；Figure 8 illustrates the process of encoding DC component values;

图9图示了用于无损压缩的设备；以及Figure 9 illustrates an apparatus for lossless compression; and

图10图示了混合无损压缩的方法。Figure 10 illustrates a method of hybrid lossless compression.

具体实施方式 Detailed ways

为了实现数字信号的数字传输并利用其优势，通常需要使用一些形式的信号压缩。当在一个结果图像中实现高压缩比率时，维持图像的高质量也同样重要。并且，用于微型硬件实现时还期望其计算效率，这在很多应用中是很重要的。In order to achieve digital transmission of digital signals and take advantage of them, it is often necessary to use some form of signal compression. When achieving high compression ratios in a resulting image, it is equally important to maintain the high quality of the image. Also, its computational efficiency is expected for tiny hardware implementations, which is important in many applications.

在详细解释本发明的一个实施例之前，需要理解本发明并不限于应用于在下面的描述中说明的或者在附图中图示的详细结构和对于部件的安排。本发明能用于其他实施例并以不同的方式实现。同样，需要理解这里使用的措辞和术语是为了描述的目的而不应该被视为限制。Before explaining one embodiment of the present invention in detail, it should be understood that the present invention is not limited to the detailed construction and arrangement of parts described in the following description or illustrated in the accompanying drawings. The invention is capable of other embodiments and of being carried out in various ways. Also, it is to be understood that the phraseology and terminology used herein are for the purpose of description and should not be regarded as limiting.

一个实施例中的一个方面使用的图像压缩是基于离散余弦变换(DCT)技术，例如未决的美国专利申请“对比度敏感的基于方差的自适应块大小DCT图像压缩”(“Contrast Sensitive Variance Based Adaptive Block Size DCTImage Compression”)中所公开的，序列号为No.09/436,085，提交于1999年11月8日，转让给本申请的受让人并通过引用结合于此。使用DCT的图像压缩和解压缩系统在共同待决的美国专利“基于质量的图像压缩”(“Quality BasedImage Compression”)中描述，序列号为No.09/494,192，提交于2000年1月28日，转让给本申请的受让人并通过引用结合于此。通常，一个在数字领域中进行处理的图像由分割成互不重叠的块的一个矩阵的像素数据组成，大小为NxN。在每一个块上可进行两维的DCT。该两维的DCT由下列关系定义：The image compression used in an aspect of one embodiment is based on discrete cosine transform (DCT) technology, such as the pending U.S. patent application "Contrast Sensitive Variance Based Adaptive Block Size DCT Image Compression" ("Contrast Sensitive Variance Based Adaptive Block Size DCT Image Compression"), Serial No. 09/436,085, filed November 8, 1999, assigned to the assignee of this application and incorporated herein by reference. An image compression and decompression system using DCT is described in co-pending U.S. Patent "Quality Based Image Compression," Serial No. 09/494,192, filed January 28, 2000, Assigned to the assignee of this application and incorporated herein by reference. Typically, an image processed in the digital domain consists of pixel data divided into a matrix of non-overlapping blocks, of size NxN. Two-dimensional DCT can be performed on each block. The two-dimensional DCT is defined by the following relationship:

$X x ((k k,, l l)) = = \frac{α α ((k k)) β β ((k k))}{\sqrt{N N * * M m}} {Σ Σ}_{m m = = 00}^{N N - - 11} {Σ Σ}_{n no = = 00}^{N N - - 11} x x ((m m,, n no)) cos cos [[\frac{((22 m m + + 11)) πk πk}{22 N N}]] cos cos [[\frac{((22 n no + + 11)) πl πl}{22 N N}]],, 00 \leq \leq k k,, l l \leq \leq N N - - 11$

其中 $α (k), β (k) = \{\begin{matrix} 1, if & k = 0 \\ \sqrt{2}, if & k &NotEqual; 0 \end{matrix},$ 以及in $α (k), β (k) = \{\begin{matrix} 1, if & k = 0 \\ \sqrt{2}, if & k &NotEqual; 0 \end{matrix},$ as well as

x(m，n)是在一个NxM块中在位置(m，n)上的像素，而x(m,n) is the pixel at position (m,n) in an NxM block, and

X(k，l)是对应的DCT系数。X(k,l) is the corresponding DCT coefficient.

由于像素值是非负值，DCT分X(0，0)一直是正的并且一般具有大部分的能量。实际上，对于典型的图像，大部分的变换能量集中在分量X(0，0)的周围。这种能量压缩的特性使DCT技术成为一种具有吸引力的压缩方法。Since pixel values are non-negative, the DCT score X(0,0) is always positive and generally has most of the energy. In fact, for a typical image, most of the transformation energy is concentrated around the component X(0,0). This energy-compressing property makes the DCT technique an attractive compression method.

该图像压缩技术使用对比度自适应编码来实现进一步的比特率的降低。通过观察发现大多数自然图像由相对缓慢变化的平滑区域和诸如目标边界和高对比度纹理的繁变区域组成。对比度自适应编码方案利用此点给繁变区域分配较多的位而向不太繁忙的区域分配较少的位。This image compression technique uses contrast adaptive coding to achieve further bit rate reduction. It is observed that most natural images consist of relatively slowly changing smooth regions and volatile regions such as object boundaries and high-contrast textures. Contrast adaptive coding schemes take advantage of this by allocating more bits to busy areas and fewer bits to less busy areas.

对比度自适应方法使用帧内编码(空间处理)而不是帧间编码(空间-时间处理)。帧间编码固有地要求更为复杂的处理电路，还要求多个帧缓冲区。在许多应用中，实际实现中需要降低复杂度。帧内编码在能使空间-时间编码方案失效及表现很差的环境中也能使用。例如，由于机械快门使积分时间相对较短，每秒24帧的电影就能分入这一类。较短的积分时间允许更高程度的时域混叠。对于快速的运动，由于它变成急动的，因而破坏了对于帧与帧之间相关性的假设。当同时使用50Hz和60Hz的电源线频率时，帧内编码还易于进行标准化。电视现在就是以50Hz或者60Hz进行发送的。使用帧内编码，作为数字方案能适应50Hz以及60Hz的操作，或者通过相对于空间分辨率对帧速率折衷甚至能适应每秒24帧的电影。Contrast adaptive methods use intra-coding (spatial processing) instead of inter-coding (spatial-temporal processing). Inter-coding inherently requires more complex processing circuitry and also requires multiple frame buffers. In many applications, complexity reduction is required in practical implementations. Intra coding can also be used in environments where spatio-temporal coding schemes fail and perform poorly. For example, a film at 24 frames per second would fall into this category due to the relatively short integration time provided by the mechanical shutter. Shorter integration times allow for a higher degree of time domain aliasing. For fast motion, since it becomes jerky, the assumption of frame-to-frame correlation is broken. Intra coding is also easy to standardize when using both 50Hz and 60Hz power line frequencies. TV is now sent at 50Hz or 60Hz. Using intra-frame coding, it is possible to accommodate 50 Hz as well as 60 Hz operation as a digital solution, or even 24 frames per second film by compromising the frame rate with respect to the spatial resolution.

为了图像处理，DCT操作是在分割成一个矩阵中的互不重叠的块的像素数据上进行的。需要注意，尽管这里讨论的块的大小是NxN，可以预见多种块大小都是可以使用的。例如，可以使用NxM的块大小，其中N和M是整数并且M可以大于或者小于N。另一个重要的方面是该块可以分割成至少一层子块，例如N/ixN/i、N/ixN/j、N/ixM/j等等，其中i和j是整数。另外，这里讨论的示范块大小是一个具有对应的DCT块和子块的16×16像素块。还可以预见各种其他整数例如两个偶数或者两个奇数都可以使用，比如9×9。For image processing, DCT operations are performed on pixel data partitioned into non-overlapping blocks in a matrix. Note that although the block size discussed here is NxN, it is envisioned that a variety of block sizes could be used. For example, a block size of NxM may be used, where N and M are integers and M may be larger or smaller than N. Another important aspect is that the block can be divided into at least one level of sub-blocks, eg N/ixN/i, N/ixN/j, N/ixM/j, etc., where i and j are integers. Additionally, the exemplary block size discussed here is a 16x16 pixel block with corresponding DCT blocks and sub-blocks. It is also envisioned that various other integers may be used such as two even numbers or two odd numbers, such as 9x9.

图1和2图示了一个结合可配置串行器概念的图像处理系统100。图像处理系统100包括压缩所接收的视频信号的编码器104。压缩的信号使用传输信道或物理媒体108发送，并由解码器112接收。解码器112将接收的编码数据解码成图像样本，图像样本接下来可能被展示。1 and 2 illustrate an image processing system 100 incorporating the concept of a configurable serializer. The image processing system 100 includes an encoder 104 that compresses a received video signal. The compressed signal is sent using a transmission channel or physical medium 108 and received by a decoder 112 . The decoder 112 decodes the received encoded data into image samples, which may then be displayed.

通常，一个图像被分成多个像素块以进行处理。一个色彩信号可使用RGB到YC₁C₂转换器116从RGB空间转化到YC₁C₂空间，其中Y是亮度或者辉度分量，而C₁和C₂是色度，或色彩分量。因为眼睛对于色彩的低空间敏感度，许多系统在水平和垂直方向上用4倍因子来子采样C₁和C₂分量。然而，子采样是不需要的。一个完整分辨率的图像，即所知的4:4:4格式，可能在一些应用比如所称的“数字影院”中非常有用或者是必须的。两种可能的YC₁C₂表示是，YIQ表示和YUV表示，两者都是业内所熟知的。也可以使用YVU表示的一种变化即所称的YC_bC_r。这可以进一步分成奇偶分量。因而，在一个实施例中使用Y-偶、Y-奇、C_b-偶、C_b-奇、C_r-偶、C_r-奇来表示。Typically, an image is divided into blocks of pixels for processing. A color signal can be converted from RGB space to YC ₁ C ₂ space using RGB to YC ₁ C ₂ converter 116, where Y is the luminance or luminance component and C ₁ and C ₂ are chrominance, or color components. Because of the eye's low spatial sensitivity to color, many systems subsample the _C1 and _C2 components by a factor of 4 in the horizontal and vertical directions. However, subsampling is not required. A full resolution image, known as 4:4:4 format, may be very useful or necessary in some applications such as so-called "digital cinema". Two possible YC ₁ C ₂ representations are, the YIQ representation and the YUV representation, both of which are well known in the art. A variation of the YVU representation known as YC _b C _r can also be used. This can be further divided into parity and even components. Thus, Y-even, Y-odd, _Cb -even, _Cb -odd, _Cr -even, _Cr -odd are used in one embodiment to represent.

在一个较佳实施例中，处理奇偶Y、C_b和C_r分量中的每一个而不进行子采样。这样，一个16×16像素块的6个分量中的每一个的输入被提供给编码器104。为了说明，图示了用于Y-偶分量的编码器104。类似的编码器被用于Y-奇分量，和奇偶C_b和C_r分量。编码器104包括块大小分配器120，它进行块大小的分配以准备用于视频压缩。该块大小分配器120基于该块中的图像的可感知特征确定对16×16块的分解。块大小分配将每个16×16块子分割成更小的块，比如8×8、4×4和2×2，该分割是以一种根据16×16块中的活动性的四叉树形式。块大小分配器120生成四叉树数据，称为PQR数据，其长度可在1至21比特之间。这样，如果块大小分配确定一个16×16块需要分割，则设置R位PQR数据并在其后加上4位附加的Q数据，4位Q数据对应于分成的4个8×8块。如果块大小分配确定任何8×8块需要进行子分割，则对于每个进行子分割的8×8块再增加4位P数据。In a preferred embodiment, each of the odd and even Y, _Cb and _Cr components are processed without subsampling. Thus, an input for each of the six components of a 16x16 pixel block is provided to the encoder 104 . For illustration, the encoder 104 is shown for the Y-even component. Similar encoders are used for the Y-odd component, and the odd-even _Cb and _Cr components. Encoder 104 includes block size allocator 120, which performs allocation of block sizes in preparation for video compression. The block size allocator 120 determines the decomposition of a 16x16 block based on the perceptible characteristics of the images in the block. The block size allocation subdivides each 16x16 block into smaller blocks, such as 8x8, 4x4, and 2x2, in a quadtree based on the activity in the 16x16 block form. The block size allocator 120 generates quadtree data, called PQR data, which can be between 1 and 21 bits in length. Thus, if the block size allocation determines that a 16x16 block needs to be split, R bits of PQR data are set followed by 4 bits of additional Q data corresponding to the division into four 8x8 blocks. If the block size allocation determines that any 8x8 block needs to be subdivided, an additional 4 bits of P data are added for each subdivided 8x8 block.

现在参考图3，提供了详细表示块大小分配元件120的操作的流程图。块的方差被视为判决是否子分割一个块时的度量。从步骤202开始，读取一个16×16的像素块。在步骤204，计算16×16块的方差v16，其方差用下式计算：Referring now to FIG. 3, a flowchart detailing the operation of block size allocation element 120 is provided. The variance of a block is considered as a metric when deciding whether to sub-split a block. Starting from step 202, a 16×16 pixel block is read. In step 204, the variance v16 of the 16×16 block is calculated, and its variance is calculated by the following formula:

$var var = = \frac{11}{{N N}^{22}} {Σ Σ}_{i i = = 00}^{N N - - 11} {Σ Σ}_{j j = = 00}^{N N - - 11} {x x}_{i i,, j j}^{22} - - {((\frac{11}{{N N}^{22}} {Σ Σ}_{i i = = 00}^{N N - - 11} {Σ Σ}_{j j = = 00}^{N N - - 11} {x x}_{i i,, j j}))}^{22}$

其中N＝16，而x_i，j是NxN块中位于i^th行j^th列的像素。在步骤206，首先如果块的平均值在两个预先确定的值之间，则修改方差门限T16以提供新的方差门限T’16。接下来块方差和新的门限T’16进行比较。Where N=16, and xi _,j is the pixel located in the i ^th row j ^th column in the NxN block. In step 206, first if the mean value of the block is between two predetermined values, the variance threshold T16 is modified to provide a new variance threshold T'16. Next the block variance is compared with the new threshold T'16.

如果方差v16不大于门限T16，则在步骤208，16×16块的起始地址被写入临时存储器，而R位PQR数据被设置为0以表示该16×16块没有被子分割。该算法接下来读取下一个16×16的像素块。如果方差v16大于门限T16，则在步骤210，R位的PQR数据被设置为1以表示该16×16块被子分割为4个8×8块。If the variance v16 is not greater than the threshold T16, then at step 208, the start address of the 16*16 block is written into temporary storage, and the R-bit PQR data is set to 0 to indicate that the 16*16 block is not subdivided. The algorithm next reads the next 16x16 block of pixels. If the variance v16 is greater than the threshold T16, then at step 210, the R bit of PQR data is set to 1 to indicate that the 16×16 block is subdivided into four 8×8 blocks.

这4个8×8块，i＝1∶4，考虑可能会进行进一步的子分割，如步骤212所示。对于每一个8×8块，在步骤214中计算方差v8_i。在步骤216中，首先如果块的平均值在两个预先确定的值之间则修改方差门限T8以提供一个新的门限T’8，接下来块方差与新的门限进行比较。These four 8×8 blocks, i=1:4, may be further sub-divided, as shown in step 212 . For each 8×8 block, the variance v8 _i is calculated in step 214 . In step 216, the variance threshold T8 is first modified to provide a new threshold T'8 if the average value of the block is between two predetermined values, and then the block variance is compared with the new threshold.

如果方差v8_i不大于门限T8，则在步骤218，8×8块的起始地址被写入临时存储器，而对应的Q位，Q_i被设置为0。然后处理下一个8×8块。如果方差v8_i大于门限T8，则在步骤220，对应的Q位，Qi被设置为1以表示该8×8块将被子分割成4个4×4块。If the variance v8 _i is not greater than the threshold T8, then in step 218, the start address of the 8×8 block is written into the temporary memory, and the corresponding Q bit, Q _i is set to 0. Then process the next 8×8 block. If the variance v8 _i is greater than the threshold T8, then in step 220, the corresponding Q bit, Qi is set to 1 to indicate that the 8×8 block will be subdivided into four 4×4 blocks.

这4个4×4块，j_i＝1∶4，考虑可能会进行进一步的子分割，如步骤222所示。对于每一个4×4块，在步骤224中计算方差v8_ij。在步骤226中，首先如果块的平均值在两个预先确定的值之间则修改方差门限T4以提供一个新的门限T’4，接下来块方差与新的门限进行比较。For the four 4×4 blocks, j _i =1:4, further sub-division may be considered, as shown in step 222 . For each 4×4 block, the variance v8 _ij is calculated in step 224 . In step 226, the variance threshold T4 is first modified to provide a new threshold T'4 if the mean value of the block is between two predetermined values, and then the block variance is compared to the new threshold.

如果方差v4_ij不大于门限T4，则在步骤228，该4×4块的首地址被写入，且对应的P位，P_ij被设置为0。然后处理下一个4×4块。如果方差v4_ij大于门限T4，则在步骤230中，对应的P位，P_ij被设置为1以表示该4×4块将被进一步子分割成4个2×2块。另外，这4个2×2块被写入临时存储器中。If the variance v4 _ij is not greater than the threshold T4, then in step 228, the first address of the 4×4 block is written, and the corresponding P bit, P _ij is set to 0. Then process the next 4×4 block. If the variance v4 _ij is greater than the threshold T4, then in step 230, the corresponding P bit, _Pij is set to 1 to indicate that the 4×4 block will be further subdivided into four 2×2 blocks. Additionally, these 4 2x2 blocks are written into temporary memory.

门限T16、T8和T4可以是预先确定的常量。这称为硬判决。或者，也可以使用自适应或者软判决。例如，软判决根据2N×2N块的平均像素值来改变用于方差的门限，其中N可为8、4或2。这样，平均像素值的函数可被用作为门限。Thresholds T16, T8 and T4 may be predetermined constants. This is called a hard judgment. Alternatively, adaptive or soft decisions can also be used. For example, a soft decision changes the threshold for variance according to the average pixel value of a 2Nx2N block, where N can be 8, 4 or 2. In this way, a function of the average pixel value can be used as a threshold.

为了说明，考虑以下的例子。将预先确定的用于Y分量的方差门限设为50、1100和880，分别用于16×16，8×8和4×4的块。换句话说，T16＝50，T8＝1100而T4＝880。将平均值的范围设为80到100。假设计算出16×16块的方差为60。由于60大于T16，而平均值90在80到100之间，该16×16块被子分割成4个8×8子块。假设计算出的8×8块的方差为1180、935、980和1210。由于两个8×8块的方差超过了T8，这两个块将进一步被子分割以生成总共8个4×4子块。最后，假设8个的4×4块的方差为620、630、670、610、590、525、930和690，对应的平均值为90、120、110、115。由于第一个4×4子块的平均值落在范围(80，100)之内，它的门限值将被降低为T’4＝200，比880要小。所以，该4×4子块和第七个4×4子块子块一样将进行子分割。To illustrate, consider the following example. The predetermined variance thresholds for the Y component are set to 50, 1100 and 880 for 16x16, 8x8 and 4x4 blocks, respectively. In other words, T16=50, T8=1100 and T4=880. Set the range of Average to 80 to 100. Suppose a variance of 60 is calculated for a 16×16 block. Since 60 is greater than T16 and the average value of 90 is between 80 and 100, the 16×16 block is subdivided into four 8×8 subblocks. Assume that the variances of the calculated 8×8 blocks are 1180, 935, 980 and 1210. Since the variance of two 8×8 blocks exceeds T8, these two blocks will be further subdivided to generate a total of 8 4×4 sub-blocks. Finally, assume that the variances of the eight 4×4 blocks are 620, 630, 670, 610, 590, 525, 930, and 690, and the corresponding average values are 90, 120, 110, and 115. Since the average value of the first 4*4 sub-block falls within the range (80,100), its threshold value will be reduced to T'4=200, which is smaller than 880. Therefore, this 4x4 sub-block will be sub-divided as well as the seventh 4x4 sub-block.

注意，类似的处理将被用于分配亮度分量Y-奇以及色彩分量C_b和C_r的块大小。彩色分量可以水平地、垂直地进行一抽选，或者两者皆是。Note that a similar process would be used to assign block sizes for the luma component Y-odd and the color components _Cb and _Cr . Color components can be decimated horizontally, vertically, or both.

另外，需要注意尽管块大小分配被描述成一个自顶向下的方法，即最大的块(本例子中是16×16)首先进行估计，使用自底向上的方法也是可以的。自底向上的方法将首先估计最小的块(本例子中是2×2)。Also, note that although block size allocation is described as a top-down approach, ie the largest block (16x16 in this example) is estimated first, it is also possible to use a bottom-up approach. A bottom-up approach will estimate the smallest block first (2×2 in this example).

回到图1，PQR数据连同所选择的块的地址被提供给DCT元件124。DCT元件124使用PQR数据在选择的块上进行适当大小的离散余弦变换。仅仅选择的块需要进行DCT处理。Returning to FIG. 1 , the PQR data is provided to DCT element 124 along with the address of the selected block. The DCT element 124 uses the PQR data to perform an appropriately sized discrete cosine transform on selected blocks. Only selected blocks require DCT processing.

图像处理系统100还包括DQT元件128用于降低DCT的DC系数中的冗余。DC系数位于每一个DCT块的左上角。通常，与AC系数比较，DC系数比较大。由于大小上的方差使得很难设计出一个有效的可变长度编码器。因此，降低DC系数中的冗余是有益的。The image processing system 100 also includes a DQT element 128 for reducing redundancy in the DC coefficients of the DCT. The DC coefficients are located in the upper left corner of each DCT block. Generally, the DC coefficient is larger than the AC coefficient. The variance in size makes it difficult to design an efficient variable-length encoder. Therefore, reducing redundancy in the DC coefficients is beneficial.

DQT元件128在DC系数上进行2-D的DCT，每次2×2。起始于4×4块中的2×2块，一个2-D的DCT在4个DC系数上进行。该2-D的DCT称为4个DC系数的差分四元树变换，或者DQT。接下来，该DQT的DC系数以及8×8块中的三个相邻的DC系数被用于计算下一级别的DQT。最后，16×16块中的4个8×8块的DC系数被用于计算DQT。这样，在一个16×16块中，只有一个真正的DC而其他都是对应与DCT和DQT的AC系数。The DQT element 128 performs a 2-D DCT on the DC coefficients, 2×2 at a time. Starting from a 2x2 block in a 4x4 block, a 2-D DCT is performed on the 4 DC coefficients. This 2-D DCT is called the differential quadtree transform of 4 DC coefficients, or DQT. Next, the DC coefficient of this DQT and the three adjacent DC coefficients in the 8x8 block are used to calculate the DQT of the next level. Finally, the DC coefficients of the four 8x8 blocks in the 16x16 block are used to calculate the DQT. Thus, in a 16×16 block, there is only one real DC and the others are AC coefficients corresponding to DCT and DQT.

变换系数(DCT和DQT)被提供给量化器进行量化。在一个较佳实施例中，DCT系数使用频率加权掩码(FWMs)和一个量化标度因子进行量化。FWM是输入DCT系数块在同一维上的频率权重的表。频率权重对于不同的DCT系数应用不同的权重。权重被设计成对具有人类视觉或者光学系统敏感的频率内容的输入样本进行加强，而削弱对人类视觉或者光学系统不具有敏感的频率内容的输入样本。权重也可以基于诸如观看距离等等的因素而设计。The transform coefficients (DCT and DQT) are supplied to a quantizer for quantization. In a preferred embodiment, the DCT coefficients are quantized using frequency weighting masks (FWMs) and a quantization scale factor. FWM is a table of frequency weights on the same dimension of the input DCT coefficient block. Frequency weighting applies different weights to different DCT coefficients. The weights are designed to emphasize input samples that have frequency content to which human vision or the optical system is sensitive, while attenuating input samples that have frequency content to which human vision or the optical system is not sensitive. Weights can also be designed based on factors such as viewing distance and the like.

权重的选择是基于经验数据。一种设计用于8×8DCT系数的权重掩码的方法在IS0/IEC JTC1 CD 10918，“连续色调静止图像的数字压缩和编码第一部分：要求和指导方针”(“Digital compression and encoding ofcontinuous-tone still images-part 1：Requirements and guidelines”)，国际标准组织，1994，通过引用结合于此。通常，设计两个FWMs，一个用于亮度分量而另一个用于色度分量。用于块大小为2×2、4×4的FWM表通过对用于8×8块表抽选获得而用于16×16表通过对8×8块表的内插而获得。标度因子控制量化的系数的质量和比特率。The choice of weights is based on empirical data. A method for designing weight masks for 8×8 DCT coefficients in ISO/IEC JTC1 CD 10918, "Digital compression and encoding of continuous-tone still images Part 1: Requirements and guidelines" ("Digital compression and encoding of continuous-tone still images-part 1: Requirements and guidelines"), International Standards Organization, 1994, incorporated herein by reference. Typically, two FWMs are designed, one for the luma component and the other for the chrominance component. The FWM tables for block sizes 2x2 and 4x4 are obtained by decimation for the 8x8 block table and the 16x16 table is obtained by interpolation of the 8x8 block table. The scale factor controls the quality and bit rate of the quantized coefficients.

这样，每个DCT系数按照下列关系进行量化：Thus, each DCT coefficient is quantized according to the following relationship:

其中DCT(i，j)是输入的DCT系数，fwm(i，j)是频率加权掩码，q是标度因子，而DCTq(i，j)是经量化的系数。注意，根据DCT系数的符号，括号中的第一项向上或向下进行取整。DCT系数也使用合适的权重掩码进行量化。然而，可以使用多个表或者屏蔽，并应用到Y、C_b和C_r分量。where DCT(i,j) is the input DCT coefficient, fwm(i,j) is the frequency weighting mask, q is the scaling factor, and DCTq(i,j) is the quantized coefficient. Note that the first term in parentheses rounds up or down, depending on the sign of the DCT coefficients. The DCT coefficients are also quantized using a suitable weight mask. However, multiple tables or masks can be used and applied to the Y, C _b and _Cr components.

在框130，AC值接下来被从DC值中分离出来并分开进行处理。对于DC元素，对于每一段的第一DC分量值被编码。每一段的每一个后来的DC分量值然后被表示为它和它前面的DC分量之间的方差，并被编码，如框134。对于无损编码，每一段的起始DC分量值和差值使用结合图6、8所示的Golomb-Rice进行编码，如框138。使用Golomb-Rice编码连续的DC分量值之间的差值有利于使DC分量值的微分趋向于具有双边指数分布。数据接下来被使用缓存器142临时存储，并然后通过传输信道108传送或发送给解码器112。At block 130, the AC values are next separated from the DC values and processed separately. For the DC element, the first DC component value for each segment is coded. Each subsequent DC component value for each segment is then represented as the variance between it and its preceding DC component, and encoded, as in block 134 . For lossless coding, the initial DC component value and the difference value of each segment are coded using Golomb-Rice shown in conjunction with FIGS. 6 and 8 , as in block 138 . Using Golomb-Rice to encode the difference between successive DC component values is advantageous in that the differential of the DC component values tends to have a bilateral exponential distribution. The data is then temporarily stored using buffer 142 and then transmitted or sent to decoder 112 via transmission channel 108 .

图8图示了编码DC分量值的处理。该处理可同样用于静止图像、视频图像(比如，但不限于运动图像或者高分辨率电视)和音频。给出数据中的一段，如步骤804，该段的第一DC分量值被取回，如步骤808。该第一DC分量值然后被编码，如步骤812。不同于AC分量值，DC分量值不需要进行量化。在一个实施例中，不管块大小分配的失效而对于一个16×16块使用单一的值。可以预见任何固定大小的块，比如8×8或4×4，或者任何由块大小分配定义的可变块大小都可以使用。第二，或者然后，取回一给定段的DC分量值，如步骤816。第二DC分量值与第一DC分量值进行比较，其差值，或者剩余项被编码，如步骤820。这样，第二DC分量值进需要被表示为其与第一值之间的差值。该处理对于每一个段的DC分量值都重复一遍。这样，进行查询，步骤824确定是否到达了段的尾部(最后的块以及最后的DC值)。如果不是则至步骤828，取回下一个段的DC值，如步骤816并重复上述处理。如果是则至步骤832，取回下一个段，如步骤804并重复该处理直到该文件中所有的帧以及所有帧中的所有段都处理过为止。Figure 8 illustrates the process of encoding DC component values. This processing is equally applicable to still images, video images (such as, but not limited to, motion images or high-definition television), and audio. Given a segment of the data, as in step 804 , the first DC component value for the segment is retrieved as in step 808 . The first DC component value is then encoded, as in step 812 . Unlike AC component values, DC component values do not need to be quantized. In one embodiment, a single value is used for a 16x16 block regardless of block size allocation failures. It is envisioned that any fixed size block, such as 8x8 or 4x4, or any variable block size defined by the block size allocation can be used. Second, or then, the DC component value for a given segment is retrieved (step 816). The second DC component value is compared to the first DC component value and the difference, or remainder, is encoded (step 820 ). Thus, the second DC component value needs to be expressed as the difference between it and the first value. This process is repeated for each segment's DC component value. Thus, a query is made, step 824 to determine if the end of the segment (last block and last DC value) has been reached. If not, go to step 828, retrieve the DC value of the next segment, go to step 816 and repeat the above process. If so, go to step 832 to retrieve the next segment, as in step 804 and repeat the process until all frames in the file and all segments in all frames have been processed.

对于DC分量值的无损编码的目的是生成具有低方差的余项值。使用DCT，DC系数分量值贡献了最大的像素能量。因此，通过不量化DC分量值而降低剩余项的方差。The purpose of lossless coding for the DC component values is to generate residual term values with low variance. With DCT, the DC coefficient component value contributes the most pixel energy. Therefore, the variance of the remaining term is reduced by not quantizing the DC component value.

对于AC分量，数据块和频率加权掩码接下来由量化器146或者标度因子元件进行定标。DCT系数的量化使它们中的一大部分降低为0从而达到压缩的效果。在一个较佳实施例中，有32个对应于平均比特率的标度因子。不同于诸如MPEG2等的其他压缩方法，平均比特率的控制是基于处理的图像的质量而不是目标比特率以及缓存的状态。For the AC component, the data block and frequency weighting mask are then scaled by the quantizer 146 or scale factor element. The quantization of DCT coefficients reduces most of them to 0 to achieve the effect of compression. In a preferred embodiment, there are 32 scaling factors corresponding to the average bit rate. Unlike other compression methods such as MPEG2, the control of the average bit rate is based on the quality of the processed image rather than the target bit rate and the state of the cache.

为了进一步增加压缩，量化的系数被提供给扫描串行器150。扫描串行器150扫描量化的系数块以产生量化系数的串行流。Z字形扫描、列扫描或者行扫描都可以使用。可以选择多种不同的Z字形扫描方式以及非Z字形扫描的方式。一种较佳的技术使用8×8块用于Z字形扫描。对于量化系数的Z字形扫描增加了遇到大的0值游程的可能性。这种0游程固有地具有递减的概率，并可用Huffman编码进行有效的编码。To further increase compression, the quantized coefficients are provided to a scan serializer 150 . The scan serializer 150 scans the quantized coefficient block to generate a serial stream of quantized coefficients. Zigzag scanning, column scanning or row scanning can be used. A variety of different zigzag scanning methods and non-zigzag scanning methods can be selected. A preferred technique uses 8x8 blocks for zigzag scanning. Zigzag scanning for quantized coefficients increases the likelihood of encountering large runs of zeros. Such zero-runs inherently have decreasing probability and can be efficiently coded with Huffman coding.

串行的、量化的AC系数流被提供给可变长度编码器154。AC分量值可使用Huffman编码或Golomb-Rice编码进行编码。对于DC分量值，使用Golomb-Rice编码。游程长度编码器分开了0系数和非0的系数，并在图6中详细描述。在一个实施例中，使用Golomb-Rice编码。Golomb-Rice编码能有效编码具有指数分布的非负整数。使用Golomb编码进行压缩能更佳地提供较短长度的编码用于指数分布的变量。A serial, quantized stream of AC coefficients is provided to a variable length encoder 154 . The AC component values can be encoded using Huffman coding or Golomb-Rice coding. For DC component values, Golomb-Rice encoding is used. The run-length encoder separates zero coefficients from non-zero coefficients and is described in detail in FIG. 6 . In one embodiment, Golomb-Rice encoding is used. Golomb-Rice coding can efficiently encode non-negative integers with exponential distribution. Compression using Golomb codes better provides shorter length codes for exponentially distributed variables.

在Golomb编码游程长度中，Golomb码由非负整数m进行系数化。例如，给出一个系数m，正整数n的Golomb编码由以一元码形式的商n/m加上修改的二进制码的余数来表示，如果余数小于等于

则其长度为

位，否则长度为

Golomb-Rice编码是Golomb编码的一种特殊形式，其中系数m表示为m＝2^k。在这种情况下，商n/m通过将二进制表示的整数n向右移k位而获得。这样，Golomb-Rice码是两者的串接。Golomb-Rice编码能用于编码具有双边几何(指数)分布的正的或者是负的整数，该分布表示为In Golomb coded run lengths, the Golomb code is coefficiented by a non-negative integer m. For example, given a coefficient m, the Golomb code for a positive integer n is represented by the quotient n/m in unary code plus the remainder of the modified binary code if the remainder is less than or equal to

Then its length is

bits, otherwise the length is

Golomb-Rice coding is a special form of Golomb coding, where the coefficient m is expressed as m=2 ^k . In this case, the quotient n/m is obtained by shifting the integer n in binary representation to the right by k bits. Thus, the Golomb-Rice code is the concatenation of the two. Golomb-Rice coding can be used to encode positive or negative integers with a bilateral geometric (exponential) distribution, expressed as

p_α(x)＝cα^|x| (1)p _α (x)=cα ^|x| (1)

在(1)中，α是表示概率x的衰减的特征系数，而c是标准化常量。由于P_α(x)是单调的，可见一系列的整数值应该满足In (1), α is a characteristic coefficient representing the decay of probability x, and c is a normalization constant. Since P _α (x) is monotonic, it can be seen that a series of integer values should satisfy

p_α(x_i＝0)≥p_α(x_i＝-1)≥p_α(x_i＝±1)≥p_α(x_i＝-2)≥... (2)p _α ( _xi ＝0)≥p _α ( _xi ＝-1)≥p _α ( _xi ＝±1)≥p _α ( _xi ＝-2)≥... (2)

如图4a，4b，4c和5a，5b，5c所示，量化的DCT系数矩阵中的0游程和幅度都具有指数分布。这些图中所示的分布是基于来自真实图像的数据。图4a图示了0游程长度相对于相关频率的Y分量分布400。类似的，图4b和4c分别图示了0游程长度相对于相关频率的C_b和C_r分量分布410和420。图5a图示了幅度大小相对于相关频率的Y分量分布500。类似的，图5b和5c分别图示了幅度大小相对于相关频率的C_b和C_r分量分布510和520。注意在图5a，5b和5c的曲线表示DCT系数大小的分布。每个大小表示一个系数值的范围。例如，大小值为4具有范围为{-15，-14，...-8，8...，14，15}，共有16个值。类似的，大小值为10具有范围为{-1023，-1022，...，-512，512，...，1022，1023}一共1024个值。从图4a，4b，4c，5a，5b和5c中可见，游程长度和幅度大小都具有指数分布。所示的幅度的实际分布可用下列方程拟合：As shown in Figures 4a, 4b, 4c and 5a, 5b, 5c, both the runs of zeros and magnitudes in the quantized DCT coefficient matrix have an exponential distribution. The distributions shown in these figures are based on data from real images. Figure 4a illustrates a Y-component distribution 400 of zero runlength versus frequency of interest. Similarly, Figures 4b and 4c illustrate _Cb and _Cr component distributions 410 and 420, respectively, of zero runlength versus frequency of interest. Figure 5a illustrates a Y-component distribution 500 of amplitude magnitude versus frequency of interest. Similarly, Figures 5b and 5c illustrate _Cb and _Cr component distributions 510 and 520, respectively, of amplitude magnitude versus frequency of interest. Note that the curves in Figures 5a, 5b and 5c represent the distribution of DCT coefficient magnitudes. Each size represents a range of coefficient values. For example, a size value of 4 has a range of {-15, -14, ... -8, 8..., 14, 15} for a total of 16 values. Similarly, a size value of 10 has a range of {-1023, -1022, ..., -512, 512, ..., 1022, 1023}, a total of 1024 values. From Figures 4a, 4b, 4c, 5a, 5b and 5c it can be seen that both the run length and the amplitude magnitude have an exponential distribution. The actual distribution of the magnitudes shown can be fitted by the following equation:

$p p (({X x}_{k k,, l l})) = = \frac{\sqrt{22 λ λ}}{22} exp exp {{- - \sqrt{22 λ λ} | | {X x}_{k k,, l l} | |}},, k k,, l l &NotEqual; &NotEqual; 00 - - - - - - ((33))$

在(3)中，X_k，l表示对应与垂直方向上频率为k和水平方向上频率为l的DCT系数，而均值 $μ_{x} = \frac{1}{\sqrt{2 λ}},$ 方差 $σ^{2} = \frac{1}{2 λ} .$ 因而，以所述的形式使用Golomb-Rice编码来进行DCT中的数据处理更为理想。In (3), X _{k, l} represents the DCT coefficient corresponding to frequency k in the vertical direction and frequency l in the horizontal direction, and the mean $μ_{x} = \frac{1}{\sqrt{2 λ}},$ variance $σ^{2} = \frac{1}{2 λ} .$ Thus, it is more desirable to use Golomb-Rice coding in the form described for data processing in DCT.

尽管下面的描述是结合图像数据的压缩，这些实施例同样可用于压缩音频数据的实施例。在图像数据的压缩中，图像或视频数据可能是，例如，RGB或YIQ或YUV或具有线性或对数编码的像素值分量的YC_bC_r。Although the following description is in connection with the compression of image data, the embodiments are equally applicable to embodiments that compress audio data. In the compression of image data, the image or video data may be, for example, RGB or YIQ or YUV or YC _b C _r with linearly or logarithmically encoded pixel value components.

图6图示了编码0和非0系数的处理600。当DCT矩阵被扫描时，0和非0系数被分开地处理并被分离，如步骤604。对于0数据，确定0游程的长度，如步骤608。注意游程长度是正整数。例如，如果游程长度为n，则Golomb系数m被确定，如步骤612。在一个实施例中，Golomb系数被确定为游程长度的函数。在另一个实施例中，Golomb系数(m)由下式(4)确定FIG. 6 illustrates a process 600 of encoding 0 and non-zero coefficients. When the DCT matrix is scanned, zero and non-zero coefficients are processed separately and separated, as in step 604 . For 0 data, determine the length of the 0 run, as in step 608 . Note that run lengths are positive integers. For example, if the run length is n, then the Golomb coefficient m is determined, as in step 612 . In one embodiment, the Golomb coefficient is determined as a function of run length. In another embodiment, the Golomb coefficient (m) is determined by the following formula (4)

可任选地，游程长度的长度和相关的Golomb系数由计数器或者寄存器进行计数，如步骤616。为了编码0的游程长度n，对商进行编码，如步骤620。在一个实施例中，该商被确定为0的游程长度和Golomb系数的函数。在另一个实施例中，商(Q)由下式(5)确定：Optionally, the length of the run length and the associated Golomb coefficient are counted by a counter or register, as in step 616 . To encode the run length n of 0, the quotient is encoded, as in step 620 . In one embodiment, the quotient is determined as a function of the run length of zero and the Golomb coefficient. In another embodiment, the quotient (Q) is determined by the following equation (5):

在一个实施例中，商Q被编码为一元码，需要Q+1位。接下来，对余数进行编码，如步骤624。在一个实施例中，余数被编码作为运行函数和商的函数。在另一个实施例中，余数(R)使用下式(6)确定：In one embodiment, the quotient Q is encoded as a unary code requiring Q+1 bits. Next, the remainder is encoded, as in step 624 . In one embodiment, the remainder is encoded as a function of the run function and the quotient. In another embodiment, the remainder (R) is determined using equation (6) below:

R＝n-2^mQ (6)R=n-2 ^m Q (6)

在一个实施例中，余数R编码为m位的二进制编码。然后，确定商Q和余数R，串接Q和R的编码，如步骤628以表示用于0的游程长度n的整体编码。In one embodiment, the remainder R is coded as an m-bit binary code. Then, the quotient Q and remainder R are determined, and the codes of Q and R are concatenated, as in step 628, to represent the overall code for a runlength n of zero.

非0系数也使用Golomb-Rice进行编码。由于系数幅度可以是正的或是负的，就需要使用一个符号位并对给出的幅度的绝对值进行编码。给出的非0系数的幅度为x，其幅度可表示为该绝对值和该符号的函数。因而，该幅度可使用下式(7)表示为y：Non-zero coefficients are also encoded using Golomb-Rice. Since the coefficient magnitude can be positive or negative, it is necessary to use a sign bit and encode the absolute value of the given magnitude. The magnitude of the given non-zero coefficient is x, and its magnitude can be expressed as a function of this absolute value and this sign. Thus, the magnitude can be expressed as y using equation (7) below:

因而，非0系数的值也可以任选地由计数器或寄存器进行计数，如步骤632。接下来在步骤636中确定该幅度是否大于等于0。如果是，在步骤640其值被编码为两倍于给出的值。如果不是，在步骤644该值被编码为一个其绝对值两倍减1的值。可以预见其他映射方案也可以使用。其关键是不需要额外的用于区分该值的符号位。Therefore, the value of the non-zero coefficient can optionally also be counted by a counter or a register, as in step 632 . Next in step 636 it is determined whether the magnitude is greater than or equal to zero. If so, its value is encoded at step 640 as twice the given value. If not, at step 644 the value is encoded as a value that is twice its absolute value minus one. It is envisioned that other mapping schemes may also be used. The point is that no extra sign bit is needed to differentiate the value.

用等式(7)进行的幅度编码，其结果为正的x值变成偶整数而负的x值变成奇整数。进一步，该映射保留了(2)式中的x的概率分布。使用(7)式说明的编码的优点是可以避免使用符号位为来表示正数和负数。在映射之后，y以和用于0游程同样的形式进行编码。处理继续进行直到当前块中的所有系数都被扫描。Amplitude encoding using equation (7) results in positive x values becoming even integers and negative x values becoming odd integers. Further, the mapping preserves the probability distribution of x in formula (2). The advantage of using the encoding described in (7) is that it avoids using the sign bit to represent positive and negative numbers. After mapping, y is encoded in the same form as for the run of 0. Processing continues until all coefficients in the current block have been scanned.

需要重点指出，尽管本发明的实施例中将系数和游程长度的值确定为等式(1)-(7)的函数，并不需要使用确切的等式(1)-(7)。这是利用了Golomb-Rice编码和DCT系数的指数分布来做到对于图像和音频数据更为有效的压缩。It is important to point out that although embodiments of the present invention determine the values of coefficients and runlengths as functions of equations (1)-(7), the exact equations (1)-(7) need not be used. This uses the exponential distribution of Golomb-Rice coding and DCT coefficients to achieve more effective compression of image and audio data.

由于在编码之后0游程无法从非0幅度中区别出来，就需要使用固定长度的特殊的前缀编码来标记出现的第一个0游程。通常在计算完一个块中非0的幅度以后再计算所有的0。在一些情况下，使用称为块结尾(EOB)码可能比使用Golomb-Rice码更有效。EOB码也可以是一个特殊的固定长度的码。Since zero-runs cannot be distinguished from non-zero amplitudes after encoding, a special fixed-length prefix encoding is required to mark the first occurrence of zero-runs. Usually all zeros are calculated after calculating the magnitude of non-zeros in a block. In some cases, it may be more efficient to use so-called end-of-block (EOB) codes than Golomb-Rice codes. The EOB code can also be a special fixed-length code.

按照等式(1)或(3)，DCT系数矩阵的幅度或游程长度的概率分布由α或λ进行系数化。其表示了对于出现在特定的DCT系数块下的内容的编码的效率可得到改善。接下来可使用合适的Golomb-Rice系数对感兴趣的量(quantity ofinterest)进行编码。在一个实施例中，计数器和寄存器被用于每一个游程长度和幅度值以计算分别的累积值和该值出现的对应次数。例如，如果寄存器用于保存累积值而累积的元素数量分别为R_r1和N_r1，下式(6)可被用作Rice-Golomb系数来编码该游程长度：According to equation (1) or (3), the probability distribution of the magnitude or run length of the DCT coefficient matrix is coefficientized by α or λ. It shows that the efficiency of encoding for content occurring under a certain block of DCT coefficients can be improved. The quantity of interest can then be encoded using appropriate Golomb-Rice coefficients. In one embodiment, counters and registers are used for each runlength and magnitude value to calculate the respective cumulative value and the corresponding number of occurrences of that value. For example, if registers are used to hold accumulated values and the number of accumulated elements are R _r1 and N _r1 respectively, the following equation (6) can be used as the Rice-Golomb coefficient to encode the run length:

类似的处理可用于幅度。Similar treatment can be used for magnitude.

剩余项像素通过首先使用ABSDCT解码器解压缩该压缩的数据，再将它从源数据中减去来产生。剩余项的动态范围越小，压缩的比例就越高。由于压缩是基于块的，剩余项也是以块为基础的。一个熟知的事实是剩余项像素具有双边指数分布，中心通常为0。由于Golomb-Rice码对于此类数据更佳，Golomb-Rice编码处理被用于压缩剩余项数据。然而，由于没有游程长度需要进行编码，不需要特殊码。进一步，不需要EOB编码。这样，压缩数据包括两个分量。一个分量是来自有损压缩器而另一个是来自无损压缩器。Remainder pixels are generated by first decompressing the compressed data using an ABSDCT decoder, and then subtracting it from the source data. The smaller the dynamic range of the remaining items, the higher the compression ratio. Since compression is block-based, the remaining items are also block-based. It is a well-known fact that residual term pixels have a bilateral exponential distribution, usually centered at 0. Since Golomb-Rice codes are better for such data, a Golomb-Rice encoding process is used to compress the remaining item data. However, since there is no run length to encode, no special code is required. Further, no EOB encoding is required. Thus, compressed data includes two components. One component is from the lossy compressor and the other is from the lossless compressor.

在编码运动序列时，也可以利用时间相关性。为了完全利用时间相关性，首先估计由于运动而造成的像素移位，然后进行运动补偿预测以获得剩余项像素。由于ABSDCT进行自适应块大小编码，块大小信息可替换地用作由运动而引起的移位的度量。为了进一步简化，不使用场景变化检测。作为替代，对于序列中的每一个帧首先获得帧内压缩数据。接下来基于帧到帧的基础生成当前帧DCT和前面帧DCT之间的差值。这由美国专利申请序列号No.09/877,578进一步描述，提交于2001年6月7日，通过引用结合于此。这些DCT域中的剩余项使用Huffman和Golomb-Rice编码处理进行编码。最终的压缩输出对应于每帧使用的比特的数量最小的那一个。Temporal correlation can also be exploited when encoding motion sequences. To fully exploit temporal correlation, the pixel shift due to motion is first estimated, and then motion compensated prediction is performed to obtain the remaining term pixels. Since ABSDCT performs adaptive block size coding, the block size information can alternatively be used as a measure of the shift due to motion. For further simplification, scene change detection is not used. Instead, the intra-frame compressed data is first obtained for each frame in the sequence. The difference between the current frame DCT and the previous frame DCT is then generated on a frame-to-frame basis. This is further described in US Patent Application Serial No. 09/877,578, filed June 7, 2001, incorporated herein by reference. The remaining terms in these DCT domains are encoded using the Huffman and Golomb-Rice encoding process. The final compressed output corresponds to the one that uses the smallest number of bits per frame.

该无损压缩算法是一个混合方案，它可以通过去掉其中的无损部分来适应再利用或编码转换。这样，使用ABSDCT最大化像素在空间领域中的相关性，能使剩余项像素与那些使用预测方案的像素相比较具有较低的方差。整体系统的有损部分允许用户实现需要的用于分配的质量和数据速率而不需要借助于帧间处理，从而消除了与运动有关人为隐影干扰并显著地降低了实施的复杂度。这在分发用于数字影院应用的程序中显得尤其重要，因为压缩材料的有损部分的分配要求较高等级的质量。The lossless compression algorithm is a hybrid scheme that can be adapted for reuse or transcoding by removing the lossless parts of it. Thus, maximizing the correlation of pixels in the spatial domain using ABSDCT enables residual term pixels to have lower variance compared to those using the prediction scheme. The lossy part of the overall system allows the user to achieve the desired quality and data rate for distribution without resorting to inter-frame processing, thereby eliminating motion-related artifacts and significantly reducing implementation complexity. This is especially important in distributing programs for digital cinema applications, since distribution of lossy portions of compressed material requires a higher level of quality.

图9图示了混合无损编码设备900。图10图示了可运行在该设备上的处理。源数字信息904保存在存储设备中，或者被发射。图9中的许多元件在图1和2中详细描述。数据帧被发送到压缩器908，压缩器908包括块大小分配元件912、DCT/DQT变换元件916和量化器920。在对数据执行DCT/DQT后，该数据被转换到频域。在一个输出端922，数据由量化器920量化并传送到输出端924，输出端924包括存储器和/或交换机。上面所描述的所有处理是在帧内的基础上的。FIG. 9 illustrates a hybrid lossless encoding device 900 . Figure 10 illustrates processes that may run on the device. Source digital information 904 is stored in a storage device, or is transmitted. Many of the elements in Figure 9 are described in detail in Figures 1 and 2 . The data frames are sent to a compressor 908 which includes a block size allocation element 912 , a DCT/DQT transform element 916 and a quantizer 920 . After DCT/DQT is performed on the data, the data is transformed into the frequency domain. At one output 922, the data is quantized by a quantizer 920 and passed to an output 924, which includes memory and/or switches. All processing described above is on an intra-frame basis.

量化器的输出也被发送到解压缩器928。解压缩器928进行压缩器处理的撤销操作，通过反量化器932，和IDQT/IDCT 936，连同按照由BSA定义的PQR数据的知识。解压缩器940的结果被提供给减法器944与源数据进行比较。减法器944可以是多种元件，比如差分器，计算剩余项像素作为每个块未压缩的和经压缩解压缩后的像素的差值。另外，差分器可获得DCT域中的剩余项用于每个块的有条件的帧间编码。解压缩数据和源数据之间的比较结果948就是像素剩余项文件。也就是说，结果948表示经过压缩和解压缩后的数据的丢失。这样，源数据就等于输出922与结果948的组合。结果948接下来被串行化952并被Huffman和/或Golomb Rice编码956，并被提供作为第二输出960。Huffman和/或Golomb Rice编码器956是一种形式的熵编码器，其使用Golomb Rice编码对剩余项像素进行编码。基于用于每个帧的最少比特来判决以确定是使用帧内还是帧间编码。使用Golomb Rice编码剩余项能提高整个系统的压缩比率。The output of the quantizer is also sent to the decompressor 928 . The decompressor 928 performs the undo operation of the compressor process, through the inverse quantizer 932, and the IDQT/IDCT 936, along with knowledge of the PQR data as defined by the BSA. The result of the decompressor 940 is provided to a subtractor 944 for comparison with the source data. Subtractor 944 may be a variety of elements, such as a differentiator, that computes the residual pixel as the difference between the uncompressed and compressed decompressed pixels for each block. Additionally, the differentiator can obtain the remaining terms in the DCT domain for conditional inter-coding of each block. The result 948 of the comparison between the decompressed data and the source data is the pixel remainder file. That is, result 948 represents the loss of compressed and decompressed data. Thus, the source data is equal to the combination of output 922 and result 948 . The result 948 is then serialized 952 and encoded 956 by Huffman and/or Golomb Rice and provided as a second output 960. The Huffman and/or Golomb Rice encoder 956 is a form of entropy encoder that encodes residual term pixels using Golomb Rice encoding. The decision to use intra or inter coding is based on the fewest bits used for each frame. Using Golomb Rice to encode the remainder can improve the compression ratio of the overall system.

这样，该无损的、帧间输出是两组数据的组合或混合，即有损的、高质量的图像文件(922，或A)和剩余项文件(960，或C)。Thus, the lossless, inter-frame output is a combination or blend of two sets of data, the lossy, high-quality image file (922, or A) and the remainder file (960, or C).

也可以使用帧间编码。量化器的输出被发送到存储器964，一起发送的还有对于BSA的知识。在获得了一帧的有效数据后，减法器966将保存的帧964和下一个帧968进行比较。其差值生成一个DCT剩余项970，该剩余项接下来进行串行化和/或Golomb-Rice编码974，以向输出924提供第三输出数据组976。这样，就编译了帧间无损文件B和C。这样，可基于大小的考虑选择组合(A+C或B+C)。进一步，为了编辑的目的，可能需要一个纯的帧间输出。Intercoding can also be used. The output of the quantizer is sent to memory 964, along with the knowledge of the BSA. After obtaining valid data for one frame, the subtractor 966 compares the saved frame 964 with the next frame 968 . The difference thereof generates a DCT residual 970 which is then serialized and/or Golomb-Rice encoded 974 to provide a third output data set 976 to output 924 . In this way, inter lossless files B and C are compiled. In this way, the combination (A+C or B+C) can be chosen based on size considerations. Further, for editing purposes, a pure interframe output may be desired.

回到图1，编码器104生成的压缩图像信号可使用缓存142临时保存，然后使用传输信道108发送给解码器112。传输信道108可为物理媒体，比如磁或光存储设备，或者是有线或无线的传输处理或设备。包括块大小分配信息的PQR数据也被提供给解码器112(图2)。解码器112包括缓存164和可变长度解码器168，编码器168解码游程长度和非0值。可变长度解码器168以和图6中描述的类似但是相反的方式进行操作。Returning to FIG. 1 , the compressed image signal generated by the encoder 104 can be temporarily stored using the buffer 142 and then sent to the decoder 112 using the transmission channel 108 . The transmission channel 108 may be a physical medium, such as a magnetic or optical storage device, or a wired or wireless transmission process or device. PQR data including block size allocation information is also provided to decoder 112 (FIG. 2). The decoder 112 includes a buffer 164 and a variable length decoder 168, and the encoder 168 decodes run lengths and non-zero values. The variable length decoder 168 operates in a similar but inverse manner to that described in FIG. 6 .

可变长度解码器168的输出被提供给反串行化器172，按照使用的扫描方案排列系数。例如，如果混合使用了Z字形扫描、垂直扫描和水平扫描，反向串行化器172将使用其知晓的所使用的扫描类型来合适地重排列系数。反串行化器173接收PQR数据来帮助正确地将系数排列到合成的系数块中。The output of variable length decoder 168 is provided to deserializer 172, which arranges the coefficients according to the scanning scheme used. For example, if a mix of zigzag scans, vertical scans, and horizontal scans is used, the deserializer 172 will use what it knows about the type of scan used to rearrange the coefficients appropriately. The deserializer 173 receives the PQR data to help arrange the coefficients correctly into the synthesized coefficient block.

合成的块被提供给反量化器174，用于撤销使用量化器标度因子和频率加权掩码的处理。The resulting block is provided to an inverse quantizer 174 for undoing the processing using the quantizer scale factor and frequency weighting mask.

如果应用了差分四叉树变换，系数块接下来被提供给IDQT元件186，接着为IDCT元件190。否则，系数块被直接提供给IDCT元件190。IDQT元件186和IDCT元件190反变换系数以生成像素数据块。像素数据接下来必须进行内插，变换到RGB形式，并保存用于以后的显示。If a differential quadtree transform is applied, the block of coefficients is provided next to an IDQT element 186 followed by an IDCT element 190 . Otherwise, the coefficient block is provided directly to the IDCT element 190 . IDQT element 186 and IDCT element 190 inverse transform the coefficients to generate a block of pixel data. The pixel data must then be interpolated, converted to RGB form, and saved for later display.

图7图示了用于Golomb-Rice编码的设备700。图7的设备较佳地实施了图6中描述的处理。确定器704确定游程长度(n)和Golomb系数(m)。或者，计数器或寄存器708被用于每一个游程长度和幅度大小值以分别计算累积值和出现该值的对应次数。编码器712编码商(Q)作为游程长度和Golomb系数的函数。该编码器712还编码剩余数(R)作为游程长度、Golomb系数和商的函数。在其他实施例中，编码器712还编码非0数据作为非0数据值和该非0数据值的符号的函数。串接器716用于串接Q值和R值。Fig. 7 illustrates an apparatus 700 for Golomb-Rice encoding. The apparatus of FIG. 7 preferably implements the process described in FIG. 6 . The determiner 704 determines the run length (n) and the Golomb coefficient (m). Alternatively, a counter or register 708 is used for each runlength and amplitude magnitude value to calculate the cumulative value and the corresponding number of occurrences of that value, respectively. Encoder 712 encodes the quotient (Q) as a function of the run length and the Golomb coefficient. The encoder 712 also encodes the remainder (R) as a function of run length, Golomb coefficient and quotient. In other embodiments, encoder 712 also encodes non-zero data as a function of the non-zero data value and the sign of the non-zero data value. The concatenator 716 is used to concatenate the Q value and the R value.

作为例子，这里公开的实施例相关的各种图示的逻辑框图、流程图和步骤可以以硬件或软件来实现或执行，用专用集成电路(ASIC)、可编程逻辑设备、离散门或晶体管逻辑、离散硬件部件，例如寄存器和FIFO，执行一组固件指令的处理器、任何传统的可编程软件和处理器、或它们的组合。处理器较佳的是微处理器，但也可以是任何传统处理器、控制器、微控制器或状态机。软件可保存在RAM存储器、快闪存储器、ROM存储器、寄存器、硬盘、可移动磁盘、CDROM、DVD-ROM或任何业内熟知的其他形式的存储媒体。By way of example, the various illustrated logical block diagrams, flowcharts, and steps related to the embodiments disclosed herein may be implemented or executed in hardware or software, using application-specific integrated circuits (ASICs), programmable logic devices, discrete gates, or transistor logic , discrete hardware components such as registers and FIFOs, a processor executing a set of firmware instructions, any conventional programmable software and processor, or a combination thereof. The processor is preferably a microprocessor, but can be any conventional processor, controller, microcontroller or state machine. The software can be stored in RAM memory, flash memory, ROM memory, registers, hard disk, removable disk, CDROM, DVD-ROM or any other form of storage medium known in the art.

前面描述的较佳实施例是提供给熟悉本领域的人员来完成或使用本发明的。对于熟悉本领域的人员来说，对于这些实施例的多种改变是显而易见的，而这里所定义的一般原理可应用于其他实施例而不需要进行创造性地劳动。所以，本发明不应该被这里所示的实施例所限，而应该符合这里所公开的原理和创造性特征的最宽的范围。The foregoing description of preferred embodiments is provided to those skilled in the art to make or use the invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the general principles defined herein can be applied to other embodiments without inventive effort. Therefore, the invention should not be limited by the embodiments shown herein, but should be accorded the widest scope consistent with the principles and inventive features disclosed herein.

本发明的其他特征和优势在权利要求中说明。Other features and advantages of the invention are set forth in the claims.

Claims

1. An apparatus for encoding data comprising a source image, the apparatus comprising:

means for compressing data representing said source image and thereby creating a compressed version of said source image, wherein said compression uses data previously generated by adaptive block resizing of the source image;

means for quantizing a compressed version of said source image and thereby creating a lossy version of said source image;

means for decompressing a compressed version of the source image to create a decompressed image, wherein the decompression uses data previously generated by adaptive block resizing of the source image;

means for determining a difference between said source image and said decompressed image and thereby creating residual item data associated with said source image; and

means for outputting a lossy version of the source image and the remnant data, wherein the lossy version of the source image and the remnant data are operable to create an image substantially identical to the source image.

2. The apparatus of claim 1, wherein the means for compressing is on an intraframe or interframe basis.

3. The apparatus of claim 1, wherein the means for compressing uses a combination of discrete cosine transform and discrete quadtree transform techniques.

4. A method of encoding data comprising a plurality of source frames from a source image, the method comprising:

compressing data representing a first source frame of said plurality of source frames, and thereby creating a compressed version of said first source frame, wherein said compression uses adaptive block resizing of said plurality of source frames and the data previously generated;

quantizing the compressed version of the first source frame and thereby creating a lossy version of the first source frame;

decompressing a compressed version of the first source frame to create a decompressed frame, wherein the decompressing uses data previously generated by adaptive block resizing of the plurality of source frames;

determining a difference between a second source frame and said decompressed frame and thereby creating residual item data associated with said first source frame; and

outputting a lossy version of the first source frame and the remnant data, wherein the lossy version of the first source frame and the remnant data may be used to create a frame substantially identical to the first source frame .

5. The method of claim 4, wherein the compressing is performed on an interframe basis.

6. The method of claim 4, wherein said compressing uses a combination of discrete cosine transform and discrete quadtree transform techniques.

7. An apparatus for encoding data comprising a plurality of source frames from a source image, the apparatus comprising:

means for compressing data representing a first source frame of said plurality of source frames, and thereby creating a compressed version of said first source frame, wherein said compression uses Data generated prior to adapting to block size adjustments;

means for quantizing a compressed version of said first source frame and thereby creating a lossy version of said first source frame;

means for decompressing a compressed version of the first source frame to create a decompressed frame, wherein the decompression uses data previously generated by adaptive block resizing of the plurality of source frames ;

means for determining a difference between a second source frame and said decompressed frame and thereby creating residual data associated with said first source frame; and

means for outputting a lossy version of said first source frame and said remnant data, wherein said lossy version of said first source frame and said remnant data can be used to create consistent frame.

8. The apparatus of claim 7, wherein the means for compressing is on an interframe basis.

9. The apparatus of claim 7, wherein the means for compressing uses a combination of discrete cosine transform and discrete quadtree transform techniques.