CN101247525A

CN101247525A - A Method of Improving the Intra-Frame Coding Rate of Image

Info

Publication number: CN101247525A
Application number: CN 200810102517
Authority: CN
Inventors: 邓中亮; 段大高
Original assignee: Beijing University of Posts and Telecommunications
Current assignee: Beijing University of Posts and Telecommunications
Priority date: 2008-03-24
Filing date: 2008-03-24
Publication date: 2008-08-20
Anticipated expiration: 2028-03-24
Also published as: CN101247525B

Abstract

The present invention discloses a method for increasing the encoding speed in the image frame basing on the H.264 standard. The method filters the estimating mode along the grain direction according to the grain direction of the image, and then executes RDO calculation thereby greatly reducing the calculating time of the RDO. The method of the invention effectively increases the image encoding speed on the base that the excellent image quality is kept.

Description

A Method of Improving the Intra-Frame Coding Rate of Image

技术领域 technical field

本发明涉及视频压缩编码技术，具体涉及一种提高基于H.264图像帧内编码速率的方法。The invention relates to video compression coding technology, in particular to a method for improving the intra-frame coding rate based on H.264 images.

背景技术 Background technique

视频/图像编码的方法多种多样，其应用涉及方方面面，为了使不同厂商生产的终端能互相交换信息，或从一个公共的信号源接收信息，20世纪80年代末期，一些国际组织开始致力于视频/图像编码的标准化工作。同时，各大厂商对此产生浓厚的兴趣，并直接推动了视频/图像编码标准化的研究进程。最终于1988年国际电话与电报顾问委员会(CCITT，International Telephone and Telegraph ConsultativeCommittee，于1992年更名为ITU-T，International TelecommunicationUnion Telecommunication Standardization Sector)制定第一个视频编码标准——H.261，从而成为视频编码史上的里程碑。随后ISO MPEG(Moving Picture Experts Group)和ITU-T VCEG(Video CodingExperts Group)根据不同应用环境和需求进行整合，以H.261为基础和核心，相继制定了一系列视频编码标准。ITU-T专注于实时视频通信应用领域，发布了H.26x系列标准(如H.261、H.262、H.263和H.264等)，ISO MPEG则主要面向视频存储媒体、电视广播和多媒体通信应用，制定了MPEGx系列标准(如MPEG-1、MPEG-2和MPEG-4等)。图1所示为视频编码标准的发展历程，下面简要介绍国际视频编码标准的发展过程。There are many methods of video/image coding, and its applications involve all aspects. In order to enable terminals produced by different manufacturers to exchange information with each other, or receive information from a common signal source, some international organizations began to work on video in the late 1980s. /Standardization of image coding. At the same time, major manufacturers are interested in this, and directly promote the research process of video/image coding standardization. Finally, in 1988, the International Telephone and Telegraph Consultative Committee (CCITT, International Telephone and Telegraph Consultative Committee, renamed ITU-T, International Telecommunication Union Telecommunications Standardization Sector in 1992) formulated the first video coding standard - H.261, thus becoming a video Milestones in coding history. Subsequently, ISO MPEG (Moving Picture Experts Group) and ITU-T VCEG (Video Coding Experts Group) were integrated according to different application environments and requirements, and based on H.261 and the core, a series of video coding standards were successively formulated. ITU-T focuses on the field of real-time video communication applications, and released the H.26x series of standards (such as H.261, H.262, H.263 and H.264, etc.), ISO MPEG is mainly for video storage media, TV broadcasting and For multimedia communication applications, MPEGx series standards (such as MPEG-1, MPEG-2 and MPEG-4, etc.) have been formulated. Figure 1 shows the development process of video coding standards. The following briefly introduces the development process of international video coding standards.

1985年CCITT组织了第15专家组，进一步研究会议电视的标准化，并于1988年制定了针对64kbit/s电视电话/会议应用的H.261标准^[30]。在H.261标准中定义了帧内(I帧，Intra frame)编码和帧间(P帧，Predictive frame)编码，并采用帧间预测、DCT变换、Huffman编码等技术。为了增强灵活性，H.261仅对与兼容性有关的码流语法、码流复用、解码过程等作了严格的限制性规定，而对诸如量化级的自适应控制、运动估计、码率控制等对重构图像质量指标有重要影响但不影响兼容性的部分不作限制性规定，给开发者、厂商和用户提供了很大的活动空间。H.261标准的成功推出，各大厂商、国际标准组织和科研院所等受到极大鼓舞，掀起了视频压缩编码的研究和应用高潮。In 1985, CCITT organized the 15th expert group to further study the standardization of conference TV, and in 1988, it formulated the H.261 standard for 64kbit/s video telephony/conference application ^[30] . In the H.261 standard, intra-frame (I frame, Intra frame) coding and inter-frame (P frame, Predictive frame) coding are defined, and technologies such as inter-frame prediction, DCT transformation, and Huffman coding are used. In order to enhance flexibility, H.261 only makes strict restrictions on code stream syntax, code stream multiplexing, decoding process, etc. There are no restrictive regulations on parts such as controls that have an important impact on reconstructed image quality indicators but do not affect compatibility, providing developers, manufacturers and users with a large space for activities. The successful introduction of the H.261 standard has greatly encouraged major manufacturers, international standard organizations and scientific research institutes, and set off a climax of research and application of video compression coding.

1991年ISO MPEG开始制定MPEG-1标准，主要目标是要建立一个适用于数字存储、媒体存储和检索的活动图像，相关声音及其组合编码的标准，并于1993年11月成为国际标准。MPEG-1标准以H.261为基本框架，引入了双向预测帧(B帧，Bi-directional predictionframe)、半像素精度运动估计，以及图像组(GOP，Group Of Picture)的概念，实行了随机读取、快速进退搜索和反向重放等功能。In 1991, ISO MPEG began to formulate the MPEG-1 standard. The main goal is to establish a standard for moving images, related sounds and their combination coding suitable for digital storage, media storage and retrieval, and it became an international standard in November 1993. The MPEG-1 standard takes H.261 as the basic frame, introduces bidirectional prediction frame (B frame, Bi-directional prediction frame), half-pixel precision motion estimation, and the concept of picture group (GOP, Group Of Picture), implements random read Fetch, fast forward and backward search and reverse playback and other functions.

随后ISO MPEG联合ITU-TVCEG启动了MPEG-2草案的制定^[32](在ITU-T标准系列中又称为H.262)，并于1994年确定成为标准。MPEG-2以MPEG-1为基础，作了重要扩展。针对隔行扫描的常规电视图像专门设置了“按帧编码”和“按场编码”方法；创立了二元码流结构：节目流(program stream)和传送流(transport stream)，传送流的运行环境有可能出现严重的差错，而节目流的运行环境则很少出现差错；按照编码技术的复杂度，首次引入了档次(Profile)和级别(Level)的概念，巧妙地解决了比特流的可交换性和国际性。此外，还增加了可分级性(scalability)概念，允许从一个编码数据流中得到不同质量等级或不同时空分辨率的视频信号，可分级性包括空间域、信噪比、时间域等。MPEG-2是一个非常成功的标准，已经广泛融入到了人们的生活和工作中。Subsequently, ISO MPEG and ITU-TVCEG initiated the formulation of the MPEG-2 draft ^[32] (also known as H.262 in the ITU-T standard series), and it was determined to become a standard in 1994. MPEG-2 is based on MPEG-1 and has made important extensions. For interlaced conventional TV images, the methods of "coding by frame" and "coding by field" are specially set; a binary code stream structure is created: program stream and transport stream, and the operating environment of transport stream There may be serious errors, but the operating environment of the program stream rarely has errors; according to the complexity of the encoding technology, the concept of profile and level is introduced for the first time, which cleverly solves the exchangeable bit stream sex and internationality. In addition, the concept of scalability is added, which allows video signals of different quality levels or different temporal and spatial resolutions to be obtained from a coded data stream. Scalability includes space domain, signal-to-noise ratio, time domain, etc. MPEG-2 is a very successful standard and has been widely integrated into people's life and work.

随着网络技术的发展和普及，网络带宽日益成为阻碍人们应用视频/图像的一个瓶颈。为了缓解这个问题，ITU-T VCEG于1995年提出针对低码率应用的H.263编码标准。它以H.261为基础，并吸引了MPEG1/2等建议中有效合理的技术，同时提供了四种可选的编码算法，即无约束运动矢量算法、基于语法的算法编码、高级预测法和PB帧算法，进一步提高编码效率。此外，H.263扩展了图像格式，支持包括QCIF、Sub-QCIF、CIF、4CIF和16CIF等多种格式的图像。此外，标准中没有限定每秒中帧数，因此可以通过减少每秒帧数来限制最大速率。在H.263标准的基础上，ITU-TVCEG提出了两个工作计划：一个是所谓短期(Short Term)计划，即在H.263基础上添加一些新的功能选项，进一步提高压缩效率和扩展一些功能，同时开始一个所谓的长期(Long Term)计划以发展一个适应于低码率视频通信的新国际标准。在短期计划中，相继推出H.263+和H.263++新版本。长期计划预计生成的标准称之为H.263L，但在1998年改名为H.26L。With the development and popularization of network technology, network bandwidth has increasingly become a bottleneck that hinders people from using video/image. In order to alleviate this problem, ITU-T VCEG proposed the H.263 coding standard for low bit rate applications in 1995. It is based on H.261 and attracts effective and reasonable technologies in recommendations such as MPEG1/2, and provides four optional coding algorithms, namely unconstrained motion vector algorithm, syntax-based algorithm coding, advanced prediction method and PB frame algorithm to further improve coding efficiency. In addition, H.263 expands the image format and supports images in various formats including QCIF, Sub-QCIF, CIF, 4CIF and 16CIF. In addition, the standard does not limit the number of frames per second, so the maximum rate can be limited by reducing the number of frames per second. On the basis of the H.263 standard, ITU-TVCEG proposed two work plans: one is the so-called Short Term (Short Term) plan, which is to add some new function options on the basis of H.263, further improve compression efficiency and expand some function, while starting a so-called long-term (Long Term) plan to develop a new international standard for low-bit-rate video communications. In the short-term plan, new versions of H.263+ and H.263++ will be launched successively. The long-term plan was to produce a standard called H.263L, but it was renamed H.26L in 1998.

1998年ISO MPEG又提出了MPEG-4标准，它综合了数字电视、交互式图形学和Internet等领域的技术和功能，在H.263、MPEG-1和MPEG-2基础上进一步进行扩展和补充。MPEG-4的编码码率揽括了从低至5kbit/s到高于2Mbit/s的一个很大的范围，提出了与以往图像编码标准完全不同的编码概念，它吸取了基于对象的编码方法的思想，它的编码方案建立在任意形状的对象模型之上，在对图像的描述上，比传统编码标准增加了形状信息。MPEG-4不再是一个标准化的固定算法，而是建立一个可扩展的编码工具集，由工具集构造各种算法。在不作解码的情况下，它支持基于内容的处理和码流的编辑，支持人工图像/声音与自然图像/声音的合成，支持基于内容的随机存取等等，同时在不同应用环境下都具有较好的鲁棒性，支持基于内容的可分级编码。与此同时，ISO MPEG还制定了MPEG-7和MPEG-21等标准，为各类多媒体信息提供一种标准化的描述，以及一种高效、透明和可互操作的媒体框架。In 1998, ISO MPEG proposed the MPEG-4 standard, which integrated technologies and functions in the fields of digital TV, interactive graphics and the Internet, and further expanded and supplemented on the basis of H.263, MPEG-1 and MPEG-2. . The encoding rate of MPEG-4 covers a large range from as low as 5kbit/s to higher than 2Mbit/s, and proposes a completely different encoding concept from previous image encoding standards. It absorbs the object-based encoding method Its coding scheme is based on the object model of arbitrary shape, and it adds shape information to the description of the image compared with the traditional coding standard. MPEG-4 is no longer a standardized fixed algorithm, but to establish an expandable coding tool set, and various algorithms are constructed by the tool set. Without decoding, it supports content-based processing and code stream editing, supports artificial image/sound and natural image/sound synthesis, supports content-based random access, etc., and has different application environments. Better robustness, support content-based scalable coding. At the same time, ISO MPEG has also developed standards such as MPEG-7 and MPEG-21 to provide a standardized description for various multimedia information, as well as an efficient, transparent and interoperable media framework.

为了进一步提高视频编码效率，2001年12月ISO与ITU-T正式成立联合视频小组(Joint Video Team，JVT)，开始致力于H.264/MPEG-4part 10(AVC，Advanced Video Coding)的标准化(本论文中统一简称为H.264)，于2003年5月正式确定为国际标准。它以H.26L为基础，扩展了从低码率到高码率的应用。H.264除继承以往标准的优点外，还引入了许多新技术，从而使得在相同解码质量条件下，编码效率比H.263和MPEG-4高近50％。In order to further improve video coding efficiency, ISO and ITU-T formally established the Joint Video Team (JVT) in December 2001, and began to work on the standardization of H.264/MPEG-4part 10 (AVC, Advanced Video Coding) ( In this paper, it is uniformly referred to as H.264), which was formally determined as an international standard in May 2003. Based on H.26L, it extends the application from low bit rate to high bit rate. In addition to inheriting the advantages of previous standards, H.264 also introduces many new technologies, so that under the same decoding quality conditions, the coding efficiency is nearly 50% higher than that of H.263 and MPEG-4.

此外，值得关注的是我国信息产业部于2002年6月开始制定拥有自主知识产权的音视频编码标准(简称AVS，Audio Video codingStandard)。AVS的目标是制订数字音视频的编解码、处理和表示等共性技术标准，为数字音视频设备与系统提供高效经济的编解码技术，主要针对HDTV、HD-DVD、无线宽带多媒体通讯、互联网宽带流媒体等重大信息产业应用。In addition, it is worth noting that the Ministry of Information Industry of my country began to formulate an audio and video coding standard (AVS, Audio Video coding Standard) with independent intellectual property rights in June 2002. The goal of AVS is to formulate common technical standards for digital audio and video codec, processing and presentation, and to provide efficient and economical codec technology for digital audio and video equipment and systems, mainly for HDTV, HD-DVD, wireless broadband multimedia communication, Internet broadband Streaming media and other major information industry applications.

与以往视频编码标准相同，H.264系统也采用MC-DCT结构，即运动补偿加变换编码的混合(hybrid)结构，其编码技术框架如图2所示。H.264编码主要由帧内预测、帧间预测(运动估计与补偿)、整数变换、量化和熵编码等构成。Same as the previous video coding standards, the H.264 system also adopts the MC-DCT structure, that is, the hybrid structure of motion compensation and transform coding. Its coding technology framework is shown in Figure 2. H.264 coding is mainly composed of intra prediction, inter prediction (motion estimation and compensation), integer transformation, quantization and entropy coding.

H.264采用帧内(Intra)和帧间(Inter)两种编码模式。支持的视频源格式包括(YUV)4:2:0、4:2:2和4:4:4，同时支持逐行扫描和隔行扫描的视频序列，对于隔行扫描的视频帧，H.264支持将奇偶场独立编码，也支持将奇偶场一起编码的方式。对于I(帧内编码帧)帧图像，采用帧内模式编码；对于P帧(前向预测帧)和B帧(双向预测帧)图像，则采用帧间模式编码，但在宏块层，也可以选择帧内模式编码。编码以互不重叠的宏块(MB，Macro-Block)为单位进行，宏块一般定义为16×16个像素块。H.264 adopts two encoding modes of intra frame (Intra) and inter frame (Inter). Supported video source formats include (YUV) 4:2:0, 4:2:2 and 4:4:4, both progressive and interlaced video sequences are supported, and for interlaced video frames, H.264 support The odd and even fields are encoded independently, and the method of encoding the odd and even fields together is also supported. For I (intra-coded frame) frame images, intra-frame coding is used; for P-frame (forward predictive frame) and B-frame (bidirectional predictive frame) images, inter-frame mode coding is used, but at the macroblock layer, also Intra mode encoding can be selected. Coding is performed in units of non-overlapping macro-blocks (MB, Macro-Block), and macro-blocks are generally defined as 16×16 pixel blocks.

对于I帧图像，首先进行帧内预测，然后对预测残差信号(原始值与预测值之差)进行整数变换和量化，再对量化系数进行变长编码或算术编码，生成压缩码流，同时经反变换、反量化等过程重构图像，以用作后续帧编码时的参考。对于P帧图像，首先进行多模式多参考帧的高精度运动估计和帧内预测，并根据率失真优化(RDO，Rate-Distortion Optimization)选择帧间、帧内编码模式和相应的分块模式，然后对残差信号进行变换、量化和熵编码，生成压缩码流，同时经反变换、反量化重构图像。对于B帧图像，与P帧图像相似，首先采用双向预测技术进行多模式多参考帧的运动估计和帧内预测，并根据率失优化选择最佳编码模式，然后对残差信号进行变换、量化和熵编码，生成压缩码流。此外，H.264还定义了SI、SP帧。For an I-frame image, first perform intra-frame prediction, then carry out integer transformation and quantization on the prediction residual signal (the difference between the original value and the predicted value), and then perform variable-length coding or arithmetic coding on the quantized coefficients to generate a compressed code stream, and at the same time The image is reconstructed through processes such as inverse transformation and inverse quantization, so as to be used as a reference for subsequent frame encoding. For P-frame images, first perform high-precision motion estimation and intra-frame prediction of multi-mode and multi-reference frames, and select inter-frame and intra-frame coding modes and corresponding block modes according to rate-distortion optimization (RDO, Rate-Distortion Optimization), Then the residual signal is transformed, quantized and entropy coded to generate a compressed code stream, and the image is reconstructed through inverse transformation and inverse quantization. For B-frame images, similar to P-frame images, bidirectional prediction technology is first used for motion estimation and intra-frame prediction of multi-mode and multi-reference frames, and the best coding mode is selected according to rate loss optimization, and then the residual signal is transformed and quantized and entropy encoding to generate a compressed code stream. In addition, H.264 also defines SI and SP frames.

为了提高编码的网络适应能力，H.264采用视频编码层(VCL，Video Coding Layer)与网络抽象层(NAL，Network Abstraction Layer)相分离的编码结构，如图3所示，VCL完成对视频图像的高效压缩，NAL负责以网络所要求的恰当的方式对数据进行打包和传送。In order to improve the network adaptability of coding, H.264 adopts a coding structure in which video coding layer (VCL, Video Coding Layer) and network abstraction layer (NAL, Network Abstraction Layer) are separated, as shown in Figure 3, VCL completes the video image High-efficiency compression, NAL is responsible for packaging and transmitting data in the appropriate way required by the network.

H.264实现了视频的更高压缩比、更好的图像质量和良好的网络适应性，因此，H.264的应用场合相当广泛，包括可视电话(固定或移动)、实时视频会议系统、视频监控系统、因特网视频传输以及多媒体信息存储等。为了符合广泛应用的特点，H.264只对比特流、语法元素和解码过程作了规定，没有对编码器做限制，使得H.264编码器的实现非常灵活。H.264 achieves higher video compression ratio, better image quality and good network adaptability. Therefore, H.264 has a wide range of applications, including videophone (fixed or mobile), real-time video conferencing system, Video surveillance system, Internet video transmission and multimedia information storage, etc. In order to meet the characteristics of wide application, H.264 only stipulates the bit stream, syntax elements and decoding process, and does not restrict the encoder, which makes the implementation of the H.264 encoder very flexible.

H.264的帧内预测Intra prediction of H.264

在H.264中，4×4块的9种可选预测模式集为：{mode 0、mode1、...、mode 8}，即垂直预测方向是mode 0，水平预测方向是mode 1，以此类推。16×16块的4种可选预测模式集：{mode 0、mode 1、...、mode 3}，即对应垂直、水平、DC和平面预测方向。8×8色度预测模式与16×16的预测方式相同。除DC预测模式外，每一种预测模式都有相应的预测方向和预测权重，DC预测方式是采用相邻边界像素的平均值作预测，即预测块的所有像素值等于相邻边界像素的平均值。In H.264, the 9 optional prediction mode sets for 4×4 blocks are: {mode 0, mode1, ..., mode 8}, that is, the vertical prediction direction is mode 0, and the horizontal prediction direction is mode 1. And so on. 4 optional prediction mode sets for 16×16 blocks: {mode 0, mode 1, ..., mode 3}, corresponding to vertical, horizontal, DC and planar prediction directions. The 8×8 chroma prediction mode is the same as the 16×16 prediction mode. In addition to the DC prediction mode, each prediction mode has a corresponding prediction direction and prediction weight. The DC prediction method uses the average value of adjacent boundary pixels for prediction, that is, all pixel values of the prediction block are equal to the average of adjacent boundary pixels. value.

宏块的帧内预测基本过程如下：The basic process of intra prediction of a macroblock is as follows:

(1)判断当前宏块周围可用宏块信息，包括上边、左边和右上边宏块是否可用。(1) Judging the available macroblock information around the current macroblock, including whether the upper, left, and upper right macroblocks are available.

(2)将亮度宏块划分为16个4×4小块，采用9种不同的预测模式，依次计算每个4×4小块的率失真代价，率失真代价由拉格朗日函数定义：(2) Divide the luminance macroblock into 16 4×4 small blocks, adopt 9 different prediction modes, and calculate the rate-distortion cost of each 4×4 small block in turn, and the rate-distortion cost is defined by the Lagrangian function:

其中，s是原像素块信号，c是重构块信号，QP是宏块的量化参数，λ_mode为拉格朗日乘数，λ_mode＝0.85·2^QP/3，R(s，c，mode|QP)是在相应mode和QP下的编码码率，包括头信息、预测模式和所有DCT系数的编码比特数。SSD(.)是4×4原像素块与重构块之间的平方误差和：Wherein, s is the original pixel block signal, c is the reconstructed block signal, QP is the quantization parameter of the macroblock, λ _mode is the Lagrangian multiplier, λ _mode =0.85·2 ^QP/3 , R(s, c, mode|QP) is the coding rate under the corresponding mode and QP, including header information, prediction mode and the number of coded bits of all DCT coefficients. SSD(.) is the sum of squared errors between the 4×4 original pixel block and the reconstructed block:

$SSD SSD ((s the s,, c c,, mode mode | | QP QP)) = = {Σ Σ}_{y the y = = 00}^{33} {Σ Σ}_{x x = = 00}^{33} {((s the s ((x x,, y the y)) - - c c ((x x,, y the y))))}^{22}$

依次完成所有4×4的帧内预测模式选择，并得到所有4×4块的率失真代价和。The selection of all 4×4 intra-frame prediction modes is completed in sequence, and the sum of the rate-distortion costs of all 4×4 blocks is obtained.

(3)采用16×16块模式，从4种不同的预测模式中选择最佳预测模式，方法是计算所有模式下的SATD(Sum of AbsoluteTransformed Difference)值，选取最小值的模式作为最佳16×16预测模式。SATD值作为16×16块的率失真(RD)代价，是残差值分成16个4×4块并分别经Hardamard变换后的绝对值和的一半，如式下所示，DiffT(i，j)是Hardamard变换系数。(3) Using the 16×16 block mode, select the best prediction mode from 4 different prediction modes. The method is to calculate the SATD (Sum of Absolute Transformed Difference) value in all modes, and select the mode with the minimum value as the best 16× 16 prediction modes. The SATD value is used as the rate-distortion (RD) cost of the 16×16 block, which is half of the absolute value sum of the residual value divided into 16 4×4 blocks and subjected to Hardamard transformation, as shown in the following formula, DiffT(i, j ) is the Hardamard transform coefficient.

$SATD SATD = = ((\underset{i i,, j j}{Σ Σ} | | DiffT DiffT ((i i,, j j)) | |)) / / 22$

(4)比较(2)和(3)得到率失真代价，从中选择一个最佳预测模式作为宏块的编码模式。色度块的计算与16×16块相同。(4) Comparing (2) and (3) to obtain the rate-distortion cost, and select an optimal prediction mode as the coding mode of the macroblock. Chroma blocks are computed the same as 16x16 blocks.

一个宏块中亮度块和色度块一共有N8×(N4×16+N16)种模式组合^[7]，N8、N4和N16分别表示色度块、4×4和16×16亮度块的预测模式数量。也就是说，一个宏块要得到最佳RDO模式，一共要计算592次不同的RDO计算。由此可见，帧内预测模式选择的计算复杂度是非常高的，影响H.264的编码速度。There are a total of N8×(N4×16+N16) mode combinations for luma blocks and chrominance blocks in a macroblock ^[7] . N8, N4 and N16 represent the prediction of chrominance blocks, 4×4 and 16×16 luma blocks, respectively. number of patterns. That is to say, to obtain the optimal RDO mode for a macroblock, a total of 592 different RDO calculations need to be calculated. It can be seen that the computational complexity of intra-frame prediction mode selection is very high, which affects the encoding speed of H.264.

为了降低帧内预测的复杂度，许多学者进行了广泛的研究。如Feng Pan在文献(F.Pan，X.Lin.Fast Mode Decision for Intra Prediction.ISO/IEC JTC1/SC29/WG11 and ITU-T SG16 Q.6，JVT-G013.doc，JVT 7th MeetingPattaya II，Thailand，7-14，March 2003.，下面简称Feng Pan法)中提出帧内预测快速模式选择算法，基本思想是首先采用Sobel算子作边缘检测，得到图像块中物体的边缘走向，再根据边缘方向确定相应的候选预测模式；Changsung Kim在文献[C.Kim.Feature-based intra-prediction modedecision for h.264.IEEE.，2004，pp：769-772.]中提出基于特征的帧内预测快速算法，采用SAD和SATD两个因素表征特征，通过复杂的计算与比较，得到最终编码模式。这些方法在一定程度上提高了帧内编码速度，但是同时引入了复杂的前期计算，因此存在局限性，其中FengPan提出的算法是众多算法中公认最好，并已经被JVT组织所采纳。In order to reduce the complexity of intra prediction, many scholars have conducted extensive research. For example, Feng Pan in the literature (F.Pan, X.Lin. Fast Mode Decision for Intra Prediction. ISO/IEC JTC1/SC29/WG11 and ITU-T SG16 Q.6, JVT-G013.doc, JVT 7th Meeting Pattaya II, Thailand , 7-14, March 2003., hereinafter referred to as the Feng Pan method) proposed a fast mode selection algorithm for intra-frame prediction. The basic idea is to first use the Sobel operator for edge detection to obtain the edge direction of the object in the image block, and then according to the edge direction Determine the corresponding candidate prediction mode; Changsung Kim proposed a feature-based intra-prediction fast algorithm in the literature [C.Kim.Feature-based intra-prediction modedecision for h.264.IEEE., 2004, pp:769-772.] , using SAD and SATD two factors to characterize features, and through complex calculations and comparisons, the final coding mode is obtained. These methods improve the speed of intra-frame coding to a certain extent, but at the same time introduce complex calculations in the early stage, so there are limitations. Among them, the algorithm proposed by FengPan is recognized as the best among many algorithms and has been adopted by the JVT organization.

发明内容 Contents of the invention

本发明的目的在于在保持良好的图像质量基础上，提高图像编码速率。The purpose of the present invention is to increase the image coding rate on the basis of maintaining good image quality.

为实现本发明的目的，本发明思想是通过减少帧内预测模式集中的候选模式数量来提高编码速率。本发明方法根据图像的纹理方向，筛选出沿纹理方向的预测模式，再进行RDO计算。To achieve the object of the present invention, the idea of the present invention is to increase the coding rate by reducing the number of candidate modes in the intra prediction mode set. According to the texture direction of the image, the method of the invention screens out prediction modes along the texture direction, and then performs RDO calculation.

上述图像的纹理方向通过如下方法获得：The texture direction of the above image is obtained by the following method:

首先，将相邻边界像素扩展为两行，再根据标准中提出的9预测方向，定义4个纹理方向：0°、45°、90°和135°；First, expand the adjacent boundary pixels into two lines, and then define 4 texture directions according to the 9 prediction directions proposed in the standard: 0°, 45°, 90° and 135°;

接着计算相邻两行像素的在纹理方向上的平均灰度差，并分别记为D₀、D₄₅、D₉₀和D₁₃₅，然后取D₀、D₄₅、D₉₀和D₁₃₅中最小值所对应的方向作为边缘的纹理方向，并令：Then calculate the average gray level difference in the texture direction of two adjacent rows of pixels, and record them as D ₀ , D ₄₅ , D ₉₀ and D ₁₃₅ respectively, and then take the minimum value among D ₀ , D ₄₅ , D ₉₀ and D ₁₃₅ The corresponding direction is taken as the texture direction of the edge, and let:

D_min＝min(D₀，D₄₅，D₉₀，D₁₃₅)D _min =min(D ₀ , D ₄₅ , D ₉₀ , D ₁₃₅ )

然后取D₀、D₄₅、D₉₀和D₁₃₅中最小值D_min所对应的方向作为边缘的纹理方向。Then take the direction corresponding to the minimum value D _min among D ₀ , D ₄₅ , D ₉₀ and D ₁₃₅ as the texture direction of the edge.

具体地说，D₀、D₄₅、D₉₀和D₁₃₅的值可分别通过如下公式计算获得：Specifically, the values of D ₀ , D ₄₅ , D ₉₀ and D ₁₃₅ can be calculated by the following formulas:

${D D.}_{00} = = \frac{11}{N N} {Σ Σ}_{i i = = 00}^{N N - - 11} | | I I (({x x}_{00} - - 22,, {y the y}_{00} + + i i)) - - I I (({x x}_{00} - - 11,, {y the y}_{00} + + i i)) | |$

${D D.}_{4545} = = \frac{11}{33 \times \times N N} (({Σ Σ}_{i i = = 00}^{22 N N - - 11} | | I I (({x x}_{00} + + i i - - 11,, {y the y}_{00} - - 22)) - - I I (({x x}_{00} + + i i,, {y the y}_{00} - - 11)) | | + +$

${Σ Σ}_{i i = = 00}^{N N - - 11} | | I I (({x x}_{00} - - 22,, {y the y}_{00} + + i i - - 11)) - - I I (({x x}_{00} - - 11,, {y the y}_{00} + + i i)) | |))$

${D D.}_{9090} = = \frac{11}{N N} {Σ Σ}_{i i = = 00}^{N N - - 11} | | I I (({x x}_{00} + + i i,, {y the y}_{00} - - 22)) - - I I {((x x}_{00} + + i i,, {y the y}_{00} - - 11)) | |$

${D D.}_{135135} = = \frac{11}{33 \times \times N N} (({Σ Σ}_{i i = = 00}^{22 N N - - 11} | | I I (({x x}_{00} + + i i + + 11,, {y the y}_{00} - - 22)) - - I I (({x x}_{00} + + i i - - 11,, {y the y}_{00} - - 11)) | | + +$

${Σ Σ}_{i i = = 00}^{N N - - 11} | | I I (({x x}_{00} - - 22,, {y the y}_{00} + + i i)) - - I I (({x x}_{00} - - 11,, {y the y}_{00} + + i i + + 11)) | |))$

其中，I(x，y)是像素在(x，y)坐标处的灰度值，(x₀，y₀)为预测块的左上角像素坐标，N是编码块的大小。本发明中，所述N的值优选为4、8或16。当N值为8或16时，取D₀、D₄₅和D₉₀中最小值D_min所对应的方向作为边缘的纹理方向。Wherein, I(x, y) is the gray value of the pixel at the (x, y) coordinate, (x ₀ , y ₀ ) is the pixel coordinate of the upper left corner of the predicted block, and N is the size of the coding block. In the present invention, the value of N is preferably 4, 8 or 16. When the value of N is 8 or 16, the direction corresponding to the minimum value D _min among D ₀ , D ₄₅ and D ₉₀ is taken as the texture direction of the edge.

沿纹理方向的预测模式可按照如下方式进行：The prediction mode along the texture direction can be done as follows:

令4×4块的9种预测模式候选集为F_4×4＝{mode 0，mode 1，...，mode 8}，16×16块的4种预测模式候选集为F_16×16＝{mode 0，mode1，...，mode3}，8×8块的4种预测模式候选集为F_8×8＝{mode 0，mode1，...，mode 3}，则通过如下方法筛选出沿纹理方向的预测模式：Let 9 kinds of prediction mode candidate sets of 4×4 blocks be F _4×4 ={mode 0, mode 1,...,mode 8}, and 4 kinds of prediction mode candidate sets of 16×16 blocks be F _16×16 = {mode 0, mode1, ..., mode3}, the four prediction mode candidate sets of 8 x 8 blocks are F _{8 x 8} = {mode 0, mode1, ..., mode 3}, which are filtered out by the following method Prediction mode along texture direction:

(1)判断当前宏块周围可用宏块信息，包括上边、左边和右上边宏块；(1) Determine available macroblock information around the current macroblock, including top, left, and top right macroblocks;

(2)如果上边、左边宏块都采用16×16预测模式，则判断当前宏块的帧内复杂性，如果当前宏块复杂性小于规定阀值，则跳转到步骤(4)；当前宏块的复杂性可通过下述公式计算：(2) If both the upper and left macroblocks adopt 16×16 prediction mode, then judge the intra-frame complexity of the current macroblock, if the complexity of the current macroblock is less than the specified threshold, then jump to step (4); the current macroblock The complexity of a block can be calculated by the following formula:

${X x}_{I I} = = {Σ Σ}_{y the y = = 00}^{M m - - 11} {Σ Σ}_{x x = = 00}^{M m - - 11} abs abs ((I I ((x x,, y the y)) - - \frac{11}{M m * * M m} {Σ Σ}_{y the y = = 00}^{M m - - 11} {Σ Σ}_{x x = = 00}^{M m - - 11} I I ((x x,, y the y))))$

其中，M是宏块大小。where M is the macroblock size.

(3)4×4块预测模式选择：(3) 4×4 block prediction mode selection:

将宏块分成4×4块，预测模式候选集的选择采取以下策略：The macroblock is divided into 4×4 blocks, and the selection of the prediction mode candidate set adopts the following strategy:

1)如果D_min＝D₉₀，候选集F_4×4＝{mode 0，mode 7，mode 5，mode2}；1) If D _min =D ₉₀ , candidate set F _4×4 ={mode 0, mode 7, mode 5, mode2};

2)如果D_min＝D₀，候选集F_4×4＝{mode 1，mode 8，mode 6，mode2}；2) If D _min =D ₀ , candidate set F _4×4 ={mode 1, mode 8, mode 6, mode2};

3)如果D_min＝D₄₅，候选集F_4×4＝{mode 4，mode 5，mode 6，mode2}；3) If D _min =D ₄₅ , candidate set F _4×4 ={mode 4, mode 5, mode 6, mode2};

4)如果D_min＝D₁₃₅，候选集F_4×4＝{mode 3，mode 7，mode 8，mode2}；4) If D _min =D ₁₃₅ , candidate set F _4×4 ={mode 3, mode 7, mode 8, mode2};

经过以上计算后，候选预测模式由原来的9种降为4种。编码器采用候选集中的预测模式进行RDO计算，求得最佳模式，并计算所有块的率失真代价和；After the above calculations, the candidate prediction modes are reduced from the original 9 to 4. The encoder uses the prediction mode in the candidate set to perform RDO calculation, find the best mode, and calculate the rate-distortion cost sum of all blocks;

(4)16×16亮度块和8×8色度块预测模式选择：(4) 16×16 luma block and 8×8 chroma block prediction mode selection:

将宏块采用16×16块进行预测，预测模式候选集的选择采取以下策略：The macroblock is predicted by 16×16 blocks, and the selection of the prediction mode candidate set adopts the following strategies:

1)如果D’_min＝D₉₀，候选集F_16×16＝{mode 0，mode 2}；1) If D' _min =D ₉₀ , candidate set F _16×16 ={mode 0, mode 2};

2)如果D’_min＝D₀，候选集F_16×16＝{mode 1，mode 2}；2) If D' _min =D ₀ , candidate set F _16×16 ={mode 1, mode 2};

3)如果D’_min＝D₄₅，候式集F_16×16＝{mode 3，mode 2}；3) If D' _min =D ₄₅ , the formula set F _16×16 ={mode 3, mode 2};

8×8色度块采用与16×16块相同的预测模式。经过以上计算后，预测模式由原来的4种降为2种。编码器采用候选集中的预测模式进行RDO计算，求得最佳模式；The 8×8 chrominance block uses the same prediction mode as the 16×16 block. After the above calculations, the prediction modes are reduced from the original 4 to 2. The encoder uses the prediction mode in the candidate set to perform RDO calculation to obtain the best mode;

(5)比较(3)和(4)的率失真代价，选择最小代价模式作为最终编码模式。(5) Compare the rate-distortion cost of (3) and (4), and select the minimum cost mode as the final coding mode.

通过上述方法能减少RDO计算，有效地提高图像的帧内编码速度，而图像质量和码率变化很少。The above method can reduce the calculation of RDO, effectively improve the encoding speed of the image within the frame, and the image quality and bit rate change little.

附图说明 Description of drawings

图1是国际标准发展历程；Figure 1 shows the development history of international standards;

图2是H.264编码框架示意图；Fig. 2 is a schematic diagram of the H.264 encoding framework;

图3是H.264分层设计示意图；Fig. 3 is a schematic diagram of H.264 layered design;

图4是Foreman(QCIF)序列第一帧图像；Fig. 4 is the first frame image of Foreman (QCIF) sequence;

图5是图4中第69个宏块(16×16宏块)；Fig. 5 is the 69th macroblock (16 * 16 macroblock) among Fig. 4;

图6是图5的第12个4×4小块；Fig. 6 is the 12th 4×4 small block of Fig. 5;

图7是4×4小块相邻边界扩展两行的示意图；Fig. 7 is a schematic diagram of extending two rows of adjacent borders of 4×4 small blocks;

图8是纹理定义方向；Figure 8 is the texture definition direction;

图9是Foreman序列PSNR比较；Figure 9 is a Foreman sequence PSNR comparison;

图10是Stefan序列PSNR比较；Figure 10 is a comparison of Stefan sequence PSNR;

图11是Carphone序列PSNR比较；Figure 11 is a comparison of Carphone sequence PSNR;

图12是Tempete序列PSNR比较。Figure 12 is a comparison of Tempete sequence PSNR.

具体实施方式 Detailed ways

下面结合附图进一步说明本发明。应当理解，以下实施例仅用于说明本发明，而不能作为本发明的限制，在不背离本发明精神和实质的前提下，进行的修改或替换，均属于本发明的范围。Further illustrate the present invention below in conjunction with accompanying drawing. It should be understood that the following examples are only used to illustrate the present invention, rather than as a limitation of the present invention, and any modifications or replacements without departing from the spirit and essence of the present invention fall within the scope of the present invention.

实施例1Example 1

1.纹理方向估计1. Texture direction estimation

众所周知，自然图像具有很强的空间相关性，相邻宏块之间的纹理走向也是非常相似的，特别对于4×4小块，相关性则更强。如图4显示的是Foreman(QCIF)序列第一帧图像，图5是图像中第69个宏块，图6中白色框内显示的是第69个宏块的第12个4×4小块。从图中可以看出，相邻宏块(或块)之间的纹理走向非常相似。下面分别对4×4亮度、16×16亮度和8×8色度块进行纹理方向估计。As we all know, natural images have strong spatial correlation, and the texture direction between adjacent macroblocks is also very similar, especially for 4×4 small blocks, the correlation is stronger. Figure 4 shows the first frame image of the Foreman (QCIF) sequence, Figure 5 shows the 69th macroblock in the image, and the white box in Figure 6 shows the 12th 4×4 small block of the 69th macroblock . It can be seen from the figure that the texture trends between adjacent macroblocks (or blocks) are very similar. The texture direction estimation is performed on the 4×4 luma, 16×16 luma and 8×8 chrominance blocks respectively in the following.

(1)4×4块纹理方向估计(1) 4×4 block texture direction estimation

首先将相邻边界像素扩展为两行，如图7所示。再根据标准中提出的9预测方向，定义4个纹理方向：0°、45°、90°和135°，如图8所示。First, the adjacent boundary pixels are extended into two rows, as shown in Figure 7. According to the 9 prediction directions proposed in the standard, 4 texture directions are defined: 0°, 45°, 90° and 135°, as shown in Figure 8.

接着计算相邻两行像素的在纹理方向上的平均灰度差，并分别记为D₀、D₄₅、D₉₀和D₁₃₅。方法如下：Then calculate the average gray level difference in the texture direction between two adjacent rows of pixels, and record them as D ₀ , D ₄₅ , D ₉₀ and D ₁₃₅ . Methods as below:

${D D.}_{9090} = = \frac{11}{N N} {Σ Σ}_{i i = = 00}^{N N - - 11} | | I I (({x x}_{00} + + i i,, {y the y}_{00} - - 22)) - - I I (({x x}_{00} + + i i,, {y the y}_{00} - - 11)) | |$

其中，I(x，y)是像素在(x，y)坐标处的灰度值，(x₀，y₀)为预测块的左上角像素坐标，N是编码块的大小，在4×4块中N＝4。然后取D₀、D₄₅、D₉₀和D₁₃₅中最小值所对应的方向作为边缘的纹理方向，并令：Among them, I(x, y) is the gray value of the pixel at (x, y) coordinates, (x ₀ , y ₀ ) is the pixel coordinates of the upper left corner of the prediction block, N is the size of the coding block, in 4×4 N=4 in the block. Then take the direction corresponding to the minimum value among D ₀ , D ₄₅ , D ₉₀ and D ₁₃₅ as the texture direction of the edge, and make:

16×16亮度块与8×8色度块纹理方向估计16×16 luma block and 8×8 chroma block texture direction estimation

对于16×16亮度块与8×8色度块，只有垂直、水平和平面预测方向，再加上DC预测模式。因此只需要定义3个纹理方向：0°、45°和90°。按照与4×4块同样的方法，分别求得最小的D’_min和D_min，并将此方向作为边缘的纹理方向。For 16x16 luma blocks and 8x8 chroma blocks, there are only vertical, horizontal and planar prediction directions, plus DC prediction mode. Therefore only 3 texture directions need to be defined: 0°, 45° and 90°. According to the same method as the 4×4 block, the minimum D' _min and D _min are obtained respectively, and this direction is used as the texture direction of the edge.

通过以上计算，初步确定了编码块的纹理方向。Through the above calculation, the texture direction of the coding block is preliminarily determined.

2.帧内模式选择算法2. Intra-frame mode selection algorithm

令4×4块的9种预测模式候选集为F_4×4＝{mode 0，mode 1，...，mode 8}，16×16块的4种预测模式候选集为F_16×16＝{mode 0，mode1，...，mode 4}，8×8块的4种预测模式候选集为F_8×8＝{mode 0，mode1，...，mode 4}。Let 9 kinds of prediction mode candidate sets of 4×4 blocks be F _4×4 ={mode 0, mode 1,...,mode 8}, and 4 kinds of prediction mode candidate sets of 16×16 blocks be F _16×16 = {mode 0, mode1, . . . , mode 4}, the four prediction mode candidate sets for 8×8 blocks are F _8×8 ={mode 0, mode1, . . . , mode 4}.

算法描述如下：The algorithm is described as follows:

(2)如果上边、左边宏块都采用16×16预测模式，则计算当前宏块的帧内复杂性：(2) If the upper and left macroblocks both adopt 16×16 prediction mode, then calculate the intra-frame complexity of the current macroblock:

其中，M是宏块大小，在本文中等于16。如果X_I小于一阈值T(由实验测试获得，本例中等于256)，则直接跳到(4)，采用16×16宏块进行预测。where M is the macroblock size, equal to 16 in this paper. If X _I is smaller than a threshold T (obtained by experimental test, equal to 256 in this example), then directly skip to (4), and use 16×16 macroblocks for prediction.

(3)4×4块预测模式选择：(3) 4×4 block prediction mode selection:

经过以上计算后，候选预测模式由原来的9种降为4种。编码器采用候选集中的预测模式进行RDO计算，求得最佳模式，并计算所有块的率失真代价和。After the above calculations, the candidate prediction modes are reduced from the original 9 to 4. The encoder uses the prediction modes in the candidate set to perform RDO calculations to find the best mode and calculate the rate-distortion cost sum of all blocks.

将宏块采用16×16块进行预测，预测模式候选集的选择采取以下策略：The macroblock is predicted by 16×16 blocks, and the selection of the prediction mode candidate set adopts the following strategy:

3)如果D’_min＝D₄₅，候式集F_16×16＝{mode 3，mode 2}；色度块采取相同的方法进行选择。经过以上计算后，预测模式由原来的4种降为2种。编码器采用候选集中的预测模式进行RDO计算，求得最佳模式。3) If D' _min =D ₄₅ , the candidate formula set F _16×16 ={mode 3, mode 2}; the chrominance blocks are selected in the same way. After the above calculations, the prediction modes are reduced from the original 4 to 2. The encoder uses the prediction modes in the candidate set to perform RDO calculations to obtain the best mode.

实验结果与分析Experimental results and analysis

为了验证算法的有效性，实验采用H.264的参考模型JM7.5(H.S.Malvar and A.Hallapuro.Low-complexity transform and quantization in H.264/AVC.IEEE Trans.CSVT.，vol.13(7)，pp：598-602，2003.)作为平台，在其中实现本方法。实验方法是采用CAVLC熵编码、参考帧为2、搜索范围为32、采用Hardmard变换、使用率失真优化(RDO)、序列结构采用IPPP和全I格式，量化参数选择：28和32，对多个标准视频序列进行测试。In order to verify the effectiveness of the algorithm, the experiment uses the H.264 reference model JM7.5 (H.S.Malvar and A.Hallapuro.Low-complexity transform and quantization in H.264/AVC.IEEE Trans.CSVT., vol.13(7 ), pp: 598-602, 2003.) as a platform in which to implement the method. The experimental method is to use CAVLC entropy coding, the reference frame is 2, the search range is 32, the Hardmard transform is used, the rate distortion optimization (RDO) is used, the sequence structure adopts IPPP and all I formats, and the quantization parameter selection: 28 and 32, for multiple Standard video sequences were tested.

本方法实验结果与全模式算法计算结果、Feng Pan算法计算结果进行比较，全模式算法就是穷尽搜索计算所有预测模式的方法。由于算法的运行速度跟硬件平台有很大关系，因此在相同的硬件平台上实验上面的三个算法，对测试结果的比值进行比较，从而可以排除硬件环境对算法的干扰，即对测试序列的编码码率变化(B_CHG)、编码时间变化(T_CHG)和图像质量变化(PSNR_CHG)进行比较，计算方法如下：The experimental results of this method are compared with the calculation results of the full model algorithm and the calculation results of the Feng Pan algorithm. The full model algorithm is a method of exhaustive search and calculation of all prediction models. Since the running speed of the algorithm has a lot to do with the hardware platform, the above three algorithms are tested on the same hardware platform, and the ratio of the test results is compared, so that the interference of the hardware environment to the algorithm can be eliminated, that is, the interference of the test sequence The encoding rate change (B_CHG), encoding time change (T_CHG) and image quality change (PSNR_CHG) are compared, and the calculation method is as follows:

$B B__CHG CHG = = \frac{Bits Bits__ours ours - - Bits Bits__all all}{Bits Bits__all all} \times \times 100100 % %$

$T T__CHG CHG = = \frac{Time Time__all all - - Time Time__ours ours}{Time Time__all all} \times \times 100100 % %$

PSNR_CHG＝PSNR_ours-PSNR_allPSNR_CHG = PSNR_ours - PSNR_all

其中Bits_ours和Bits_all分别为使用本发明方法和全模式算法产生的比特数，Time_ours和Time_all分别为使用本发明方法和全模式算法的编码时间，PSNR_ours和PSNR_all分别为使用本发明方法和全模式算法的图像质量。Feng Pan算法的计算方法与之相同。Wherein Bits_ours and Bits_all are respectively the number of bits produced using the method of the present invention and the full-mode algorithm, Time_ours and Time_all are respectively the encoding time using the method of the present invention and the full-mode algorithm, and PSNR_ours and PSNR_all are respectively the number of bits using the method of the present invention and the full-mode algorithm Image Quality. The calculation method of Feng Pan algorithm is the same.

1.IPPP结构，编码120帧，帧率＝30f/s，GOP＝15，编码结果比较如下：1. IPPP structure, encoding 120 frames, frame rate = 30f/s, GOP = 15, the encoding results are compared as follows:

表1 QCIF标准序列实验结果(QP＝28)Table 1 Experimental results of QCIF standard sequence (QP＝28)

表2 CIF标准序列实验结果(QP＝32)Table 2 Experimental results of CIF standard sequence (QP＝32)

由表1和表2可以看出，对于IPPP结构，本方法在编码速度方面较全模式方法平均将提高21.5％，而图像质量下降只有0.026dB，码率平均增加0.593％左右。同时可以看出，本方法与文献[7]算法相比，两者较全模式算法的码率增加和图像质量下降相当，但本方法平均有2％的提高。主要原因是Feng Pan算法中使用Sobel算法估计边缘的计算量比较大。下面图9和图10分别是是Foreman(QCIF)和Stefan(CIF)序列的亮度PSNR对照图，虚线是全模式算法下的图像质量，实线是本方法下的图像质量，P帧图像的质量下降是由于用于预测的I帧图像质量下降造成的。It can be seen from Table 1 and Table 2 that for the IPPP structure, this method will increase the encoding speed by 21.5% on average compared with the full-mode method, while the image quality will only decrease by 0.026dB, and the code rate will increase by about 0.593% on average. At the same time, it can be seen that compared with the algorithm in literature [7], the code rate increase and the image quality decrease of the two are comparable to the full-mode algorithm, but the average improvement of this method is 2%. The main reason is that the Sobel algorithm used in the Feng Pan algorithm to estimate the edge has a relatively large amount of calculation. Figure 9 and Figure 10 below are the brightness PSNR comparison charts of Foreman (QCIF) and Stefan (CIF) sequences respectively. The dashed line is the image quality under the full-mode algorithm, the solid line is the image quality under this method, and the quality of the P frame image The drop is due to the degradation of the I-frame image quality used for prediction.

2.全I帧结构，编码120帧，帧率＝30f/s，编码结果比较如下：2. Full I-frame structure, encoding 120 frames, frame rate = 30f/s, the encoding results are compared as follows:

表3 QCIF标准序列实验结果(QP＝28)Table 3 Experimental results of QCIF standard sequence (QP＝28)

表4 CIF标准序列实验结果(QP＝32)Table 4 Experimental results of CIF standard sequence (QP＝32)

TempeteTempete 3.873.87 38.2638.26 -0.129-0.129 3.543.54 43.6643.66 -0.125-0.125 StefanStefan 4.124.12 37.6437.64 -0.098-0.098 4.774.77 39.4839.48 -0.084-0.084 MobileMobile 4.364.36 33.6333.63 -0.135-0.135 4.514.51 37.8937.89 -0.139-0.139 平均 average 4.074.07 38.1838.18 -0.117-0.117 4.184.18 42.3642.36 -0.120-0.120

由表3和表4可以看出，对于全I帧结构，本方法性能表现更明显，在编码速度方面较全模式方法平均将提高43％，而图像质量下降只有0.12dB，码率平均增加4.0％左右。同样本方法也优于文献[7]算法，两者的码率增加和图像质量下降相差不多，而本方法平均有近4％的提高。下面图11和图12分别是是Carphone(QCIF)和Tempete(CIF)序列的亮度PSNR比较图，虚线是全模式算法下的图像质量，实线是本方法下的图像质量。It can be seen from Table 3 and Table 4 that for the full I-frame structure, the performance of this method is more obvious, and the encoding speed will be increased by 43% on average compared with the full-mode method, while the image quality will only decrease by 0.12dB, and the code rate will increase by an average of 4.0 %about. Similarly, this method is also better than the literature [7] algorithm, the code rate increase and the image quality decline of the two are almost the same, but this method has an average improvement of nearly 4%. Figure 11 and Figure 12 below are the brightness PSNR comparison diagrams of Carphone (QCIF) and Tempete (CIF) sequences, respectively. The dotted line is the image quality under the full-mode algorithm, and the solid line is the image quality under this method.

由以上实验结果表明，本方法能有效地提高图像的帧内编码速度，而图像质量和码率变化很少。The above experimental results show that this method can effectively improve the intra-frame encoding speed of images, while the image quality and bit rate change little.

Claims

1. A method for increasing the intra-frame coding rate of an image based on the H.264 standard. According to the texture direction of the image, the method screens out the prediction mode along the texture direction, and then performs RDO calculation.

2. The method according to claim 1, wherein the texture direction of the image is obtained by the following method:

First, expand the adjacent boundary pixels into two lines, and then define 4 texture directions according to the 9 prediction directions proposed in the standard: 0°, 45°, 90° and 135°;

Then calculate the average gray level difference in the texture direction of two adjacent rows of pixels, and record them as D ₀ , D ₄₅ , D ₉₀ and D ₁₃₅ respectively, and then take the minimum value among D ₀ , D ₄₅ , D ₉₀ and D ₁₃₅ The corresponding direction is taken as the texture direction of the edge, and let:

D _min =min(D ₀ , D ₄₅ , D ₉₀ , D ₁₃₅ )

Then take the direction corresponding to the minimum value D _min among D ₀ , D ₄₅ , D ₉₀ and D ₁₃₅ as the texture direction of the edge.

3. The method according to claim 2, characterized in that the values of D ₀ , D ₄₅ , D ₉₀ and D ₁₃₅ are calculated by the following method:

{D D.}_{00} = = \frac{11}{N N} {Σ Σ}_{i i = = 00}^{N N - - 11} | | I I (({x x}_{00} - - 22,, {y the y}_{00} + + i i)) - - I I (({x x}_{00} - - 11,, {y the y}_{00} + + i i)) | |

{D D.}_{4545} = = \frac{11}{33 \times \times N N} (({Σ Σ}_{i i = = 00}^{22 N N - - 11} | | I I (({x x}_{00} + + i i - - 11,, {y the y}_{00} - - 22)) - - I I (({x x}_{00} + + i i,, {y the y}_{00} - - 11)) | | + +

{Σ Σ}_{i i = = 00}^{N N - - 11} | | I I (({x x}_{00} - - 22,, {y the y}_{00} + + i i - - 11)) - - I I (({x x}_{00} - - 11,, {y the y}_{00} + + i i)) | |))

{D D.}_{9090} = = \frac{11}{N N} {Σ Σ}_{i i = = 00}^{N N - - 11} | | I I (({x x}_{00} + + i i,, {y the y}_{00} - - 22)) - - I I (({x x}_{00} + + i i,, {y the y}_{00} - - 11)) | |

{D D.}_{135135} = = \frac{11}{33 \times \times N N} (({Σ Σ}_{i i = = 00}^{22 N N - - 11} | | I I (({x x}_{00} + + i i + + 11,, {y the y}_{00} - - 22)) - - I I (({x x}_{00} + + i i - - 11,, {y the y}_{00} - - 11)) | | + +

{Σ Σ}_{i i = = 00}^{N N - - 11} | | I I (({x x}_{00} - - 22,, {y the y}_{00} + + i i)) - - I I (({x x}_{00} - - 11,, {y the y}_{00} + + i i + + 11)) | |))

Wherein, I(x, y) is the gray value of the pixel at the (x, y) coordinate, (x ₀ , y ₀ ) is the pixel coordinate of the upper left corner of the predicted block, and N is the size of the coding block.

4. The method according to claim 3, wherein the value of N is 4, 8 or 16.

5. The method according to claim 4, wherein when the value of N is 8 or 16, the direction corresponding to the minimum value D _min among D ₀ , D ₄₅ and D ₉₀ is taken as the texture direction of the edge.

6. The method according to any one of claims 1 to 5, characterized in that the nine prediction mode candidate sets for 4×4 blocks are F _4×4 ={mode 0,mode 1,...,mode 8}, the 4 kinds of prediction mode candidate sets for 16×16 blocks are F _16×16 ={mode 0, mode 1,...,mode3}, the 4 kinds of prediction mode candidate sets for 8×8 blocks are F _8×8 ＝{mode 0, mode 1,..., mode 3}, then filter out the prediction mode along the texture direction by the following method:

(1) Determine available macroblock information around the current macroblock, including top, left, and top right macroblocks;

(2) If both the upper and left macroblocks adopt the 16×16 prediction mode, then judge the intra-frame complexity of the current macroblock, if the complexity of the current macroblock is less than the specified threshold, then jump to step (4);

(3) 4×4 block prediction mode selection:

The macroblock is divided into 4×4 blocks, and the selection of the prediction mode candidate set adopts the following strategy:

1) If D _min =D ₉₀ , candidate set F _4×4 ={mode 0, mode 7, mode 5, mode2};

2) If D _min =D ₀ , candidate set F _4×4 ={mode 1, mode 8, mode 6, mode2};

3) If D _min =D ₄₅ , candidate set F _4×4 ={mode 4, mode 5, mode 6, mode2};

4) If D _min =D ₁₃₅ , candidate set F _4×4 ={mode 3, mode 7, mode 8, mode2};

The encoder uses the prediction mode in the candidate set to perform RDO calculation, find the best mode, and calculate the rate-distortion cost sum of all blocks;

(4) 16×16 luma block and 8×8 chroma block prediction mode selection:

The macroblock is predicted by 16×16 blocks, and the selection of the prediction mode candidate set adopts the following strategies:

1) If D' _min =D ₉₀ , candidate set F _16×16 ={mode 0, mode 2};

2) If D' _min =D ₀ , candidate set F _16×16 ={mode 1, mode 2};

3) If D' _min =D ₄₅ , the formula set F _16×16 ={mode 3, mode 2};

The 8×8 chrominance block adopts the same prediction mode as the 16×16 block, and the encoder uses the prediction mode in the candidate set for RDO calculation to obtain the best mode;

(5) Compare the rate-distortion cost of (3) and (4), and select the minimum cost mode as the final coding mode.

7. The method according to claim 6, wherein the complexity of the current macroblock is calculated by the following formula:

{X x}_{I I} = = {Σ Σ}_{y the y = = 00}^{M m - - 11} {Σ Σ}_{x x = = 00}^{M m - - 11} abs abs ((I I ((x x,, y the y)) - - \frac{11}{M m * * M m} {Σ Σ}_{y the y = = 00}^{M m - - 11} {Σ Σ}_{x x = = 00}^{M m - - 11} I I ((x x,, y the y))))

where M is the macroblock size.