CN102833538B

CN102833538B - Multi-pass video encoding

Info

Publication number: CN102833538B
Application number: CN201210271592.1A
Authority: CN
Inventors: 童歆; 吴锡荣; 托马斯·彭; 安德里亚那·杜米特拉; 巴林·哈斯凯尔; 吉姆·诺米勒
Original assignee: Apple Computer Inc
Current assignee: Apple Inc
Priority date: 2004-06-27
Filing date: 2005-06-24
Publication date: 2015-04-22
Anticipated expiration: 2025-06-24
Also published as: JP2008504750A; CN1926863B; WO2006004605A2; WO2006004605A3; KR100988402B1; JP2011151838A; JP4988567B2; KR100909541B1; EP1762093A2; CN102833539B; CN102833538A; KR20070011294A; CN102833539A; JP5318134B2; KR20090034992A; EP1762093A4; KR100997298B1; CN1926863A; WO2006004605B1; HK1101052A1

Abstract

Some embodiments of the invention provide a multi-pass encoding method that encodes several images (e.g., several frames of a video sequence). The method iteratively performs an encoding operation that encodes these images. The encoding operation is based on a nominal quantization parameter, which the method uses to compute quantization parameters for the images. During several different iterations of the encoding operation, the method uses several different nominal quantization parameters. The method stops iterations when it reaches a terminating criterion (e.g., it identifies an acceptable encoding of the images).

Description

Multi-pass Video Coding

本申请是申请日为2005年6月24日、申请号为200580006363.5、发明名称为“多通路视频编码”的发明专利申请的分案申请。This application is a divisional application of an invention patent application with an application date of June 24, 2005, an application number of 200580006363.5, and an invention title of "multi-channel video coding".

背景技术 Background technique

视频编码器通过利用多种编码方案编码视频图像序列（例如，视频帧）。视频编码方案典型地是以内帧或帧间的方式编码视频帧或视频帧的各部分（例如，视频帧内的像素集）。内帧编码的帧或像素集是独立于其他帧或其他帧内的像素集来编码的。帧间编码的帧或像素集是通过参考一个或多个其他帧或其他帧内的像素集来编码的。A video encoder encodes a sequence of video images (eg, video frames) by utilizing various encoding schemes. Video coding schemes typically encode video frames or portions of video frames (eg, sets of pixels within a video frame) in an intraframe or interframe manner. Intra-coded frames or sets of pixels are encoded independently of other frames or sets of pixels within other frames. An inter-coded frame or set of pixels is encoded by reference to one or more other frames or sets of pixels within other frames.

当压缩视频帧时，一些编码器实现了“速率控制器”，其为将要编码的视频帧或视频帧的集合提供“比特预算”。比特预算指定已经分配给编码该视频帧或视频帧集合的比特数量。通过有效分配比特预算，速率控制器试图生成考虑到某种限制（例如，目标比特率等）的最高质量压缩的视频流。When compressing video frames, some encoders implement a "rate controller" that provides a "bit budget" for the video frame or set of video frames to be encoded. The bit budget specifies the number of bits that have been allocated to encode the video frame or set of video frames. By efficiently allocating the bit budget, the rate controller attempts to generate the highest quality compressed video stream taking into account certain constraints (eg target bitrate, etc.).

迄今为止，已经提出了多种单通路和多通路速率控制器。单通路速率控制器为在单个通路中编码一系列视频图像的编码方案提供比特预算，而多通路速率控制器为在多个通路中编码一系列视频图像的编码方案提供比特预算。So far, various single-pass and multi-pass rate controllers have been proposed. A single-pass rate controller provides a bit budget for encoding schemes that encode a series of video images in a single pass, while a multi-pass rate controller provides a bit budget for encoding schemes that encode a series of video images in multiple passes.

单通路速率控制器在实时编码条件下是有效的。另一方面，多通路速率控制器基于一组限制为特定比特率优化编码。迄今为止，并没有很多的速率控制器在控制它们的比特率中考虑到帧或帧内像素集的空间或时间的复杂度。同样，大多数多通路速率控制器没有为虑及所期望比特率而对帧和/或帧内像素集使用最优量化参数的编码解决方案充分搜索解空间。Single-pass rate controllers are effective under real-time encoding conditions. Multipass rate controllers, on the other hand, optimize encoding for a specific bitrate based on a set of constraints. To date, not many rate controllers take into account the spatial or temporal complexity of a frame or pixel set within a frame in controlling their bit rate. Also, most multi-pass rate controllers do not adequately search the solution space for encoding solutions that use optimal quantization parameters for frames and/or sets of intra-frame pixels taking into account the desired bit rate.

因此，现有技术中存在对使用新颖技术的速率控制器的需求，以便在控制用于编码一组视频图像的比特率的同时，考虑视频图像和/或视频图像各部分的空间或时间复杂度。现有技术中还存在对多通路速率控制器的需求，其充分检查各种编码方案以识别出针对视频图像和/或视频图像各部分使用最优量化参数集的编码方案。Therefore, there is a need in the art for a rate controller using novel techniques to take into account the spatial or temporal complexity of video images and/or portions of video images while controlling the bit rate used to encode a set of video images . There is also a need in the prior art for a multi-pass rate controller that adequately examines various coding schemes to identify the coding scheme that uses the optimal set of quantization parameters for a video image and/or portions of a video image.

发明内容 Contents of the invention

本发明的一些实施例提供一种编码多个图像（例如，视频序列的多个帧）的多通路编码方法。该方法重复执行编码这些图像的编码操作。该编码操作是基于标称量化参数，该方法使用该标称量化参数计算这些图像的量化参数。在该编码操作的几次不同的迭代过程中，该方法使用了几种不同的标称量化参数。该方法在达到了终结准则（例如，其识别到一个可接受的图像编码）时停止其迭代过程。Some embodiments of the invention provide a multi-pass encoding method for encoding multiple images (eg, multiple frames of a video sequence). The method repeatedly performs the encoding operation for encoding these images. The encoding operation is based on a nominal quantization parameter, which is used by the method to calculate the quantization parameters of the images. During several different iterations of the encoding operation, the method uses several different nominal quantization parameters. The method stops its iterative process when a termination criterion is reached (eg, it identifies an acceptable image encoding).

本发明的一些实施例提供一种用于编码视频序列的方法。该方法识别量化视频中的第一图像的复杂度的第一属性。它还基于所述识别的第一属性为编码第一图像识别量化参数。该方法接着基于所述识别的量化参数编码第一图像。在一些实施例中，这种方法为视频中的多个图像执行这三项操作。Some embodiments of the invention provide a method for encoding a video sequence. The method identifies a first property that quantifies complexity of a first image in the video. It also identifies a quantization parameter for encoding the first image based on the identified first property. The method then encodes a first image based on said identified quantization parameters. In some embodiments, this method performs these three operations for multiple images in the video.

本发明的一些实施例基于视频图像和/或视频图像的各部分的“视觉掩蔽”属性编码视频图像序列。图像或图像各部分的视觉掩蔽是对在图像或图像各部分中能够忍受多少编码人工因素的指示。为了表达图像或图像各部分的视觉掩蔽属性，一些实施例计算了量化图像或图像各部分的亮度能量的视觉掩蔽强度。在一些实施例中，该亮度能量测量作为图像或图像各部分的平均luma或像素能量的函数。Some embodiments of the invention encode a sequence of video images based on a "visual masking" property of the video images and/or portions of the video images. The visual masking of an image or portion of an image is an indication of how much coding artifact can be tolerated in the image or portion of an image. To express the visual masking properties of an image or portions of an image, some embodiments calculate a visual masking strength that quantifies the luminance energy of the image or portions of an image. In some embodiments, the luminance energy is measured as a function of the average luma or pixel energy of the image or portions of the image.

替代该亮度能量或与之结合，图像或图像各部分的视觉掩蔽强度也可以量化图像或图像各部分的活动性能量。活动性能量表示图像或图像各部分的复杂度。在一些实施例中，活动性能量包括量化图像或图像各部分空间复杂度的空间组件，和/或量化由于图像之间的移动而能够忍受/掩蔽的失真数量的运动组件。Instead of or in combination with this luminance energy, the visual masking strength of the image or portions of the image can also quantify the activity energy of the image or portions of the image. Activity energy represents the complexity of the image or parts of the image. In some embodiments, activity energy includes a spatial component quantifying the spatial complexity of an image or parts of an image, and/or a motion component quantifying the amount of distortion that can be tolerated/masked due to movement between images.

本发明的一些实施例提供一种用于编码视频序列的方法。该方法识别视频中的第一图像的视觉掩蔽属性。其还识别用于基于所述识别的视觉掩蔽属性编码第一图像的量化参数。该方法接着基于所述识别的量化参数编码第一图像。Some embodiments of the invention provide a method for encoding a video sequence. The method identifies visual masking properties of a first image in a video. It also identifies quantization parameters for encoding the first image based on the identified visual masking properties. The method then encodes a first image based on said identified quantization parameters.

附图说明 Description of drawings

本发明的新颖特征在所附权利要求书中阐述。然而，出于解释的目的，在以下附图中阐述本发明的多个实施例。The novel features of the invention are set forth in the appended claims. For purposes of explanation, however, various embodiments of the invention are set forth in the following figures.

图1给出了概念性举例说明本发明一些实施例的编码方法的过程；Fig. 1 has given the process of conceptually illustrating the encoding method of some embodiments of the present invention;

图2概念性举例说明了一些实施例的编解码系统；Figure 2 conceptually illustrates the codec system of some embodiments;

图3为举例说明一些实施例的编码过程的流程图；Figure 3 is a flowchart illustrating an encoding process of some embodiments;

图4a为一些实施例中图像的标称移除时间和最终到达时间之间的区别与举例说明下溢条件的图像数量之间关系的曲线图；Figure 4a is a graph of the difference between the nominal removal time and final arrival time of a picture versus the number of pictures illustrating an underflow condition in some embodiments;

图4b举例说明了在消除下溢条件之后，对如图4a中所示的同一图像标称移除时间和最终到达时间的区别与图像数量之间的关系曲线图；Figure 4b illustrates a graph of the difference between the nominal removal time and final arrival time versus the number of pictures for the same picture as shown in Figure 4a after eliminating the underflow condition;

图5举例说明了一些实施例中编码器用于执行下溢检测的过程；Figure 5 illustrates a process used by an encoder to perform underflow detection in some embodiments;

图6举例说明了一些实施例中编码器用于消除图像的单个片段中的下溢条件的过程；Figure 6 illustrates the process used by an encoder to eliminate an underflow condition in a single segment of a picture in some embodiments;

图7举例说明了视频流应用中缓冲器下溢管理的应用；Figure 7 illustrates the application of buffer underflow management in video streaming applications;

图8举例说明了HD-DVD系统中缓冲器下溢管理的应用。Figure 8 illustrates the application of buffer underflow management in HD-DVD system.

图9给出了利用其实现了本发明的一个实施例的计算机系统。FIG. 9 shows a computer system with which an embodiment of the present invention is implemented.

具体实施方式 Detailed ways

在以下对本发明的详细描述中，提出并描述了本发明的众多细节、实例及实施例。然而，对本领域技术人员明确并显而易见的是，本发明并不局限于所述的实施例，并且本发明可以无需一些指定细节和所讨论实例而实施。In the following detailed description of the invention, numerous details, examples and embodiments of the invention are presented and described. It is, however, clear and obvious to a person skilled in the art that the invention is not limited to the described embodiments and that the invention may be practiced without some of the specific details and examples discussed.

I.定义I. Definition

此部分为这个文档中使用的多个符号提供了定义。This section provides definitions for several symbols used in this document.

RT代表目标比特率，它是用于编码帧序列所期望的比特率。通常，这个比特率以比特/秒为单位表述，并且是从所期望的最终的文件尺寸、序列中帧的数量、以及帧速率计算得出的。RT stands for Target Bit Rate, which is the desired bit rate for encoding a sequence of frames. Typically, this bit rate is expressed in bits per second and is calculated from the desired final file size, the number of frames in the sequence, and the frame rate.

Rp代表通路p的结束处所编码比特流的比特率。Rp represents the bit rate of the coded bit stream at the end of path p.

Ep代表在通路p的结束处比特率中的错误百分比。在一些情况下，这个百分比计算为 Ep represents the percentage error in the bit rate at the end of pass p. In some cases, this percentage is calculated as

ε代表最终比特率中的误差容许范围。ε represents the tolerance range of errors in the final bit rate.

ε_C代表针对第一QP搜索阶段的比特率中的误差容许范围。ε _C represents the error tolerance in the bit rate for the first QP search stage.

QP代表量化参数。QP stands for quantization parameter.

QP_Nom（p）代表为帧序列编码的通路p中所使用的标称量化参数。QP_Nom（p）的值由本发明的多通路编码器在第一QP调整阶段中调整以达到目标比特率。QP _Nom(p) represents the nominal quantization parameter used in pass p encoded for the sequence of frames. The value of QP _{Nom (p)} is adjusted by the inventive multi-pass encoder in the first QP adjustment stage to achieve the target bit rate.

MQP_p（k）代表屏蔽帧QP，其是通路p中帧k的量化参数（QP）。一些实施例通过利用标称QP和帧级视觉掩蔽计算该值。MQP _p (k) represents the masked frame QP, which is the quantization parameter (QP) of frame k in pass p. Some embodiments calculate this value by utilizing the nominal QP and frame-level visual masking.

MQP_MB（p）（k，m）代表屏蔽宏块QP，其是帧k和通路p的单个宏块（具有宏块索引m）的量化参数（QP）。一些实施例通过利用MQP_p（k）和宏块级视觉掩蔽计算MQP_MB（p）（k，m）。MQP _MB(p) (k,m) stands for masked macroblock QP, which is the quantization parameter (QP) of a single macroblock (with macroblock index m) of frame k and pass p. Some embodiments compute MQP _MB(p) (k,m) by utilizing MQP _p (k) and macroblock-level visual masking.

φ_F(k)代表成为帧k掩蔽强度的值。掩蔽强度φ_F(k)是对该帧的复杂度度量，在一些实施例中，这个值被用于确定视觉编码人工因素/噪声将如何呈现以及用于计算帧k的MQP_p（k）。φ _F (k) represents a value that becomes the masking strength of frame k. The masking strength φ _F (k) is a measure of complexity for that frame, and in some embodiments, this value is used to determine how visual coding artifacts/noise will appear and to compute the MQP _p (k) for frame k.

φ_R(p)代表通路p中的参考屏蔽强度。该参考屏蔽强度用于计算帧k的MQP_p（k），并且其由本发明的多通路编码器在第二阶段中调整以达到目标比特率。φ _R(p) represents the reference shielding strength in path p. This reference masking strength is used to compute the MQP _p (k) for frame k, and it is adjusted in the second stage by the inventive multi-pass encoder to reach the target bitrate.

φ_MB(k,m)代表帧k中具有索引号为m的宏块的屏蔽强度。屏蔽强度φ_MB(k,m)为该宏块复杂度的度量，并且在一些实施例中，其被用于确定视觉编码人工因素/噪声将如何呈现以及用于计算MQP_MB（p）（k，m）。AMQPp代表通路p中的帧之上的平均屏蔽QP。在一些实施例中，该值作为通路p中的所有帧之上的平均MQP_p（k）计算。φ _MB (k,m) represents the masking strength of the macroblock with index m in frame k. The masking strength φ _MB (k,m) is a measure of the complexity of the macroblock, and in some embodiments it is used to determine how visual coding artifacts/noise will appear and to calculate the MQP _MB(p) (k , m). AMQPp represents the average masked QP over frames in pass p. In some embodiments, this value is calculated as the average MQP _p (k) over all frames in pass p.

II.概述II. Overview

本发明的一些实施例提供了实现以给定比特率编码帧序列的最佳视觉质量的编码方法。在一些实施例中，该方法使用为每一个宏块分配量化参数QP的视觉掩蔽过程。这种分配基于图像或视频帧中较亮或空间上较复杂区域中的编码人工因素/噪声不如较暗或平面区域中的编码人工因素/噪声明显的认识。Some embodiments of the invention provide encoding methods that achieve the best visual quality for encoding a sequence of frames at a given bit rate. In some embodiments, the method uses a visual masking process that assigns a quantization parameter QP to each macroblock. This assignment is based on the recognition that coding artifacts/noise in brighter or spatially complex regions of an image or video frame are less pronounced than in darker or planar regions.

在一些实施例中，这种视觉掩蔽过程作为发明的多通路编码过程的部分执行。为了使最终编码比特流达到目标比特率，这种编码过程调整标称量化参数并通过参考屏蔽强度参数φ_R控制视觉掩蔽过程。如以下的进一步描述，调整标称量化参数和控制屏蔽算法调整每幅图片（即，通常是视频编码方案中的每个帧）和每幅图片内的每个宏块的QP值。In some embodiments, this visual masking process is performed as part of the inventive multi-pass encoding process. Such an encoding process adjusts the nominal quantization parameter and controls the visual masking process by referring to the masking strength parameter _φR in order to achieve the target bitrate for the final encoded bitstream. As described further below, adjusting the nominal quantization parameter and controlling the masking algorithm adjusts the QP value per picture (ie, typically each frame in a video coding scheme) and per macroblock within each picture.

在一些实施例中，多通路编码过程全局调整整个序列的标称QP和φ_R。在其他实施例中，这个过程将视频序列划分为片段，利用标称QP和φ_R调整每个片段。下面的描述涉及其上应用了多通路编码处理的帧序列。普通技术人员将意识到在一些实施例中这个序列包括整个序列，而在其他实施例中其仅包括序列的一个片段。In some embodiments, the multi-pass encoding process globally adjusts the nominal QP and φ _R for the entire sequence. In other embodiments, this process divides the video sequence into slices, adjusting each slice with a nominal QP and _φR . The following description relates to a sequence of frames on which the multi-pass encoding process is applied. Those of ordinary skill will appreciate that in some embodiments this sequence includes the entire sequence, while in other embodiments it includes only a fragment of the sequence.

在一些实施例中，本方法具有三个编码阶段。这三个阶段为：（1）在通路0中执行的初始分析阶段，（2）在通路1到通路N₁中执行的第一搜索阶段，以及（3）在通路N₁+1到N₁+N₂中执行的第二搜索阶段。In some embodiments, the method has three encoding stages. The three phases are: (1) the initial analysis phase performed in pass 0, (2) the first search phase performed in pass 1 to pass N ₁ , and (3) in pass N ₁ + 1 to N ₁ The second search phase performed in +N ₂ .

在初始分析阶段中（即，在通路0期间），本方法识别用于标称QP（QP_Nom（1），将在编码的通路1中使用）的初始值。在初始分析阶段期间，该方法还识别参考屏蔽强度φ_R的值，它在第一搜索阶段中的所有通路中使用。In the initial analysis phase (ie, during pass 0), the method identifies an initial value for a nominal QP (QP _Nom(1) , to be used in pass 1 of the encoding). During the initial analysis phase, the method also identifies the value of the reference masking strength _φR , which is used in all passes in the first search phase.

在第一搜索阶段中，本方法执行编码过程的N₁迭代（即，N₁通路）。在通路p中对每一个帧k，该过程通过使用特定量化参数MQP_p（k）和帧k内的各个宏块m的特定量化参数MQP_MB（p）（k，m）编码该帧，在此MQP_MB（p）（k，m）是利用MQP_p（k）计算的。In the first search phase, the method performs _N1 iterations (ie, _N1 passes) of the encoding process. For each frame k in pass p, the process encodes the frame by using a specific quantization parameter MQP _p (k) and a specific quantization parameter MQP _MB(p) (k,m) for each macroblock m within frame k, where This MQP _MB(p) (k,m) is calculated using MQP _p (k).

在第一搜索阶段中，量化参数MQP_p（k）在通路之间变化，因为其是由在通路之间变化的标称量化参数QP_Nom（p）得到的。换言之，在第一搜索阶段期间每个通路p的结束时，该过程计算用于通路p+1的标称QP_Nom（p+1）。在一些实施例中，标称QP_Nom（p+1）是基于来自之前的通路的标称QP值和比特率错误。在其他的实施例中，标称QP_Nom _（p+1）值在第二搜索阶段中的每个通路的结束时不同地计算。In the first search stage, the quantization parameter MQP _p (k) is varied between passes as it is derived from the nominal quantization parameter QP _{Nom (p) that} is varied between passes. In other words, at the end of each pass p during the first search phase, the process calculates a nominal QP _{Nom(p+1) for pass p+1} . In some embodiments, the nominal QP _Nom(p+1) is based on nominal QP values and bit rate errors from previous passes. In other embodiments, the nominal QP _Nom _(p+1) value is calculated differently at the end of each pass in the second search phase.

在第二搜索阶段中，本方法执行编码过程的N₂迭代（即，N₂通路）。正如在第一搜索阶段中的那样，该过程通过使用特定量化参数MQP_p（k）和帧k内的各个宏块m的特定量化参数MQP_MB（p）（k，m）在每个通路p期间编码每个帧k，在此由MQP_p（k）得到MQP_MB _（p）（k，m）。In the second search phase, the method performs _N2 iterations (ie, _N2 passes) of the encoding process. As in the first search stage, the process is performed in each pass p by using the specific quantization parameter MQP _p (k) and the specific quantization parameter MQP _MB(p) (k,m) Each frame k is coded during, where MQP _MB _(p) (k,m) is obtained from MQP _p (k).

同样，正如在第一搜索阶段中的那样，量化参数MQP_p（k）在通路间变化。然而，在第二搜索阶段期间，这个参数改变是由于其是利用在通路之间变化的参考屏蔽强度φ_R(p)计算的。在一些实施例中，参考屏蔽强度φ_R(p)是基于来自之前通路的比特率中的错误和φ_R值计算的。在其他的实施例中，该参考屏蔽强度在第二搜索阶段中的每个通路的结束时计算为不同的值。Also, as in the first search phase, the quantization parameter MQP _p (k) varies between passes. However, during the second search phase, this parameter changes because it is calculated using a reference masking strength φ _R(p) that varies between passes. In some embodiments, the reference masking strength φ _R(p) is calculated based on errors in the bit rate and φ _R values from previous passes. In other embodiments, the reference masking strength is calculated as a different value at the end of each pass in the second search phase.

尽管是结合视觉掩蔽过程描述了多通路编码过程，本领域的普通技术人员将意识到的是编码器无需同时一起使用这些两种处理过程。例如，在一些实施例中，通过忽略φ_R并省略以上所述的第二搜索阶段，多通路编码过程被用于编码给定目标比特率附近的比特流而无需视觉掩蔽。Although the multi-pass encoding process is described in conjunction with the visual masking process, one of ordinary skill in the art will appreciate that the encoder need not use these two processes together at the same time. For example, in some embodiments, by ignoring _φR and omitting the second search stage described above, a multi-pass encoding process is used to encode bitstreams around a given target bitrate without visual masking.

在本申请的第III和IV部分进一步描述了视觉掩蔽和多通路编码过程。The visual masking and multi-pass encoding process is further described in Sections III and IV of this application.

III.视觉掩蔽III. Visual Masking

给定一个标称量化参数，视觉掩蔽处理首先利用参考屏蔽强度（φ_R）和该帧屏蔽强度（φ_F）计算每个帧的屏蔽帧量化参数（MQP）。该过程接着基于该帧和宏块级屏蔽强度（φ_F和φ_MB）计算每个宏块的屏蔽宏块量化参数（MQP_MB）。当在多通路编码过程中应用视觉掩蔽处理时，一些实施例中的参考屏蔽强度（φ_R）如上所述以及以下进一步的描述在第一编码通路中被识别。Given a nominal quantization parameter, the visual masking process first computes the masked frame quantization parameter (MQP) for each frame using the reference masking strength (φ _R ) and the masking strength of this frame (φ _F ). The process then calculates the masked macroblock quantization parameter (MQP _MB ) for each macroblock based on the frame and macroblock level masking strengths (φ _F and φ _MB ). When applying the visual masking process in a multi-pass encoding process, the reference masking strength (φ _R ) in some embodiments is identified in the first encoding pass as described above and further below.

A.计算帧级屏蔽强度A. Calculating frame-level shielding strength

1.第一种方法1. The first method

为了计算帧级屏蔽强度φ_F(k)，一些实施例使用以下公式（A）：To calculate the frame-level masking strength φ _F (k), some embodiments use the following formula (A):

φ_F(k)＝C*power(E*avgFrameLuma(k),β)*power(D*avgFrameSAD(k),α_F)，(A)φ _F (k) = C*power(E*avgFrameLuma(k),β)*power(D*avgFrameSAD(k), _αF ), (A)

其中：in:

●avgFrameLuma（k）为利用bxb区域计算的帧k中的●avgFrameLuma(k) is the frame k calculated using the bxb area

平均像素强度，其中b为大于或等于1的整数（例如，b=1或b=4）；Average pixel intensity, where b is an integer greater than or equal to 1 (for example, b=1 or b=4);

●avgFrameSAD（k）为帧k内所有宏块的MbSAD（k，m）的平均值；●avgFrameSAD(k) is the average value of MbSAD(k,m) of all macroblocks in frame k;

●MbSAD（k，m）为由函数Calc4x4MeanRemovedSAD（4x4_block_pixel_value）给出的具有索引为m的宏块中所有4x4块的值的总和；MbSAD(k,m) is the sum of the values of all 4x4 blocks in the macroblock with index m given by the function Calc4x4MeanRemovedSAD(4x4_block_pixel_value);

●α_F，C，D，和E为常数和/或根据本地统计而调整；以及● α _F , C, D, and E are constant and/or adjusted according to local statistics; and

●power（a，b）意为a^b。● power (a, b) means a ^b .

用于函数Calc4x4MeanRemovedSAD的伪码如下：The pseudocode for the function Calc4x4MeanRemovedSAD is as follows:

2.第二种方法2. The second method

其他的实施例以不同的方式计算帧级屏蔽强度。例如，上述的公式（A）基本如下所示计算帧屏蔽强度：Other embodiments calculate frame-level masking strengths differently. For example, the above formula (A) basically calculates the frame masking strength as follows:

φ_F(k)＝C*power(E*Brightness_Attribute,exponent0)*φ _F (k)＝C*power(E*Brightness_Attribute,exponent0)*

power(scalar*Spatial_Activity_Attribute,exponentl)power(scalar*Spatial_Activity_Attribute,exponentl)

在公式（A）中，帧的Brightness_Attribute等于avgFrameLuma（k），而Spatial_Activity_Attribute等于avgFrameSAD（k），其是帧内的所有宏块的平均宏块SAD（MbSAD（k，m））值，在此平均宏块SAD等于宏块内所有4x4块的平均移除4x4像素变更（如由Calc4x4MeanRemovedSAD给出）的绝对值之和。该Spatial_Activity_Attribute度量了正被编码的帧之内的像素区域中的空间修正的数量。In formula (A), the Brightness_Attribute of a frame is equal to avgFrameLuma(k), and the Spatial_Activity_Attribute is equal to avgFrameSAD(k), which is the average macroblock SAD (MbSAD(k,m)) value of all macroblocks in the frame, where the average The macroblock SAD is equal to the sum of the absolute values of the mean removed 4x4 pixel changes (as given by Calc4x4MeanRemovedSAD) of all 4x4 blocks within the macroblock. The Spatial_Activity_Attribute measures the amount of spatial correction in the region of pixels within the frame being encoded.

其他的实施例将活动度量扩展到包含穿过许多连续帧的像素区域中的时间修正的数量。特别的，这些实施例如下所示计算帧屏蔽强度：Other embodiments extend the activity metric to include the number of temporal corrections in a pixel region across many consecutive frames. In particular, these embodiments compute the frame masking strength as follows:

φ_F(k)＝C*power(E*Bfightness_Attribute,expouent0)*φ _F (k)＝C*power(E*Bfightness_Attribute,expouent0)*

power(scalar*Activity_Attribute,exponentl) (B)power(scalar*Activity_Attribute, exponentl) (B)

在这个公式中，Activity_Attribu由以下公式（C）给出：In this formula, Activity_Attribu is given by the following formula (C):

E*power(F*Temporal_Activity_Attribuc，exponent_delta)(C)E*power(F*Temporal_Activity_Attribuc, exponent_delta)(C)

在一些实施例中，Temporal_Activity_Attribute量化了能够忍受（即，屏蔽）由于帧之间的移动而引起失真的数量。在这些实施例的一些中，帧的Temporal_Activity_Attribute等于该帧内所定义的像素区域的移动补偿错误信号的绝对值之和的常数倍。在另外一些实施例中，Temporal_Activity_Attribute由以下公式（D）提供：In some embodiments, the Temporal_Activity_Attribute quantifies the amount of distortion due to motion between frames that can be tolerated (ie masked). In some of these embodiments, the Temporal_Activity_Attribute of a frame is equal to a constant multiple of the sum of absolute values of motion compensation error signals for pixel regions defined within the frame. In other embodiments, Temporal_Activity_Attribute is provided by the following formula (D):

$Temporal Temporal__Activity Activity__Attribute Attribute = =$

${Σ Σ}_{j j = = - - 11}^{- - N N} (({W W}_{j j} \cdot \cdot avgFrameSAD avgFrameSAD ((j j)))) + + {Σ Σ}_{j j = = 11}^{M m} (({W W}_{j j} \cdot \cdot avgFrameSAD avgFrameSAD ((j j)))) + + {W W}_{00} \cdot &Center Dot; avgFrameSAD avgFrameSAD ((00)) - - - - - - ((D D.))$

在公式（D）中，“avgFrameSAD”代表（如上所述）帧内的平均宏块SAD（MbSAD（k，m））值，avgFrameSAD（0）为当前帧的avgFrameSAD，并且负的j指向当前帧之前的时间实例，而正的j指向当前帧之后的时间实例。由此，avgFrameSAD（j=-2）表示当前帧之前的两个帧的平均帧SAD，avgFrameSAD（j=3）表示当前帧之后的三个帧的平均帧SAD。In formula (D), "avgFrameSAD" represents (as above) the average macroblock SAD (MbSAD(k,m)) value within a frame, avgFrameSAD(0) is the avgFrameSAD of the current frame, and the negative j points to the current frame A previous time instance, while a positive j points to a time instance after the current frame. Thus, avgFrameSAD (j=-2) represents the average frame SAD of two frames before the current frame, and avgFrameSAD (j=3) represents the average frame SAD of three frames after the current frame.

同样，在公式（D）中，变量N和M分别指当前帧之前和之后的帧的数量。代替简单的基于特定数量的帧选择值N和M，一些实施例基于当前时间帧的时间的之前或之后特定时间周期计算值N和M。将移动屏蔽与空间持续时间相关联比将移动屏蔽与一组数量的帧相关联更具优势。这是因为将移动屏蔽与时间周期相关联直接符合观察者基于时间的视觉感觉。另一方面，将这样的屏蔽与帧的数量相关联由于不同的显示装置以不同帧速率呈现视频而要忍受可变的显示持续时间。Also, in formula (D), the variables N and M refer to the number of frames before and after the current frame, respectively. Instead of simply selecting the values N and M based on a particular number of frames, some embodiments calculate the values N and M based on a particular time period before or after the time of the current time frame. Associating a motion mask with a spatial duration is more advantageous than associating a motion mask with a set number of frames. This is because associating moving masks with time periods directly corresponds to the observer's time-based visual perception. On the other hand, associating such masking with the number of frames suffers from variable display durations since different display devices render video at different frame rates.

在公式（D）中，“W”代指权重因数，在一些实施例中，当帧j进一步离开当前帧时其会减少。同样，在这个公式中，第一求和表示能够在当前帧之前屏蔽的移动数量。第二求和表示能够在当前正之后屏蔽的移动数量，而最后的表达式（avgFrameSAD（0））表示当前帧的帧SAD。In formula (D), "W" refers to a weighting factor which, in some embodiments, decreases as frame j moves further away from the current frame. Also, in this formula, the first sum represents the amount of movement that can be masked before the current frame. The second sum represents the amount of movement that can be masked after the current frame, while the final expression (avgFrameSAD(0)) represents the frame SAD of the current frame.

在一些实施例中，权重因数被调整以说明场景变化。例如，一些实施例解决先行范围内（即，在M帧内）即将来临的场景变化，但在场景变化之后没有任何帧。例如，这些实施例可以设置场景变化之后的先行范围内的帧的权重因数为零。同样，一些实施例不解决向后看范围内（即，在N帧之内）先于或位于场景变化的帧。例如，这些实施例可以设置涉及前面场景或落到先前场景变化之前的向后看范围内的帧的权重因数为零。In some embodiments, weighting factors are adjusted to account for scene changes. For example, some embodiments address an upcoming scene change within the lookahead range (ie, within M frames), but not any frames after the scene change. For example, these embodiments may set the weight factors of frames in the look-ahead range after a scene change to zero. Also, some embodiments do not address frames that precede or lie within a look-back range (ie, within N frames) of a scene change. For example, these embodiments may set the weighting factor to zero for frames that refer to the previous scene or fall within the look-behind range before the previous scene change.

3.第二方法的变异3. Variation of the second method

a）限制过去帧和将来帧对Temporal_Activity_Attribute的影响a) Limit the influence of past frames and future frames on Temporal_Activity_Attribute

以上的公式（D）基本上从以下条件表述Temporal_Activity_Attribute：The above formula (D) basically expresses Temporal_Activity_Attribute from the following conditions:

Temporal_Activity_Attribute＝Past_Frame_Activity+Future_Frame_Activity+Temporal_Activity_Attribute = Past_Frame_Activity + Future_Frame_Activity +

Current_Frame_Activity，Current_Frame_Activity,

在此Past_Frame_Activity（PFA）等于Future_Frame_Activity（FFA）等于而Current_Frame_Activity（CFA）等于avgFrameSAD（current）。Here Past_Frame_Activity(PFA) is equal to Future_Frame_Activity (FFA) is equal to And Current_Frame_Activity (CFA) is equal to avgFrameSAD (current).

一些实施例修改Temporal_Activity_Attribute的计算以便Past_Frame_Activity和Future_Frame_Activity均不会过度控制Temporal_Activity_Attribute的值。例如，一些实施例初始定义PFA等于 $Σ_{i = 1}^{N} (W_{i} \cdot avgFrameSAD (i)),$ 而FFA等于 $Σ_{j = 1}^{M} (W_{j} \cdot avgFrameSAD (j)) .$ Some embodiments modify the calculation of Temporal_Activity_Attribute so that neither Past_Frame_Activity nor Future_Frame_Activity over-dominates the value of Temporal_Activity_Attribute. For example, some embodiments initially define PFA equal to $Σ_{i = 1}^{N} (W_{i} &Center Dot; avgFrameSAD (i)),$ while FFA is equal to $Σ_{j = 1}^{m} (W_{j} &Center Dot; avgFrameSAD (j)) .$

这些实施例接着判断PFA是否大于标量时间FFA。如果是的话，这些实施例就将PFA设置为等于PFA上限值（例如，标量时间FFA）。除了设置PFA等于PFA上限值，一些实施例可以执行将FFA设置为零以及将CFA设置为零的组合设置。其他的实施例可以将PFA和CFA之一或二者设置为PFA、CFA、以及FFA的加权组合。These embodiments then determine whether PFA is greater than scalar time FFA. If so, these embodiments set the PFA equal to the upper PFA value (eg, scalar time FFA). In addition to setting PFA equal to the upper PFA value, some embodiments may perform a combined setting of FFA to zero and CFA to zero. Other embodiments may set one or both of PFA and CFA as a weighted combination of PFA, CFA, and FFA.

与之类似，在基于加权总和初始定义了PFA和FFA值之后，一些实施例还判断FFA值是否大于标量时间PFA。如果是的话，这些实施例就将FFA设置为等于FFA上限值（例如，标量时间PFA）。除了设置FFA等于FFA上限值，一些实施例可以执行将PFA设置为零以及将CFA设置为零的组合设置。其他的实施例可以将FFA和CFA之一或二者设置为FFA、CFA、以及PFA的加权组合。Similarly, after initially defining the PFA and FFA values based on the weighted sum, some embodiments also determine whether the FFA value is greater than the scalar time PFA. If so, these embodiments set the FFA equal to the FFA upper limit (eg, scalar time PFA). In addition to setting FFA equal to the upper FFA value, some embodiments may perform a combined setting of PFA to zero and CFA to zero. Other embodiments may set one or both of FFA and CFA as a weighted combination of FFA, CFA, and PFA.

PFA和FFA值的潜在后续调整（在基于加权总和对这些值进行初始估算之后）防止了这些值的任一个对Temporal_Activity_Attribute的过度控制。Potential subsequent adjustments of the PFA and FFA values (after the initial estimation of these values based on a weighted sum) prevent either of these values from overdoing the Temporal_Activity_Attribute.

b）限制Spatial_Activity_Attribute和Temporal_Activity_Attribute对Activity_Attribute的影响以上的公式（C）基本从以下条件表述Activity_Attribute:Activity_Attribute＝Spatia_Activity+Temporal_ctivityb) Limiting the influence of Spatial_Activity_Attribute and Temporal_Activity_Attribute on Activity_Attribute The above formula (C) basically expresses Activity_Attribute from the following conditions: Activity_Attribute=Spatia_Activity+Temporal_ctivity

其中，Spatial_Activity等于scalar*（scalar*Spatial_Activity_Attribute）^β，而Temporal_Activity等于scalar*（scalar*Temporal_Activity_Attribute）^Δ。Among them, Spatial_Activity is equal to scalar*(scalar*Spatial_Activity_Attribute) ^β , and Temporal_Activity is equal to scalar*(scalar*Temporal_Activity_Attribute) ^Δ .

一些实施例修改Activity_Attribute的计算以便Spatial_Activity和Temporal_Activity任一个都不会过度控制Activity_Attribute的值。例如，一些实施例初始定义Spatial_Activity（SA）等于scalar*（scalar*Spatial_Activity_Attribute）^β，以及定义Temporal_Activity（TA）等于scalar*（scalar*Temporal_Activity_Attribute）^Δ。Some embodiments modify the calculation of Activity_Attribute so that neither Spatial_Activity nor Temporal_Activity overdominates the value of Activity_Attribute. For example, some embodiments initially define Spatial_Activity(SA) equal to scalar*(scalar*Spatial_Activity_Attribute) ^β , and define Temporal_Activity(TA) equal to scalar*(scalar*Temporal_Activity_Attribute) ^Δ .

这些实施例接着判断是否大于标量时间TA果是的话，这些实施例就将SA设置为等于SA上限值（例如，标量时间TA）。除了设置SA等于SA上限的这种情况之外，一些实施例还可以将TA值设置为零或设置为TA和SA的加权组合。These embodiments then determine whether it is greater than a scalar time TA, and if so, these embodiments set SA equal to an upper SA value (eg, a scalar time TA). In addition to the case where SA is set equal to the SA upper limit, some embodiments may also set the TA value to zero or to a weighted combination of TA and SA.

与之类似，在基于指数方程初始定义SA和TA值之后，一些实施例还判断TA值是否大于标量时间SA。如果是的话，这些实施例就将TA设置为等于TA上限值（例如，标量时间SA）。除了设置TA等于TA上限的这种情况之外，一些实施例还可以将SA值设置为零或设置为SA和TA的加权组合。Similarly, after initially defining SA and TA values based on exponential equations, some embodiments also determine whether the TA value is greater than a scalar time SA. If so, these embodiments set TA equal to the TA ceiling value (eg, scalar time SA). In addition to the case where TA is set equal to the TA upper limit, some embodiments may also set the SA value to zero or to a weighted combination of SA and TA.

SA和TA值的潜在后续调整（在基于指数方程对这些值进行初始计算之后）防止了这些值之一对Activity_Attribute的过度控制。Potential subsequent adjustments of the SA and TA values (after the initial calculation of these values based on exponential equations) prevent excessive domination of the Activity_Attribute by one of these values.

B.计算宏块级屏蔽强度B. Calculation of macroblock-level masking strength

1.第一种方法1. The first method

在一些实施例中，宏块级屏蔽强度φ_MB(k,m)如下计算：In some embodiments, the macroblock-level masking strength φ _MB (k,m) is calculated as follows:

φ_MB(k，m)＝A*power(C*avgMbLuma(k，m)，β)*power(B*MbSAD(k，m)，α_MB)，(F)φ _MB (k, m) = A*power(C*avgMbLuma(k, m), β)*power(B*MbSAD(k, m), α _MB ), (F)

其中：in:

avgMbLuma（k，m）为帧k、宏块m内的平均像素强度；α_MB、β、A、B、和C为常数和/或适合于本地统计。avgMbLuma(k,m) is the average pixel intensity within frame k, macroblock m; α _MB , β, A, B, and C are constants and/or fit local statistics.

2.第二种方法2. The second method

以上所述的公式（F）基本上如下计算宏块屏蔽强度：Formula (F) described above basically calculates the macroblock masking strength as follows:

φ_MB(k，m)＝D*power(E*Mb_Brightness_Attribute,exponent0)*φ _MB (k, m) = D*power(E*Mb_Brightness_Attribute, exponent0)*

power(scalar*Mb_Spatial_Activity_Attribute，cxponentl)power(scalar*Mb_Spatial_Activity_Attribute, cxponentl)

在公式（F）中，宏块的Mb_Brightness_Attribute等于avgMbLuma（k，m），而Mb_Spatial_Activity_Attribute等于avgMbSAD（k）。该Mb_Spatial_Activity_Attribute度量了正被编码的宏块内的像素区域中的空间修正的数量。In formula (F), Mb_Brightness_Attribute of a macroblock is equal to avgMbLuma(k, m), and Mb_Spatial_Activity_Attribute is equal to avgMbSAD(k). The Mb_Spatial_Activity_Attribute measures the amount of spatial modification in the area of pixels within the macroblock being coded.

正如在帧屏蔽强度的情况下一样，一些实施例可以扩展宏块屏蔽强度中的活动度量以包含穿过许多连续帧的像素区域中的时间修正的数量。特别的，这些实施例将如下所示计算宏块屏蔽强度：As in the case of frame masking strength, some embodiments may expand the activity metric in macroblock masking strength to include the number of temporal corrections in pixel regions across many consecutive frames. In particular, these embodiments will calculate the macroblock masking strength as follows:

power(scalar*Mb_Activity_Attribute，exponentl)，(G)power(scalar*Mb_Activity_Attribute, exponentl), (G)

其中Mb_Activity_Attribute由以下公式（H）给出：where Mb_Activity_Attribute is given by the following formula (H):

Mb_Activity_Attribute＝F*power(D*Mb_Spatial_Activity_Attribute,exponent_beta)+Mb_Activity_Attribute＝F*power(D*Mb_Spatial_Activity_Attribute,exponent_beta)+

G*power(F*Mb_Temporal_Activity_Attribue,exponent_dclta)(H)G*power(F*Mb_Temporal_Activity_Attribute,exponent_dclta)(H)

宏块的Mb_Temporal_Activity_Attribute的计算可以与以上所述帧的Mb_Temporal_Activity_Attribute的计算相类似。例如，在这些实施例的一些中，Mb_Temporal_Activity_Attribute由以下公式（I）提供：The calculation of Mb_Temporal_Activity_Attribute of a macroblock may be similar to the calculation of Mb_Temporal_Activity_Attribute of a frame described above. For example, in some of these embodiments, Mb_Temporal_Activity_Attribute is provided by the following formula (1):

$Mb MB__Temporal Temporal__Activity Activity__Attribute Attribute = =$

${Σ Σ}_{i i = = 11}^{N N} (({W W}_{i i} \cdot &Center Dot; MbSAD MbSAD ((i i,, m m)))) + + {Σ Σ}_{j j = = 11}^{M m} (({W W}_{j j} \cdot &Center Dot; MbSAD MbSAD ((j j,, m m)))) + + MbSAD MbSAD ((m m)) - - - - - - ((I I))$

公式（I）中的变量在第III部分中定义。在公式（F）中，帧I或j中的宏块m可以是如与当前帧中宏块m的相同位置中的宏块，或可以是初始预测为对应当前帧中的宏块m的帧i或j中的宏块。The variables in formula (I) are defined in Section III. In formula (F), macroblock m in frame i or j may be a macroblock in the same position as macroblock m in the current frame, or may be a frame initially predicted to correspond to macroblock m in the current frame macroblock in i or j.

由公式（I）提供的Mb_Temporal_Activity_Attribute可以以与公式（D）所提供的帧Temporal_Activity_Attribute的修改（在以上第III.A.3部分中所讨论的）相类似的方式进行修改。特别的，可以修改由公式（I）提供的Mb_Temporal_Activity_Attribute以限制过去和将来帧中的宏块的过度影响。The Mb_Temporal_Activity_Attribute provided by Equation (I) may be modified in a similar manner to the modification of the frame Temporal_Activity_Attribute provided by Equation (D) (discussed in Section III.A.3 above). In particular, the Mb_Temporal_Activity_Attribute provided by formula (I) can be modified to limit the excessive influence of macroblocks in past and future frames.

类似的，由公式（H）所提供的Mb_Activity_Attribute可以以与公式（C）所提供的帧Activity_attribute的修改（在以上第部分中所讨论的）相类似的方式进行修改。特别的，可以修改由公式（H）提供的Mb_Activity_Attribute以限制Mb_Spatial_Activity_Attribute和Mb_Temporal_Activity_Attribute的过度影响。Similarly, the Mb_Activity_Attribute provided by Equation (H) may be modified in a manner similar to the modification of the Frame Activity_attribute provided by Equation (C) (discussed in section above). In particular, Mb_Activity_Attribute provided by formula (H) can be modified to limit the excessive influence of Mb_Spatial_Activity_Attribute and Mb_Temporal_Activity_Attribute.

C.计算屏蔽的QP值C. Calculating masked QP values

基于屏蔽强度（φ_F和φ_MB）值和参考屏蔽强度（φ_R）值，视觉掩蔽处理可通过使用两个函数CalcMQP和CalcMQPforMB计算帧级和宏块级的屏蔽QP值。这两个函数的伪码如下：Based on the masking strength (φ _F and φ _MB ) values and the reference masking strength (φ _R ) value, the visual masking process can calculate frame-level and macroblock-level masking QP values by using two functions CalcMQP and CalcMQPforMB. The pseudocode of these two functions is as follows:

在以上函数中，β_F和β_MB可以是预先设定的常数或适合于本地统计。In the above functions, β _F and β _MB can be preset constants or suitable for local statistics.

IV.多通路编码IV. Multi-pass encoding

图1展示了过程100，其概念性地举例说明了本发明一些实施例的多通路编码方法。正如该图所示，过程100有三个阶段，在以下三个部分中描述。Figure 1 shows a process 100, which conceptually illustrates the multi-pass encoding method of some embodiments of the present invention. As shown in the figure, process 100 has three phases, described in the following three sections.

A.分析和初始QP选择A. Analysis and Initial QP Selection

如图1所示，过程100最初在多通路编码过程的初始分析阶段（即，在通路0期间）计算参考屏蔽强度（φ_R(1)）的初始值和标称量化参数（QP_Nom（1））的初始值（步骤105）。初始参考强度（φ_R(1)）在第一搜索阶段期间使用，而初始标称量化参数（QP_Nom（1））在第一搜索阶段的第一通路期间使用（即，多通路编码过程的通路1期间）。As shown in Figure 1, the process 100 initially calculates initial values for the reference masking strength (φ _R(1) ) and the nominal quantization parameter (QP _{Nom (1 )} ) initial value (step 105). The initial reference strength (φ _R(1) ) is used during the first search stage, while the initial nominal quantization parameter (QP _Nom(1) ) is used during the first pass of the first search stage (i.e., the during pathway 1).

在通路0之初，φ_R(0)可以是某些任意值或基于实验结果选择的值（例如，φ_R值的典型范围的中间值）。在序列的分析期间，针对每帧计算屏蔽强度φ_F(k)，然后在通路0的结束设置参考屏蔽强度φ_R(1)等于avg（φ_F(k)）。对参考屏蔽强度φ_R的其他判定也是可能的。例如，它可以计算作为值φ_F(k)的中间值或其他算术函数，例如值φ_F(k)的加权平均值。At the beginning of pass 0, φ _R(0) may be some arbitrary value or a value chosen based on experimental results (eg, the middle value of a typical range of φ _R values). During the analysis of the sequence, the masking strength φ _F (k) is calculated for each frame, then at the end of pass 0 the reference masking strength φ _{R(1 )} is set equal to avg(φ _F (k)). Other determinations of the reference shielding strength φ _R are also possible. For example, it may calculate an intermediate value or other arithmetic function as a value φ _F (k), such as a weighted average of the values φ _F (k).

存在使用变化的复杂度进行初始QP选择的几种方法。例如，初始标称QP可以选择为如任意值（例如26）。可选的，可以基于编码实验选择已知的值以针对目标比特率生成可接受的质量。There are several approaches to initial QP selection with varying complexity. For example, the initial nominal QP can be chosen as an arbitrary value (eg 26). Alternatively, known values may be chosen based on encoding experiments to produce acceptable quality for the target bitrate.

初始标称QP值也可以基于空间解决方案、帧速率、空间/时间复杂度、以及目标比特率从查询表中选择。在一些实施例中，该初始标称QP值使用依赖于这些参数中的每一个的距离度量从表中选择，或者它可以利用这些参数的加权距离度量选择。An initial nominal QP value may also be selected from a look-up table based on spatial resolution, frame rate, space/time complexity, and target bitrate. In some embodiments, this initial nominal QP value is selected from a table using a distance metric dependent on each of these parameters, or it may be selected using a weighted distance metric for these parameters.

该初始标称QP值还可以如它们在使用速率控制器快速编码期间（无屏蔽）所选择的那样设置为帧QP值的调整平均值，其中该平均值已经基于通路0的比特率百分比速率误差E₀调整。类似的，初始标称QP也可以设置为帧QP值的加权调整平均值，其中每个帧的权重由没有编码为跳跃宏块的宏块在这个帧中的百分比确定。可选的，初始标称QP可以如它们在使用速率控制器快速编码期间（带屏蔽）所选择的那样设置为帧QP值的调整平均值或调整加权平均值，同时考虑了参考屏蔽强度从φ_R(0)改变到φ_R(1)的效应。This initial nominal QP value can also be set as an adjusted average of the frame QP values as they were chosen during fast encoding using the rate controller (without masking), where this average has been based on the bitrate percent rate error for lane 0 E ₀ adjustment. Similarly, the initial nominal QP can also be set as a weighted adjusted average of frame QP values, where each frame's weight is determined by the percentage of macroblocks in that frame that are not coded as skipped macroblocks. Optionally, the initial nominal QP can be set as an adjusted average or adjusted weighted average of the frame QP values as they are chosen during fast encoding using the rate controller (with masking), taking into account the reference masking strength from φ The effect of changing _R(0) to _φR(1) .

B.快速搜索阶段：标称QP调整B. Fast Search Phase: Nominal QP Adjustment

步骤105之后，多通路编码过程100进入第一搜索阶段。在第一搜索阶段，过程100执行序列的N₁编码，其中N₁代表通过第一搜索阶段的通路数。在第一阶段的每个通路期间，该过程使用具有恒定参考屏蔽强度的变动标称量化参数。After step 105, the multi-pass encoding process 100 enters the first search phase. In the first search stage, process 100 performs _N1 encoding of the sequence, where _N1 represents the number of passes through the first search stage. During each pass of the first stage, the process uses a varying nominal quantization parameter with a constant reference masking strength.

特别的，在第一级搜索阶段的每个通路p期间，过程100计算（步骤107）每个帧k的特定量化参数MQP_p（k），以及计算帧k内的每个单独宏块m的特定量化参数MQP_MB（p）（k，m）。给定标称量化参数QP_Nom（p）和参考屏蔽强度φ_R(p)的参数MQP_p（k）和MQP_MB（p）（k，m）的计算在第III部分中描述（其中MQP_p（k）和MQP_MB（p）（k，m）是通过利用函数CalcMQP和CalcMQPforMB计算的，这在以上的部分III中描述）。在通过步骤107的第一通路（即，通路1）中，标称量化参数和第一阶段参考屏蔽强度为参数QP_Nom（1）和参考屏蔽强度φ_R(1)，它们在初步分析阶段105期间计算。In particular, during each pass p of the first-level search phase, the process 100 computes (step 107) the specific quantization parameter MQP _p (k) for each frame k, and computes the Specific quantization parameters MQP _MB(p) (k,m). The calculation of the parameters MQP _p (k) and MQP _{MB (p)} (k,m) given the nominal quantization parameter QP _{Nom (p)} and the reference masking strength φ _{R (p)} is described in Section III (where MQP _p (k) and MQP _MB(p) (k,m) are calculated by using the functions CalcMQP and CalcMQPforMB, which are described in Section III above). In the first pass (i.e., pass 1) through step 107, the nominal quantization parameter and the first-stage reference masking strength are the parameter QP _Nom(1) and the reference masking strength φ _R(1) , which were obtained in the preliminary analysis stage 105 period calculation.

步骤107之后，该过程基于在步骤107计算的量化参数值编码该序列（步骤110）。接下来，编码过程100判断其是否应该结束（步骤115）。不同的实施例具有结束整个编码过程的不同条件。完全结束多通路编码过程的退出条件的例子包括：After step 107, the process encodes the sequence based on the quantization parameter values calculated at step 107 (step 110). Next, the encoding process 100 determines whether it should end (step 115). Different embodiments have different conditions for ending the entire encoding process. Examples of exit conditions that completely end the multipass encoding process include:

●|Ep|<ε，其中ε为最终比特率中的误差容许范围。• |Ep|<ε, where ε is the error tolerance in the final bit rate.

●QP_Nom（p）为QP值有效范围的上边界和下边界。●QP _{Nom (p)} is the upper boundary and lower boundary of the valid range of QP value.

●通路的数量超过了允许的最大通路数P_MAX。• The number of paths exceeds the allowed maximum number of paths P _MAX .

一些实施例可能使用所有的这些退出条件，而其他实施例可能仅使用它们中的一些。然而其他的实施例可能使用其他的用于结束编码过程的退出条件。Some embodiments may use all of these exit conditions, while other embodiments may only use some of them. However other embodiments may use other exit conditions for ending the encoding process.

当多通路编码过程决定结束（步骤115），过程100省略第二搜索阶段并转移到步骤145。在步骤145，该过程保存来自最后的通路p的比特流作为最终结果，然后结束。When the multi-pass encoding process decides to end (step 115 ), process 100 omits the second search stage and moves to step 145 . At step 145, the process saves the bitstream from the last pass p as the final result and ends.

另一方面，当该过程确定（步骤115）不能结束，其接着确定（步骤120）是否应当结束第一搜索阶段。同样，不同的实施例具有结束第一搜索阶段的不同条件。结束多通路编码过程的第一搜索阶段的退出条件的例子包括：On the other hand, when the process determines (step 115 ) that it cannot end, it then determines (step 120 ) whether the first search phase should end. Also, different embodiments have different conditions for ending the first search phase. Examples of exit conditions that end the first search phase of the multi-pass encoding process include:

●QP_Nom（p+1）与QP_Nom（q）相同，并且q≤p，（在此情况下，比特率中的误差不能再通过修改标称QP进一步降低）。• QP _Nom(p+1) is the same as QP _Nom(q) , and q≤p, (in this case, the error in the bit rate cannot be further reduced by modifying the nominal QP).

●|Ep|<ε_C，ε_C>ε，其中ε_C为第一搜索阶段的比特率中的误差允许范围。●|Ep|< _εc , _εc >ε, where _εc is the error tolerance range in the bit rate of the first search stage.

●通路的数量已超过了P₁，其中P₁小于P_MAX。• The number of paths has exceeded P ₁ , where P ₁ is less than P _MAX .

●通路的数量已超过了P₂，其小于P₁，并且|Ep|<ε₂，ε₂>ε_C。• The number of paths has exceeded P ₂ , which is smaller than P ₁ , and |Ep|<ε ₂ , ε ₂ >ε _C .

一些实施例可能使用所有这些退出条件，而其实施例可能仅使用它们中的一些。然而其他的实施例可能使用其他的用于结束第一搜索阶段的退出条件。Some embodiments may use all of these exit conditions, while other embodiments may only use some of them. However other embodiments may use other exit conditions for ending the first search phase.

当多通路编码过程决定（步骤120）结束第一搜索阶段时，过程100继续到第二搜索阶段，其在以下部分中描述。另一方面，当过程确定（步骤120）其不应结束第一搜索阶段时，它就在第一搜索阶段中更新（步骤125）下一通路的标称QP（即，定义QP_Nom（p+1））。在一些实施例中，标称QP_Nom（p+1）如下更新。在通路1的结束，这些实施例定义：When the multi-pass encoding process decides (step 120) to end the first search phase, the process 100 proceeds to the second search phase, which is described in the following sections. On the other hand, when the process determines (step 120) that it should not end the first search phase, it updates (step 125) the nominal QP of the next pass in the first search phase (i.e., defines QP _{Nom(p+ 1)} ). In some embodiments, the nominal QP _Nom(p+1) is updated as follows. At the end of Passage 1, these examples define:

QP_Nom（p+1）=QP_Nom（p）+χE_p，QP _Nom(p+1) =QP _Nom(p) +χE _p ,

其中χ为常数。在从通路2到通路N₁的每个通路的结束，这些实施例于是定义：where χ is a constant. At the end of each pass from pass 2 to pass N ₁ , these embodiments then define:

QP_Nom（p+1)＝InterpExtrap（0，E_q1，E_q2，QP_Nom（q1），QP_Nom（q2）），QP _Nom(p+1) = InterpExtrap(0, E _q1 , E _q2 , QP _Nom(q1) , QP _Nom(q2) ),

其中InterpExtrap为如下进一步描述的函数。同样，在以上公式中，q1和q2为对应具有直到通路p的所有通路中比特误差最低的通路数，而且q1、q2和p具有以下关系：where InterpExtrap is a function as further described below. Likewise, in the above formula, q1 and q2 correspond to the number of paths with the lowest bit error among all paths up to path p, and q1, q2 and p have the following relationship:

1≤q₁<q₂≤p1≤q ₁ <q ₂ ≤p

以下为InterpExtrap函数的伪码。注意，如果x不在x1和x2之间，这个函数就为外推函数。否则，其为插值函数。The following is the pseudocode of the InterpExtrap function. Note that if x is not between x1 and x2, this function is an extrapolation function. Otherwise, it is an interpolation function.

标称QP值通常四舍五入为整数值并限制在QP值的有效范围之内。本领域普通技术人员将认识到其他实施例可以以不同于以上所述的方法来计算标称QP_Nom（p+1）。Nominal QP values are usually rounded to integer values and limited to the valid range of QP values. One of ordinary skill in the art will recognize that other embodiments may calculate the nominal QP _Nom(p+1) in a different way than that described above.

在步骤125之后，该过程转移回到步骤107以开始下一通路（即，p:=p+1），并且对于这个通路，针对当前通路p计算每个帧k的特定量化参数MQP_p（k），以及帧k内的每个单独的宏块m的特定量化参数MQP_MB（p）（k，m）（步骤107）。接下来，该过程基于这些新近计算的量化参数编码帧序列（步骤110）。该过程接着由步骤110转移步骤115，其已在上面描述。After step 125, the process transfers back to step 107 to start the next pass (i.e., p:=p+1), and for this pass, the specific quantization parameter MQP _p (k ), and a specific quantization parameter MQP _MB(p) (k,m) for each individual macroblock m within frame k (step 107). Next, the process encodes the sequence of frames based on these newly calculated quantization parameters (step 110). The process then transitions from step 110 to step 115, which has been described above.

C.第二搜索阶段：参考屏蔽强度调整C. Second search stage: reference shielding strength adjustment

当过程100确定其应当结束第一搜索阶段时（步骤120），它转移到步骤130。在第二搜索阶段，过程100执行序列的N₂编码，在此N₂代表通过第二搜索阶段的通路数。在每个通路期间，该过程使用相同的标称量化参数和变化的参考屏蔽强度。When process 100 determines that it should end the first search phase (step 120 ), it transfers to step 130 . In the second search stage, process 100 performs _N2 encoding of the sequence, where _N2 represents the number of passes through the second search stage. The process uses the same nominal quantization parameters and varying reference masking strengths during each pass.

在步骤130，过程100计算下一通路，即通路p+1，其为通路N₁+1，的参考屏蔽强度φ_R(p+1)。在通路N₁+1中，过程100在步骤135中编码帧序列。不同的实施例以不同的方式在通路p的结束计算参考屏蔽强度φ_R(p+1)（步骤130）。以下描述了两种可选的实现方法。At step 130, process 100 calculates the reference shielding strength φ _R(p+1) for the next pass, pass p+1, which is pass N ₁ +1 . In pass N ₁ +1 , process 100 encodes a sequence of frames in step 135 . Different embodiments calculate the reference shielding strength φ _R(p+1) (step 130 ) at the end of pass p in different ways. Two alternative implementation methods are described below.

一些实施例基于来自先前的通路的比特率中的误差和φ_R的值计算参考屏蔽强度φ_R(p)。例如，在通路N₁的结束，一些实施例定义：Some embodiments calculate the reference masking strength φ _R(p) based on the error in the bit rate and the value of φ _R from previous passes. For example, at the end of path _N1 , some embodiments define:

φ_R(N1+1)＝φ_R(N1)+φ_R(N1)×Konst×E_N1 φ _R(N1+1) ＝φ _R(N1) +φ _R(N1) × Konst × E _N1

在通路N1+m的结束处，此处m为大于1的整数，一些实施例定义At the end of the path N1+m, where m is an integer greater than 1, some embodiments define

φ_R(N1+m)＝InterpExtrap(0，E_N1+m-2，E_N1+m-1，φ_R(N1+m-2)，φ_R(N1+m-1))φ _R(N1+m) ＝ InterpExtrap(0, E _N1+m-2 , E _N1+m-1 , φ _R(N1+m-2) , φ _R(N1+m-1) )

或者，一些实施例定义：Alternatively, some embodiments define:

φ_R(N1+m)＝InterpExtrap(0，E_N1+m-q2，E_N1+m-q1，φ_R(N1+m-q2)，φ_R(N1+m·q1))φ _R(N1+m) ＝ InterpExtrap(0, E _N1+m-q2 , E _N1+m-q1 , φ _R(N1+m-q2) , φ _R(N1+m·q1) )

其中q1和q2为之前给出最优误差的通路。where q1 and q2 are the paths that gave the optimal error before.

其他实施例通过利用AMQP在第二搜索阶段在每个通路的结束计算参考屏蔽强度，其在第I部分中定义。以下将参考函数GetAvgMaskedQP的伪码描述给定标称QP和φ_R的一些值用于计算AMQP的一种方式：Other embodiments compute the reference masking strength at the end of each pass in the second search phase by utilizing AMQP, which is defined in Section I. The following describes one way of computing AMQP given a nominal QP and some value of _φR with reference to the pseudocode of the function GetAvgMaskedQP:

一些使用AMQP的实施例基于来自之前通路的比特率中的误差和AMQP的值计算通路p+1所期望的AMQP。对应于这个AMQP的φ_R(p+1)于是通过由函数Search（AMQP_（p+1），φ_R(p)）给出的搜索过程而找到，该函数的伪码在本部分的最后给出。Some embodiments using AMQP calculate the desired AMQP for pass p+1 based on the error in the bit rate from the previous pass and the value of AMQP. The φ _R(p+1) corresponding to this AMQP is then found by the search procedure given by the function Search(AMQP _(p+1) , φ _R(p) ), the pseudocode of which is given at the end of this section out.

例如，一些实施例在通路N₁的结束计算AMQP_N1+1，其中：For example, some embodiments compute AMQP _N1+1 at the end of pass _N1 , where:

AMQP_N1+1＝InterpExtrap(0，E_N1-1，E_N1，AMQP_N1-1，AMQP_N1)，when N₁＞1，AMQP _N1+1 = InterpExtrap(0, E _N1-1 , E _N1 , AMQP _N1-1 , AMQP _N1 ), when N ₁ >1,

并且and

AMQP_N1+1＝AMQP_N1，whenN₁＝1，AMQP _N1+1 =AMQP _N1 , when N ₁ =1,

这些实施例于是定义：These examples then define:

φ_R(N1+1)＝Search(AMQP_N1+1，φ_R(N1))φ _R(N1+1) ＝Search(AMQP _N1+1 ，φ _R(N1) )

在通路N₁+m（其中m为大于1的整数）的结束，一些实施例定义：At the end of the path N ₁ +m (where m is an integer greater than 1), some embodiments define:

AMQP_N1+m＝InterpExtrap(0，E_N1+m-2,E_N1+m-1，AMQP_N1+m-2，A·MQP_N1+m-1)，AMQP _N1+m = InterpExtrap(0, E _N1+m-2 , E _N1+m-1 , AMQP _N1+m-2 , A·MQP _N1+m-1 ),

以及as well as

φ_R(N1+m)＝Search(AMQP_N1+m，φ_R(N1+m-1))φ _R(N1+m) ＝Search(AMQP _N1+m ，φ _R(N1+m-1) )

给定所期望的AMQP和φ_R的一些默认值，对应于所期望的AMQP的φ_R可以利用Search函数找到，该函数在一些实施例中具有以下伪码：Given the desired AMQP and some default values for _φR , _φR corresponding to the desired AMQP can be found using the Search function, which in some embodiments has the following pseudocode:

在以上伪码中，数字10、12和0.05可以使用适当选择的阈值代替。In the pseudocode above, the numbers 10, 12 and 0.05 can be replaced with appropriately chosen thresholds.

在通过编码帧序列计算了下一通路（通路p+1）的参考屏蔽强度之后，过程100就转移到步骤132并开始下一个通路（即，p:=p+1）。在每个编码通路p期间，对于每个帧k和每个宏块m，该过程计算每个帧k的特定量化参数MQP_p（k）以及帧k中的单独宏块m的特定量化参数MQP_MB（p）（k，m）（步骤132）。给定标称量化参数QP_Nom _（p）和参考屏蔽强度φ_R(p)的参数MQP_p（k）和MQP_MB（p）（k，m）的计算在第III部分中描述（其中MQP_p（k）和MQP_MB（p）（k，m）通过利用函数CalcMQP和CalcMQPforMB计算，这在以上第III部分中描述）。在通过步骤132的第一通路期间，参考屏蔽强度正是在步骤130处计算的数值。同样，在第二搜索阶段期间，标称QP在整个第二搜索阶段保持为常数。在一些实施例中，第二搜索阶段之内的标称QP为第一搜索阶段期间由最优编码解决方案（即，在具有最低比特率误差的编码解决方案中）所得到的标称QP。After calculating the reference masking strength for the next pass (pass p+1) through the sequence of encoded frames, the process 100 moves to step 132 and starts the next pass (ie, p:=p+1). During each encoding pass p, for each frame k and each macroblock m, the process computes the specific quantization parameter MQP _p (k) for each frame k and the specific quantization parameter MQP for the individual macroblock m in frame k _MB(p) (k,m) (step 132). The calculation of the parameters MQP _p (k) and MQP _{MB (p)} (k,m) given the nominal quantization parameter QP _Nom _(p) and the reference masking strength φ _{R (p)} is described in Section III (where MQP _p (k) and MQP _MB(p) (k, m) are calculated by utilizing the functions CalcMQP and CalcMQPforMB, which are described in Section III above). During the first pass through step 132 , the reference shielding strength is exactly the value calculated at step 130 . Also, during the second search phase, the nominal QP remains constant throughout the second search phase. In some embodiments, the nominal QP within the second search phase is the nominal QP obtained by the optimal encoding solution (ie, in the encoding solution with the lowest bit rate error) during the first search phase.

在步骤132之后，该过程利用在步骤130处计算的量化参数编码帧序列（步骤135）。在步骤135之后，该过程确定（步骤140）是否应当结束第二搜索阶段。不同的实施例使用不同的条件用于在通路p的结束处结束第一搜索阶段。这种条件的例子为：After step 132, the process encodes the sequence of frames using the quantization parameters calculated at step 130 (step 135). After step 135, the process determines (step 140) whether the second search phase should end. Different embodiments use different conditions for ending the first search phase at the end of pass p. Examples of such conditions are:

●通路的数量超过了所允许的最大通路数P_MAX。• The number of paths exceeds the allowed maximum number of paths P _MAX .

一些实施例可能使用所有的这些退出条件，而其他实施例可能仅使用它们中的一些。然而其他的实施例可能使用其他的用于结束第一搜索阶段的退出条件。Some embodiments may use all of these exit conditions, while other embodiments may only use some of them. However other embodiments may use other exit conditions for ending the first search phase.

当过程100确定（步骤140）不应当结束第二搜索阶段时，其返回到步骤130以重新计算下一编码通路的参考屏蔽强度。该过程从步骤130转移到步骤132以计算量化参数，然后转移到步骤135以通过利用新近计算的量化参数编码视频序列。When the process 100 determines (step 140) that the second search phase should not end, it returns to step 130 to recalculate the reference masking strength for the next encoding pass. The process moves from step 130 to step 132 to calculate a quantization parameter, and then to step 135 to encode the video sequence by using the newly calculated quantization parameter.

另一方面，当该过程决定（步骤140）结束第二搜索阶段时，则其转移到步骤145。在步骤145，过程100保存来自最后一个通路p的比特流作为最终结果，然后就结束。On the other hand, when the process decides (step 140 ) to end the second search phase, then it transfers to step 145 . At step 145, the process 100 saves the bitstream from the last pass p as the final result, and then ends.

V.解码器输入缓冲区下溢控制V. Decoder input buffer underflow control

本发明的一些实施例提供对目标比特率检查视频序列的各种编码的多通路编码过程，为了识别有关由解码器使用的输入缓冲区的使用的最优编码方案。在一些实施例中，这种多通路过程遵循图1的多通路编码过程100。Some embodiments of the invention provide a multi-pass encoding process that examines various encodings of a video sequence against a target bitrate, in order to identify an optimal encoding scheme with respect to the usage of the input buffer used by the decoder. In some embodiments, this multi-pass process follows the multi-pass encoding process 100 of FIG. 1 .

由于各种因素的变化，例如已编码图像的大小、解码器接收已编码数据所使用的速度、解码器缓冲区的大小、解码过程的速度等方面的变动，解码器输入缓冲区（“解码器缓冲区”）的使用在解码已编码图片序列（例如，帧）的过程中在一定程度上变动。Decoder input buffers ("decoder Buffer") usage varies to some extent during the decoding of a sequence of encoded pictures (eg, frames).

解码器缓冲区下溢在图像已经完全到达解码器端之前解码器准备解码下一图像的情况下颇为重要。一些实施例的多通路编码器模拟解码器缓冲区并重新编码序列中所选择的片段以防止解码器缓冲区下溢。Decoder buffer underflow is important in cases where the decoder is ready to decode the next picture before the picture has fully arrived at the decoder side. The multi-pass encoder of some embodiments simulates a decoder buffer and re-encodes selected segments in the sequence to prevent decoder buffer underflow.

图2概念性举例说明了本发明一些实施例的编码系统200。该系统包括解码器205和编码器210。在该图中，编码器210具有多个使其能够模拟解码器205的类似组件的操作的组件。Figure 2 conceptually illustrates an encoding system 200 of some embodiments of the invention. The system includes a decoder 205 and an encoder 210 . In this figure, encoder 210 has a number of components that enable it to simulate the operation of similar components of decoder 205 .

特别的，解码器205具有输入缓冲区215、解码过程220、以及输出缓冲区225。解码器210通过维护模拟解码器输入缓冲区230、模拟解码过程235、以及模拟解码器输出缓冲区240来模拟这些模块。为了不妨碍本发明的描述，简化图2以将解码过程220和编码过程245显示为单个的块。同样，在一些实施例中，没有利用模拟解码过程235和模拟解码器输出缓冲区240用于缓冲区下溢管理，从而在本图中仅出于举例而示意。In particular, the decoder 205 has an input buffer 215 , a decoding process 220 , and an output buffer 225 . Decoder 210 simulates these modules by maintaining simulated decoder input buffer 230 , simulated decoding process 235 , and simulated decoder output buffer 240 . In order not to obstruct the description of the present invention, FIG. 2 is simplified to show decoding process 220 and encoding process 245 as a single block. Also, in some embodiments, analog decoding process 235 and analog decoder output buffer 240 are not utilized for buffer underflow management and are thus illustrated in this figure by way of example only.

解码器维护输入缓冲区215以消除输入的编码图像的速率和到达时间的变化。如果解码器用完了数据（下溢）或填满了输入缓冲区（上溢）的话，就会有例如图片解码中断或输入的数据被丢弃的可视的解码中断。这两种情况都是不期望的。The decoder maintains an input buffer 215 to smooth out rate and time-of-arrival variations of incoming encoded pictures. If the decoder runs out of data (underflow) or fills up the input buffer (overflow), there will be visual decoding interruptions such as picture decoding interruption or input data being discarded. Both of these situations are undesirable.

为了消除下溢条件，在一些实施例中编码器210首先编码图像序列并将它们存储到存储器255。例如，编码器210使用多通路编码过程100以获取图像序列的第一编码。然后它模拟解码器输入缓冲区215并且重新编码可能导致缓冲区下溢的图像。在所有缓冲区下溢条件都消除之后，通过连接255将重新编码的图像提供给解码器205，连接255可以是网络连接（因特网、电缆、PSTN线路等），非网络直接连接，媒体（DVD等）等。To eliminate the underflow condition, encoder 210 first encodes the sequence of images and stores them to memory 255 in some embodiments. For example, the encoder 210 uses the multi-pass encoding process 100 to obtain a first encoding of the sequence of images. It then simulates the decoder input buffer 215 and re-encodes images that may cause buffer underflow. After all buffer underflow conditions have been eliminated, the re-encoded image is provided to decoder 205 via connection 255, which can be a network connection (Internet, cable, PSTN line, etc.), non-network direct connection, media (DVD, etc. )wait.

图3举例说明了一些实施例的编码器的编码过程300。该过程试图找到不会导致解码器缓冲区下溢的最优编码方案。如图3所示，过程300识别（步骤302）满足所期望目标比特率（例如，序列中满足所期望平均目标比特率的每个图像的平均比特率）的图像序列的第一编码。例如，过程300可以使用（步骤302）多通路编码过程100以获取图像序列的第一编码。Figure 3 illustrates an encoding process 300 of an encoder of some embodiments. This process attempts to find the optimal encoding scheme that does not cause the decoder buffer to underflow. As shown in FIG. 3 , process 300 identifies (step 302 ) a first encoding of a sequence of images that meets a desired target bitrate (eg, an average bitrate of each image in the sequence that meets a desired average target bitrate). For example, process 300 may use (step 302 ) the multi-pass encoding process 100 to obtain a first encoding of a sequence of images.

在步骤302之后，编码过程300通过考虑各种因素，如连接速度（即，解码器用于接收编码数据的速度）、解码器输入缓冲区的大小、所编码图像的大小、解码处理速度等，的变化模拟解码器输入缓冲区215（步骤305）。在步骤310，过程300确定所编码图像的任何片段是否会导致解码器输入缓冲区下溢。编码器用于确定（并随后消除）下溢条件的技术在下面进一步描述。After step 302, the encoding process 300, by considering various factors such as the connection speed (i.e., the speed at which the decoder is used to receive encoded data), the size of the decoder input buffer, the size of the encoded image, the speed of the decoding process, etc., Vary the analog decoder input buffer 215 (step 305). At step 310, process 300 determines whether any segment of the encoded image would cause the decoder input buffer to underflow. The technique used by the encoder to determine (and subsequently eliminate) an underflow condition is described further below.

如果过程300确定（步骤310）所编码图像没有造成下溢条件，该过程结束。另一方面，如果过程300确定（步骤310）在所编码图像的任何片段中存在缓冲区下溢条件的话，其就基于来自先前编码通路的这些参数的值改进编码参数（步骤315）。然后该过程重新编码（步骤320）具有下溢的片段以减小该片段的比特大小。在重新编码该片段之后，过程300检查（步骤325）该片段以确定是否消除了下溢条件。If the process 300 determines (step 310) that the encoded picture did not cause an underflow condition, the process ends. On the other hand, if the process 300 determines (step 310) that a buffer underflow condition exists in any segment of the encoded image, it refines the encoding parameters based on the values of these parameters from the previous encoding pass (step 315). The process then re-encodes (step 320) the segment with underflow to reduce the bit size of the segment. After re-encoding the segment, process 300 checks (step 325) the segment to determine if the underflow condition has been eliminated.

当该过程确定（步骤325）该片段仍会导致下溢时，过程300就转移到步骤315以进一步改进编码参数以消除下溢。可选的，当该过程确定（步骤325）该片段不会导致任何下溢时，该过程就指定（步骤330）用于重新检查并重新编码该视频序列的起始点作为步骤320的上一次迭代中重新编码的片段的结束之后的帧。接下来，在步骤335，该过程重新编码在步骤330所指定的视频序列部分，直到（并排除）在步骤315和320指定的下溢片段随后的第一IDR帧。在步骤335之后，该过程转移回到步骤305以模拟解码器缓冲区以确定余下的视频序列在重新编码之后是否仍就会导致缓冲区下溢。以上描述了过程300从步骤305开始的流程。When the process determines (step 325) that the segment would still result in underflow, process 300 moves to step 315 to further refine the encoding parameters to eliminate underflow. Optionally, when the process determines (step 325) that the segment does not result in any underflow, the process designates (step 330) the starting point for re-examining and re-encoding the video sequence as the last iteration of step 320 Frame after the end of the re-encoded segment in . Next, at step 335 , the process re-encodes the portion of the video sequence specified at step 330 up to (and excluding) the first IDR frame following the underflow segment specified at steps 315 and 320 . After step 335, the process transfers back to step 305 to simulate the decoder buffer to determine whether the remaining video sequence will still cause buffer underflow after re-encoding. The flow of the process 300 starting from step 305 has been described above.

A.确定已编码图像序列中的下溢片段A. Determining Underflow Fragments in an Encoded Image Sequence

如上所述，编码器模拟解码器缓冲区条件以确定已编码或重新编码的图像的序列中的任何片段是否会导致解码器缓冲区中的下溢。在一些实施例中，编码器使用考虑了编码图像的大小、诸如带宽的网络条件、解码器因素（例如，输入缓冲区大小，移除图像的初始和标称时间，解码处理时间，每个图像的显示时间等）的模拟模型。As described above, the encoder simulates decoder buffer conditions to determine whether any segment in the sequence of encoded or re-encoded pictures would cause an underflow in the decoder buffer. In some embodiments, the encoder uses a per-image display time, etc.) of the simulation model.

在一些实施例中，使用MPEG-4AVC编码图片缓冲区（CPB）模型模拟解码器输入缓冲区条件。CPB是在MPEG-4H.264标准中使用的术语，指理想基准解码器（HRD）的模拟输入缓冲区。HRD为指定编码过程可能产生的合格数据流的可变性方面的限制的理想解码器模型。CPB模型是众所周知的，并且出于方便在以下部分1中描述。CPB和HRD的更为详细的描述可以在ITU-T推荐草案和International Standard of Joint Video Specification最终草案（ITU-TRec.H.264/ISO/IEC14496-10AVC）中找到。In some embodiments, decoder input buffer conditions are simulated using the MPEG-4 AVC Coded Picture Buffer (CPB) model. CPB is a term used in the MPEG-4H.264 standard to refer to the analog input buffer of the ideal reference decoder (HRD). The HRD is an ideal decoder model that specifies constraints on the variability of the eligible data streams that the encoding process may produce. The CPB model is well known and is described in Section 1 below for convenience. A more detailed description of CPB and HRD can be found in the draft ITU-T recommendation and the final draft of the International Standard of Joint Video Specification (ITU-TRec.H.264/ISO/IEC14496-10AVC).

1.使用CPB模型模拟解码器缓冲区1. Simulate the decoder buffer using the CPB model

以下段落描述了在一些实施例中是如何使用CPB模型模拟解码器输入缓冲区的。图像n的第一个比特开始进入CPB的时间被称为初始到达时间t_ai(n)，其推导如下：The following paragraphs describe how the CPB model is used to simulate the decoder input buffer in some embodiments. The time at which the first bit of image n starts to enter the CPB is called the initial arrival time t _ai (n), which is derived as follows:

●t_ai(0)=0，当图像为第一图像时(即，图像0)；t _ai (0)=0, when the image is the first image (ie, image 0);

●t_ai(n)=Max(t_af(n-1)，t_ai，earliest(n))，当图像不是正编●t _ai (n)=Max(t _af (n-1), t _ai , earliest(n)), when the image is not regular

码或重新编码的序列中的第一图像时(即，n>0)。When the first image in the encoded or re-encoded sequence (i.e., n>0).

在以上公式中：In the above formula:

●t_ai，earliest(n)=t_r，n(n)-initial_cpb_removal_delay，t _ai , earliest(n)=t _{r, n} (n)-initial_cpb_removal_delay,

其中t_r，n(n)为如下面所指定的图像n从CPB中移除的标称移除时间，而initial_cpb_removal_delay为初始缓冲周期。where t _r,n (n) is the nominal removal time for picture n to be removed from the CPB as specified below, and initial_cpb_removal_delay is the initial buffering period.

图像n的最终到达时间通过下式推导：The final arrival time of image n is derived by the following formula:

t_af(n)=t_ai(n)+b(n)/BitRate，t _af (n)=t _ai (n)+b(n)/BitRate,

其中b(n)为图像n以比特为单位的大小。where b(n) is the size of image n in bits.

在一些实施例中，编码器如下所述进行自身标称移除时间的计算，而非如H.264规范中的那样从比特流的可选部分读取。对于图像0，图像从CPB移除的标称移除时间指定为：In some embodiments, the encoder does its own calculation of the nominal removal time as described below, rather than reading from an optional part of the bitstream as in the H.264 specification. For image 0, the nominal removal time for image removal from the CPB is specified as:

t_r，n(0)=initial_cpb_removal_delayt _{r, n} (0) = initial_cpb_removal_delay

对于图像n(n>0)，图像从CPB移除的标称移除时间指定为：For image n (n>0), the nominal removal time for image removal from the CPB is specified as:

t_r，n(n)=t_r，n(0)+sum_i=0 to n-1(ti)t _r,n (n)=t _r,n (0)+sum _{i=0 to n-1} (ti)

其中t_r,n(n)为图像n的标称移除时间，而t_i为图片i的显示持续时间。where t _r,n (n) is the nominal removal time of image n, and t _i is the display duration of picture i.

图像n的移除时间如下指定：The removal time for image n is specified as follows:

●t_r(n)=t_r，n(n)，当t_r，n(n)>=t_af(n)时，●t _r (n)=t _{r, n} (n), when t _{r, n} (n)>=t _af (n),

●t_r(n)=t_af(n)，当t_r，n(n)<t_af(n)时●t _r (n)=t _af (n), when t _r,n (n)<t _af (n)

后一种情况指示图像n的大小b(n)非常大以至于它阻止了在标称移除时间时移除。The latter case indicates that the size b(n) of image n is so large that it prevents removal at the nominal removal time.

2.下溢片段的检测2. Detection of underflow fragments

如在前面的部分中的描述，编码器能够模拟解码器输入缓冲区状态并在立即给定的时间瞬间获取缓冲区中的比特数量。可选的，编码器能够跟踪每个单独的图像是如何通过其标称移除时间与最终到达时间之间的差异(即，t_b(n)=t_r，n(n)-t_af(n))来改变解码器输入缓冲区状态的。当t_b(n)小于0时，缓冲区就会在时间瞬间t_r，n(n)和t_af(n)之间，并且可能会在t_r，n(n)之前和t_af(n)之后遭遇下溢。As described in the previous section, the encoder is able to simulate the decoder input buffer state and fetch the number of bits in the buffer at an immediate given time instant. Optionally, the encoder can keep track of how each individual image traveled through the difference between its nominal removal time and final arrival time (i.e., t _b (n) = t _{r, n} (n) - t _af ( n)) to change the decoder input buffer state. When t _b (n) is less than 0, the buffer will be between time instants t _r,n (n) and t _af (n), and may be before t _r,n (n) and t _af (n ) then encounters underflow.

通过测试t_b(n)是否小于0能够容易地发现直接陷入下溢的图像。然而，t_b(n)小于0的图像并非必然导致下溢，反之导致下溢的图像的t_b(n)不一定小于0。一些实施例通过连续不停地耗尽解码器输入缓冲区直至下溢达到其最低点将下溢片段定义为导致下溢的连续图像(以解码顺序)的伸展。Images that fall directly into underflow can be easily found by testing whether t _b (n) is less than zero. However, an image with t _b (n) smaller than 0 does not necessarily lead to underflow, and conversely, t _b (n) of an image that leads to underflow does not necessarily have to be smaller than 0. Some embodiments define an underflow segment as the stretch of consecutive pictures (in decoding order) that cause underflow by continuously draining the decoder input buffer until the underflow reaches its lowest point.

图4为一些实施例中图像t_b(n)与图像数量的标称移除时间与最终到达时间之间的差别的曲线图。该曲线针对1500个编码图像序列而绘制。图4a示意了以箭头标记其开始和结束的下溢片段。注意图4a中在第一下溢片段之后还发生了另外一个下溢片段，出于简化没有对其使用箭头明显标注。Figure 4 is a graph of the difference between nominal removal time and final arrival time for image t _b (n) versus number of images in some embodiments. The curve is plotted for a sequence of 1500 encoded images. Figure 4a illustrates an underflow segment whose start and end are marked with arrows. Note that another underflow segment occurs after the first underflow segment in Fig. 4a, which is not clearly marked with an arrow for simplicity.

图5举例说明了编码器用于执行步骤305处的下溢检测操作的过程500。过程500首先通过如上述的解释模拟解码器输入缓冲区条件确定（步骤505）每个图像的最终到达时间t_af和标称移除时间t_r,n。注意，由于该过程在缓冲区下溢管理的迭代过程中可能被称为若干时间，其接收图像号作为起始点并从该给定的起始点开始检查图像序列。显而易见的是，对于第一次迭代，该起始点为序列中的第一个图像。FIG. 5 illustrates a process 500 for an encoder to perform the underflow detection operation at step 305 . Process 500 first determines (step 505 ) the final arrival time t _af and nominal removal time t _r,n for each picture by simulating decoder input buffer conditions as explained above. Note that since this process may be called several times in an iterative process of buffer underflow management, it receives the image number as a starting point and starts checking the sequence of images from this given starting point. Obviously, for the first iteration, the starting point is the first image in the sequence.

在步骤510，过程500通过解码器将解码器输入缓冲区处的每个图像的最终到达时间与该图像的标称移除时间相比较。如果该过程确定在标称移除时间之后没有具有最终到达时间的图像（即，不存在下溢条件），该过程就退出。另一方面，当找到了其最终到达时间在标称移除时间之后的图像时，该过程就确定存在下溢并转移到步骤515以识别下溢片段。At step 510, process 500 compares, by the decoder, the final arrival time of each picture at the decoder's input buffer to the nominal removal time for that picture. If the process determines that there are no images with a final arrival time after the nominal removal time (ie, no underflow condition exists), the process exits. On the other hand, when an image is found whose final arrival time is after the nominal removal time, the process determines that underflow exists and branches to step 515 to identify underflow segments.

在步骤515，过程500将下溢片段识别为解码器缓冲区开始连续耗尽直至下一全局最小值的图像的片段，在此下溢条件开始改进（即，t_b（n）在图像伸展期间不会更多的负值）。过程500于是退出。在一些实施例中，下溢片段的开始被进一步调整为以I帧开始，其是标记一组相关内编码图像的开始的内编码图像。一旦识别出一个或多个导致下溢的片段，编码器就继续消除下溢。以下部分B描述了单个片段情况下（即，当编码整个图像序列仅包含单个下溢片段时）下溢的消除。然后部分C描述用于多个片段下溢的情况下的下溢消除。At step 515, process 500 identifies an underflow segment as a segment of the picture where the decoder buffer begins to drain continuously until the next global minimum, at which point the underflow condition begins to improve (i.e., t _b (n) during picture stretching no more negative values). Process 500 then exits. In some embodiments, the start of the underflow segment is further adjusted to start with an I-frame, which is an intra-coded picture marking the start of a group of related intra-coded pictures. Once one or more underflow-causing fragments have been identified, the encoder proceeds to eliminate the underflow. Section B below describes underflow elimination for the single-segment case (ie, when the encoded entire image sequence contains only a single underflowed segment). Section C then describes underflow cancellation for the case where multiple fragments underflow.

B.单个片段下溢消除B. Single Fragment Underflow Elimination

参考图4（a），如果t_b（n）与n的曲线具有下降斜率仅穿过n轴一次的话，那么在整个序列中就仅有一个下溢片段。该下溢片段开始于先前零交叉点的最近的本地最大值处，结束于零交叉点与序列结束之间的下一个全局最小值点。如果缓冲区从下溢中恢复的话，片段的结束点能够跟随具有上升斜率的曲线的另一个零交叉点。Referring to Figure 4(a), if the curve of _tb (n) versus n has a downward slope and crosses the n-axis only once, then there is only one underflow segment in the entire sequence. The underflow segment starts at the nearest local maximum of the previous zero crossing and ends at the next global minimum point between the zero crossing and the end of the sequence. If the buffer recovers from underflow, the end point of the segment can follow another zero-crossing point of a curve with a rising slope.

图6举例说明了在一些实施例中在图像的单个片段内解码器用于（步骤315、320和325）消除下溢条件的过程600。在步骤605，过程600通过计算进入到缓冲区中的输入比特率的产出和在片段的结束处找到的最长延迟（例如，最小值t_b（n））估算下溢片段内要减少的比特总数（ΔB）。Figure 6 illustrates a process 600 used by a decoder (steps 315, 320 and 325) to eliminate underflow conditions within a single slice of a picture in some embodiments. At step 605, process 600 estimates the amount to reduce within the underflow segment by calculating the output of the input bitrate into the buffer and the longest delay found at the end of the segment (e.g., minimum _tb (n)). Total number of bits (ΔB).

接着，在步骤610，过程600使用平均屏蔽帧QP（AMQP）以及来自上一编码通路（或多个通路）的当前片段中的比特总数估算用于实现该片段所期望的比特数的期望的AMQP，B_T=B-ΔB_p，其中p为该片段的过程600的当前迭代次数。如果该迭代为该特定片段的过程600的首次迭代的话，AMQP和比特的总数就是在步骤302处所识别的由初始编码解决方案推导得到的该片段的AMQP和比特总数。另一方面，当该迭代不是过程600的首次迭代的话，这些参数就可以由编码解决方案或在过程600的最后一个通路或最后多个通路中获得的解决方案推导得到。Next, at step 610, process 600 estimates the desired AMQP to achieve the desired number of bits for the segment using the average masked frame QP (AMQP) and the total number of bits in the current segment from the previous encoding pass (or passes). , B _T =B-ΔB _p , where p is the current iteration number of the process 600 for this segment. If this iteration is the first iteration of process 600 for this particular segment, the AMQP and total number of bits is the AMQP and total number of bits identified at step 302 for that segment derived from the initial encoding solution. On the other hand, when the iteration is not the first iteration of the process 600, these parameters can be derived from the coded solution or the solution obtained in the last pass or passes of the process 600.

接下来，在步骤615，过程600基于屏蔽强度φ_F(n)使用所期望的AMQP修正平均屏蔽帧QP，MQP（n），以便能够忍受更多屏蔽的图像得到更多得比特扣除。该过程接着基于在步骤315定义的参数重新编码（步骤620）视频片段。该过程接着检查（步骤625）该片段以判断下溢条件是否被消除。图4（b）举例说明了在将过程600应用于下溢片段以对其重新编码之后图4（a）的下溢条件的消除情况。当消除了下溢条件时，该过程就退出。否则，过程转移回到步骤605以进一步调整编码参数以减少总比特大小。Next, at step 615 , the process 600 modifies the average masked frame QP, MQP( _{n) based on the masking strength φ F(n)} using the desired AMQP so that more masked images can tolerate more bit deductions. The process then re-encodes (step 620 ) the video segment based on the parameters defined at step 315 . The process then checks (step 625) the segment to determine if the underflow condition has been removed. Figure 4(b) illustrates the elimination of the underflow condition of Figure 4(a) after applying the process 600 to the underflow segment to re-encode it. The process exits when the underflow condition is removed. Otherwise, the process transfers back to step 605 to further adjust the encoding parameters to reduce the total bit size.

C.多下溢片段的下溢消除C. Underflow Elimination of Multiple Underflow Fragments

当序列中有多个下溢片段时，片段的重新编码改变了所有确保帧的缓冲区充满度时间t_b（n）。为了解决修改的缓冲区条件，编码器从具有下降斜率的第一个零交叉点（即，在最低点n处）开始，一次搜索一个下溢片段。When there are multiple underflowing fragments in the sequence, the re-encoding of the fragments changes the buffer fullness time _tb (n) for all guaranteed frames. To account for the modified buffer condition, the encoder starts from the first zero-crossing point with a falling slope (i.e., at the nadir n) and searches for one underflow segment at a time.

下溢片段开始于先于该零交叉点的最近的本地最大值处，并结束于零交叉点和下一零交叉点（或如果没有更多零交叉点的话在序列的结束点）之间的下一全局最小值处。在找到一个片段之后，编码器理想地移除这个片段内的下溢并通过在片段结束处设置t_b（n）为0以及对所有序列帧重新进行缓冲区模拟估算更新的缓冲区充满度。The underflow segment starts at the nearest local maximum preceding the zero crossing and ends at the interval between the zero crossing and the next zero crossing (or at the end of the sequence if there are no more zero crossings). the next global minimum. After finding a segment, the encoder ideally removes the underflow within this segment and estimates an updated buffer fullness by setting _tb (n) to 0 at the end of the segment and re-running the buffer simulation for all sequence frames.

编码器接着利用修改后的缓冲区充满度继续搜索下一片段。一旦如上所述的识别了所有的下溢片段，编码器就导出AMQP并正如在单个片段的情况下的那样独立于其他片段修改每个片段的屏蔽帧QP。The encoder then proceeds to search for the next segment using the modified buffer fullness. Once all underflowing fragments are identified as described above, the encoder derives AMQP and modifies the masked frame QP of each fragment independently of the other fragments as in the case of a single fragment.

普通技术人员会认识到可以以不同方式实现其他的实施例。例如，一些实施例不会识别多个导致解码器的输入缓冲区下溢的片段。一些实施例而是会如上所述执行缓冲区模拟以识别导致下溢的第一片段。在识别这样的片段之后，这些实施例就修改该片段以校正那个片段内的下溢条件，然后继续执行随后的校正部分的编码。在编码了序列的剩余部分之后，这些实施例将对下一下溢片段重复这个过程。Those of ordinary skill will realize that other embodiments may be implemented in different ways. For example, some embodiments will not identify multiple fragments that cause the decoder's input buffer to underflow. Some embodiments will instead perform a buffer simulation as described above to identify the first fragment that caused the underflow. After identifying such a segment, the embodiments modify the segment to correct the underflow condition within that segment, and then proceed to perform subsequent encoding of the corrected portion. After encoding the remainder of the sequence, these embodiments will repeat this process for the next underflow segment.

D.缓冲区下溢管理的应用D. Application of buffer underflow management

以上所述的解码器缓冲区下溢技术应用于众多编码和解码系统。以下描述了此类系统的多个例子。The decoder buffer underflow technique described above is used in many encoding and decoding systems. Several examples of such systems are described below.

图7举例说明了将视频数据流服务器710与几台客户端解码器715-725相连接的网络705。客户端通过具有诸如300Kb/秒和3Mb/秒的不同带宽的链路连接到网络705。视频数据流服务器710控制从编码器730到客户端解码器715-725的编码视频图像流。Figure 7 illustrates a network 705 connecting a video streaming server 710 with several client decoders 715-725. Clients connect to network 705 through links with different bandwidths, such as 300Kb/sec and 3Mb/sec. Video streaming server 710 controls the streaming of encoded video images from encoder 730 to client decoders 715-725.

流视频服务器可以决定使用网络中的最低带宽（即，300Kb/秒）和最小客户端缓冲区大小流动编码视频图像。在此情况下，流服务器710仅需要为300Kb/秒的目标比特率优化的一组编码的图像。另一方面，服务器可以生成并存储针对不同带宽和不同客户端缓冲区条件优化的不同编码。A streaming video server can decide to stream encoded video images using the lowest bandwidth in the network (ie, 300Kb/s) and smallest client buffer size. In this case, the streaming server 710 only needs a set of encoded images optimized for a target bitrate of 300Kb/sec. On the other hand, the server can generate and store different encodings optimized for different bandwidths and different client buffer conditions.

图8举例说明了解码器下溢管理的另一个应用实例。在这个例子中，HD-DVD播放器805从已经存储了来自视频编码器810的已编码视频数据的HD-DVD 840接收编码视频图像。HD-DVD播放器805具有输入缓冲区815、出于简化显示为一个部件820的一组解码模块、以及输出缓冲区825。Figure 8 illustrates another application example of decoder underflow management. In this example, HD-DVD player 805 receives encoded video images from HD-DVD 840 that has stored encoded video data from video encoder 810. The HD-DVD player 805 has an input buffer 815 , a set of decoding modules shown as one component 820 for simplicity, and an output buffer 825 .

播放器805的输出被发送到诸如TV 830或计算机显示终端835的显示装置。HD-DVD播放器可以具有很高的带宽，例如29.4Mb/秒。为了在显示装置上维持高质量的图像，编码器确保视频图像以某种方式编码，其中图像序列中不会有太大以致不能按时传递到解码器输入缓冲区的片段。The output of the player 805 is sent to a display device such as a TV 830 or a computer display terminal 835. HD-DVD players can have very high bandwidth, eg 29.4Mb/sec. In order to maintain a high quality image on a display device, the encoder ensures that the video images are encoded in such a way that there are no fragments of the image sequence that are too large to be delivered to the decoder input buffer on time.

VI.计算机系统VI. Computer system

图9展示了所实现的本发明的一个实施例的计算机系统。计算机系统900包括总线905、处理器910、系统存储器915、只读存储器920、永久存储装置925、输入装置930、和输出装置935。总线905集中表示所有的系统、外围设备、和畅通连接计算机系统900的众多内部设备的芯片集总线。例如，总线905将处理器910与只读存储器920、系统存储器915、和永久存储器装置925畅通连接。Figure 9 illustrates a computer system implementing an embodiment of the present invention. Computer system 900 includes bus 905 , processor 910 , system memory 915 , read only memory 920 , persistent storage 925 , input device 930 , and output device 935 . The bus 905 collectively represents all the system, peripheral, and chipset busses that seamlessly connect the numerous internal devices of the computer system 900 . For example, bus 905 fluidly connects processor 910 with read only memory 920 , system memory 915 , and persistent storage device 925 .

为了执行本发明的各个过程，处理器910从这些各种各样的存储单元中检索要执行的指令和要处理的数据。只读存储器（ROM）920存储了处理器910和计算机系统的其他模块所需的静态数据和指令。To perform the various processes of the present invention, processor 910 retrieves instructions to be executed and data to be processed from these various memory locations. Read Only Memory (ROM) 920 stores static data and instructions needed by processor 910 and other modules of the computer system.

另一方面，永久存储器装置925为读-写存储器装置。该装置是即使是当计算机系统900关闭时也存储指令和数据的非易失存储器单元。本发明的一些实施例使用大容量存储装置（如磁盘或光盘及其对应的盘驱动器）作为永久存储装置925。Persistent storage device 925, on the other hand, is a read-write memory device. This device is a non-volatile memory unit that stores instructions and data even when computer system 900 is off. Some embodiments of the invention use mass storage devices such as magnetic or optical disks and their corresponding disk drives as persistent storage 925 .

其他的实施例使用可移动存储装置（如软盘或压缩盘，及其对应的盘驱动器）作为永久存储装置。与永久存储装置925相类似，系统存储器915为读-写存储器装置。然而，与存储装置925不同的是，系统存储器为非永久性读-写存储器，如随机存取存储器。系统存储器存储了处理器在运行时间所需的一些指令和数据。在一些实施例中，本发明的各种处理过程保存在系统存储器915、永久存储装置925、和/或只读存储器920中。Other embodiments use removable storage devices, such as floppy disks or compact disks, and their corresponding disk drives, as permanent storage. Like persistent storage 925, system memory 915 is a read-write memory device. However, unlike the storage device 925, the system memory is a non-permanent read-write memory, such as random access memory. System memory stores some instructions and data needed by the processor at runtime. In some embodiments, various processes of the present invention are stored in system memory 915 , persistent storage 925 , and/or read-only memory 920 .

总线905还连接到输入和输出装置930和935。输入装置使用户能够与计算机系统沟通信息并选择到计算机系统的命令。输入装置930包括字母数字键盘和光标控制器。输出装置935显示由计算机系统生成的图像。输出装置包括打印机和显示设备，如阴极射线管（CRT）或液晶显示器（LCD）。Bus 905 is also connected to input and output devices 930 and 935 . Input devices enable a user to communicate information with and select commands to the computer system. Input device 930 includes an alphanumeric keypad and a cursor controller. The output device 935 displays images generated by the computer system. Output devices include printers and display devices such as cathode ray tubes (CRT) or liquid crystal displays (LCD).

最后，如图9所示，总线905还通过网络适配器（未示出）将计算机900与网络965相连。在这种方式下，计算机可以是计算机网络（如局域网（“LAN”），广域网（“WAN”），或内部网）的一部分或网络（诸如因特网）的网络的一部分。计算机系统900的任何或所有组件都可以结合本发明使用。然而，本领域普通技术人员将理解的是，也可以结合本发明使用任何其他系统配置。Finally, as shown in FIG. 9, the bus 905 also connects the computer 900 to the network 965 through a network adapter (not shown). In this manner, the computer may be part of a computer network, such as a local area network ("LAN"), wide area network ("WAN"), or intranet, or part of a network of networks, such as the Internet. Any or all components of computer system 900 may be used in conjunction with the present invention. However, one of ordinary skill in the art will understand that any other system configuration may also be used in conjunction with the present invention.

尽管已经参考各种特定细节描述了本发明，本领域普通技术人员将认识到的是，可以不偏离本发明的精神而以其他指定的方式实施本发明。例如，不是使用模拟解码器输入缓冲区的H264方法，也可以使用考虑到缓冲区大小、缓冲区中图像的到达和移除时间、以及图像的解码和显示次数的其他模拟方法。Although the invention has been described with reference to various specific details, those skilled in the art will recognize that the invention may be practiced in other specified ways without departing from the spirit of the invention. For example, instead of using the H264 method that simulates the input buffer of the decoder, other simulation methods that take into account the size of the buffer, the arrival and removal times of images in the buffer, and the number of decoding and display times of the images can also be used.

以上所述的多个实施例计算了平均移除SAD以获得宏块中图像变化的指示。然而，其他实施例可以以不同的方式识别图像变化。例如，一些实施例可以预测宏块的像素的期望图像值。这些实施例接着通过从宏块的像素的亮度值中扣除该预测值，并加上该扣除部分的绝对值生成宏块SAD。在一些实施例中，该预测值不仅基于宏块内的像素值，而且基于一个或多个相邻宏块内的像素值。The various embodiments described above calculate the average removed SAD to obtain an indication of image variation in a macroblock. However, other embodiments may identify image changes in different ways. For example, some embodiments may predict expected image values for pixels of a macroblock. These embodiments then generate the macroblock SAD by subtracting the predicted value from the luminance values of the pixels of the macroblock and adding the absolute value of the subtracted portion. In some embodiments, the predictive value is based not only on pixel values within the macroblock, but also on pixel values within one or more neighboring macroblocks.

同样，以上所述的实施例直接使用推导得出的空间和时间屏蔽值。其他的实施例为了挑出视频图像之中连续空间屏蔽值和/或连续时间屏蔽值的总体趋势而在使用它们之前对这些值应用平滑过滤。由此，本领域内普通技术人员将理解的是，本发明并不局限于前面所举例的细节。Also, the embodiments described above directly use the derived spatial and temporal masking values. Other embodiments apply smoothing filtering to these values before using them in order to pick out general trends in continuous spatial mask values and/or continuous temporal mask values among video images. Thus, it will be understood by those of ordinary skill in the art that the invention is not limited to the details set forth above.

Claims

1. An apparatus for encoding a plurality of images, said apparatus comprising:

In the first stage involving the first multiple encoding passes:

means for defining a nominal quantization parameter for encoding said image;

means for deriving at least one image-specific quantization parameter for at least one image based on said nominal quantization parameter;

means for encoding said picture based on said picture-specific quantization parameter; and

means for iteratively performing said defining, deriving and encoding operations to optimize said encoding, wherein said iteratively performing said defining comprises changing said nominal quantization after each encoding pass of a first plurality of encoding passes parameters; and

In a second stage comprising a second plurality of coding passes:

means for defining a reference masking strength for encoding said image;

means for deriving at least one image-specific quantization parameter for at least one image based on a reference masking strength used to quantify the complexity of said plurality of images;

means for encoding the plurality of pictures based on picture-specific quantization parameters derived for the second stage; and

means for iteratively performing said defining, deriving and encoding operations to optimize said encoding, wherein said iteratively performing said defining comprises changing said reference masking strength after each encoding pass of a second plurality of encoding passes .

2. The apparatus according to claim 1, further comprising means for stopping said iterations when the encoding operation satisfies a set of termination criteria.

3. The apparatus of claim 2, wherein said set of termination criteria includes an identification of an acceptable encoding of said image.

4. The apparatus of claim 3, wherein said acceptable encoding of said image is an encoding of an image within a certain target bitrate range.

5. An apparatus for encoding a plurality of images, said apparatus comprising:

means for identifying a plurality of image attributes, each particular image attribute quantifying at least the complexity of a particular portion of a particular image, wherein each of said image attributes is an indication of how much can be tolerated in said image or a portion of said image Encoding the strength of visual masking artifacts;

means for identifying a reference attribute quantifying the complexity of the plurality of images;

means for identifying quantization parameters for encoding said plurality of pictures based on the identified picture properties, said reference property and a nominal quantization parameter;

means for encoding the plurality of images based on the identified quantization parameters; and

Means for iteratively performing said identifying and encoding operations to optimize said encoding, wherein a plurality of different iterations use a plurality of different reference attributes.

6. The apparatus of claim 5, wherein in indicating the complexity of the portion of the image, the visual masking strength quantifies an amount of distortion that can be tolerated due to movement between images.

7. A method of iteratively encoding a video sequence comprising a plurality of images using a plurality of encoding passes, the method comprising:

specifying, by the video encoder, a first nominal quantization parameter for the first encoding pass;

During the first encoding pass, each picture is encoded by (i) using the first nominal quantization parameter to generate a first picture-specific quantization for the picture during the first encoding pass parameter, and (ii) encoding said picture based on said first picture-specific quantization parameter;

specifying a second nominal quantization parameter different from said first nominal quantization parameter for a second coding pass; and

During the second encoding pass, each picture is encoded by (i) using the second nominal quantization parameter to generate a second picture-specific quantization for the picture during the second encoding pass parameter, and (ii) encode the picture based on the second picture-specific quantization parameter, wherein, for each picture in a set of pictures, the second quantization parameter specified during the second encoding pass is specific to The quantization parameter of the picture is different from the first picture-specific quantization parameter specified during said first encoding pass.

8. The method of claim 7, further comprising stopping the encoding pass when an acceptable encoding for the plurality of images is identified.

9. The method of claim 8, wherein said acceptable encoding of said plurality of images is an encoding of said images within a particular target bitrate range for encoding said video sequence.

10. The method of claim 7, wherein said image-specific quantization parameters of an image are derived based on average pixel intensities in said image.

11. The method of claim 7, wherein said image-specific quantization parameter of an image is derived based on a power function of the average pixel intensity in said image.

12. The method according to claim 7, wherein an image comprises groups of pixels, wherein said image-specific quantization parameter of an image is based on a sum of absolute differences (SAD) of pixel values of all groups of pixels in said image derived from the power function of the mean.

13. The method of claim 7, wherein the image-specific quantization parameter of an image is derived based on a temporal attribute quantifying an amount of distortion that can be tolerated due to movement between images.

14. The method of claim 7, wherein said image-specific quantization parameter of an image is derived based on a sum of absolute values of motion compensation error signals for a plurality of pixel regions defined in said image.

15. The method according to claim 7, wherein said image-specific quantization parameter of the current image is derived based on: the mean value of the sum of absolute differences (SAD) of the pixel values of the current image, temporally The average of the sum of absolute differences (SAD) of pixel values of a group of images preceding the current image and the sum of absolute differences of pixel values of a group of images temporally subsequent to the current image (SAD) mean.

16. A method of iteratively encoding a video sequence comprising a plurality of images using a plurality of encoding passes, the method comprising:

defining, by the video encoder, a first nominal quantization parameter for the first encoding pass;

identifying a first coding scheme for said video sequence by performing a quantization operation on each picture using picture-specific quantization parameters derived for each particular picture from said first nominal quantization parameter;

identifying a second nominal quantization parameter for a second encoding pass based on the first nominal quantization parameter for the first encoding pass; and

Identifying a second coding scheme for said video sequence by performing a quantization operation on each picture using picture-specific quantization parameters derived for each specific picture from said second nominal quantization parameter, wherein said video sequence of Each encoding scheme comprises encoding for each picture in said video sequence, wherein said first encoding scheme assigns for each picture in a set of pictures in said video sequence the same as said second encoding scheme different encoding.

17. The method of claim 16, further comprising stopping said iterating upon identifying an acceptable encoding for said plurality of images.

18. The method of claim 16, wherein said second nominal quantization parameter is also based on measurements taken during said first encoding pass.

19. The method of claim 18, wherein an acceptable encoding of the image is an encoding of the image that is within a particular target bitrate range for encoding the plurality of images.

20. The apparatus according to claim 16 , wherein said image-specific quantization parameter for each particular image is derived based on: the mean value of the sum of absolute differences (SAD) of pixel values of said particular image, The average of the sum of absolute differences (SAD) of pixel values of a group of images temporally preceding the particular image, and the absolute difference of pixel values of a group of images temporally subsequent to the particular image The average of the sum (SAD).

21. A method of encoding a video sequence comprising a plurality of images using multiple coding passes, the method comprising:

specifying, by the video encoder, a first reference masking strength for the first encoding pass;

During the first encoding pass, each picture is encoded by (i) using the first reference masking strength and a picture-specific visual masking strength to generate a picture-specific quantization parameter, the picture-specific The visual masking strength quantifies the amount of encoding artifacts that will be perceptible in the image, and (ii) performs quantization on the image using the image-specific quantization parameters generated during the first encoding pass operate to encode the image;

assigning a second reference masking strength different from said first reference masking strength for a second encoding pass; and

During said second encoding pass, each picture is encoded by (i) generating a picture-specific quantization parameter using said second reference masking strength and a picture-specific visual masking strength for said picture, and (ii) encoding said image by performing a quantization operation on said image using said image-specific quantization parameter generated during said second encoding pass, wherein, for a group of images in said video sequence For each picture of , the picture-specific quantization parameter produced during said second encoding pass is different from the picture-specific quantization parameter produced during said first encoding pass.

22. The method of claim 21, wherein the second reference masking strength is based on an error in bit rate from the first encoding pass.

23. The method of claim 21, wherein said second reference masking strength is based on an average picture-specific quantization parameter over said plurality of pictures during said first encoding pass.

24. The method of claim 21, wherein the visual masking strength of each image is for a portion of the image that is less than the entire image.

25. The method of claim 21, wherein the visual masking strength of each image is for the entire image.

26. The method of claim 21, wherein the image-specific quantization parameters for each image are for a portion of the image that is less than the entire image.

27. The method of claim 21, wherein said picture-specific quantization parameters for each picture are for the whole picture.

28. An encoding device comprising means for performing the steps of the method according to any one of claims 7-27.