[go: up one dir, main page]

HK1152178B - Video encoding and decoding method - Google Patents

Video encoding and decoding method Download PDF

Info

Publication number
HK1152178B
HK1152178B HK11105843.5A HK11105843A HK1152178B HK 1152178 B HK1152178 B HK 1152178B HK 11105843 A HK11105843 A HK 11105843A HK 1152178 B HK1152178 B HK 1152178B
Authority
HK
Hong Kong
Prior art keywords
motion vector
macroblock
field
interlaced
frame
Prior art date
Application number
HK11105843.5A
Other languages
Chinese (zh)
Other versions
HK1152178A1 (en
Inventor
K‧慕克吉
T‧W‧赫尔科比
Original Assignee
Microsoft Technology Licensing, Llc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US10/882,135 external-priority patent/US8064520B2/en
Application filed by Microsoft Technology Licensing, Llc filed Critical Microsoft Technology Licensing, Llc
Publication of HK1152178A1 publication Critical patent/HK1152178A1/en
Publication of HK1152178B publication Critical patent/HK1152178B/en

Links

Description

Video encoding and decoding method
The present application is a divisional application of patent applications with an application date of 2004, 9/3, international application number of PCT/US2004/029000, chinese national application number of 200480024621.8, and an inventive name of "advanced bidirectional predictive coding of interlaced video".
RELATED APPLICATIONS
This application claims the right to U.S. provisional patent application No. 60/501,081 entitled "video encoding and Decoding Tools and Techniques" (video encoding and Decoding Tools and Techniques), filed on 7/9/2003, which is incorporated herein by reference.
The following co-pending U.S. patent applications are related to the present application and are incorporated herein by reference: 1) U.S. patent application serial No. 10/622,378 entitled "Advanced Bi-directional Coding of Video Frames" (Advanced Bi-directional Coding of Video Frames) and filed on 7/18/2003; 2) U.S. patent application serial No. 10/622,284 entitled "intra and inter interlaced Coding and Decoding" (Intraframe and interframe Interlace Coding and Decoding) and filed on 7/18/2003; 3) U.S. patent application serial No. 10/622,841 entitled "encoding of Motion vector information" (Coding of Motion vector information) and filed on 7/18/2003; 4) U.S. patent application serial No. 10/857,473 entitled "predicting motion Vectors for Fields of Interlaced Video Frames for Forward prediction" (predictionmotion Vectors for Fields of Forward-predicted Interlaced Video Frames) and filed 5/27/2004.
Technical Field
Techniques and tools for interlaced video encoding and decoding are described. For example, a video encoder encodes bi-directionally predicted macroblocks in interlaced video.
Background
Digital video consumes a large amount of storage and transmission capacity. Typical original digital video sequences comprise 15 or 30 pictures per second. Each picture may comprise tens or hundreds of thousands of pixels (also called pels). Each pixel represents a small element of the picture. In raw form, a computer typically represents a pixel with 24 bits or more. Thus, the number of bits per second or bit rate of a typical original digital video sequence may be 5 megabits per second or more.
Most computers and computer networks lack the resources to process raw digital video. For this reason, engineers use compression (also called transcoding or encoding) to reduce the bit rate of digital video. Compression can be lossless, where the quality of the video is not compromised, but the reduction in bit rate is limited by the complexity of the video. Alternatively, compression may be lossy, in which the quality of the video suffers, but the achievable bit rate degradation is greater. Decompression is the reverse process of compression.
In general, video compression techniques include "intra" compression and "inter" or predictive compression. Intra-frame compression techniques compress individual pictures, commonly referred to as I-frames or key frames. Inter-frame compression techniques compress frames with reference to previous and/or subsequent frames, and inter-compressed frames are commonly referred to as predicted frames, P-frames, or B-frames.
Inter-frame compression in Windows Media Video versions 8 and 9
Windows Media Video version 8 of microsoft corporation "WMV 8" includes a Video encoder and a Video decoder. The WMV8 encoder uses intra-frame compression and inter-frame compression, and the WMV8 decoder uses intra-frame decompression and inter-frame decompression. Windows Media Video version 9[ "WMV 9" ] uses a similar architecture for many operations.
Intra-frame compression in WMV8 encoders uses block-based motion-compensated predictive coding, followed by transform coding of the residual. Fig. 1 and 2 illustrate block-based inter-frame compression of a predicted frame in a WMV8 encoder. In particular, fig. 1 illustrates motion estimation for a predicted frame 110, while fig. 2 illustrates prediction residual compression for a motion compensated block of the predicted frame.
For example, in fig. 1, the WMV8 encoder calculates motion vectors for predicting macroblocks 115 in a frame 110. To calculate the motion vector, the encoder searches in a search area 135 of the reference frame 130. Within search area 135, the encoder compares macroblock 115 from predicted frame 110 with various candidate macroblocks in order to find a candidate macroblock that is a good match. The encoder outputs a (entropy encoded) motion vector that specifies the matching macroblock.
Because motion vector values are often related to the values of spatially surrounding motion vectors, compression of the data used to convey motion vector information may be achieved by selecting a motion vector predictor from neighboring macroblocks and using that predictor to predict the motion vector of the current macroblock. The encoder may encode the motion vector and the difference value before the predictor. After reconstructing the motion vector by adding the difference to the prediction, the decoder uses the motion vector to compute a predicted macroblock for macroblock 115 from information from the reference frame 130, which is a previously reconstructed frame available at the encoder and decoder. The prediction is rarely perfect, so the encoder often encodes individual blocks of pixel differences (also referred to as error or residual blocks) between the predicted macroblock and the macroblock 115 itself.
Fig. 2 shows an example of the calculation and encoding of the error block 235 in the WMV8 encoder. The error block 235 is the difference between the prediction block 215 and the original current block 225. The encoder applies the discrete cosine transform [ "DCT" ]240 to the error block 235, resulting in an 8x8 block 245 of coefficients. The encoder then quantizes (250) the DCT coefficients, resulting in 8x8 blocks 255 of quantized DCT coefficients. The encoder scans (260) the 8x8 blocks 255 into a one-dimensional array 265 so that the coefficients are typically ordered from lowest frequency to highest frequency. The encoder entropy encodes the scanned coefficients using a variant of run-length encoding 270. The encoder selects an entropy code from one or more run/level/last tables 275 and outputs the entropy code.
Fig. 3 shows one example of a corresponding decoding process 300 for an inter-coded block. In overview from fig. 3, the decoder decodes (310, 320) entropy encoded information representing the prediction residual using variable length decoding 310 and run length decoding 320 with one or more run/level/last tables 315. The decoder inverse scans 330 the one-dimensional array 325 storing the entropy encoded information into two-dimensional blocks 335. The decoder inverse quantizes and inverse discrete cosine transforms (together numbered 340) the data, producing a reconstructed error block 345. In the independent motion compensation path, the decoder uses the motion vector information 355 to compute a prediction block 365 for displacement from the reference frame. The decoder combines 370 the prediction block 365 with the reconstructed error block 345 to form a reconstructed block 375.
The amount of change between the original and reconstructed frames is distortion and the number of bits required to encode the frame represents the rate of the frame. The amount of distortion is approximately inversely proportional to the rate.
Interlaced video and progressive video
The video frame contains spatial information of each line of the video signal. For progressive video, the lines contain samples starting at a time and continuing with subsequent lines up to the bottom of the frame. A progressive I frame is an intra-coded progressive video frame. A progressive P frame is a progressive video frame encoded using forward prediction, and a progressive B frame is a progressive video frame encoded using bi-directional prediction.
A typical interlaced video frame consists of two fields that start scanning at different times. For example, referring to fig. 4, an interlaced video frame 400 includes an upper field 410 and a lower field 420. Typically, even-numbered lines (top field) start scanning at one time (e.g., time t) and odd-numbered lines (bottom field) start scanning at a different (typically subsequent) time (e.g., time t + 1). This timing may create jagged features in the area of the interlaced video frame where motion is present because the two fields begin scanning at different times. Thus, the individual interlaced video frames may be rearranged according to a field structure, with odd lines grouped in one field and even lines grouped in another field. This arrangement, known as field coding, is useful to reduce this jagged edge artifact in high motion pictures. On the other hand, in static areas, image details in interlaced video frames are more efficiently preserved without such rearrangement. Thus, frame coding is often used for static or low motion interlaced video frames, where the original alternating field line arrangement is preserved.
A typical progressive video frame consists of a frame of content having non-alternating lines. In contrast to interlaced video, progressive video does not divide a video frame into fields, and the entire frame is scanned from left to right, top to bottom, starting from a certain time.
P-frame encoding and decoding in previous WMV encoders and decoders
Previous WMV encoders and decoders use progressive and interlaced encoding and decoding in P-frames. In interlaced and progressive P-frames, motion vectors are encoded in the encoder by computing the difference between the motion vector and a motion vector predictor, where the motion vector predictor is computed based on neighboring motion vectors. And, in the decoder, the motion vector is reconstructed by adding the motion vector difference to the motion vector predictor, which is again calculated based on the neighboring motion vectors (this time in the decoder). A predictor of the current macroblock or a field of the current macroblock is selected based on the candidate predictors, and a motion vector difference value is calculated based on the predictors. The motion vector may be reconstructed by adding the motion vector difference to the selected motion vector predictor on either the encoder side or the decoder side. Typically, luma motion vectors are reconstructed from the encoded motion information, while chroma motion vectors are derived from the reconstructed luma motion vectors.
A. Encoding and decoding of progressive P-frames
For example, in previous WMV encoders and decoders, a progressive P-frame may contain macroblocks coded in a one motion vector (1MV) mode or a four motion vector (4MV) mode, or skipped macroblocks, where the decision is typically made on a macroblock-by-macroblock basis. P-frames with only 1MV macroblocks (and possibly skipped macroblocks) are referred to as 1 MVP-frames, while P-frames with both 1MV and 4MV macroblocks (and possibly skipped macroblocks) are referred to as mixed MV P-frames. One motion vector is associated with each 1MV macroblock and four motion vectors are associated with each 4MV macroblock (one for each block).
Fig. 5A and 5B are diagrams illustrating positions of macroblocks considered as candidate motion vector predictors for macroblocks in a 1MV progressive P-frame. Candidate predictors are taken from the left, top, and top-right macroblocks, except in the case where the macroblock is the last macroblock in a row. In this case, Predicator (predictor) B is taken from the top left macroblock, not from the top right. For the particular case where the frame is one macroblock wide, the predictor is always predictor a (top predictor). When predictor a is outside the bounds because the macroblock is in the leading row, the predictor is predictor C. Various other rules address other specific situations, such as prediction values for intra-coding.
Fig. 6A-10 show the positions of blocks or macroblocks of up to 3 candidate motion vectors that are considered as motion vectors for 1MV or 4MV macroblocks in a mixed MV frame. In the following figures, the larger squares are macroblock boundaries and the smaller squares are block boundaries. For the particular case where the frame is one macroblock wide, the predictor is always predictorra (top predictor). Various other rules address other specific scenarios, such as the first row block for the first row 4MV macroblock, the first row 1MV macroblock, and intra-coded prediction values.
Fig. 6A and 6B are diagrams illustrating positions of blocks considered as candidate motion vector predictors for a 1MV current macroblock in a hybrid MV frame. The neighboring respective macroblocks may be 1MV or 4MV macroblocks. Fig. 6A and 6B show the positions of candidate motion vectors assuming that the neighbors are all 4MV (i.e., predictor a is the motion vector of block 2 in the macroblock above the current macroblock and predictor C is the motion vector of block 1 in the macroblock immediately to the left of the current macroblock). If any of the neighbors is a 1MV macroblock, the motion vector predictor shown in FIGS. 5A and 5B is considered as the motion vector predictor of the entire macroblock. As shown in fig. 6B, if the macroblock is the last macroblock in the row, predictor B is from block 3 of the top left macroblock and not from block 2 of the top right macroblock as would otherwise be the case.
Fig. 7A-10 show the positions of blocks considered as candidate motion vector predictors for each of the 4 luma blocks in a 4MV macroblock. Fig. 7A and 7B are diagrams illustrating the positions of blocks that are considered as candidate motion vector predictors for a block at position 0; fig. 8A and 8B are diagrams illustrating the positions of blocks that are considered as candidate motion vector predictors for a block at position 1; fig. 9 is a diagram showing the positions of blocks that are regarded as candidate motion vector predictors for a block at position 2; and fig. 10 is a diagram showing the positions of blocks that are regarded as candidate motion vector predictors for a block at position 3. Again, if the neighbor is a 1MV macroblock, the motion vector predictor of that macroblock is used for each block of that macroblock.
For the case where the macroblock is the first macroblock in a row, predictor B of block 0 is processed differently from block 0 of the remaining macroblocks in the row (see fig. 7A and 7B). In this case, predictor B is taken from block 3 of the macroblock immediately above the current macroblock, and not from block 3 of the macroblock above and to the left of the current macroblock as in the other cases. Similarly, for the case where the macroblock is the last macroblock in a row, predictor B for block 1 is handled differently (see fig. 8A and 8B). In this case, the predictor is taken from block 2 of the macroblock immediately above the current macroblock, and not from block 2 of the macroblock to the top right of the current macroblock as in the other cases. In general, if the macroblock is in the first macroblock column, predictor C for blocks 0 and 2 is set equal to 0.
B. Encoding and decoding of interlaced P-frames in previous WMV encoders and decoders
Previous WMV encoders and decoders use a 4: 1 macroblock format for interlaced P-frames, which may include macroblocks coded in field mode or frame mode, or skipped macroblocks, where the decision is typically made on a macroblock-by-macroblock basis. Two motion vectors are associated with each field coded macroblock (one motion vector per field) and one motion vector is associated with each frame coded macroblock. The encoder jointly encodes motion information, including horizontal and vertical motion vector difference components, and possibly other signaling information.
FIGS. 11 and 12A-B illustrate examples of candidate predictors for motion vector prediction for frame-coded 4: 1 macroblocks and field-coded 4: 1 macroblocks in interlaced P-frames of previous WMV encoders and decoders, respectively. FIG. 11 shows candidate predictors A, B and C (not the first or last macroblock in a macroblock row, nor the first row) for a current frame encoding a 4: 1 macroblock in an intra position of an interlaced P-frame. Predictors may be obtained from different candidate directions than those labeled A, B and C (e.g., in certain situations such as where the current macroblock is the first macroblock or the last macroblock in a row or in the first row, as some predictors are not available for these situations). For a current frame-coded macroblock, the candidate predictors are computed differently depending on whether the neighboring macroblock is field-coded or frame-coded. For adjacent frame-coded macroblocks, motion vectors are simply taken as candidate predictors. For adjacent field coded macroblocks, the candidate motion vectors are determined by averaging the top field and bottom field motion vectors.
FIGS. 12A-B illustrate candidate predictors A, B and C for a current field within a 4: 1 macroblock for field encoding in a field position within a field. In fig. 12A, the current field is the bottom field, and the bottom field motion vectors in neighboring macroblocks are used as candidate predictors. In fig. 12B, the current field is the top field, and the top field motion vectors in neighboring macroblocks are used as candidate predictors. Thus, for each field in a current field coded macroblock, the number of candidate motion vector predictors for each field is at most 3, where each candidate is from the same field type (e.g., top or bottom field) as the current field. Again, various special cases (not shown) apply when the current macroblock is the first macroblock or the last macroblock in a row, or in the first row, since some predictors are not available for these cases.
In order to select a predictor from a set of candidate predictors, previous WMV encoders and decoders in question used different selection algorithms, such as a three-median algorithm or a four-median algorithm. The process of three-way median prediction is illustrated in pseudo code 1300 of fig. 13. The process of four median predictions is illustrated in pseudo code 1400 of fig. 14.
Bidirectional prediction
Bi-directionally predicted frames (or B-frames) use two frames from the source video as reference (or anchor) frames instead of one anchor used in P-frames. In the anchor frames of a typical B-frame, one anchor frame comes from the past in time and another anchor frame comes from the future in time. Referring to FIG. 15, a B-frame 1510 in a video sequence has a temporally preceding reference frame 1520 and a temporally future reference frame 1530. The use of B-frames provides the advantage of efficient compression in terms of greater bit rate savings (e.g., the occurrence of certain types of movement, such as occlusion). An encoded bitstream with B-frames typically uses fewer bits than an encoded bitstream without B-frames while providing similar visual quality. B-frames also provide more options and flexibility when used in a smaller device space. For example, the decoder may tolerate spatial and temporal limitations by choosing not to decode or display B-frames, since B-frames are not typically used as reference frames. Estimates of rate-distortion improvement using B-frames in a video sequence range from 0 to 50%.
V. encoding and decoding of B-frames in previous WMV encoders and decoders
Previous WMV encoders and decoders use B-frames. While a macroblock in a forward predicted frame (e.g., a P-frame) has only one directional mode of prediction (forward, from a previous I-or P-frame), a macroblock in a B-frame can be predicted using five different prediction modes: forward, backward, direct, interpolated and intra. The encoder selects and signals different prediction modes in the bitstream. For example, previous WMV encoders in question send compressed bitplanes at the frame level, representing direct/indirect mode decisions for each macroblock of the B-frame, while indirect modes (such as forward, reverse, and interpolation modes) are shown at the macroblock level.
The forward mode is similar to conventional P-frame prediction. In forward mode, the macroblock is derived from a temporally preceding anchor. In the reverse mode, the macroblock is derived from a temporally subsequent anchor. Macroblocks predicted in direct or interpolated mode use forward and backward anchors in the prediction. The direct and interpolated modes use rounded averages that combine two reference pixel values into one macroblock pixel set, as shown by the following equation:
average pixel value ═ 1 (forward interpolation + reverse interpolation +1) >
A. Fractional Coding (frame Coding) and scaling of co-located (co-located) motion vectors
In previous WMV encoders and decoders in question, the encoder implicitly derives the motion vector for the direct mode by scaling the co-located motion vector for the forward anchor. The scaling operation depends on the temporal position of the current B-frame relative to its anchor. To encode the temporal position of a reference picture, the encoder uses fractional coding.
In fractional coding, the encoder explicitly encodes the temporal position of the current B-frame as a fraction of the distance between its two anchors. The variable bfmotion is used to represent the different individual scores and is sent at the frame level. The fraction takes on a finite set of discrete values between 0 and 1. For motion vectors in direct mode, the encoder and decoder use the fraction to scale co-located Motion Vectors (MVs) in the reference frame, thereby achieving the followingScaling operations to derive implicit direct mode Motion Vectors (MVs) for a current B-frameFAnd MVB):
MVFFraction MV
MVF(fraction-1) MV
Fig. 16 shows how fractional encoding enables the encoder to arbitrarily scale motion between surrounding reference frames. To derive the MV of a current macroblock coded in a B-frame 1620FAnd MVBThe encoder and decoder scale the Motion Vectors (MVs) of the corresponding macroblocks in the future reference frame 1630 using fractional coding. In the example shown in fig. 16, p + q is 1 for the scores p and q. The encoder and decoder process the macroblocks in the previous reference frame 1640 and the future reference frame 1630 using two implicit motion vectors and use the average of these to predict the current macroblock 1610. For example, in FIG. 16, MVF(dx p, dy p) and MVB=(-dx*q,-dy*q)。
Table 1700 in fig. 17 is a Variable Length Code (VLC) table for the bitstream element bfmotion. In the example shown in table 1700, the 3-bit codeword is a "short" codeword and the 7-bit codeword is a "long" codeword. The decoder finds the scaling factor based on the numerator and denominator of the fraction according to the pseudo code 1800 shown in fig. 18.
Once the scaling factor has been determined, the decoder uses it to scale the x-and y-elements of the motion vector for the co-located macroblock. Given that the subsequent anchor frame is a P-frame (for I-frames, all motion vectors are assumed to be (0, 0)) and the co-located macroblock contains motion vectors (MV _ X, MV _ Y), the decoder derives two motion vectors, one of which (MV _ X)F,MV_YF) Reference is made to the forward (preceding) anchor frame, while the other (MV _ X)B,MV_YB) Reference is made to the reverse (subsequent) anchor frame.
The decoder performs scaling according to the pseudo code 1900 shown in fig. 19. In the function Scale _ Direct _ MV of the pseudo code 1900, the inputs MV _ X and MV _ Y are the X-and Y-elements of the motion vector from the co-located macroblock of the future reference picture, while the output MV _XF、MV_YF、MV_XBAnd MV _ YBAre the x-and y-elements that indicate the motion vector for the forward and backward directions of the macroblock being decoded.
B.B/I frame
Previous WMV encoders and decoders in question also use intra B-frames ("B/I-frames") in progressive scan encoding and decoding. B/I frames are encoded like I-frames because they do not depend on reference frames. Unlike I-frames, however, B/I frames are not key frames; other frames are not allowed to use B/I frames as anchors.
C. Interlaced B-frame
Previous WMV encoders and decoders in question also use interlaced B-frames. Macroblocks in an interlaced B-frame may be field coded or frame coded. A frame-coded macroblock may have one, two (e.g., forward and backward motion vectors for interpolated mode, derived forward and backward motion vectors for direct mode), or no motion vectors, while a field-coded macroblock may have up to four motion vectors depending on prediction mode. For example, in direct mode field coded macroblocks, four implicit motion vectors are derived: forward and backward motion vectors of the upper field, and forward and backward motion vectors of the lower field.
Although previous WMV encoders and decoders in discussion used interlaced B-frames, they are limited in several important respects. For example, each macroblock allows only one macroblock prediction mode (e.g., direct mode, forward mode, etc.), no 4MV coding is used (i.e., one motion vector is used for each block in the macroblock), and no B-frame portion can be a reference for motion compensation of any frame. As another example, the interlaced encoding and decoding (including interlaced B-frames) of previous WMV encoders and decoders in question were performed using only a 4: 1 macroblock format.
Standards for video compression and decompression
In addition to previous WMV encoders and decoders, several international standards relate to video compression and decompression. These standards include the moving Picture experts group [ "MPEG" ]1, 2 and 4 standards and the H.261, H.262, H.263 and H.264 standards from the International Telecommunications Union [ "ITU" ]. One of the main methods used in the international standards to achieve data compression of digital video sequences is to reduce temporal redundancy between pictures. These popular compression schemes (MPEG-1, MPEG-2, MPEG-4, H.261, H.263, etc.) use motion estimation and compensation. For example, the current frame is divided into uniform square regions (e.g., individual blocks and/or macroblocks). The matching region for each current region is specified by transmitting motion vector information for the region. The motion vector indicates the location of the region to be used as a predictor for the current region in a previously encoded (and reconstructed) frame. A pixel-by-pixel difference value, referred to as an error signal, between the current region and the region in the reference frame is derived. The error signal typically has a lower entropy than the original signal. Thus, information can be encoded at a lower rate. Since in previous WMV encoders and decoders motion vector values are often related to spatially surrounding motion vectors, compression of the data used to represent the motion vector information may be obtained by encoding the difference between the current motion vector and a predictor based on a previously encoded, neighboring motion vector.
Some international standards describe motion estimation and compensation for interlaced video frames. The h.262 standard allows interlaced video frames to be encoded as a single frame or two fields, where frame encoding or field encoding can be adaptively selected on a frame-by-frame basis. The h.262 standard describes field-based prediction, which is a prediction mode that uses only one field of a reference frame. The h.262 standard also describes bi-base prediction, which is a prediction mode where two forward field-based prediction values average 16x16 blocks in an interlaced P-picture. Section 7.6 of the h.262 standard describes "field prediction" which involves selecting between two reference fields to use as motion compensation for macroblocks of the current field of an interlaced video frame. Subsection 7.6.3 describes motion vector prediction and reconstruction, where the reconstructed motion vector for a given macroblock becomes the motion vector predictor for a subsequently encoded/decoded macroblock. Such motion vector prediction fails to adequately predict motion vectors for macroblocks of fields of an interlaced video frame in many cases.
In addition, section 7.6 of the h.262 standard describes "field prediction" and "frame prediction" for B-pictures. In "field prediction" and "frame prediction", prediction of a B-picture is performed using two newly reconstructed reference frames (omitting other intervening B-pictures), which may be encoded as two fields or as a single frame.
Given the critical importance of video compression and decompression to digital video, it is not surprising that video compression and decompression are areas of intense development. However, regardless of the advantages of previous video compression and decompression techniques, they do not have the advantages of the following techniques and tools.
Disclosure of Invention
In summary, the detailed description is directed to various techniques and tools for encoding and decoding bi-directionally predicted interlaced video frames (e.g., interlaced B-fields, interlaced B-frames). The techniques and tools improve rate/distortion performance and facilitate better support for devices with lower CPU resources (e.g., devices with smaller form factors).
The various embodiments implement one or more of the described techniques and tools for encoding and/or decoding interlaced B-pictures, including but not limited to:
in one aspect, for an interlaced B-frame, the encoder/decoder switches the prediction mode between fields in a field-coded macroblock of the interlaced B-frame. For example, the encoder/decoder switches between a forward prediction mode for the top field and a reverse mode for the bottom field in a field coded macroblock. Switching between forward and reverse prediction within the same field coded macroblock allows more flexibility in finding valid prediction modes for different parts of an interlaced B-frame.
In another aspect, for interlaced B-frames, the encoder/decoder calculates the direct mode motion vector for the current macroblock by selecting at most one representative motion vector for each of the top and bottom fields of a previously decoded temporally subsequent anchor co-located macroblock. For example, the selection is performed based at least in part on the mode (e.g., 1MV mode, 2-field MV mode, etc.) in which the macroblock of the current interlaced B-frame is encoded.
In yet another aspect, the encoder/decoder uses 4MV coding for interlaced B-fields or interlaced B-frames. For example, 4MV is used for unidirectional prediction modes (forward or backward), but not for other available prediction modes (e.g., direct, interpolated). The use of 4MV allows for more accurate motion compensation for interlaced B-fields and interlaced B-frames; limiting the 4MV to forward and reverse modes reduces the coding overhead and avoids the decoding complexity associated with combining 4MV with modes such as direct and interpolated.
In another aspect, for interlaced B-fields or interlaced B-frames, forward motion vectors are predicted by the encoder/decoder using previously reconstructed (or estimated) forward motion vectors from a forward motion vector buffer, while reverse motion vectors are predicted using previously reconstructed (or estimated) reverse motion vectors from a reverse motion vector buffer. The resulting motion vectors are added to the corresponding buffers. Holes in the motion vector buffer may be filled with estimated motion vector values. For example, for interlaced B-frames, when forward prediction is used to predict a motion vector and the motion vector is added to the forward motion vector buffer, the corresponding location in the reverse motion vector buffer is filled with the predicted motion vector using only the reverse motion vector as the predictor ("hole filling"). As another example, for an interlaced B-field, to select between motion vectors of different polarities (e.g., "same polarity" or "opposite polarity") for hole padding, the encoder/decoder selects the dominant polarity field motion vector. The distance between the anchor and the current frame is calculated using various syntax elements, and the calculated distance is used to scale the reference field motion vectors. The respective motion vector buffers and the hole filling in the respective motion vector buffers enable more accurate motion vector prediction for interlaced B-fields and interlaced B-frames.
In yet another aspect, for interlaced B-fields, the encoder/decoder uses "self-reference" frames. For example, the second B-field in the current frame refers to the first B-field from the current frame in motion compensated prediction. Having the first B-field in a frame used as a reference for the second B-field in the frame allows the prediction of the second field to be more accurate while also retaining the temporal scalability advantage of having B-fields in the current frame.
In another aspect, for an interlaced B-field, the encoder sends binary information indicating whether the prediction mode for one or more macroblocks in the interlaced B-field is forward or non-forward. For example, the encoder sends forward/non-forward decision information at the B-field level of the compressed bitplane. Sending the forward/non-forward prediction mode decision information at the B-field level of the compressed bitplanes may reduce the coding overhead of prediction mode coding. The decoder performs the corresponding decoding.
In yet another aspect, for interlaced B-fields, if the corresponding macroblock in the corresponding field of the next anchor picture is encoded using four motion vectors, the encoder/decoder selects the motion vector for direct mode using logic that favors dominant polarity. For example, if the same-polarity motion vectors of the corresponding macroblock outnumber the motion vectors of its opposite polarity, the encoder/decoder calculates the median of the same-polarity motion vectors to obtain the motion vectors used to derive the direct-mode motion vectors. This selection process allows the derivation of accurate direct mode motion vectors for interlaced B-fields with 4MV macroblock anchors.
The various techniques and tools may be used in combination or alone.
Other features and advantages of the present invention will become apparent from the following detailed description of various embodiments, which proceeds with reference to the accompanying drawings.
Drawings
Fig. 1 is a diagram illustrating motion estimation of a video encoder according to the related art.
Fig. 2 is a block-based compression of an 8x8 block showing a prediction residual in a video encoder according to the prior art.
Fig. 3 is a block-based decompression of an 8x8 block showing a prediction residual in a video encoder according to the prior art.
Fig. 4 is a diagram illustrating an interlaced frame according to the related art.
Fig. 5A and 5B are diagrams illustrating positions of macroblocks of candidate motion vector predictors for 1MV macroblocks in a progressive P-frame according to the related art.
Fig. 6A and 6B are diagrams illustrating positions of blocks of candidate motion vector predictors for 1MV macroblocks in a hybrid 1MV/4MV progressive P-frame according to the related art.
Fig. 7A, 7B, 8A, 8B, 9 and 10 are diagrams illustrating the positions of blocks used for candidate motion vector predictors for blocks at respective positions in 4MV macroblocks in a hybrid 1MV/4MV progressive P-frame according to the prior art.
Fig. 11 is a diagram illustrating candidate motion vector predictors for a current frame coded macroblock in an interlaced P-frame according to the related art.
Fig. 12A-12B are diagrams illustrating candidate motion vector predictors for a current field coded macroblock in an interlaced P-frame according to the prior art.
Fig. 13 and 14 are code diagrams illustrating pseudo code for performing three-in-three and four-in-four median calculations, respectively, according to the prior art.
Fig. 15 is a diagram illustrating a B-frame having past and future reference frames according to the related art.
Fig. 16 is a diagram illustrating direct mode prediction using fractional coding according to the related art.
Fig. 17 shows a VLC table of a bitstream element bfmotion according to the related art.
Fig. 18 is a code listing showing pseudo codes for finding a scaling coefficient for scaling a motion vector of a co-located macroblock in direct mode prediction according to the related art.
FIG. 19 is a code listing showing pseudo code for scaling the x-and y-elements of a motion vector in a co-located macroblock according to scaling factors, according to the prior art.
FIG. 20 is a block diagram of a suitable computing environment in connection with which several described embodiments may be implemented.
FIG. 21 is a block diagram of a generic video encoder system in conjunction with which several of the described embodiments may be implemented.
FIG. 22 is a block diagram of a generic video decoder system in conjunction with which several of the described embodiments may be implemented.
Fig. 23 is a diagram of a macroblock format used in several of the described embodiments.
FIG. 24A is a diagram of a portion of an interlaced video frame showing alternating lines of the top field and the bottom field. Fig. 24B is a diagram of an interlaced video frame organized into frames for encoding/decoding, and fig. 24C is a diagram of an interlaced video frame organized into fields for encoding/decoding.
Fig. 25 and 26 are diagrams illustrating interlaced P-fields with two reference fields.
Fig. 27 and 28 are diagrams illustrating interlaced P-fields using the latest reference field allowed.
Fig. 29 and 30 are diagrams illustrating interlaced P-fields using the allowed second latest reference field.
Fig. 31 is a diagram showing the relationship between the respective spatial positions of the vertical components of the motion vectors and the different combinations of current and reference field polarities.
Fig. 32 is a diagram showing two sets of three candidate motion vector predictors for a current macroblock.
33A-33F are code listings illustrating pseudo code for computing motion vector predictors in interlaced P-or B-fields of two reference fields.
34A-34B are code listing illustrating pseudo code for scaling predictors from one field to derive predictors from another field.
Fig. 35 and 36 are tables showing scaling operation values associated with different reference frame distances.
Fig. 37 is a diagram showing motion vectors of luminance blocks and derived motion vectors of chrominance blocks in 2-field MV macroblocks of an interlaced P-frame.
Fig. 38 is a diagram showing different motion vectors for each of 4 luminance blocks and derived motion vectors for each of 4 chrominance sub-blocks in a 4-frame MV macroblock of an interlaced P-frame.
Fig. 39 is a diagram showing motion vectors for luminance blocks and derived motion vectors for chrominance blocks in a 4-field MV macroblock of an interlaced P-frame.
Fig. 40A-40B are diagrams illustrating candidate predictors for a current macroblock of an interlaced P-frame.
FIG. 41 is a flow diagram illustrating a technique for predicting motion vectors for respective fields in a field-coded macroblock of an interlaced B-frame using different prediction modes.
FIG. 42 is a flow diagram illustrating a technique for calculating direct mode motion vectors for macroblocks of an interlaced B-frame.
FIG. 43 is a diagram of buffered motion vectors for respective blocks of a co-located macroblock of a previously decoded temporally subsequent anchor frame, which are used to calculate direct mode motion vectors for macroblocks of an interlaced B-frame.
FIG. 44 is a flow diagram illustrating a technique for predicting motion vectors for a current macroblock of an interlaced B-frame using forward and/or reverse motion vector buffers.
Fig. 45 is a diagram showing motion vectors in a forward motion vector buffer and a backward motion vector buffer for predicting motion vectors of a macroblock.
Fig. 46 is a diagram showing the motion vectors of the upper half frame and the lower half frame of the reconstructed macroblock in the forward motion vector buffer and the backward motion vector buffer.
FIG. 47 is a code listing showing pseudo code describing a polarity selection process for real-valued buffering and hole filling in interlaced B-field motion vector prediction.
Fig. 48A-48B are a code listing illustrating pseudo code for scaling a prediction value from one field to derive a prediction value from another field for a reverse predicted interlaced B-field.
FIG. 49 is a table showing scaling operation values associated with different reference frame distances for a first interlaced B-field.
Fig. 50A and 50B are diagrams illustrating reference fields of interlaced B-fields.
FIG. 51 is a flow diagram illustrating a technique for encoding forward/non-forward prediction mode decision information for a macroblock of an interlaced B-field in a video encoder having one or more bit plane coding modes.
Fig. 52 is a flow diagram illustrating a technique for decoding forward/non-forward prediction mode decision information for macroblocks of an interlaced B-field, where the decision information is encoded by a video encoder having one or more bit-plane encoding modes.
Fig. 53 is a code listing showing pseudo codes describing a selection process of a motion vector used as a basis for a direct mode motion vector in an interlaced B-field.
Fig. 54 is a diagram illustrating a frame-layer bitstream syntax for a combined implementation of a interlaced B-field or BI-field.
Fig. 55 is a diagram showing a field-layer bitstream syntax for a combined implementation of an interlaced B-field.
Fig. 56 is a diagram illustrating a field-layer bitstream syntax for a combined implementation of a interlaced BI-field.
Fig. 57 is a diagram of macroblock-layer bitstream syntax for combining macroblocks implementing interlaced B-fields.
Fig. 58 is a diagram of a macroblock-layer bitstream syntax for combining macroblocks implementing a line-scan BI-field.
FIG. 59 is a diagram of a frame-layer bitstream syntax for a combined implementation of an interlaced B-frame.
Fig. 60 is a diagram of macroblock-layer bitstream syntax for combining macroblocks implementing a interlaced B-frame.
Fig. 61A-61B are code manifests illustrating pseudo code for decoding motion vector difference values and dominant/non-dominant predictor information in a combined implementation.
FIGS. 62A-62F are code listings illustrating pseudo code for computing motion vector predictors in dual reference interlaced P-fields in a combined implementation.
FIG. 63 is a code listing showing pseudo code for determining a reference field for an interlaced B-field in a combined implementation.
FIG. 64 is a code listing showing pseudo code for collecting candidate motion vectors for 1MV macroblocks of an interlaced P-frame in a combined implementation.
Fig. 65, 66, 67 and 68 are code listings showing pseudo-code for collecting candidate motion vectors for 4 frame MV macroblocks of an interlaced P-frame in a combined implementation.
Fig. 69 and 70 are code listing showing pseudo code for collecting candidate motion vectors for 2 field MV macroblocks of an interlaced P-frame in a combined implementation.
Fig. 71, 72, 73 and 74 are code listings showing pseudo-code for collecting candidate motion vectors for 4 field MV macroblocks of an interlaced P-frame in a combined implementation.
Fig. 75 is a code listing showing pseudo code for calculating a motion vector predictor for a frame motion vector of an interlaced P-frame in a combined implementation.
FIG. 76 is a code listing illustrating pseudo code for computing motion vector predictors for field motion vectors of interlaced P-frames in a combined implementation.
Fig. 77A and 77B are code listings showing pseudo code for decoding motion vector difference values for interlaced P-frames and B-frames in a combined implementation.
FIG. 78 is a code listing illustrating pseudo code for deriving chroma motion vectors for interlaced P-frames in a combined implementation.
FIGS. 79A-79C are diagrams illustrating tiling of the Norm-6 and Diff-6 bit-plane coding modes for forward/non-forward prediction mode decision information for macroblocks of an interlaced B-field.
Detailed Description
The present application relates to techniques and tools for efficient compression and decompression of interlaced video. In various described embodiments, a video encoder and decoder incorporate techniques for encoding and decoding bi-directionally predicted interlaced video frames and corresponding signal representation techniques used in bitstream formats or syntax that include various layers or levels (e.g., sequence level, frame level, field level, macroblock level, and/or block level).
Various alternatives to the described implementation are possible. For example, the methods described with reference to the flowcharts may be altered by changing the order of the stages shown in the flowcharts, or by repeating or omitting certain stages, etc. As another example, although some implementations are described with reference to a particular macroblock format, other formats may be used. Furthermore, the techniques and tools described with reference to bi-directional prediction may also be applied to other types of prediction.
The various techniques and tools may be used in combination or independently. Different embodiments implement one or more of the described techniques and tools. Some of the techniques and tools described herein may be used in a video encoder or decoder, or are not specifically limited to video encoding or decoding in some other systems.
I. Computing environment
FIG. 20 illustrates a general example of a suitable computing environment 2000 in which several of the described embodiments may be implemented. The computing environment 2000 is not intended to suggest any limitation as to the scope of use or functionality of the invention, as the techniques and tools of the present invention may be implemented in diverse general-purpose or special-purpose computing environments.
With reference to FIG. 20, the computing environment 2000 includes at least one processing unit 2010 and storage 2020. In fig. 20, this most basic configuration 2030 is included within a dashed line. Processing unit 2010 executes computer-executable instructions and may be a real or virtual processor. In a multi-processing system, multiple processing units execute computer-executable instructions to increase processing power. The memory 2020 may be volatile memory (e.g., registers, cache, RAM), non-volatile memory (e.g., ROM, EEPROM, flash memory, etc.), or some combination of the two. The memory 2020 stores software 2080 that implements a video encoder or decoder that performs bi-directional prediction of interlaced video frames.
The computing environment may have additional features. For example, the computing environment 2000 includes storage 2040, one or more input devices 2050, one or more output devices 2060, and one or more communication connections 2070. An interconnection mechanism (not shown) such as a bus, controller, or network interconnects the various components of the computing environment 2000. Typically, operating system software (not shown) provides an operating environment for other software executing in the computing environment 2000 and coordinates actions by the various components of the computing environment 2000.
Storage 2040 may be removable or non-removable, and includes magnetic disks, magnetic tapes or cassettes, CD-ROMs, CD-RWs, DVDs, or any other medium which can be used to store information and which can be accessed within the computing environment 2000. The storage 2040 stores instructions for the software 2080 to implement a video encoder or decoder.
The input device 2050 may be, for example, a keyboard, a mouse, an electronic pen, or a trackball, a voice input device, a scanning device, or another device that provides input to the computing environment 2000. For audio or video encoding, the input device 2050 may be a sound card, a video card, a TV tuner card, or similar device that accepts audio or video input in analog or digital form, or a CD-ROM or CD-RW that reads audio or video samples into the computing environment 2000. The output device 2060 may be a display, a printer, a speaker, a CD-recorder, or another device that provides output from the computing environment 2000.
Communication connection(s) 2070 allow communication with another computing entity over a communication medium. Communication media conveys information such as computer-executable instructions, audio or video input or output, or other data in a modulated data signal. A modulated data signal is a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal. By way of example, and not limitation, communication media includes wired or wireless techniques implemented with an electronic, optical, RF, infrared, acoustic, or other carrier.
These techniques and tools may be described in the general context of computer-readable media. Computer readable media are any available media that can be accessed within a computing environment. By way of example, and not limitation, for the computing environment 2000, computer-readable media include memory 2020, storage 2040, communication media, and combinations of any of the above.
The techniques and tools may be described in the general context of computer-executable instructions, such as those included in program modules, being executed in a computing environment on a target real or virtual processor. Generally, program modules include routines, programs, libraries, objects, classes, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The functionality of the program modules may be combined or divided between program modules as desired in various embodiments. Computer-executable instructions of program modules may be executed within a local or distributed computing environment.
For purposes of illustration, the detailed description uses terms like "estimate," "compensate," "predict," and "apply" to describe computer operations in a computing environment. These terms are high-level abstractions for operations performed by a computer, and should not be confused with acts performed by a human being. The actual computer operations corresponding to these terms vary depending on the implementation.
General video encoder and decoder
Fig. 21 is a block diagram of a generalized video encoder 2100 in connection with which several of the described embodiments may be implemented. Fig. 22 is a block diagram of a generic video decoder 2200 in conjunction with which several of the described embodiments may be implemented.
The illustrated relationships between the modules within the encoder 2100 and decoder 2200 represent the general information flow in the encoder and decoder; other relationships are not shown for simplicity. In particular, fig. 21 and 22 generally do not show auxiliary information representing encoder settings, modes, tables, etc. for video sequences, pictures, macroblocks, blocks, etc. These side information are typically sent in the output bitstream after entropy coding of the side information. The format of the output bitstream may be Windows Media Video version 9 format or other format.
The encoder 2100 and decoder 2200 process video pictures, which may be video frames, video fields, or a combination of frames and fields. The bitstream syntax and semantics at the picture and macroblock level may depend on whether a frame or a field is used. There may also be changes to the macroblock organization and overall timing. The encoder 2100 and decoder 2200 are block-based and use a 4: 2: 0 macroblock format for frames, where each macroblock includes 48 x8 luma blocks (often considered as a 16x16 macroblock) and 28 x8 chroma blocks. For fields, the same or different macroblock organization and format may be used. These 8x8 blocks may be further subdivided at different stages, for example at the frequency transform and entropy coding stages. Example video frame organization is described in more detail below. Alternatively, the encoder 2100 and decoder 2200 are object-based, use different macroblock or block formats, or perform operations on sets of pixels that differ in size or configuration from 8x8 blocks and 16x16 macroblocks.
Depending on the desired compression implementation and type, modules of the encoder or decoder may be added, omitted, divided into multiple modules, combined with other modules, and/or replaced with similar modules. In alternative embodiments, encoders or decoders having different modules and/or other configurations of modules perform one or more of the described techniques.
A. Video frame organization
In some implementations, the encoder 2100 and decoder 2200 process video frames organized as follows. The frame contains spatial information for each line of the video signal. For progressive video, the lines contain samples starting at a time and continuing with subsequent lines up to the bottom of the frame. A progressive video frame is divided into macroblocks such as macroblock 2300 shown in fig. 23. The macroblock 2300 includes 48 x8 luma blocks (Y1 to Y4) and 28 x8 chroma blocks collocated with the 4 luma blocks, which conform to the conventional 4: 2: 0 macroblock format but halved in resolution in the horizontal and vertical directions. These 8x8 blocks may be further subdivided at different stages, such as at the frequency transform (e.g., 8x4, 4x8, or 4x4 DCT) and entropy coding stages. A progressive I-frame is an intra-coded progressive video frame. A progressive P-frame is a progressive video frame encoded using forward prediction, and a progressive B-frame is a progressive video frame encoded using bi-directional prediction. Progressive P-and B-frames may include intra-coded macroblocks as well as different types of predicted macroblocks.
An interlaced video frame consists of two scans of a frame-one including the even lines of the frame (the top field) and the other including the odd lines of the frame (the bottom field). The two fields may represent two different time periods, or they may be from the same time period. Fig. 24A shows a portion of an interlaced video frame 2400, including alternating lines of the top and bottom fields to the upper left in the interlaced video frame 2400.
Fig. 24B shows the interlaced video frame 2400 of fig. 24A organized as a frame 2430 for encoding/decoding. The interlaced video frame 2400 has been divided into macroblocks such as macroblocks 2431 and 2432 using a 4: 2: 0 format as shown in fig. 23. In the luminance plane, each macroblock 2431, 2432 comprises 8 lines from the top field alternating with 8 lines from the bottom field (16 lines in total), and each line is 16 pixels long. (the actual organization and placement of luma and chroma blocks within macroblocks 2431, 2432 is not shown, and may actually vary for different coding decisions.) within a given macroblock, the top field information and the bottom field information may be jointly or independently coded at any of the various stages. An interlaced I-frame is two intra-coded fields of an interlaced video frame, where a macroblock includes information for both fields. An interlaced P-frame is two fields of an interlaced video frame encoded using forward prediction, and an interlaced B-frame is two fields of an interlaced video frame encoded using bi-directional prediction, where a macroblock comprises information of both fields. Interlaced P-and B-frames may include intra-coded macroblocks as well as different types of predicted macroblocks.
Fig. 24C shows the interlaced video frame 2400 of fig. 24A organized as a field 2460 for encoding/decoding. Each of the two fields of the interlaced video frame 2400 is divided into macroblocks. The top field is divided into macroblocks such as macroblock 2461 and the bottom field is divided into macroblocks such as macroblock 2462. (again, each macroblock uses a 4: 2: 0 format as shown in FIG. 23, and the organization and placement of luma and chroma blocks within each macroblock is not shown.) in the luma plane, macroblock 2461 includes 16 lines from the top field, macroblock 2462 includes 16 lines from the bottom field, and each line is 16 pixels long. An interlaced I-field is a single independent representative field of an interlaced video frame. An interlaced P-field is a single independent representative field of an interlaced video frame encoded using forward prediction, while an interlaced B-field is a single independent representative field of an interlaced video frame encoded using bi-directional prediction. Interlaced P-and B-fields may include intra-coded macroblocks as well as different types of predicted macroblocks.
The term picture generally refers to source, encoded or reconstructed image data. For progressive video, a picture is a progressive video frame. For interlaced video, a picture may refer to an interlaced video frame, the top field of a frame, or the bottom field of a frame, depending on the context.
Alternatively, the encoder 2100 or decoder 2200 is object-based, uses different macroblock or block formats, or performs operations on a set of pixels that are different from the size or configuration of the 8x8 blocks and the 16x16 macroblocks.
B. Video encoder
Fig. 21 is a block diagram of a generic video encoder system 2100. Encoder system 2100 receives a sequence of video pictures (e.g., progressive video frames, interlaced video frames, or fields of an interlaced video frame) including current picture 2105 and generates compressed video information 2195 as output. Particular embodiments of the video encoder typically use a variant or complementary version of the generic encoder 2100.
The encoder system 2100 compresses the prediction picture and the key picture. For demonstration purposes, fig. 21 shows the path of a key picture through the encoder system 2100, as well as the path of a predicted picture. Many components of the encoder system 2100 are used to compress the key pictures and the prediction pictures. The exact operations performed by those components may vary depending on the type of information being compressed.
A predicted picture (e.g., a progressive p-frame or b-frame, an interlaced p-field or b-frame, or an interlaced p-frame or b-frame) is represented based on a prediction (or difference) from one or more other pictures (commonly referred to as reference pictures or anchors). The prediction residual is the difference between the predicted picture and the original picture. In contrast, key pictures (e.g., progressive I-frames, interlaced I-fields, or interlaced I-frames) are not compressed with reference to other pictures.
If the current picture 2105 is a forward predicted picture, the motion estimator 2110 estimates motion of a macroblock or other set of pixels of the current picture 2105 with reference to one or more reference pictures, such as a reconstructed previous picture 2125 cached in a picture store 2120. If the current picture 2105 is a bi-directionally predicted picture, the motion estimator 2110 estimates motion in the current picture 2105 with reference to up to 4 reconstructed reference pictures (e.g., of interlaced B-fields). In general, a motion estimator estimates motion in a B-picture with reference to one or more temporally previous reference pictures and one or more temporally future reference pictures. Accordingly, the encoder system 2100 may use separate stores 2120 and 2122 for multiple reference pictures. For more information on progressive B-Frames, see U.S. patent application serial No. 10/622,378 entitled Advanced Bi-Directional coding of Video Frames (Advanced Bi-Directional predictive coding of Video Frames) and filed on 7/18/2003.
The motion estimator 2110 may estimate in pixels, 1/2 pixels, 1/4 pixels, or other increments, and may switch the resolution of the motion estimation on a picture-by-picture basis or other basis. The motion estimator 2110 (and compensator 2130) may also switch between reference picture pixel interpolation types (e.g., bicubic and bilinear) on a per-frame or other basis. The resolution of the motion estimation may be the same or different horizontally or vertically. The motion estimator 2110 outputs side information motion information 2115 such as difference motion vector information. The encoder 2100 encodes motion information by, for example, calculating one or more predictors for the motion vector, calculating differences between the motion vector and the predictors, and entropy-encoding the differences. To reconstruct the motion vector, the motion compensator 2130 combines the predictor with the difference motion vector information. Various techniques for calculating motion vector predictors, calculating differential motion vectors, and reconstructing motion vectors for interlaced B-fields and interlaced B-frames are described below.
Motion compensator 2130 applies the reconstructed motion vectors to reconstructed picture 2125 to form motion compensated current picture 2135. However, the prediction is rarely perfect, and the difference between the motion compensated current picture 2135 and the original current picture 2105 is the prediction residual 2145. In subsequent reconstruction of the picture, the prediction residual 2145 is added to the motion compensated current picture 2135 to obtain a reconstructed picture that is closer to the original current picture 2105. However, in lossy compression, some information is still lost from the original current picture 2105. Alternatively, the motion estimator and motion compensator apply another type of motion estimation/compensation.
Frequency transformer 2160 converts the spatial domain video information into frequency domain (i.e., spectral) data. For block-based video pictures, frequency transformer 2160 applies a DCT, a variant of a DCT, or other block transform to blocks of pixel data or prediction residual data, producing blocks of frequency transform coefficients. Alternatively, frequency transformer 2160 applies another conventional frequency transform, such as a fourier transform, or uses wavelet or subband analysis. The frequency transformer 2160 may apply 8x8, 8x4, 4x8, 4x4, or other magnitudes of frequency transform.
The quantizer 2170 then quantizes the respective blocks of spectral data coefficients. The quantizer applies uniform scalar quantization to the spectral data, with the step size varying on a picture-by-picture or other basis. Alternatively, the quantizer applies another type of quantization to the spectral data coefficients, such as non-uniform vector or non-adaptive quantization, or quantizes the spatial domain data directly in an encoder system that does not use frequency transforms. In addition to adaptive quantization, the encoder 2100 may use frame dropping, adaptive filtering, or other techniques for rate control.
The encoder 2100 may use a specific signal representation for skipped macroblocks that are macroblocks without specific type information (e.g., without motion information and without residual information for the macroblock).
When the reconstructed current frame is required for subsequent motion estimation/compensation, the inverse quantizer 2176 performs inverse quantization on the quantized spectral data coefficients. The inverse frequency transformer 2166 then performs the inverse operation of the frequency transformer 2160, generating a reconstructed prediction residual (for the prediction picture) or a reconstructed key picture. If the current picture 2105 is a key picture, the reconstructed key picture is taken as a reconstructed current picture (not shown). If the current picture 2105 is a prediction picture, the reconstructed prediction residual is added to the motion compensated current picture 2135 to form a reconstructed current picture. One or more picture stores 2120, 2122 buffer the reconstructed current picture for motion compensated prediction. In some embodiments, the encoder applies a deblocking filter to the reconstructed frame to adaptively smooth discontinuities and other artifacts in the picture.
The entropy coder 2180 compresses the output of the quantizer 2170 and some side information (e.g., motion information 2115, quantization step size). Typical entropy encoding techniques include arithmetic encoding, differential encoding, huffman encoding, run-length encoding, LZ encoding, dictionary encoding, and combinations thereof. The entropy encoder 2180 typically uses different coding techniques for different types of information (e.g., DC coefficients, AC coefficients, different types of side information) and may select from multiple code tables within a particular coding technique.
The entropy encoder 2180 provides compressed video information 2195 to the multiplexer [ "MUX" ] 2190. MUX 2190 may include buffers and buffer level indicators are fed back to the bit rate adaptation module for rate control. Before or after MUX 2190, the compressed video information 2195 may be channel coded for transmission over a network. Channel coding may apply error detection and correction data to the compressed video information 2195.
C. Video decoder
Fig. 22 is a block diagram of a generic video decoder system 2200. The decoder system 2200 receives information 2295 for a compressed sequence of video pictures and generates an output (e.g., a progressive video frame, an interlaced video frame, or a field of an interlaced video frame) that includes a reconstructed picture 2205. Particular embodiments of the video decoder typically use a variant or complementary version of the generic decoder 2200.
The decoder system 2200 decompresses the predicted picture and the key picture. For the sake of illustration, fig. 22 shows the path of a key picture through the decoder system 2200, as well as the path of a forward predicted picture. Many components of the decoder system 2200 are used to decompress key pictures and prediction pictures. The exact operations performed by those components vary depending on the type of information being decompressed.
A DEMUX (demultiplexer) 2290 receives information 2295 of the compressed video sequence and makes the received information available to an entropy decoder 2280. The DEMUX 2290 may include a jitter buffer and other buffers. The compressed video information may be channel decoded and used for error detection and correction processing before or after DEMUX 2290.
The entropy decoder 2280 typically entropy decodes the entropy-encoded quantized data and entropy-encoded side information (e.g., motion information 2215, quantization step size) using the inverse of the entropy encoding performed in the encoder. Entropy decoding techniques include arithmetic decoding, differential decoding, huffman decoding, run-length decoding, LZ decoding, dictionary decoding, and combinations thereof. The entropy decoder 2280 often uses different decoding techniques for different types of information (e.g., DC coefficients, AC coefficients, different types of side information) and may select from multiple code tables within a particular decoding technique.
The decoder 2200 decodes the motion 2215 by, for example, calculating one or more predictors for the motion vector, entropy decoding the difference motion vector, and combining the decoded difference motion vector and the predictor used to reconstruct the motion vector. Various techniques for calculating motion vector predictors, calculating differential motion vectors, and reconstructing interlaced B-fields and interlaced B-frames are described below.
The motion compensator 2230 applies the motion information 2215 to one or more reference pictures 2225 to form a prediction value 2235 for the reconstructed picture 2205. For example, motion compensator 2230 uses one or more macroblock motion vectors to find a macroblock in reference picture 2225. One or more picture stores (e.g., picture stores 2220, 2222) store previously reconstructed pictures for use as reference pictures. Typically, a B-picture has more than one reference picture (e.g., at least one temporally previous reference picture and at least one temporally future reference picture). Accordingly, the decoder system 2200 may use separate picture stores 2220 and 2222 for multiple reference pictures. The motion compensator 2230 may compensate motion by pixel, 1/2 pixel, 1/4 pixel, or other increments, and may switch the resolution of motion compensation on a picture-by-picture basis or other basis. Motion compensator 2230 may also switch between reference picture pixel interpolation types (e.g., bicubic and bilinear) on a per-frame or other basis. The resolution of the motion compensation may be the same or different horizontally or vertically. Alternatively, the motion compensator applies another type of motion compensation. The prediction by the motion compensator is rarely perfect, so the decoder 2200 also reconstructs the prediction residual.
The inverse quantizer 2270 performs inverse quantization on the entropy-decoded data. In general, the inverse quantizer applies uniform scalar inverse quantization to entropy decoded data, where the step size varies on a frame-by-frame or other basis. Alternatively, the inverse quantizer applies another type of inverse quantization to the data, such as reconstruction after non-uniform vector or non-adaptive quantization, or directly inverse quantizing spatial domain data in a decoder system that does not use inverse frequency transform.
The inverse frequency transformer 2260 converts the quantized frequency domain data into spatial domain video information. For block-based video pictures, the inverse frequency transformer 2260 applies an inverse DCT [ "IDCT" ], a variant of IDCT, or other inverse block transform to the block of frequency transform coefficients, producing pixel data or prediction residual data for the key picture or prediction picture, respectively. Alternatively, the inverse frequency transformer 2260 applies another conventional inverse frequency transform, such as an inverse fourier transform, or uses wavelet or subband analysis. The inverse frequency transformer 2260 may apply inverse frequency transforms of 8x8, 8x4, 4x8, 4x4, or other sizes.
For predicted pictures, the decoder 2200 combines the reconstructed prediction residual 2245 and the motion compensated prediction 2235 to form a reconstructed picture 2205. When the decoder needs the reconstructed picture 2205 for subsequent motion compensation, one or both picture stores (e.g., picture store 2220) cache the reconstructed picture 2205 for use in predicting the next picture. In some embodiments, the decoder 2200 applies a deblocking filter to the reconstructed picture to adaptively smooth discontinuities and other artifacts in the picture.
Interlaced P-fields and interlaced P-frames
A typical interlaced video frame consists of two fields (e.g., top and bottom fields) that are scanned at different times. In general, it is more efficient to encode static regions of an interlaced video frame by encoding fields together ("frame mode" encoding). On the other hand, it is generally more efficient to encode moving regions of an interlaced video frame by separately encoding fields ("field mode" encoding), since the two fields tend to have different motion. A forward predicted interlaced video frame can be encoded as two separate forward predicted fields-interlaced P-fields. For example, when there is high motion on an interlaced video frame and thus there is a large difference between the fields, it may be effective to separately encode the fields of the forward predicted interlaced video frame.
Alternatively, forward predicted interlaced video frames may be encoded as interlaced P-frames using a mix of field coding and frame coding. For a macroblock of an interlaced P-frame, the macroblock includes rows of pixels of the top field and the bottom field, and the rows may be encoded together in a frame encoding mode or separately in a field encoding mode.
A. Interlaced P-field
Interlaced P-fields refer to one or more previously decoded fields. For example, in some implementations, interlaced P-fields reference one or two previously decoded fields, while interlaced B-fields reference up to two previous and two future reference fields (i.e., up to a total of four reference fields). (encoding and decoding techniques for interlaced B-fields are described in detail below.)
Fig. 25 and 26 show examples of interlaced P-fields with two reference fields. In fig. 25, the current field 2510 references a top field 2520 and a bottom field 2530 of a temporally preceding interlaced video frame. Because fields 2540 and 2550 are interlaced B-fields, they are not used as reference fields. In fig. 26, the current field 2610 references the top field 2620 and the bottom field 2630 of the interlaced video frames immediately preceding the interlaced video frame containing the current field 2610. For more information on double-referenced Interlaced P-Fields, see U.S. patent application serial No. xx/yyy, zzz entitled "predictive Motion Vectors for Fields of forward-predicted Interlaced Video Frames", filed 5.2004, 27.5.M.
Fig. 27 and 28 show examples of interlaced P-fields with one reference field-the temporally latest reference field allowed. In fig. 27, the current field 2710 refers to the lower field 2730 of a temporally previous interlaced video frame, but does not refer to the less new upper field 2720 of the interlaced video frame. In the example shown in FIG. 27, fields 2740 and 2750 are interlaced B-fields and are not allowed reference fields. In fig. 28, a current field 2810 references the lower field 2830 of the interlaced video frame immediately preceding the interlaced video frame containing the current field 2810, while the less new upper field 2820 is not referenced.
Fig. 29 and 30 show examples of interlaced P-fields using the second latest reference field allowed. In fig. 29, current field 2910 refers to the top field 2920 of the temporally previous interlaced video frame, but does not refer to the updated bottom field 2920. In the example shown in FIG. 29, fields 2940 and 2950 are interlaced B-fields and are not allowed reference fields. In fig. 30, the current half frame 3010 refers to the upper half frame 3020, and does not refer to the updated lower half frame 3030.
In one implementation, all of the scenarios shown in FIGS. 25-30 are permissible in interlaced P-field syntax. Other implementations are also possible. For example, a picture may use fields from other pictures of different types or temporal locations as reference fields.
1. Field picture coordinate system and field polarity
The motion vectors represent horizontal and vertical displacements in units of 1/4 pixels. For example, if the vertical component of the motion vector represents a displacement of 6 1/4 pixel units, this means that the reference block is 1.5 field lines below the current block position (6 × 1/4 ═ 11/2).
Fig. 31 illustrates the relationship between the vertical component of the motion vector and the spatial position in one implementation. The example shown in fig. 31 shows three different scenarios 3110, 3120 and 3130 for three different combinations of current and reference field types (e.g., top and bottom). If the field type is different for the current and reference fields, the polarity is "opposite". The polarity is "same" if the field types are the same. For each case, FIG. 31 shows one vertical pixel column in the current field and a second vertical pixel column in the reference field. In practice, the two columns are horizontally aligned. Circles represent actual integer pixel positions and X represents an interpolated 1/2 or 1/4 pixel position. The horizontal component values (not shown) do not need to account for any offset due to interlacing, since the fields are horizontally aligned. Negative values indicate offsets that are more upward in the opposite direction than the positive vertical offsets shown.
In case 3110, the polarity is "opposite". The current field is the top field and the reference field is the bottom field. The position of the reference field is shifted 1/2 pixels in the downward direction relative to the current field due to the interlacing. A vertical motion vector component value of 0 is a "no vertical motion" offset and represents a position in the reference field that is at the same vertical level (absolute value) as the position in the current field; a vertical motion vector component value of +2 indicates the position in the reference field that is shifted 1/2 pixels (absolute value) below the position in the current field, which is the actual value in the reference field; and a vertical component value of +4 indicates the position in the reference field shifted to 1 full pixel (absolute value) below the position in the current field, which is an interpolation in the reference field.
In case 3120, the polarity is also "opposite". The current field is the bottom field and the reference field is the top field. The position of the reference field is shifted 1/2 pixels in the upward direction relative to the current field due to the interlacing. A vertical motion vector component value of-2 represents the position in the reference field offset to 1/2 pixels (absolute) above the position in the current field; a vertical component value of 0 indicates a position in the reference field that is at the same level (absolute value) as the position in the current field; and a vertical component value of +2 indicates a position in the reference field that is offset to 1/2 pixels (absolute value) below the position in the current field.
In case 3130, the polarity is "the same". The position of the reference field relative to the current field is the same in the vertical direction. A vertical motion vector component value of 0 is a "no vertical motion" offset and represents a position in the reference field that is at the same vertical level (absolute value) as the position in the current field; a vertical motion vector component value of +2 indicates the position in the reference field that is shifted 1/2 pixels (absolute value) below the position in the current field, which is an interpolation in the reference field; and a vertical component value of +4 indicates the position in the reference field shifted to 1 full pixel (absolute value) below the position in the current field, which is the actual value in the reference field.
Alternatively, the displacement of the motion vector is expressed according to different conventions.
2. Motion vector prediction in interlaced P-fields of dual reference fields
The interlaced P-field of a dual reference field refers to two fields in the same temporal direction (e.g., the two nearest previous reference fields). Two motion vector predictors are computed for each macroblock. In some implementations, one predictor is from a reference field of the same polarity and the other predictor is from a reference field of opposite polarity. Other combinations of polarities are also possible. (interlaced B-fields using dual reference fields in each direction are described below. in some implementations, these interlaced B-fields use the same technique for calculating motion vector predictors as interlaced P-fields.)
In some implementations, the encoder/decoder calculates a motion vector predictor for a current block or macroblock by finding an odd field predictor and an even field predictor, and selecting one of the process predictors for processing the macroblock. For example, the encoder/decoder determines an odd field motion vector predictor and an even field motion vector predictor. One of the motion vector predictors thus has the same polarity as the current field, while the other motion vector predictor has the opposite polarity. The encoder/decoder selects a motion vector predictor from the odd field motion vector predictor and the even field motion vector predictor. For example, the encoder selects between motion vector predictors based on which gives a better prediction. The encoder signals which motion vector predictor to use using a simple selection signal or using a more complex signaling in combination with context information that improves coding efficiency. The context information may indicate which of the odd or even fields, or which of the same polarity or opposite polarity fields, has been used primarily in the neighborhood around the block or macroblock. The decoder selects which motion vector predictor to use based on a selection signal and/or context information. The encoder/decoder then processes the motion vector using the selected motion vector predictor. For example, the encoder encodes the difference between the motion vector and the motion vector predictor. Alternatively, the decoder decodes the motion vector by combining the motion vector difference value and the motion vector predictor.
Alternatively, the encoder and/or decoder may skip determining the odd field motion vector predictor or skip determining the even field motion vector predictor. For example, if the encoder determines that the odd field will be used for motion compensation for a particular block or macroblock, the encoder only determines the odd field motion vector predictor. Alternatively, if the decoder determines from the context and/or signaling information that the odd field is to be used for motion compensation, the decoder only determines the odd field motion vector predictor. In this way, the encoder and decoder can avoid unnecessary operations.
The decoder may employ the following techniques to determine the motion vector predictor for the current interlaced P-field.
Two sets of three candidate motion vector predictors are available for each block or macroblock with a motion vector in an interlaced P-field. The positions of the neighboring macroblocks from which these candidate motion vector predictors are obtained relative to the current macroblock 3200 are shown in fig. 32. Three of these candidates are from the even reference field and the other three are from the odd reference field. Since the neighboring macroblocks (A, B and C) in each candidate direction are intra-coded or have actual motion vectors that reference even or odd fields, it is necessary to derive the motion vectors of the other fields (or derive the odd and even field motion vector candidates for the intra-coded macroblock). For example, for a given macroblock, assume predictor a has a motion vector that references the odd field. In this case, the "even field" candidate predictor a is derived from the motion vector of the "odd field" candidate predictor a. The derivation is done using a scaling operation. (see, e.g., the explanation of fig. 34A and 34B below.) alternatively, the derivation is done in another way.
Once the three odd field candidate motion vector predictors have been obtained, a median operation is used to derive an odd field motion vector predictor from the three odd field candidates. Similarly, once the three even field candidate motion vector predictors have been obtained, a median operation is used to derive an even field motion vector predictor from the three even field candidates. Alternatively, another mechanism is used to select a field motion vector predictor based on the candidate field motion vector predictors. The decoder decides whether to use the even or odd field as a motion vector predictor (e.g., by selecting the primary predictor), and whether the even or odd motion vector predictor is used to reconstruct the motion vector.
Pseudo code 3300 in fig. 33A-33F illustrates a process for generating motion vector predictors from predictors A, B and C arranged as shown in fig. 32. Although fig. 32 shows the neighborhood of a typical macroblock in a current interlaced P-field, the pseudo code 3300 of fig. 33A-33F addresses each particular case of macroblock location. In addition, the pseudo code 3300 may be used to calculate motion vector predictors for motion vectors of blocks at various locations.
In pseudo code 3300, the terms "same field" and "opposite field" are understood to relate to a currently encoded or decoded field. For example, if the current field is an even field, the "same field" is an even reference field and the "opposite field" is an odd reference field. The variables samefieldpred _ x and samefieldpred _ y in the pseudo code 3300 represent the horizontal and vertical components of motion vector predictors from the same field, and the variables oppotifieldpred _ x and oppotifieldpred _ y represent the horizontal and vertical components of motion vector predictors from the opposite field. The variables samecount and oppositectount track how many motion vectors of the neighborhood of the current block or macroblock reference the "same" polarity reference field of the current field and how many reference the "opposite" polarity reference field, respectively. The variables samecount and oppositectount are initialized to 0 at the beginning of the pseudo code.
The scaling operations scaleforsame () and scaleforpoposite () mentioned in pseudo code 3300 are used to derive candidate motion vector predictors for "another" field from the actual motion vector values of the neighbors. The scaling operation is implementation independent. Example scaling operations are described below with reference to fig. 34A, 34B, 35, and 36. Alternatively, other scaling operations may be used, for example, to compensate for vertical displacement such as that shown in FIG. 31. (scaling operations specific to interlaced B-fields are described in detail below.)
Fig. 33A and 33B show pseudo codes for calculating a motion vector predictor of a typical block or macroblock in an internal position within a frame. The motion vector of the "intra" neighbor is set to 0. For each neighbor, the same field motion vector predictor and the opposite field motion vector predictor are set, and when one is set by the actual value of the neighbor motion vector, the other is derived therefrom. The median of the candidate values is calculated for the same field motion vector predictor and the opposite field motion vector predictor, while the "main" predictor is determined from the samecount and the oppositectcount. The variable dominantpredictor indicates which field contains the dominant motion vector predictor. A motion vector predictor is the primary predictor if it has the same polarity as most of the three candidate predictors. (the value predictor flag representing the signal decoded with the motion vector difference data represents whether a dominant or non-dominant predictor is used.)
The pseudo code in fig. 33C addresses the case of a macroblock of an interlaced P-field having only one macroblock per line (which has no neighbors B or C). The pseudo code in fig. 33D or 33E addresses the case where a block or macroblock is at the left edge of an interlaced P-field (without neighbor C). Here, if the motion vector predictor has the same polarity as the two or more candidate predictors, it is a main predictor; and in the case of no upper and lower division, the motion vector predicted value of the opposite half frame is the main predicted value. Finally, the pseudo code in FIG. 33F addresses the situation where, for example, a macroblock is in the first line of an interlaced P-field.
3. Scaling of one field motion vector predictor derived from another field motion vector predictor
In one implementation, the encoder/decoder derives one field motion vector predictor from another field motion vector predictor using the scaling operation shown in the pseudo code 3400 of fig. 34A and 34B. The values of SCALEOPP, scalemame 1, SCALESAME2, SCALEZONE1_ X, SCALEZONE1_ Y, ZONE1OFFSET _ X and ZONE1OFFSET _ Y are implementation dependent. Two possible sets of values are shown, where the case where the current field is the first field in an interlaced video frame is shown in table 3500 of fig. 35, and the case where the current field is the second field in an interlaced video frame is shown in table 3600 of fig. 36. For a P-frame, the reference frame distance is first the number of B-frames (i.e., video frames containing two B-fields) between the current P-frame and its reference frame. If no B-frame is present, the reference distance is 0. For example, the encoder encodes the reference frame distance using a variable-size syntax element (e.g., the reflast syntax element described in detail in section XIV below).
In each of the examples shown in tables 3500 and 3600, the value of N (used as a multiplier for the SCALEZONE1_ X, SCALEZONE1_ Y, ZONE1OFFSET _ X and ZONE1OFFSET _ Y values in the tables) depends on the motion vector range. For example, the EXTENDED motion vector range may be signaled by the syntax element extension _ MV ═ 1. If EXTENDED _ MV is 1, the MVRANGE syntax element appears in the picture header and signals the motion vector range. If EXTENDED _ MV is 0, then the default motion vector range is used. Table 1 below shows the relationship between N and MVRANGE.
Table 1: derivation of N in FIGS. 35 and 36
MVRANGE N
0 or default value 1
10 2
110 8
111 16
The various values shown in tables 3500 and 3600 may vary depending on the implementation.
Alternatively, N is assumed to be 1 (i.e., scaling is not dependent on N), or scaling may be performed in some other manner.
B. Interlaced P-frame
In some implementations, macroblocks in an interlaced P-frame may be one of 5 types: 1MV, 2 field MV, 4 frame MV, 4 field MV, and intra.
In a 1MV macroblock, the displacement of 4 luma blocks in the macroblock is represented by a single motion vector. A corresponding chroma motion vector may be derived from the luma motion vector to represent the displacement of each of the 28 x8 chroma blocks of the motion vector. For example, referring again to the macroblock arrangement shown in fig. 23, a 1MV macroblock 2300 includes 48 x8 luminance blocks and 28 x8 chrominance blocks. The displacement of the luma block (Y1-Y4) is represented by a single motion vector, and a corresponding chroma motion vector may be derived from the luma motion vector to represent the displacement of each of the 2 chroma blocks (U and V).
In a 2 field MV macroblock, the displacement of each field of 4 luminance blocks in the macroblock is described by a different motion vector. For example, fig. 37 shows that the top half frame motion vector describes the displacement of the even lines of all 4 luminance blocks, and the bottom half frame motion vector describes the displacement of the odd lines of all 4 luminance blocks. Using the top half frame motion vector, the encoder can derive a corresponding top half frame chroma motion vector that describes the displacement of the even lines of the chroma block. Similarly, the encoder may derive a lower field chroma motion vector, which describes the displacement of odd lines of the chroma block.
Referring to fig. 38, in a 4-frame MV macroblock, the displacement of each of 4 luminance blocks is described by different motion vectors (MV1, MV2, MV3, and MV 4). Each chroma block may be motion compensated by using 4 derived chroma motion vectors (MV1, MV2, MV3, and MV4) that describe the displacement of 4x4 chroma sub-blocks. The motion vector for each 4x4 chroma sub-block may be derived from the motion vector of the spatially corresponding luma block.
Referring to fig. 39, in a 4 field MV macroblock, the displacement of each field of a luminance block is described by two different motion vectors. The even rows of the luminance block are vertically subdivided to form 28 × 8 regions. For even lines, the displacement of the left region is described by the motion vector of the top left half frame block, and the displacement of the right region is described by the motion vector of the top right half frame block. The odd rows of the luminance block are also vertically subdivided to form 28 x8 regions. The displacement of the left region is described by the motion vector of the bottom left half frame block and the displacement of the right region is described by the motion vector of the bottom right half frame block. Each chroma block may also be divided into 4 regions in the same manner as the luma block, and each chroma block region may be motion compensated using the derived motion vectors.
For intra macroblocks, the motion is assumed to be 0.
In general, the process of calculating a motion vector predictor for a current macroblock in an interlaced P-frame includes two steps. First, three candidate motion vectors of the current macroblock are collected from its neighboring macroblocks. For example, in one implementation, candidate motion vectors are collected based on the arrangement shown in fig. 40A-40B (and various specific cases of leading row macroblocks, etc.). Alternatively, candidate motion vectors may be collected in some other order or arrangement. Second, a motion vector predictor of the current macroblock is computed from the set of candidate motion vectors. For example, the predicted value may be calculated using the median of 3 predicted values, or by other methods.
For additional details regarding the predictor calculation and chroma motion vector derivation for macroblocks of interlaced P-frames, see U.S. provisional patent application No. 60/501,081 entitled "Video Encoding and Decoding Tools and techniques" filed on 9/7/2003 as described in section XIV below.
Bidirectional prediction of progressive video frames
As described above, macroblocks in progressive B-frames can be predicted using 5 different prediction modes: forward, backward, direct, interpolated and intra. The encoder selects and signals different prediction modes in the bitstream at the macroblock level or some other level. In forward mode, macroblocks in the current progressive B-frame are derived from temporally preceding anchors. In the reverse mode, macroblocks in the current progressive B-frame are derived from temporally subsequent anchors. Macroblocks predicted in direct or interpolated mode use forward and backward anchors for prediction. Because there are two reference frames for direct and interpolated modes, there are typically at least two motion vectors (explicitly coded or derived) for each macroblock. (aspects of encoding, signal representation, and decoding for progressive B-frames may also be used for interlaced B-frames, as described below.)
In some implementations, the encoder implicitly derives the motion vector in the direct mode by scaling the collocated motion vector for the forward anchor with the fractional value. The score may reflect the relative temporal position of the current progressive B-frame within the interval formed by its anchor, but need not reflect the true inter-frame distance. Thus, the encoder need not assume a fixed speed. This allows the encoder additional freedom to accurately and easily describe the true motion between the anchor and the current progressive B-frame by varying the fraction from the "actual" temporal position in order to improve the motion compensated prediction. The variable bfmotion represents a different score that may be sent in the bitstream (e.g., at the picture level or some other level) to represent the relative temporal position. The different scores are a finite set of discrete values between 0 and 1.
Referring back to fig. 17, table 1700 is a Variable Length Code (VLC) table for the bitstream element bfmotion. There is no restriction on the uniqueness of bfrace in progressive B-frames between the same two anchors; different progressive B-frames with the same anchor may have the same bfrace value. The code in table 1700 may be changed or rearranged to represent different scores with different codes. Other possible codes (e.g., 1111110 or 1111111) not shown in table 1700 may be considered invalid codes, or may be used for other purposes. For example, entry 1111110 may be used to explicitly encode BFRACTION in fixed point format. As another example, the entry 1111111 may be used to signal a particular frame type (e.g., an intra-coded progressive B-frame).
Referring again to fig. 18, the decoder finds the scaling coefficients from the pseudo code 1800. Referring back to fig. 19, the decoder uses the scaling factor to scale the x and y elements of the motion vector of the co-located macroblock in the subsequent reference picture. The function Scale _ Direct _ MV in the pseudo-code 1900 takes the inputs MV _ X and MV _ Y and derives two motion directions using the Direct modeAmount, where one motion vector refers to the forward (previous) anchor picture (MV _ X)F、MV_YF) While the other motion vector refers to the inverse (subsequent) anchor picture (MV _ X)B、MV_YB)。
The "skipped" macroblock signal in a progressive B-frame indicates that no motion vector prediction error has occurred for a given macroblock. The predicted motion vector will be exactly identical to the motion vector used by the encoder/decoder in reconstructing the macroblock (i.e. no motion vector prediction error is applied). The encoder still signals the prediction mode of the macroblock because the macroblock can be skipped using direct, forward, backward or interpolated prediction.
Innovative overview of predictive encoding/decoding of interlaced B-pictures
The described embodiments include techniques and tools for encoding and decoding interlaced B-pictures (e.g., interlaced B-fields, interlaced B-frames). The various described embodiments implement one or more of the described techniques and tools for encoding and/or decoding bi-directionally predicted interlaced pictures, including but not limited to:
1. for interlaced B-frames, the encoder/decoder switches the prediction mode between the top field and the bottom field in a macroblock of the interlaced B-frame.
2. For interlaced B-frames, the encoder/decoder calculates the direct mode motion vector for the current macroblock by selecting a representative motion vector for each of the top and bottom fields of the previously decoded temporally subsequent anchor co-located macroblock. The selection may be performed based at least in part on the mode (e.g., 1MV mode, 2 field MV mode, etc.) in which the macroblock of the current interlaced B-frame is encoded.
3. For interlaced B-fields or interlaced B-frames, the encoder/decoder uses 4MV coding. For example, 4MV can be used for unidirectional prediction modes (forward or backward), but not for other available prediction modes (e.g., direct, interpolated).
4. For interlaced B-fields or interlaced B-frames, the forward motion vectors are predicted using previously reconstructed (or estimated) forward motion vectors from a forward motion vector buffer, and the backward motion vectors are predicted using previously reconstructed (or estimated) backward motion vectors from a backward motion vector buffer. The resulting motion vectors are added to the respective buffers and the holes in the motion vector buffers can be filled with estimated motion vector values.
a. For interlaced B-frames, when forward prediction is used to predict a motion vector and the motion vector is added to the forward buffer, the corresponding location in the reverse buffer is filled with a predicted motion vector that uses only the reverse motion vector as the predictor ("hole filling"). Similarly, when backward prediction is used to predict a motion vector and the motion vector is added to the backward buffer, the corresponding location in the forward motion vector buffer is filled with the predicted motion vector using only the forward motion vector as the predictor.
b. For interlaced B-fields, to select between motion vectors of different polarities (e.g., "same polarity" or "opposite polarity") for the hole fill, the encoder/decoder selects the dominant polarity field motion vector. The distance between the anchor and the current frame is calculated using various syntax elements, and the calculated distance is used to scale the reference field motion vectors.
5. For interlaced B-fields, the encoder/decoder uses "self-reference" frames. For example, the second B-field in the current frame refers to the first B-field from the current frame in motion compensated prediction.
6. For an interlaced B-field, the encoder sends binary information (e.g., at the B-field level of the compressed bitplane) indicating whether the prediction mode for one or more macroblocks in the interlaced B-field is forward or non-forward. The decoder performs the corresponding decoding.
7. For interlaced B-fields, if the corresponding macroblock in the corresponding field of the next anchor picture is encoded using four motion vectors, the encoder/decoder uses logic that favors the dominant polarity to select the motion vector for direct mode.
8. Intra-coded fields: when no good motion compensation is possible for a B field, it can be coded as an intra (i.e., non-predicted) B field ("BI field").
The various described techniques and tools may be combined with each other, with other techniques, or may be used alone.
Switching prediction modes within field coded macroblocks in interlaced B-frames
In some implementations, the encoder performs prediction mode switching within macroblocks of an interlaced B-frame. For example, the encoder allows the prediction mode to be switched from forward to reverse, or vice versa, when going from the top field to the bottom field in a macroblock of an interlaced B-frame. Instead of encoding an entire macroblock with one prediction direction mode, a combination of prediction direction modes is used to encode a single macroblock. The ability to change the prediction direction mode in the respective fields of a macroblock leads in many cases to a more efficient coding of interlaced B-frames.
Fig. 41 illustrates one technique 4100 for predicting motion vectors for respective fields in a field coded macroblock of an interlaced B-frame using different prediction modes. At 4110, in an interlaced B-frame, the encoder/decoder predicts a motion vector of a first field in a field coded macroblock using a first prediction mode. In some implementations, the "first field" may be the top field or the bottom field, the decision of which is signaled independently. At 4120, the encoder/decoder predicts the motion vector of the second field in the same macroblock using a different prediction mode.
For example, for a macroblock encoded using two motion vector fields, the top field may be forward predicted (i.e., the top field motion vector references the previous anchor picture), while the bottom field may be backward predicted (i.e., the bottom field references the subsequent anchor picture). In some implementations, field-coded macroblocks in interlaced B-frames are not coded using 4 motion vectors. Alternatively, if the macroblock is field coded using 4 motion vectors (e.g., two motion vectors per field), the two motion vectors of the top field will reference one anchor (forward or backward) and the motion vectors of the bottom field will reference the other anchor.
This switching of prediction modes requires only one additional bit in case the macroblock type does not start with direct or interpolated, as further shown in the following pseudo code of interlaced B-frames:
if MB is field coded AND the type of AND MB is forward or backward then
IfMVSwitch ═ 1 then prediction mode switches between the top and bottom fields (from forward to reverse or vice versa)
Thus, limiting the prediction mode switching to the forward and reverse modes avoids the need for more bits to signal the second mode, as this second mode is implicit from the first mode (previously signaled) and the switching value.
If there is a higher motion in the area covered by the macroblocks of an interlaced B-frame, it is possible for the macroblocks to be coded in field mode. In these cases, forward or backward prediction is more likely to give accurate motion compensation results than direct or interpolated modes (including pixel averaging). Direct and interpolated modes are not the best way to encode these macroblocks, since the individual results are averaged in smoothing (e.g., high frequency elements with high motion are lost). Experimental results have shown that it is inefficient because of the increased overhead of signaling all four prediction modes as switching options at the field level within a field coded macroblock.
Alternatively, the encoder may switch more than two prediction modes within a field coded macroblock of an interlaced B-frame, or may switch between different prediction modes.
Computing direct mode motion vectors in interlaced B-frames
In some implementations, the encoder/decoder buffers motion vectors from a previously decoded anchor I-frame or P-frame (which is a temporally forward reference frame used as a backward prediction reference frame) and selects one or more of the buffered motion vectors to use for calculating a direct mode motion vector for a current macroblock in an interlaced B-frame. For example, the encoder/decoder buffers a representative motion vector from each of the top field and the bottom field of each macroblock of the anchor frame and calculates a motion vector for the current direct mode macroblock using one or more of the buffered motion vectors. The selection is performed based at least in part on the coding mode of the current macroblock (e.g., 1MV mode, 2 field MV mode, etc.).
Fig. 42 illustrates one technique 4200 for calculating direct mode motion vectors for macroblocks in interlaced B-frames in one implementation. At 4210, the encoder/decoder buffers a plurality of motion vectors for each macroblock of the co-located macroblock in the previously reconstructed temporally future anchor frame. If the co-located macroblock has only one motion vector, the motion vector will be buffered as motion vector values for the respective blocks of the co-located macroblock, if needed. At 4220, the encoder/decoder selects one or more buffered motion vectors of the co-located macroblock for direct mode prediction of the current macroblock in the interlaced B-frame, depending in part on the number of motion vectors required for the current macroblock.
In an implementation, the decoder buffers two motion vectors in a co-located macroblock, or buffers half of the maximum possible number of decoded luma motion vectors from future anchor frames. Macroblocks in the anchor frame may be encoded in different ways, with up to 4 motion vectors per macroblock, but only up to two motion vectors may be buffered, as described below. Also, the number of forward/backward motion vector pairs generated for the current macroblock depends on the coding mode of the current macroblock, rather than only on the coding mode of the co-located macroblocks of the previously decoded future anchor frame.
For example, if the current direct mode macroblock is 1MV encoded, the decoder takes the buffered motion vectors from the top half of the anchor frame's co-located macroblock and generates a pair of direct motion vectors-one forward and the other backward. If the current direct mode macroblock is field coded, the decoder takes the buffered top and bottom field motion vectors from the co-located macroblock of the anchor frame and generates two pairs of motion vectors, 4 motion vectors for the current direct mode macroblock-one forward and one backward for each field.
Fig. 43 shows motion vectors MV1, MV2, MV3, and MV4 for respective blocks of a previously decoded temporally future co-located macroblock 4300 of an anchor frame. If the co-located macroblock is a 1MV macroblock, MV1, MV2, MV3 and MV4 are all equal. If the co-located macroblock is a 2 field MV macroblock, MV1 and MV2 are equal to one value, and MV3 and MV4 are equal to another value. If the anchor frame's co-located macroblock is a 4 field MV or 4 frame MV macroblock, then MV1, MV2, MV3, and MV4 may all be different values. However, even though MV1, MV2, MV3, and MV4 are all available, the decoder still buffers only MV1 and MV 3.
In the example shown in fig. 43, the decoder buffers MV1 and MV 3. If the current macroblock uses the 1MV mode, the decoder selects MV1 to calculate the forward and reverse direct mode motion vectors of the current macroblock and omits MV 3. If the current macroblock uses 2-field MV mode, the decoder uses MV1 and MV3 to calculate 4 direct mode motion vectors. This operation produces a good representation of the motion of the top and bottom fields of the current macroblock.
When a motion vector from a co-located macroblock in the anchor frame has been selected, the decoder applies scaling logic to derive corresponding forward and backward pointing motion vectors for direct mode prediction of B frame macroblocks. For example, the decoder may apply the function Scale _ Direct _ MV in fig. 19. Alternatively, the decoder applies a different scaling function.
Alternatively, the encoder/decoder may buffer 4 motion vectors for each anchor frame macroblock. For example, if the current macroblock is 1MV encoded, the encoder/decoder may take the top-left motion vector of the co-located macroblock in the anchor frame and generate a pair of direct motion vectors, or may take the average of the 4 motion vectors of the anchor frame macroblock. If the current macroblock is field coded, the encoder/decoder may take the top left and bottom left motion vectors and generate two pairs (one field pair), or may take the mean of the top motion vectors and the mean of the bottom motion vectors for the anchor frame macroblock.
The direct mode motion vector is considered to be (0, 0) when the co-located macroblock in the anchor frame is intra, or when the anchor frame is an I-frame.
Interlaced B-fields and 4MV coding in interlaced B-frames
In some implementations, the encoder encodes the interlaced B-fields and the interlaced B-frames using a 4 motion vector (4MV) coding mode. 4MV coding may allow representation of complex motion trajectories to be more accurate than 1 motion vector (1MV) coding (e.g., by allowing 4 luma blocks in a macroblock to be independently predicted and motion compensated). The use of 4MV may be limited to certain prediction modes. For example, in some implementations, the encoder uses 4MV for forward and reverse modes (including both field and frame changes), and not for direct or interpolated modes. This is different from the progressive coding mode when 4MV is not used for progressive B-frames.
Direct and interpolated modes involve computing the pixel average as predicted by motion compensation, which is used to smooth fine details. If such smoothing is acceptable, it is possible to use the 1MV mode instead of the 4MV mode because it is easier to encode 1MV and 1MV can be used to accurately describe a smooth motion trajectory. Experiments have shown that it is advantageous to use 4MV mode in interlaced B-fields and macroblocks of interlaced B-frames, while limiting the 4MV mode to forward and backward predicted macroblocks. Another factor that helps to limit 4MV to forward and backward modes is that combining 4MV with direct or interpolated mode will result in a total of 8 motion vectors in each case. The accuracy advantage is generally offset by the signaling overhead (for the interpolation mode) and implementation and decoding complexity associated with the 8 motion vectors. Furthermore, it is often not practical to encode an interlaced B-picture with 8 motion vectors, as P-pictures, which are typically encoded at higher quality settings (i.e. less strongly quantized), can typically use only one or four motion vectors for motion compensation.
Limiting the 4MV to certain prediction modes also has other advantages. For example, if the 4MV is limited to forward and backward prediction modes only, and if the forward/non-forward mode decision is signaled (e.g., using a bit-plane coding technique such as that described in section XI below), the encoder need not send any additional bits to signal the prediction mode of the 4MV macroblock.
The following pseudo-code may be applied to macroblocks of an interlaced B-field, where the forward/non-forward decision is bit-plane coded and sent before any macroblock level information (e.g., sent at picture level):
the AND prediction mode for IfMB being 4MV coding is non-forward
the then prediction mode is reversed (no more bits are sent to signal the mode)
In some implementations, the direct/indirect prediction mode decision is sent before any macroblock level information (e.g., in a compressed bitplane on the picture level). (for more information on Coding direct/indirect information, see U.S. patent application entitled "Advanced Bi-Directional Coding of video Frames" for Advanced Bi-Directional Predictive Coding of video Frames, serial No. 10/622,378 and filed on 7/18/2003.) the following pseudo-code may be applied to macroblocks of interlaced B-Frames, where 4MV is limited to forward and reverse modes in these implementations:
the AND prediction mode for IfMB being 4MV coding is non-forward
then sends an additional bit to signal the prediction mode (forward or reverse)
Alternatively, 4MV is used for a prediction mode other than or in addition to forward or reverse modes, not used for forward modes, not used for reverse modes, or not used for any prediction mode. For example, in some implementations, 4MV is used for interlaced B-fields, but not for interlaced B-frames. In other alternative implementations, other codes or code lengths may be used to signal the prediction mode in conjunction with 4MV coding.
IX. use separate forward and reverse motion vector buffers to predict motion vectors in interlaced B-pictures
Motion vectors of interlaced B-pictures are predicted using separate forward and reverse motion vector contexts. In general, a forward motion vector is predicted using a motion vector stored in a forward motion vector buffer, and a backward motion vector is predicted using a motion vector stored in a backward motion vector buffer. The resulting motion vector for the current macroblock is then stored in an appropriate buffer and available for subsequent motion vector predictors for other macroblocks. Typically, the respective spaces in the forward and backward motion vector buffers are filled for each macroblock, even if a given macroblock is predicted with only forward motion vectors (in the case of a forward predicted macroblock) or only backward motion vectors (in the case of a backward predicted macroblock). The following sections describe techniques for predicting motion vectors in interlaced B-pictures (e.g., interlaced B-fields, interlaced B-frames), and for "filling" corresponding spaces in a motion vector buffer for "missing" forward or reverse motion vectors.
A. Forward and reverse buffers
When predicting motion vectors for interlaced B-pictures, the encoder/decoder uses previously reconstructed motion vectors in the forward motion vector buffer and/or the backward motion vector buffer. In the forward mode, the encoder/decoder uses the reconstructed forward motion vectors from the forward motion vector buffer to predict the current motion vector for forward motion compensation. In the reverse mode, the encoder/decoder uses the reconstructed reverse motion vectors from the reverse motion vector buffer to predict the current motion vector for reverse motion compensation. For direct mode or interpolated mode macroblocks, the encoder/decoder uses a forward motion vector buffer to predict the forward motion vector component (or possibly multiple forward motion components) and a backward motion vector buffer to predict the backward motion vector component (or possibly multiple backward motion components).
After reconstructing the motion vectors of the interlaced B-picture, the encoder/decoder buffers the reconstructed forward motion vectors in a forward motion vector buffer and buffers the reconstructed reverse motion vectors in a reverse motion vector buffer. In the forward mode, the encoder/decoder stores the reconstructed forward motion vectors in a forward motion vector buffer. In the reverse mode, the encoder/decoder stores the reconstructed reverse motion vectors in a reverse motion vector buffer. For macroblocks that use direct or interpolated prediction modes, the encoder/decoder stores forward motion vector component(s) in a forward motion vector buffer and backward motion vector component(s) in a backward motion vector buffer.
For example, if the encoder encodes a forward predicted macroblock at macroblock coordinate position (12, 13) in an interlaced B-picture, the encoder calculates the forward motion vector predictor and sends the residual of the forward motion vector in the bitstream (assuming the macroblock is not "skipped"). The decoder decodes the residual (i.e., difference) and reconstructs the motion vector. The encoder/decoder inserts the reconstructed motion vectors into a forward motion vector buffer. The encoder/decoder then uses motion vector prediction logic to calculate an inverse motion vector predictor to fill in the inverse motion vector and places the inverse motion vector at a location (12, 13) in an inverse motion vector buffer. For example, in the case of median prediction of the three, the encoder/decoder may take the median of the buffered backward motion vectors at locations (11, 13), (12, 12), and (13, 12) (left, top, and top-right neighbors of the current forward predicted macroblock) to fill in the backward motion vector of (12, 13).
Fig. 44 illustrates one technique 4400 for predicting motion vectors for a current macroblock in an interlaced B-picture using forward and/or reverse motion vector buffers. At 4410, the encoder/decoder selects whether to use a forward or reverse motion vector buffer depending on whether the motion vector to be predicted is a forward or reverse motion vector. If the current motion vector is a forward motion vector, the encoder/decoder selects a set of candidate motion vector predictors from the forward motion vector buffer at 4420. If the current motion vector is an inverse motion vector, the encoder/decoder selects a set of candidate motion vector predictors from the inverse motion vector buffer at 4430. At 4440, the encoder/decoder calculates a motion vector predictor based on the set of candidate motion vector predictors. For example, the encoder/decoder computes the median of the set of candidate motion vector predictors. In a simple case, the encoder/decoder calculates a motion vector predictor of a 1MV current macroblock based on predictors of all 1MV macroblocks. A more complex variation is described below, where the current macroblock and/or the neighboring macroblocks have different modes.
Fig. 45 illustrates motion vectors in the forward motion vector buffer 4510 and the backward motion vector buffer 4520. In the example shown in fig. 45, for the reconstructed macroblock 4530 and 4570, the encoder/decoder stores the forward motion vector in the forward motion vector buffer 4510 and the backward motion vector in the backward motion vector buffer 4520. To predict the motion vector of the current macroblock 4580, the encoder/decoder uses candidate predictors from neighboring macroblocks. For example, if the current macroblock 4580 is predicted in forward mode, the encoder predicts the forward motion vector using the neighboring forward motion vectors in the forward motion vector buffer (e.g., using a median of three prediction), and then fills the current macroblock location in the forward motion vector buffer with the reconstructed motion vector values. To fill the corresponding current macroblock position in the backward motion vector buffer 4520, the encoder/decoder may predict a backward motion vector using neighboring backward motion vectors in the backward motion vector buffer and place the prediction value in the position of the current macroblock of the backward motion vector buffer.
B. Motion vector prediction in interlaced B-frames
In some implementations, the encoder/decoder employs the following scheme to predict motion vectors for macroblocks in an interlaced B-frame (including their different fields), using separate forward and reverse motion vector contexts. Fig. 40A-40B illustrate neighboring macroblocks from which candidate motion vectors are collected.
If the 1MV macroblock is forward predicted, the encoder/decoder predicts its forward motion vector from the candidate motion vectors in the forward motion vector buffer (e.g., using a three-way median prediction and prediction mode such as that shown in FIGS. 40A and 40B or elsewhere). The encoder/decoder stores the forward motion vector (after adding the motion vector prediction error) in a forward motion vector buffer. The encoder/decoder fills in the "hole" (e.g., as in the forward prediction case) by predicting a backward motion vector from the candidate motion vectors of the backward motion vector buffer and stores the backward motion vector (here, the predictor) in the backward motion vector buffer.
If the 1MV macroblock is backward predicted, the encoder/decoder predicts its backward motion vector from the candidate motion vectors of the backward motion vector buffer (e.g., as in the forward prediction case). The encoder/decoder stores the backward motion vector (after adding the prediction error) in a backward motion vector buffer. The encoder/decoder fills in the "hole" by predicting a forward motion vector from the candidate motion vectors of the forward motion vector buffer and stores the forward motion vector (here, the predictor) in the forward motion vector buffer.
As neighbors to the intra-coded macroblock in the forward and backward motion vector buffers.
Various special cases determine the combination of 1MV and field coded 2MV macroblocks in interlaced B-frames. If the neighboring macroblock in position A, B or C of the current 1MV macroblock is a field coded 2MV macroblock, the encoder/decoder takes the mean of the field motion vectors of the 2MV macroblock as the motion vector predictor for that position.
For a forward predicted current 2-field MV macroblock, for example, for each of the two forward predicted field motion vectors, candidate motion vectors from neighbors are collected from the forward motion vector buffer. The encoder/decoder selects a set of candidate motion vectors based on the coding modes (e.g., intra, 1MV, 2 field MV) of neighboring macroblocks stored in the forward motion vector buffer. If a neighboring macroblock exists and is not intra-coded, the encoder/decoder takes care of the motion vector of the macroblock to add to the candidate set. In some embodiments, the encoder/decoder continues as follows. For the top half frame forward motion vector, if the neighboring macroblock in position A, B or C is a 1MV macroblock, the encoder adds the motion vector from the macroblock in the corresponding position of the forward motion vector buffer to the candidate set. For neighboring macroblocks in position A, B or C, which are 2 field MV macroblocks, the encoder/decoder adds the top field MV from the corresponding position of the forward motion vector buffer to the set.
For the next field forward motion vector, if the neighboring macroblock in position A, B or C is a 1MV macroblock, the encoder adds the motion vector from the macroblock in the corresponding position of the forward motion vector buffer to the candidate set. For neighboring macroblocks in position A, B or C, which are 2 field MV macroblocks, the encoder/decoder adds the next field MV from the corresponding position of the forward motion vector buffer into the set.
To calculate the predictor of the field motion vector for the 2 field MV macroblock, the encoder/decoder then calculates the median of the candidate set.
To calculate the backward predicted motion vector for a 2 field MV macroblock, the logic is the same as for the forward prediction case, but the candidate motion vectors from the neighbors are collected from the backward motion vector buffer.
Again, for motion vector prediction, the intra-coded neighbors in position A, B or C are omitted.
After reconstructing the motion vectors of the 2-field MV macroblock (e.g., by adding motion vector difference information), the reconstructed actual motion vectors are placed into either a forward motion vector buffer or a backward motion vector buffer in a prediction direction that is appropriate for the reconstructed motion vectors. The corresponding empty slot of the motion vector buffer for the missing direction is filled by calculating a motion vector predictor for the missing direction and storing the motion vector predictor in the empty slot.
Hole filling of interlaced B-intra field coded macroblocks is involved in an example if prediction mode switching is used (see section VI above). In this case, a given field coded 2MV macroblock has one forward motion vector and one backward motion vector. After reconstructing a field-coded macroblock of an interlaced B-frame, the encoder/decoder fills the up-down motion vector "slots" of the forward motion vector buffer with forward motion vectors and fills the up-down motion vector slots of the backward buffer with backward motion vectors when the field-coded macroblock switches the prediction direction between the upper field and the lower field. Although the forward motion vector is sent for only one field (e.g., the top field), the encoder places the same motion vector into the top and bottom field motion vector slots of the forward motion vector buffer. Similarly, although the inverse motion vector is transmitted only for the lower half frame, the encoder places it in the upper and lower half frame slots of the inverse motion vector buffer.
For example, fig. 46 shows the motion vectors of the top and bottom fields of the reconstructed macroblock 4680 in the forward motion vector buffer 4610 and the backward motion vector buffer 4620. In the example shown in fig. 46, for the reconstructed macroblock 4630-4670, the encoder/decoder stores the forward motion vectors in the forward motion vector buffer 4610 and the backward motion vectors in the backward buffer 4620. The reconstructed macroblock 4680 is field coded with prediction switching and its top field motion vectors are stored in the forward or backward motion vector buffer at top and bottom positions (depending on the prediction direction of the top field motion vectors). The lower field motion vectors for macroblock 4680 are stored in the other motion vector buffers in the upper and lower positions. In this example, the reconstructed macroblock 4680 uses prediction mode switching. Although both forward and directional motion vectors are sent for only one field, the encoder places the same motion vector in the upper and lower field motion vector slots of the corresponding forward and reverse motion vector buffers.
If the current macroblock is interpolated, the encoder/decoder predicts a forward motion vector (or a forward motion vector of a 2-field MV macroblock) using a forward motion vector buffer, predicts a backward motion vector (or a backward motion vector of a 2-field MV macroblock) using a backward motion vector buffer, and stores the forward and backward motion vectors in the forward and backward motion vector buffers, respectively (after adding a prediction error that is added as soon as calculated).
If the macroblock is directly predicted in an interlaced B-frame, the encoder/decoder may use the techniques described in section VII above.
In some implementations, 1MV macroblock, 2 field MV macroblock, and intra macroblock are allowed for interlaced B-frames (but not other MV macroblock types) because the logic for predicting motion vectors is simplified because fewer current/neighbor mode combinations need to be determined. Alternatively, other and/or additional MV modes are allowed, such as 4 frame MV macroblocks and 4 field MV macroblocks. For example, a portion of the pseudo code shown in FIGS. 64, 69, and 70 may be used to determine such other combinations of interlaced B-frames.
C. Motion vector prediction for interlaced B-fields
In general, for interlaced B-fields, the previously reconstructed (or derived) forward field motion vector is used as a predictor for the current forward field motion vector, and the previously reconstructed (or derived) reverse field motion vector is used as a predictor for the current reverse field motion vector. In forward or reverse mode, the motion vectors for the current forward or reverse field are added to the appropriate motion vector buffer, and the motion vectors for the other (missing) direction (e.g., the reverse direction in forward mode, or the forward direction in reverse mode) are derived for use as predictors for future use.
In some implementations, the field motion vector prediction selection is made according to the dual reference field motion vector prediction logic described in detail in section iii.a.2 above and described in section xiv.b.3 below. For example, the pseudo code shown in FIGS. 33A-33F is used to calculate the forward motion vector predictors for two fields of a macroblock of an interlaced B-field, and one motion vector predictor is selected for reconstructing the motion vector of the forward field. The reconstructed motion vectors are then placed in a forward motion vector buffer. The pseudo code is also used to calculate inverse motion vector predictors for two fields of the macroblock, and one predictor is selected for use as a fill value for the inverse motion vector buffer. For interlaced B-fields, the encoder/decoder selects between motion vector predictors of the same polarity and opposite polarity in order to fill the "holes" in the motion vector buffer lacking direction. This choice between polarities is because two predictors are generated in a given missing direction-one with the same polarity as the current field and the other with the opposite polarity to the current field. Thus, in some implementations, the encoder/decoder selects a primary or "dominant" polarity predictor for the missing directional motion vector. In this way, the complete set of forward and directional motion vectors is provided for motion vector prediction. Alternatively, the dominant polarity is determined, predictor selection is performed first, and only the selected motion vector predictor is calculated.
The process of actual value buffering and hole filling by selecting from field motion vector predictors of different polarities in one implementation is illustrated by pseudo code 4700 in fig. 47. Pseudo code 4700 shows that during hole filling prediction, no actual motion vector is provided as the missing direction, so the predicted missing direction motion vector with dominant polarity is selected by the encoder/decoder.
In some implementations, the entire scheme for motion vector prediction for interlaced B-fields is as follows.
If the macroblock is forward predicted, the encoder/decoder predicts its forward motion vector from candidate same and/or opposite polarity motion vectors of the forward motion vector buffer (e.g., using the three median prediction from the left, top, and top-right neighbors in most cases) or from motion vectors derived from the buffered motion vectors. The encoder/decoder stores the reconstructed forward motion vector in a forward motion vector buffer, calculates a main backward motion vector predictor (similar to the one predicted using the three median from the spatial neighborhood of the backward motion vector buffer), and stores it in a corresponding location in the backward motion vector buffer.
If the macroblock is backward predicted, the encoder/decoder predicts its backward motion vector from candidate same and/or opposite polarity motion vectors of the backward motion vector buffer (e.g., using the three median prediction from the left, top, and top-right neighbors in most cases) or from motion vectors derived from the buffered motion vectors. The encoder/decoder stores the reconstructed backward motion vector in a backward motion vector buffer, calculates a main forward motion vector predictor (similar to the one predicted using the three median from the spatial neighborhood of the forward motion vector buffer), and stores it in the corresponding location in the forward motion vector buffer.
If the macroblock is interpolated, the encoder/decoder predicts the forward motion vector component using the forward motion vector buffer, predicts the backward motion vector component using the backward motion vector buffer, and stores the reconstructed forward and backward motion vectors in the forward and backward motion vector buffers, respectively (after adding the prediction error that was added as soon as it was calculated).
If the macroblock is directly predicted, the encoder/decoder calculates a direct mode motion vector for the current field and stores the forward and backward motion vector components in the corresponding motion vector buffers.
In motion vector prediction, the intra-coded neighbors in position A, B or C are omitted.
Various special cases determine the combination of 1MV and 4MV macroblocks in an interlaced B-field. Fig. 6A-10 illustrate predictor modes for motion vector prediction for progressive scan P-frames. These same modes show the position of a block or macroblock that is considered as a candidate motion vector for motion vector prediction of motion vectors for 1MV or 4MV macroblocks of a mixed MV interlaced B-field. For the particular case where the frame is one macroblock wide, the predictor is always Predicator a (top predictor). Various other rules address other specific scenarios, such as the top row 4MV macroblock, the top row 1MV macroblock, and intra-coded prediction values.
The predictor modes shown in fig. 6A-10 are used to forward predict using candidates from locations in the forward mv buffer and to backward predict using candidates from locations in the backward mv buffer. In addition, the predictor modes shown in FIGS. 6A-10 are used in conjunction with the above-described dual reference field motion vector prediction logic for interlaced B-fields.
Fig. 6A and 6B show the positions of blocks of candidate motion vector predictors for a 1MV current macroblock in an interlaced B-field considered for mixed MVs. The neighboring respective macroblocks may be 1MV or 4MV macroblocks. Fig. 6A and 6B show the positions of candidate motion vectors assuming that the neighbors are 4MV (i.e., predictor a is the motion vector of block 2 in the macroblock above the current macroblock, and predictor C is the motion vector of block 1 in the macroblock immediately to the left of the current macroblock). If any of the neighbors is a 1MV macroblock, the motion vector predictor shown in FIGS. 5A and 5B is considered as the motion vector predictor of the entire macroblock. As shown in fig. 6B, if the macroblock is the last macroblock in the row, predictor B is from block 3 of the top left macroblock and not from block 2 of the top right macroblock as would otherwise be the case.
Fig. 7A-10 show the positions of blocks of candidate motion vector predictors for each of the 4 luminance blocks in a 4MV macroblock considered as an interlaced B-field for mixed MVs. Fig. 7A and 7B are diagrams illustrating the positions of blocks that are considered as candidate motion vector predictors for a block at position 0; fig. 8A and 8B are diagrams illustrating the positions of blocks that are considered as candidate motion vector predictors for a block at position 1; fig. 9 is a diagram showing the positions of blocks that are regarded as candidate motion vector predictors for a block at position 2; and fig. 10 is a diagram showing the positions of blocks that are regarded as candidate motion vector predictors for a block at position 3. Again, if the neighbor is a 1MV macroblock, the motion vector predictor of that macroblock is used for each block of that macroblock.
For the case where the macroblock is the first macroblock in a row, predictor B of block 0 is processed differently from block 0 of the remaining macroblocks in the row (see fig. 7A and 7B). In this case, predictor B is taken from block 3 of the macroblock immediately above the current macroblock, and not from block 3 of the macroblock above and to the left of the current macroblock as in the other cases. Similarly, for the case where the macroblock is the last macroblock in a row, predictor B of block 1 is processed differently (see fig. 8A and 8B). In this case, the predictor is taken from block 2 of the macroblock immediately above the current macroblock, and not from block 2 of the macroblock to the top right of the current macroblock as in the other cases. In general, if the macroblock is in the first macroblock column, predictor C for blocks 0 and 2 is set equal to 0.
Again, for motion vector prediction, the intra-coded neighbors in position A, B or C are omitted.
After reconstructing the motion vectors of the 4MV macroblock (e.g., by adding motion vector difference information), the reconstructed actual motion vectors are placed into either a forward motion vector buffer or a backward motion vector buffer in a prediction direction that is appropriate for the reconstructed motion vectors. The corresponding empty slot of the motion vector buffer for the missing direction is filled by calculating motion vector predictors of the same and opposite polarities for the missing direction, selecting between motion vector predictors of different polarities, and storing the motion vector predictors in the empty slot.
Referring back to fig. 34A and 34B, for motion vector prediction, the encoder/decoder derives one field motion vector predictor from another field motion vector predictor using the scaling operation shown in the pseudo-code 3400. Two possible sets of values are shown, where the case where the current field is the first field in an interlaced video frame is shown in table 3500 of fig. 35, and the case where the current field is the second field in an interlaced video frame is shown in table 3600 of fig. 36. In tables 3500 and 3600, SCALEOPP, SCALESAME1, SCALESAME2, scalezon 1_ X, SCALEZONE1_ Y, ZONE1OFFSET _ X, and ZONE1OFFSET _ Y depend on the reference frame distance.
In some implementations, fractional coding is used to calculate reference frame distances for forward and reverse references in interlaced B-fields. The bfmotion syntax element (which signals the forward or backward prediction mode macroblocks of an interlaced B-field, not just the direct mode macroblocks of an interlaced B-frame) is used to derive the forward and backward reference picture distances, as shown in the following pseudo-code:
forward Reference Frame Distance (FRFD) ═
NINT ((BFRACTION numerator)/(BFRACTION denominator) × reference frame distance-1
If(FRFD<0)then FRFD=0
Distance to Backward Reference Frame (BRFD) ═
Reference frame distance-FRFD-1 (where NINT is the nearest integer operator)
The bfrace numerator and denominator are decoded from the bfrace syntax element. The element bfmotion may be used to represent different fractions that may be sent in the bitstream (e.g., at the frame level of an interlaced B-field). This score takes values in a finite set of discrete values between 0 and 1 and marks the relative temporal position of the B-picture within the interval formed by its anchor.
For forward prediction and backward prediction of the second field in a frame with interlaced B-fields, the encoder/decoder performs motion vector scaling according to the pseudo code 3400 in fig. 34A and 34B. However, in some implementations, the encoder/decoder that performs the inverse motion vector prediction of the first half frame uses the functions scaleforopsisite _ x, scaleforopsisite _ y, scaleforsame _ x, scaleforsame _ y as defined in the pseudo code 4800 shown in fig. 48. SCALESAME, SCALEOPP1, SCALEOPP2, SCALEZONE1_ X, SCALEZONE1_ Y, ZONE1OFFSET _ X and ZONE1OFFSET _ Y for the first interlaced B-field in one implementation are shown in table 4900 of FIG. 49. In table 4900, the relationship between the variable N and the motion vector range is the same as that described above with reference to fig. 35 and 36 and table 1.
Alternatively, the reference frame distance is calculated in another way, or the scaling is performed according to a different algorithm. For example, scaling is performed independently of the value of N (i.e., N is taken to be 1).
X. self-referencing frame with interlaced B-fields
A frame with interlaced B-fields is encoded as two independent (and to some extent independently encoded) fields. The top field consists of the even raster lines of the frame (starting from line 0), while the bottom field consists of the odd raster lines of the frame. Because the fields in a "field picture" are independently decodable, they do not need to be sent in any pre-set order. For example, the encoder may send the lower half frame first and then the upper half frame, or vice versa. In some implementations, the order of the two fields is represented by a "top field first" syntax element that is true or false depending on the exact temporal order in which the two fields of the frame are decoded.
Existing encoders and decoders have used the anchor pre-and post-frames (e.g., I-or P-frames) or fields in the anchor pre-and post-frames as "reference" pictures to perform motion compensation for the current B-picture. Existing encoders and decoders also restrict B-pictures, or any portion thereof, from being used as motion compensated references for any picture. However, in some implementations of the techniques and tools, one or more of these "rules" are relaxed.
For example, in some implementations, a first interlaced B-field references first and second fields from leading and trailing anchor pictures. The second interlaced B-field references the first interlaced B-field from the current picture as the "opposite polarity" field and the same polarity field as the anchor frame immediately preceding the "same polarity" field, as well as the first and second fields from the next anchor picture.
FIG. 50B is a diagram showing reference fields for each of two interlaced B-fields in an interlaced video frame B2. In the example shown in FIG. 50B, the first B-field to be decoded (here, the top field) is allowed to reference two reference fields in the forward (temporally past) anchor P1 and two reference fields from the reverse (temporally future) anchor P3 for a total of 4 reference fields. The second interlaced B-field of B2 to be decoded is allowed to reference the first field from the same interlaced video frame (thus breaking the convention of not allowing parts of the B-picture to be used as reference) and one reference field from the previous anchor P1, as well as two fields from the future anchor P3. For comparison, FIG. 50A illustrates the convention followed for interlaced P-fields of an interlaced video frame.
Techniques and tools that implement these interlaced B-field reference rules may provide better compression. Field coding of interlaced video is most effective for coding high motion, i.e. when there is considerable motion between the top and bottom fields. For example, in this case, the upper (and first-coded) half of a frame will be a much better predictor for pixels in the lower half of the same frame than the upper half taken from the previous frame (from a longer temporal distance). These temporally future predictions provide much weaker predictions when motion is higher due to the larger temporal distance between them. Furthermore, the possibility of occlusion is magnified for more temporally distant predictors, which results in more intra-coded macroblocks being expensive to encode. In particular, experiments have demonstrated that allowing the second temporally interlaced B-field of a frame to reference the first temporally interlaced B-field of the same frame can yield significant compression gains.
Bit-plane coding of forward mode in interlaced B-fields
As described in section X above, in some implementations, the second encoded interlaced B-field of the current frame may reference the first encoded interlaced B-field of the current frame. This "self-referencing" technique is effective in interlaced B-fields with high motion frames, because temporally closer B-fields of the current frame are often better predictors than temporally farther anchor fields. When a frame with an interlaced B-field has a higher motion and the temporally second interlaced B-field prefers the temporally first interlaced B-field as a prediction reference, the more efficient prediction mode for the macroblock in the second interlaced B-field will often be the "forward" mode.
Since forward mode prediction in interlaced B-fields is an effective tool for reducing the bit rate, especially in low bit rate scenarios, it is advantageous to reduce the signaling overhead to reduce the overall cost of signaling forward mode prediction. Thus, in some embodiments, the encoder uses a unified bit-plane coding technique to encode the forward mode prediction information. For example, the encoder encodes forward mode prediction information in a compressed bit plane, where each bit in the bit plane is associated with a macroblock and the value of each bit signals whether the macroblock is encoded in forward mode or in non-forward prediction mode.
The compressed bitplanes may be sent at the frame level, half-frame level, or at some other level. Compared to other prediction modes for interlaced B-fields, bit-plane coding techniques favor the use of the forward mode. For example, if most macroblocks on an interlaced B-field use forward prediction, the encoder can reduce the signaling overhead to less than one bit per macroblock by bit-plane coding forward/non-forward decisions.
Fig. 51 illustrates a technique 5100 for encoding forward/non-forward prediction mode decision information for macroblocks of an interlaced B-field in a video encoder having one or more bitplane coding modes. Fig. 52 illustrates a corresponding technique 5200 for decoding forward/non-forward prediction mode decision information encoded by a video encoder having one or more bitplane encoding modes. .
Referring to fig. 51, the encoder selects a bit-plane coding mode for coding forward/non-forward prediction mode decision information (5110). After selecting the encoding mode, the encoder encodes the forward/non-forward prediction mode decision information with the selected mode (5120). The encoder selects a bit-plane coding mode on a field-by-field basis. Alternatively, the encoder selects the bit-plane encoding mode on some other basis (e.g., at the sequence level). Alternatively, if only one bit-plane coding mode is used, no selection of the bit-plane coding mode is made. When the encoder finishes encoding the forward/non-forward prediction mode decision information (5130), the encoding of the forward/non-forward prediction mode decision information ends.
Referring to fig. 52, the decoder determines a bit-plane coding mode of the coded forward/non-forward prediction mode decision information used (and signaled) by the encoder (5210). The decoder then decodes the forward/non-forward prediction mode decision information with the selected mode. The decoder determines the bit-plane coding mode on a field-by-field basis. Alternatively, the decoder determines the bit-plane coding mode on some other basis (e.g., at the sequence level). Alternatively, if only one bit-plane coding mode is available, no selection of the bit-plane coding mode is made. When the decoder finishes decoding the forward/non-forward prediction mode decision information (5230), the decoding of the forward/non-forward prediction mode decision information ends.
For additional details regarding signaling and decoding various bit-plane coding modes according to several combinatorial implementations, see section XIV below. For more details regarding general bit-plane Coding, see U.S. patent application serial No. 10/321,415 entitled "Skip Macroblock Coding" and filed 12/16/2002, the disclosure of which is incorporated herein by reference. Alternatively, the bits representing the forward/non-forward mode information may be sent uncompressed and/or at some other level (e.g., macroblock level).
If non-forward prediction is indicated, the encoder specifies a non-forward prediction mode (e.g., a reverse mode, a direct mode, an interpolation mode, or an intra mode) for the macroblock. In some implementations, the encoder encodes the non-forward prediction mode at the macroblock level with reference to the VLC table, as shown in table 2 below.
BMVTYPE VLC Motion prediction mode
0 Reverse direction
10 Direct connection
11 Interpolation
Table 2 motion prediction mode VLC table
In the example shown in table 2, the backward mode is the preferred non-forward prediction mode. The encoder uses a 1-bit signal to represent the inverse mode and a 2-bit signal to represent the direct and interpolated modes. Alternatively, the encoder uses different codes to indicate different prediction modes and/or to prefer a different non-forward prediction mode.
In some implementations, the intra mode is signaled by a particular motion vector difference value, which is signaled in the manner in which the prediction mode is intra mode. The motion vector difference is thus used to infer that the macroblock is intra-coded, but conventionally the encoder sets the prediction type to be backward, so as not to have any undefined prediction type.
Selecting co-located motion vectors for direct mode in interlaced B-fields XII
In some implementations, the direct mode motion vectors for macroblocks in a field coded B-picture are selected using special logic. For a current macroblock in an interlaced B-field, if a co-located macroblock of the corresponding field of the next anchor picture is encoded using 4 motion vectors, the logic biases toward a more dominant polarity (e.g., the same or opposite) among up to four motion vectors of the co-located macroblock. Once the motion vector for the current macroblock is selected, the encoder/decoder may apply a scaling operation to give a direct mode motion vector.
In some implementations, for direct mode 1MV macroblocks of an interlaced B-field, the encoder/decoder computes motion vectors for direct mode scaling based on one or more motion vectors of co-located macroblocks in a reference field (e.g., the temporally next P-field) having the same polarity. If the co-located macroblock in the reference field is a 1MV macroblock, the encoder/decoder uses a single motion vector to derive the direct mode motion vector for the macroblock in the interlaced B-field. On the other hand, if the co-located macroblock in the reference field is a 4MV macroblock, the encoder/decoder considers the polarity (biased toward dominant polarity) of the 4 motion vectors in selecting the motion vectors used to derive the direct mode motion vectors for the macroblocks in the interlaced B-field. The encoder/decoder can apply this selection logic to the 4MV macroblocks in the reference field, if desired, during decoding of the interlaced B-field. Alternatively, the encoder/decoder may apply the selection logic after decoding the reference field and then buffer only the values to be used in the subsequent interlaced B-field decoding.
For example, for a co-located 4MV macroblock in the reference field, if the number of motion vectors from the same polarity field (of 4) exceeds the number of motion vectors from the opposite polarity field, then if the number of same polarity motion vectors is 4, 3, 2, or 1, respectively, the encoder/decoder may use the four-median, three-median, two-median, or the value of the same polarity field motion vectors to calculate the motion vectors used in direct mode interlaced B-field decoding. Otherwise, if the number of motion vectors from the opposite polarity field exceeds the number of motion vectors from the same polarity field, the encoder/decoder may use a similar operation to derive a representative motion vector from the motion vectors of the opposite polarity field for use in direct mode interlaced B-field decoding. If more than two motion vectors in the original set of four motion vectors of a co-located macroblock (regardless of polarity) are intra-coded, the encoder/decoder may simply treat the co-located representative motion vector as being intra-coded (i.e., (0, 0)). However, in some implementations, all intra MBs in an interlaced B-field are coded as 1MV, so the case where more than two of the original 4 motion vectors are intra coded results in the co-located representative motion vector being considered as intra coded to be practically impossible.
Pseudo code 5300 in fig. 53 illustrates the selection process for a motion vector that serves as a basis for a direct mode motion vector in an interlaced B-field. In some implementations, the selection process is a precursor to a scaling operation that produces forward and backward direction-indicative direct mode motion vectors.
Intra-coded B-fields in interlaced video frames XIII
An interlaced BI-field (or "intra B-field") is a field that is coded independently from its reference pictures. Interlaced BI-fields are different from other intra-fields (e.g. interlaced I-fields) in the sense that they cannot be used as anchors for predicting other pictures. There is no inter-picture dependency on interlaced BI-fields and its presence in the bitstream does not indicate the start of an independently decodable segment or group of pictures. However, the first field of an interlaced video frame, if encoded as a BI-field, can be used to predict the second field of the frame that can be encoded as an interlaced B-field. This innovation also improves overall compression by using intra-coding for only half of the frame (the first encoded field) in many cases, without encoding the entire frame as an intra-frame or encoding both fields as intra-fields. In some implementations, a frame may include two B-fields, two BI-fields, or one B-or one BI-field.
It is reasonable to use interlaced BI-fields instead of interlaced I-fields. One reason is to avoid sacrificing temporal scalability. For example, when a decoder submits digital video and needs to immediately drop some pictures to keep up with processing requirements, it may look for a sequence of fields that it may drop. If intra fields in the sequence become key fields, the decoder will be forced to decode them for use as references for other fields and cannot discard them. However, if intra fields in the sequence are coded as BI-fields, the decoder will still have the option of discarding them without compromising the subsequent motion compensation.
Interlaced BI-fields differ from interlaced B-fields with intra macroblocks in the sense that they signal syntax elements for intra coding and decoding more efficiently, since intra BI-field motion compensation related elements (or elements signaling their absence) can be eliminated. In other words, when an interlaced B-field is coded at an inter-field prediction break in a video sequence (e.g., because of scene changes or complex motion), the reason for using an interlaced BI-field (instead of a regular B-field) arises. Often most macroblocks in such fields will need to be encoded as intra macroblocks. In this case, it is often easier to encode the entire B-field as one BI-field than to transmit the prediction mode information for each macroblock in the field, in terms of bit rate. When a better prediction or motion compensation of an interlaced B-field is not possible, it can be coded as a BI-field.
In some implementations, the encoder can signal the occurrence of a BI-field in the bitstream as one of the possible values of the picture type. Alternatively, the occurrence of the BI-field may be indicated in some other way.
XIV. combinatorial implementation
A detailed combined implementation of bitstream syntax, semantics and decoder will now be described, as well as another combined implementation with slight differences from the main combined implementation.
A. Bitstream syntax
In various combined implementations, data for interlaced B-pictures is presented in a bitstream having multiple layers (e.g., sequence, frame, field, macroblock, block, and/or sub-block layers).
For interlaced video frames with interlaced B-fields and/or BI-fields, the frame-level bitstream elements are as shown in fig. 54. The data for each frame includes a header followed by the data for the field layer (shown as the repeated "FieldPiccLayer" element for each field). The bitstream elements that make up the header of the fields of the interlaced B-field and BI-field are shown in fig. 55 and 56, respectively. The bitstream elements constituting the macroblock layers of interlaced B-fields (intra, 1MV, or 4MV macroblocks) and BI-fields are shown in fig. 57 and 58, respectively.
For interlaced B-frames, the frame-level bitstream elements are as shown in fig. 59. The data for each frame includes a header followed by macroblock-level data. The bitstream elements (intra or various inter type macroblocks) that make up the macroblock layer of an interlaced B-frame are shown in fig. 60.
The following sections describe selected bitstream elements in the frame, field and macroblock layers that are associated with a signal representation related to bi-directionally predicted interlaced pictures. Although selected bitstream elements are described in the context of a particular layer, some bitstream elements may be used in more than one layer.
1. Selecting frame layer elements
Fig. 54 is a diagram illustrating a frame-level bitstream syntax for a frame containing interlaced B-fields or BI-fields (or possibly other types of interlaced fields). Fig. 59 is a diagram illustrating a frame-level bitstream syntax of an interlaced B-frame. The specific bitstream elements are as follows.
Frame Coding Mode (FCM) (variable size)
FCM is a variable length codeword [ "VLC" ] used to represent picture coding type. FCM has values of frame coding modes as shown in table 3 below.
TABLE 3 frame coding mode VLC
FCM value Frame coding mode
0 Progressive scanning
10 Frame-interlaced scanning
11 Field-interlaced scanning
Field Picture Type (FPTYPE) (3 bits)
FPTYPE is a 3-bit syntax element provided in the header of frames including interlaced P-fields, interlaced I-fields, interlaced B-fields, and/or interlaced BI fields. FPTYPE has values for different combinations of field types in an interlaced video frame, as shown in table 4 below.
TABLE 4 field Picture type FLC
FPTYPE FLC First half frame type Second field type
000 I I
001 I P
010 P I
011 P P
100 B B
101 B BI
110 BI B
111 BI BI
Reference distance (REFDIST) (variable size)
The reflast is a variable-size syntax element. This element represents the number of frames between the current frame and the reference frame. Table 5 shows the VLC used to encode the reflast value.
Table 5, REFDISTVLC Table
The last row in table 5 represents codewords representing reference frame distances greater than 2. These are coded as (binary) 11 followed by N-31 s, where N is the reference frame distance. The last bit in the codeword is 0. For example:
n is 3, VLC code word is 110, VLC size is 3
N is 4, VLC code word is 1110, VLC size is 4
N is 5, VLC code word is 11110, VLC size is 5
Picture Type (PTYPE) (variable size)
PTYPE is a variable-size syntax element provided in the header of an interlaced B-frame (or other type of interlaced frame, such as an interlaced I-frame, or an interlaced P-frame). PTYPE has different frame type values as shown in table 6 below.
Table 6 picture type VLC
PTYPE VLC Picture type
110 I
0 P
10 B
1110 BI
1111 Skip over
If PTYPE indicates that the frame is skipped, the frame is treated as a P frame equivalent to its reference frame. The reconstruction of the skipped frame is conceptually equivalent to copying the reference frame. A skipped frame means that no other data of the frame is transmitted.
B-frame direct mode MB bit syntax element (DIRECTMB) (variable size)
The DIRECTMB syntax element uses bitplane coding to represent macroblocks in a B picture (here an interlaced B-frame) that are coded in direct mode. The DIRECTMB syntax element may also be signaled: the direct mode is signaled in the original mode, in which case the direct mode is signaled on macroblock level of macroblocks of the interlaced B-frame.
Extended MV Range Marker (MVRANGE) (variable size)
MVRANGE is a variable-size syntax element provided when the sequence layer EXTENDED _ MV bit is set to 1. MVRANGE VLC denotes a motion vector range.
Extended Difference MV Range Mark (DMVRANGE) (variable size)
DMVRANGE is a variable-size syntax element provided if the sequence-layer syntax element extension _ DMV is 1, which DMVRANGE VLC represents the motion vector difference range.
Macroblock mode table (MBMODETAB) (2 or 3 bits)
The mbmode ab syntax element is a fixed-length field. For interlaced P-fields, MBMODE ab is a 3-bit value, indicating which of the 8 huffman tables is used to decode the macroblock mode syntax element (MBMODE) in the macroblock layer.
Motion Vector Table (MVTAB) (2 or 3 bits)
The MVTAB syntax element is a 2 or 3 bit value. For interlaced P-fields with NUMREF ═ 1, MVTAB is a 3-bit syntax element that indicates which of the 8 interlaced huffman tables was used to decode the data for the motion vector.
2MV Block mode Table (2MVBPTAB) (2 bits)
The 2MVBPTAB syntax element is a 2-bit value that indicates which of the 4 huffman tables is used to decode the 2MV block mode (2MVBP) syntax element in the 2MV field macroblock.
4MV Block mode Table (4MVBPTAB) (2 bits)
The 4MVBPTAB syntax element is a 2-bit value that indicates which of the 4 huffman tables is used to decode the 4MV block mode (4MVBP) syntax element in the 4MV field macroblock.
In another combined implementation, the picture type information is signaled at the beginning of a field level of an interlaced B-field, rather than at the frame level of an interlaced video frame that includes the interlaced B-field, and the reference distance may be omitted.
2. Selecting half-frame layer elements
FIG. 55 is a diagram showing field level bitstream syntax for an interlaced B-field in a combined implementation. The specific bitstream elements are as follows.
Motion Vector Mode (MVMODE) (variable size or 1 bit)
The MVMODE syntax element signals one of the 4 motion vector coding modes, or a luma compensation mode (and also less likely for certain picture types). Several subsequent elements provide additional motion vector patterns and/or illumination compensation information.
B-field Forward mode MB bit syntax element (FORWARDMB) (variable size)
The forward syntax element uses bit-plane coding to represent macroblocks in the B-field coded in forward mode. The forward mode is signaled as the original mode, in which case the forward/non-forward mode decision is signaled at macroblock level.
Fig. 56 is a diagram showing a field level bitstream syntax for a combined implementation of an interlaced B-field. In this combined implementation, the field level bitstream syntax of the interlaced BI-field uses the same syntax elements as the interlaced I-field.
3. Selecting macroblock layer elements
Fig. 57 is a diagram showing macroblock-level bitstream syntax for macroblocks of a block implementing a interlaced B-field in combination. Fig. 60 is a diagram illustrating macroblock-level bitstream syntax for macroblocks that combine to implement a interlaced B-frame. The specific bitstream elements are as follows. The data of a macroblock includes a macroblock header followed by block-layer data.
Macroblock mode (MBMODE) (variable size)
The MBMODE syntax elements indicate the macroblock type (e.g., 1MV, 4MV, or intra for interlaced B-fields), as well as the presence of CBP flags and motion vector data.
Forward B-field encoding mode (FORWARDBITIT) (1 bit)
Forward syntax is a 1-bit syntax element provided in an interlaced B-field macroblock if the field level syntax element forward syntax indicates that the original mode was used. If FORWARDBIT is 1, then the macroblock uses forward mode coding.
B macroblock motion prediction type (BMVTYPE) (variable size)
BMVTYPE is a variable-size syntax element provided in interlaced B-frame macroblocks and interlaced B-field macroblocks that indicates whether the macroblock uses forward, backward, or interpolated prediction. As shown in table 7, the values of bfmotion and BMVTYPE determine which type to use for macroblocks of an interlaced B-frame.
TABLE 7 BMVTYPE VLC
In interlaced B-fields, BMVTYPE is sent if the macroblock mode is not forward (shown by forward dmb or forward bi it syntax elements) and 4MV is not used. In this case, BMVTYPE is used to signal whether the B macroblock is inverted, direct, or interpolated. This is a simple VLC where inverse is 0, direct is 10, and interpolated is 11. In case the macroblock mode is not forward and 4MV is used, BMVTYPE is backward, since only forward and backward modes are allowed for 4 MV.
Interpolated MV provides (INTERPMVP) (1 bit)
INTERPMVP is a 1-bit syntax element provided in the B-field if the field level syntax element BMVTYPE indicates that the macroblock type is interpolated. If INTERPMVP is 1, the interpolated MV appears, otherwise it does not appear.
B macroblock motion vector 1(BMV1) (variable size)
BMV1 is a variable-size syntax element of the first motion vector of a differentially encoded macroblock.
B macroblock motion vector 2(BMV2) (variable size)
BMV2 is a variable-size syntax element provided in interlaced B-frame macroblocks and interlaced B-field macroblocks if the interpolation mode is used. The syntax element differentially encodes a second motion vector of the macroblock.
4MV Block mode (4MVBP) (4 bits)
The 4MVBP syntax element indicates which of the 4 luma blocks contains a non-zero motion vector difference value, the use of which is described in detail below.
Block level motion vector data (BLKMVDATA) (variable size)
BLKMVDATA is a variable-size syntax element that contains motion information for the block and is provided in 4MV macroblocks.
Half frame transition Flag (FIELDTX) (1 bit)
FIELDTX is a 1-bit syntax provided in intra-coded macroblocks of interlaced B-frames. The syntax element indicates whether the macroblock is frame-coded or field-coded (basically the internal organization of the macroblock). Fielddtx 1 indicates that the macroblock is field coded. Otherwise, the macroblock is frame coded. In inter-coded macroblocks, this syntax element can be inferred from MBMODE.
Direct B frame coding mode (DIRECTBBIT) (1 bit)
Directbit is a 1-bit syntax element provided in an interlaced B-frame macroblock if the frame-level syntax element DIRECTMB indicates that the original mode is used. If directbit is 1, then the macroblock uses direct mode coding.
B frame MV switching (MVSW) (1 bit)
MVSW is a 1-bit syntax element provided in an interlaced B-frame macroblock if MB is field mode and if BMVTYPE is forward or reverse. If MVSW is 1, the MV type and the prediction type change from the forward direction to the reverse direction (or from the reverse direction to the forward direction) when going from the upper half frame to the lower half frame.
Two motion vector Block mode (2MVBP) (variable size)
The 2MVBP is a variable size syntax element provided in interlaced B-frame macroblocks. If the MBMODE syntax element indicates that the macroblock contains a motion vector, and if the macroblock is an interpolated macroblock, the syntax element is provided. In this case, the 2MVBP indicates which of two motion vectors (forward and reverse motion vectors) is provided.
Motion Vector Data (MVDATA) (variable size)
MVDATA is a variable size syntax element of the motion vector difference values of the coded macroblock, the decoding of which is described in detail below.
Fig. 58 is a diagram showing macroblock-level bitstream syntax for a combined implementation of a interlaced BI-field. In a combined implementation, the macroblock-level bitstream syntax for the interlaced BI-field uses the same syntax elements as for the interlaced I-field.
B. Decoding interlaced B-fields
The following sections describe the procedure for decoding a B-field in a combined implementation with interlaced scanning.
1. Frame/field layer decoding
Interlaced B-fields can be one of two types: 1MV or mixed MV.
In a 1MV interlaced B-field, the displacement of each prediction block is represented by 0, 1 or 2 motion vectors depending on the prediction type (BMVTYPE) of the macroblock. When BMVTYPE equals DIRECT, forward and backward motion vectors are inferred, and other motion vectors are not explicitly signaled. When BMVTYPE is INTERPOLATED, two motion vectors are decoded: forward and reverse. In the forward and reverse case, only one motion vector is decoded. The 1MV mode is signaled by the MVMODE picture layer syntax element.
In a hybrid MV interlaced B-field, each macroblock can be coded as a 1MV or 4MV macroblock. In a 4MV macroblock, each of the 4 luma blocks has a motion vector associated with it. Furthermore, a 4MV macroblock can only be associated with a forward or backward prediction type (BMVTYPE) in an interlaced B-field. The 1MV or 4MV mode for each macroblock is indicated by the MBMODE syntax element on each macroblock. The hybrid MV mode is signaled by the MVMODE picture layer syntax elements.
2. Macroblock layer decoding
Macroblocks in interlaced B-fields can be one of three possible types: 1MV, 4MV, and intra. Further, a macroblock may be one of four prediction types (BMVTYPE): forward, backward, direct or interpolated. The macroblock type is signaled by an MBMODE syntax element in the macroblock layer. The prediction type is signaled by a combination of frame level bit plane forward dmb signaling the forward/non-forward direction of each macroblock and macroblock level BMVTYPE syntax elements signaling in case the prediction type is non-forward.
The following sections describe the 1MV and 4MV types, and how they are signaled.
1MV macroblocks in interlaced B-fields
1MV macroblocks may occur in 1MV and mixed MV interlaced B-fields. In a 1MV macroblock, a single motion vector represents the displacement between the current and reference pictures for all 6 blocks in the macroblock. For a 1MV macroblock, the MBMODE syntax elements in the macroblock layer represent three points:
1) the macroblock type is 1MV
2) Whether CBPCY syntax element is present
3) Whether a BMV1 syntax element is present
If the MBMODE syntax element indicates that the BMV1 syntax element is present, the BMV1 syntax element is present in the macroblock layer at the corresponding position. The BMV1 syntax element encodes the motion vector difference value. The motion vector difference value is combined with the motion vector predictor to reconstruct the motion vector. If the MBMODE syntax element indicates that the BMV1 syntax element is not present, the motion vector difference value is taken to be 0 and thus the motion vector is equal to the motion vector predictor.
If the MBMODE syntax element indicates that the CBCPY syntax element is present, the CBCPY syntax element is present in the macroblock layer at the corresponding position. The CBCPY syntax element indicates which of the 6 blocks is coded on the block level. If the MBMODE syntax element indicates that the CBCPY syntax element is not present, CBCPY is taken equal to 0 and no block data is present for any of the 6 blocks in the macroblock.
Also, if the macroblock type is 1MV and the prediction type of the macroblock is interpolation, the encoder signals whether a second motion vector difference BMV2 occurs using INTERPMVP syntax elements. If so, the decoder decodes the BMV2 immediately after the BMV 1. Otherwise, the motion vector difference of BMV2 is taken to be 0 and the second motion vector is equal to the motion vector predictor.
When the prediction type is interpolation, BMV1 corresponds to forward motion vectors and BMV2 corresponds to reverse motion vectors.
4MV macroblocks in interlaced B-fields
4MV macroblocks can only appear in hybrid MVB-field pictures and are limited to forward and backward prediction types. In a 4MV macroblock, each of the 4 luminance blocks has an associated motion vector. The displacement of the chrominance block is derived from the 4 luminance motion vectors. In a hybrid MV interlaced B-field, a 4MV macroblock can only be associated with forward and backward prediction types.
For a 4MV macroblock, the MBMODE syntax elements in the macroblock layer represent three points:
1) the macroblock type is 4MV
2) Whether CBPCY syntax element is present
3) Whether a 4MVBP syntax element is present
The 4MVBP syntax element indicates which of the 4 luma blocks contains non-zero motion vector difference values. The 4MVBP syntax element is decoded to a value of 0 to 15. For each of the 4 bit positions in the 4MVBP, a value of 0 indicates that the motion vector difference value (BLKMVDATA) for the block does not occur and the motion vector difference value is taken to be 0. The value 1 indicates that the motion vector difference value (BLKMVDATA) for the block appears in the corresponding location. For example, if 4MVBP is decoded to a value of 100 (binary), the bitstream contains BLKMVDATA for blocks 0 and 1 and BLKMVDATA for blocks 2 and 3 are not present.
If the MBMODE syntax element indicates that the 4MVBP syntax element is not present, then it is assumed that motion vector difference data (BLKMVDATA) for all 4 luma blocks is present.
Depending on whether the MVMODE syntax element represents a mixed MV or an all-1 MV, MBMODE signals the information as follows. Table 8 below shows how the MBMODE element signals information about a macroblock in a full-1 MV picture.
TABLE 8 macroblock modes in full-1 MV pictures
Indexing Macroblock type CBP Presence MV occurrence
0 Intra-frame Is free of -
1 Intra-frame Is provided with -
2 1MV Is free of Is free of
3 1MV Is free of Is provided with
4 1MV Is provided with Is free of
5 1MV Is provided with Is provided with
Table 9 below shows how the MBMODE element signals information about macroblocks in the mixed MV picture.
TABLE 9 macroblock modes in hybrid 1MV pictures
Indexing Macroblock type CBP Presence MV occurrence
0 Intra-frame Is free of -
1 Intra-frame Is provided with -
2 1MV Is free of Is free of
3 1MV Is free of Is provided with
4 1MV Is provided with Is free of
5 1MV Is provided with Is provided with
6 4MV Is free of -
7 4MV Is provided with -
One of the 8 coding tables is used to signal MBMODE. The particular table used is signaled by the mbmode ab syntax element.
The following sections describe prediction type decoding and decoding of direct mode motion vectors.
Predictive type decoding (BMVTYPE) in interlaced B-fields
The prediction type is decoded according to the following rule. If the picture level bit plane forward indicates that the macroblock is of the forward type, the prediction type of the macroblock is set to forward. If the forward dmb element is encoded as original, the encoder/decoder uses an additional bit forward bit at macroblock level to determine if the prediction type is forward.
If the prediction type is non-forward and this macroblock signaled as an MBMODE syntax element uses 4MV (only possible in mixed MV B pictures), the decoder can directly infer that the prediction type is backward, since only the forward and backward types can be associated with the 4MV mode. Otherwise, the decoder explicitly decodes the BMVTYPE syntax element.
Decoding direct mode motion vectors in interlaced B-fields
To decode direct mode motion vectors in interlaced B-fields, the decoder first buffers the motion vectors from the anchor (I or P) picture that was previously decoded (i.e., temporally in the future). In this way, the decoder uses the buffered motion vectors corresponding to the upper half frame as a predictor in order to calculate the direct mode motion vectors of the upper B-field and to calculate the motion vectors of the lower B-field using the motion vectors corresponding to the lower field. For example, macroblock (x, y) in field z (z ═ up/down) will reference motion vectors buffered from macroblocks (x, y) of the previously decoded I or P field z (i.e., co-located macroblocks in the anchor field of the same polarity as the current field).
If the buffered motion vector from the anchor picture is an intra motion vector (such as when the previously decoded field z is an I-field), or if the anchor picture is a P-field but the macroblock (x, y) is intra coded, the decoder treats the buffered motion vector as (0, 0). If the co-located macroblock is 1MV, the decoder uses the motion vector. If the co-located macroblock is 4MV, the decoder uses the logic described in pseudo code 5300 of FIG. 53 to calculate a motion vector predictor.
In pseudo code 5300, the selectdirectmdevfromcolocatedmed mb derives the motion vector predictor used in the direct mode calculation. The decoder may buffer all motion vectors from previously decoded anchor pictures and apply the above direct mode during decoding of the B-field, or the decoder may apply the above direct mode rules while decoding the anchor field and buffer the resulting motion vectors for the B-field.
Using the motion vectors obtained above, the decoder applies scaling logic (Scale _ Direct _ MV in fig. 19). Scale _ Direct _ MV obtains forward and backward pointing motion vectors. Scale _ Direct _ MV may result in forward and backward motion vectors pointing to the top and bottom half frames. This is valid because the direct motion vectors are evaluated by the encoder and are only selected when they give a good prediction, and because interlaced B-fields use two reference fields in both the forward and reverse directions.
In another implementation, any other process of generating motion vectors scaled in direct mode may be used, including processes that do not involve any buffering, which may be useful in memory-constrained devices (e.g., using a random number generator to model a zero-biased laplacian distribution). This process will still work because a good encoder will discard the poor guess for the direct mode motion vector, leaving more accurate guesses in the bitstream.
3. Motion vector decoding process
The following sections describe the motion vector decoding process for blocks and macroblocks of an interlaced B-field in a combined implementation.
Filling forward and backward prediction contexts
The forward and backward motion vectors are separately buffered and used to predict the forward and backward motion vectors, respectively. The use of separate buffers in the forward and reverse contexts is described, for example, in section X above. Techniques for selecting motion vector predictors are described in section III background, section III detailed description, and other portions of the specification.
When filling the backward buffer with the predicted motion vector during decoding of the forward motion vector (the "lack of direction" buffer), or when filling the forward buffer during decoding of the backward motion vector, two additional details are added. In general, an encoder/decoder may use motion vector type information (e.g., 1MV, etc.), as well as the polarity of previously decoded motion vectors, to form a prediction. However, in the "hole filling" case, the encoder/decoder does not have motion vector type information or polarity information (e.g., same polarity or opposite polarity) because the encoder/decoder does not actually decode motion vectors that lack a direction type. In this combined implementation, the encoder/decoder sets the motion vector type to 1MV and selects the main field motion vector as the predictor. Pseudo code 4700 in FIG. 47 describes a polarity selection process in this combined implementation.
For intra-coded macroblocks, "intra motion vectors" are used to fill the forward and backward motion prediction planes. Any consistent representation of an "intra motion vector" may be selected by a decoder implementation. For example, if the motion vectors are stored in a 2-byte short array, the "intra motion vectors" may be represented as a unique large constant that is padded into the motion vector array to indicate that the macroblock is encoded as an intra macroblock.
Forward motion vector prediction in B-fields
The forward reference frame distance is calculated from the bfmotion syntax element and the reflast syntax element. Forward motion vector prediction proceeds as described in section X above.
Inverse motion vector prediction in B-fields
The forward reference frame distance is calculated from the bfmotion syntax element and the reflast syntax element. Forward motion vector prediction proceeds as described in section X above.
Decoding motion vector differences
The BMV1, BMV2, or BLKMVDATA syntax elements encode the motion information of the macroblock or blocks in the macroblock. A 1MV macroblock has BMV1 and BMV2 syntax elements, and a 4MV macroblock may have between 0 and 4 BLKMVDATA elements.
When the prediction type (BMVTYPE) is interpolation, BMV1 corresponds to the forward direction and BMV2 corresponds to the backward motion vector residual.
The following sections describe how motion vector difference values are calculated for the double reference case applied to B-pictures.
Motion vector difference in dual reference field pictures
Dual reference field pictures occur in the encoding of interlaced frames using field pictures. Each frame of the sequence is divided into two fields and each field is encoded using a substantially progressive code path.
In a field picture with two reference fields (such as a picture with interlaced B-fields), each MVDATA or BLKMVDATA syntax element in the macroblock layer jointly encodes three kinds of information: 1) horizontal motion vector difference component, 2) vertical motion vector difference component, 3) whether a dominant or non-dominant predictor is used, i.e. which of the two fields is referenced by the motion vector.
The MVDATA or BLKMVDATA syntax element is a variable length huffman codeword followed by a fixed length codeword. The value of the huffman codeword determines the size of the fixed length codeword. The MVTAB syntax element in the picture layer specifies the huffman table used to decode the variable size codeword. Pseudo code 6100 in fig. 61A shows how motion vector difference values and dominant/non-dominant predictor information are decoded.
The values predictor _ flag, dmv _ x, and dmv _ y are calculated in the pseudo code 6100 in fig. 61A. The values in pseudo-code 6190 are defined as follows:
dmv _ x: the motion vector level difference value component,
dmv _ y: the motion vector is perpendicular to the difference component,
k _ x, k _ y: the fixed length of the long motion vector,
k _ x and k _ y depend on the motion vector range defined by the MVRANGE symbol.
TABLE 10 k _ x and k _ y specified by MVRANGE
MVRANGE k_x k_y range_x range_y
0 (Default) 9 8 256 128
10 10 9 512 256
110 12 10 2048 512
111 13 11 4096 1024
extended _ x: the extended range of horizontal motion vector difference values,
extended _ y: the extended range of vertical motion vector difference values,
extend _ x and extend _ y are derived from DMVRANGE picture field syntax elements. If DMVRANGE represents an extended range using the horizontal component, then extended _ x is 1. Otherwise, extended _ x is 0. Similarly, if DMVRANGE represents an extended range using a vertical component, then extended _ y is 1. Otherwise, extended _ y is 0.
The variable predictor _ flag is a binary flag indicating whether the dominant or non-dominant motion vector predictor is used (0 ═ using the dominant predictor, and 1 ═ using the non-dominant predictor). The offset _ table and size _ table arrays are defined as shown in fig. 61A.
The pseudo code 6110 in fig. 61B shows how the motion vector difference values of the dual reference field are decoded in another combined implementation. The pseudo code 6110 decodes the motion vector difference values in different ways. For example, the pseudo code 6110 omits the processing of the extended motion vector difference range.
Motion vector predictor
The motion vector is calculated by adding the motion vector difference value calculated in the previous section to the motion vector predictor. The following sections describe how the motion vector predictor for a macroblock in a 1 MV-and hybrid MV interlaced B-field is calculated in this combined implementation.
Motion vector predictor in 1MV interlaced B-fields
Fig. 5A and 5B are diagrams illustrating positions of macroblocks regarded as candidate motion vector predictors for 1MV macroblocks. Candidate predictors are taken from the left, top, and top-right macroblocks, except in the case where the macroblock is the last macroblock in a row. In this case, predictor B is taken from the top left macroblock, not from the top right. For the particular case where the frame is one macroblock wide, the predictor is always predictor a (top predictor). The specific case of the current macroblock in the top row (no a or B predictor, or no predictor at all) is addressed above with reference to fig. 33A-33F and below with reference to fig. 62A-62F.
Motion vector predictor in hybrid MV interlaced B-fields
Fig. 6A-10 show the positions of blocks or macroblocks of candidate motion vectors that are considered as motion vectors for 1MV or 4MV macroblocks in a hybrid MV interlaced B-field.
Primary and non-dominant MV prediction values in interlaced B-fields
For each inter-coded macroblock, two motion vector predictors are derived. One from the primary field and the other from the non-primary field. The main field is considered as the field containing most of the actual values of the candidate motion vector predictors in the neighborhood. In the case of no partition, the motion vector predictor of the opposite field is considered to be the primary predictor (because it is closer in time). Intra-coded macroblocks are not considered in the calculation of the main/non-main prediction values. If all candidate predictor macroblocks are intra-coded, then both the main and non-main motion vector predictors are set to 0 and the main predictor is considered to be from the opposite field.
Computing motion vector predictors in interlaced B-fields
Two motion vector predictors are computed for each motion vector of a block or macroblock-one referenced to each. The pseudo code 6200 in fig. 62A-62F describes how motion vector predictors are computed for the two reference case in a combined implementation. (pseudo code 3300 in fig. 33A-33F describes how motion vector predictors are computed for a double reference case in another implementation.) in a double reference picture, the current field may refer to the two latest fields. One predictor is for reference fields of the same polarity and the other for reference fields of opposite polarity.
Reconstructing motion vectors in interlaced B-fields
The following sections describe how the luma and chroma motion vectors of 1MV and 4MV macroblocks are reconstructed. After the motion vector is reconstructed, it can then be used as a neighborhood motion vector to predict the motion vectors of neighboring macroblocks. The motion vector will have an associated polarity of "same" or "opposite" and can be used to derive a motion vector predictor for the other field polarity for motion vector prediction.
Luminance motion vector reconstruction in interlaced B-fields
In all cases (1MV and 4MV macroblocks), the luma motion vector is reconstructed by adding the difference value to the prediction value as follows:
mv_x=(dmv_x+predictor_x)smod range_x
my_y=(dmv_y+predictor_y)smod range_y
the modulo operation "smod" is a signed modulo defined as:
A smod b=((A+b)%(2*b))-b
this ensures that the reconstructed vector is valid. (A smod b) is located between-b and b-1. range _ x and range _ y depend on MVRANGE.
Since an interlaced B-field picture uses two reference pictures, the predictor flag derived after decoding the motion vector difference value is combined with the dominantpredictor value derived from motion vector prediction to determine which field is used as a reference. The pseudo code 6300 in fig. 63 describes how to determine the reference field.
In a 1MV macroblock, there will be a single motion vector for the 4 blocks that make up the luminance component of the macroblock. If the MBMODE syntax element indicates that no MV data is present in the macroblock layer, then dmv _ x is 0 and dmv _ y is 0(MV _ x is predictor _ x and MV _ y is predictor _ y).
In a 4MV macroblock, each inter-coded luminance block in the macroblock will have its own motion vector. There will be 0 to 4 luma motion vectors in each 4MV macroblock. If the 4MVBP syntax element represents motion vector information for a block that is not present, then the dmv _ x and dmv _ y for that block are 0(mv _ x and mv _ y being predictor _ y).
Chroma motion vector reconstruction
The chrominance motion vector is derived from the luminance motion vector. Furthermore, for a 4MV macroblock, the decision whether to encode a chroma block as an inter block or an intra block is made based on the state of the luma block or field.
C. Decoding interlaced P-frames
Before describing the process for decoding interlaced B-frames in a combined implementation, the process for decoding interlaced P-frames is described. Sections describing the process for decoding interlaced B-frames will be made with reference to the various concepts discussed in this section.
1. Macroblock layer decoding for interlaced P-frames
In interlaced P-frames, each macroblock can be motion compensated using either a frame mode using one or four motion vectors, or a field mode using two or four motion vectors. An inter-coded macroblock does not contain any intra blocks. Furthermore, the residual after motion compensation may be encoded in a frame transform mode or a field transform mode. More specifically, if a residual is encoded in a field transform mode, a luminance component of the residual is rearranged according to each field, whereas the residual remains unchanged while a chrominance component remains unchanged in a frame transform mode. Macroblocks may also be encoded as intra macroblocks.
Motion compensation may be limited to not include four motion vectors (fields/frame) and it is signaled by a 4 MVSWITCH. The type of motion compensation and residual coding for each macroblock is jointly represented by MBMODE and SKIPMB. MBMODE uses different sets of tables according to 4 MVSWITCH.
Each macroblock in an interlaced P-frame is divided into 5 types: 1MV, 2 field MV, 4 frame MV, 4 field MV, and intra. The first four classes of macroblocks are inter-coded, while the last class indicates that the macroblock is intra-coded. The macroblock type is signaled by the MBMODE syntax element in the macroblock layer and the skip bits. The MBMODE commonly encodes macroblock types and various pieces of information about the macroblocks for different types of macroblocks.
Signaling skipped macroblocks
The SKIPMB field indicates a skip condition of the macroblock. If the SKIPMB field is 1, it indicates that the current macroblock is to be skipped and no other information is sent after the SKIPMB field. The skip condition implies that the current macroblock is 1MV with 0 differential motion vector (i.e., the macroblock is motion compensated using its 1MV motion predictor) and there is no coded block (CBP ═ 0).
On the other hand, if the SKIPMB field is not 1, then the MBMODE field is decoded to indicate the type of macroblock and other information about the current macroblock, such as the information described in the following section.
Signalling macroblock modes
There are 15 possible events represented by MBMODE; the MBMODE collectively specifies the type of macroblock (1MV, 4 frame MV, 2 field MV, 4 field MV, or intra), the transform type of the inter-coded macroblock (i.e., field or frame or uncoded block), and whether there is a motion vector difference for the 1MV macroblock.
Let < MVP > denote a binary event that signals whether there is a non-zero 1MV motion vector difference. Let < Field/Frame transform > denote a ternary event that signals whether the residual of a macroblock is a Frame transform coded, a Field transform coded, or a zero coded block (i.e., CBP ═ 0). MBMODE signals collectively the following set of events:
MBMODE { <1MV, MVP, field/frame transform >, <2 field MV, field/frame transform >, <4 frame MV, field/frame transform >, <4 field MV, field/frame transform >, < intra > }; except for the events <1MV, MVP ═ 0, CBP ═ 0>, which are signaled by skip conditions.
For inter-coded macroblocks, when a field/frame transform event in MBMODE indicates no coding block, the CBPCY syntax element is not decoded. On the other hand, if the field/frame transform event in MBMODE indicates a field or frame transform, CBPCY is decoded. The decoded event < field/frame transition > is used to set the flag FIELDTX. If the event indicates that the macroblock is field transform coded, FIELDTX is set to 1. If the event indicates that the macroblock is frame transform coded, FIELDTX is set to 0. If the event represents a 0 coding block, then FIELDTX is set to the same type as the motion vector, i.e., FIELDTX is set to 1 if it is FIELDMV and 0 if it is FRAMEMV.
For inter-coded macroblocks other than 1MV, another field is sent that represents a zero-difference motion vector event. In the case of 2-field MV macroblocks, a 2MVBP field is sent indicating which of the two motion vectors contains a non-zero motion vector difference value. Similarly, a 4MVBP field is sent indicating which of the four motion vectors contains a non-zero motion vector difference value.
For intra-coded macroblocks, field/frame transforms and zero-coded blocks are coded in various fields.
2. Motion vector decoding for interlaced P-frames
Motion vector predictor for interlaced P-frame
The process of calculating the motion vector predictor of the current macroblock includes two steps. First, three candidate motion vectors for a current macroblock are collected from its neighboring macroblocks. Second, a motion vector predictor of the current macroblock is computed from the set of candidate motion vectors. Fig. 40A-40B illustrate neighboring macroblocks from which candidate motion vectors are collected. The order of collection of candidate motion vectors is important. In this combined implementation, the collection order always starts from A, continues to B, and ends at C. Note that a candidate predictor is deemed to not exist if the respective block is outside a frame boundary or the respective block is part of a different slice. Thus, motion vector prediction is not performed across slice boundaries.
The following sections describe whether to collect candidate motion vectors for different types of macroblocks and how to calculate motion vector predictors.
1MV candidate motion vector
In this combined implementation, the pseudo code 6400 in fig. 64 is used to collect a maximum of three candidate motion vectors for the motion vector.
4 frame MV candidate motion vectors
For a 4-frame MV macroblock, candidate motion vectors from neighboring blocks are collected for each of the four frame block motion vectors in the current macroblock. In this combined implementation, the pseudo code 6500 in fig. 65 is used to collect up to three candidate motion vectors for the top left frame block motion vector. Pseudo code 6600 in fig. 66 is used to collect up to three candidate motion vectors for the top right frame block motion vector. The pseudo code 6700 in fig. 67 is used to collect a maximum of three candidate motion vectors for the lower left frame block motion vector. The pseudo code 6800 in fig. 68 is used to collect a maximum of three motion vectors for the lower right frame block motion vector.
Derivation of 2-field MV candidate motion vectors
For a 2 field MV macroblock, candidate motion vectors from neighboring blocks are collected for each of the two frame motion vectors in the current macroblock. The pseudo code 6900 in fig. 69 is used to collect a maximum of three candidate motion vectors for the top half frame motion vector. The pseudo code 7000 in fig. 70 is used to collect a maximum of three candidate motion vectors for the motion vector of the next field.
Derivation of 4-field MV candidate motion vectors
For a 4 field MV macroblock, candidate motion vectors from neighboring blocks are collected for each of the four field blocks in the current macroblock. The pseudo code 7100 in fig. 71 is used to collect a maximum of three candidate motion vectors for the top left half block motion vector. Pseudo code 7200 in fig. 72 is used to collect a maximum of three candidate motion vectors for the top right half frame block motion vector. The pseudo code 7300 in fig. 73 is used to collect a maximum of three candidate motion vectors for the bottom left field block motion vector. The pseudo code 7400 in fig. 74 is used to collect a maximum of three candidate motion vectors for the bottom right field block motion vector.
Averaging field motion vectors
Given two field Motion Vectors (MVX)1,MVY1) And (MVX)2,MVY2) For forming candidate Motion Vectors (MVX)A,MVYA) The average operation of (a) is:
MVXA=(MVX1+MVX2+1)>>1;
MVXA=(MVY1+MVY2+1)>>1;
computing MV predictors for a frame from candidate motion vectors
This section describes: how to compute the motion vector predictor of the frame motion vector given a set of candidate motion vectors. In this combined implementation, the operation is the same for calculating the prediction value for each of the 4 frame block motion vectors in the 4 frame MV macroblock.
Pseudo code 7500 in FIG. 75 describes how to compute motion vector Predictors (PMVs) for frame motion vectorsx,PMVy). In pseudo code 7500, TotalValidMV denotes the total number of motion vectors in the candidate motion vector set (TotalValidMV ═ 0, 1, 2, or 3), and the ValidMV array denotes the motion vectors in the candidate motion vector set.
Computing MV predictors for fields from candidate motion vectors
This section describes: how to compute the motion vector predictor for a field motion vector given a set of candidate motion vectors. The operation is the same for calculating the prediction value for each of the 2 field motion vectors in a 2 field MV macroblock or each of the 4 field motion vectors in a 4 field MV macroblock.
First, the candidate motion vectors are divided into two sets, where one set contains only candidate motion vectors pointing to the same field as the current field, and the other set contains candidate motion vectors pointing to the opposite field. Assuming that the candidate motion vector is represented in 1/4 pixel units, the encoder or decoder can check whether the candidate motion vector points to the same field by checking its y-component as follows:
pseudo code 7600 in FIG. 76 describes how to compute motion vector Predictors (PMVs) for field motion vectorsx,PMVy). In pseudo code 7600, SameFieldMV and OppFieldMV represent two sets of candidate motion vectors, while numtimefieldmv and NumOppFieldMV represent the number of candidate motion vectors belonging to each set. The order of the candidate motion vectors in each set starts with candidate a (if it exists), then candidate B (if it exists), and then candidate C (if it exists). For example, if the set of SameFieldMV candidate motion vectors contains only candidate B and candidate C, then SameFieldMV [0]Is candidate B.
Decoding motion vector differences
The MVDATA syntax element contains motion vector difference information for the macroblock. Depending on the type of motion compensation and the motion vector block mode signaled on each macroblock, there can be up to four MVDATA syntax elements per macroblock. More specifically, the present invention is to provide a novel,
for 1MV macroblock, 0 or 1 MVDATA syntax element may appear depending on the MVP field in the MBMODE.
For 2 field MV macroblocks, 0, 1, or 2 MVDATA syntax elements may occur depending on the 2 MVBP.
For 4 frame/field MV macroblocks, 0, 1, 2,3, or 4 MVDATA syntax elements may occur depending on the 4 MVBP.
In this combined implementation, the motion vector difference is decoded in the same way as the single reference field motion vector difference for interlaced P-fields. (pseudo-code 7700 in FIG. 77A shows how the motion vector difference values for a single reference field are decoded; pseudo-code 7710 in FIG. 77B shows how the motion vector difference values for a single reference field are decoded in another combined implementation-pseudo-code 7710 decodes the motion vector difference values in a different way; for example, pseudo-code 7710 omits processing of the extended motion vector difference value range.)
Reconstructing motion vectors
Given the motion vector difference value dmv, the luma motion vector is reconstructed by adding the difference value to the predictor as described in section xv.b.3 above. Given a luminance frame or field motion vector, a corresponding chrominance frame or field motion vector is derived to compensate some or all of the Cb/Cr block. The pseudo code 7800 in fig. 78 describes how the chrominance motion vector CMV is derived from the luminance motion vector LMV in an interlaced P-frame.
D. Decoding interlaced B-frames
This section describes the procedure for decoding interlaced B-frames in a combined implementation with reference to the concepts discussed in the previous section.
1. Macroblock level decoding of interlaced B-frames
At the macroblock level, the interlaced B-frame syntax is similar to the interlaced P-frame described above. Macroblocks in interlaced B-frames are divided into three types: 1MV, 2 field MV, and intra. The 4-frame MV and 4-field MV modes are not allowed for interlaced B-frames in this combined implementation. These three modes are coded together with the MBMODE syntax elements as in interlaced P-frames. Each macroblock is also predicted as forward, backward, direct or interpolated (using DIRECTMB and BMVTYPE syntax elements) if a 1MV macroblock is forward or backward, it uses a single motion vector. If it is a 1MV but direct or interpolated, it uses two motion vectors. If it is of the 2-field MV type and is forward or backward predicted, it uses two motion vectors. If it is of the 2-field MV type and is direct or interpolated, it uses four motion vectors.
The following sections describe the characteristics of different inter-coded macroblock types in interlaced B-frames.
1MV macroblocks in interlaced B-frames
In 1MV macroblocks in interlaced B-frames, the displacement of a luminance block is represented by a single motion vector when the prediction type is forward or backward, and by two motion vectors when the type is direct or interpolated. In each case a corresponding chrominance vector is derived. In the case of interpolation and direct prediction, the motion compensated pixels from the forward and backward reference pictures are averaged to form the final prediction.
2-field MV macroblock in interlaced B-frame
In a 2-field MV macroblock in an interlaced B-frame, the displacement of each field of a luminance block is described by a different motion vector, as shown in fig. 37. Furthermore, the prediction type is allowed to switch from forward to backward or vice versa when going from the top half to the bottom half, so that the top half gets motion compensated from one reference picture and the bottom half gets motion compensated from another reference picture, as described in section VII above.
Interpretation of 2MVBP, 4MVBP in interlaced B-frame and motion vector order
In 1MV macroblocks, the encoder uses a 2MVBP syntax element in the interpolation mode to indicate which of the two motion vectors is present. Bit 1 corresponds to a forward motion vector and bit 0 corresponds to a backward motion vector.
In a 2-field MV macroblock, the encoder uses the 2MVBP syntax element in forward and backward modes to indicate which of the motion vectors of the two fields is present. Bit 1 corresponds to the top half frame motion vector and bit 0 corresponds to the bottom half frame motion vector. The encoder uses the same up/down signaling when the MVSW syntax elements are used to switch from forward prediction for the top field to backward prediction for the bottom field or vice versa. The encoder uses the 4MVBP syntax element in interpolation mode to indicate which of the four motion vectors is present. Bit 3 corresponds to the top half forward motion vector, bit 2 corresponds to the top half backward motion vector, bit 1 corresponds to the bottom half forward motion vector, and bit 0 corresponds to the bottom half backward motion vector.
Bits set to '1' for 2MVBP and 4MVBP indicate the presence of the corresponding motion vector difference value, while bits set to '0' indicate that the corresponding motion vector is equal to the predicted motion vector, i.e., the corresponding motion vector difference value is not present. The actual decoded motion vectors are sent in the same order as the bits in 2MVBP or 4 MVBP. For example, in a 2-field MV macroblock using the difference mode, the first motion vector to be received by the decoder is the top field forward motion vector, and the last (i.e., fourth) motion vector to be received is the bottom field backward motion vector.
Signaling skipped macroblocks
Skipped macroblocks are signaled in the same manner as P frames. However, skipped macroblocks in interlaced B-frames are limited to 1MV frame types, i.e., field types are not allowed. Motion vectors are coded with homodyne motion vectors (i.e., macroblocks are motion compensated using their 1MV motion compensation values) and there are no coded blocks (CBP ═ 0). If a macroblock is skipped, the encoder sends only BMVTYPE information for the macroblock so that the motion vector can be accurately predicted as forward, backward, direct, or interpolated.
Signalling macroblock modes
Signaling that macroblock mode is performed in the same way as interlaced P-frames, as described in the xv.c. section above.
Prediction type decoding (BMVTYPE and MVSW)
The prediction type of interlaced B-frames is decoded according to the following rules. If the picture horizon plane DIRECTMB indicates that the macroblock is of the direct type, then the prediction type for that macroblock is set to direct. If the direct/indirect decision is to encode in the original mode, the encoder uses an additional bit directbit at macroblock level to indicate whether the prediction type is direct or not.
If the prediction type is indirect, the decoder decodes the BMVTYPE syntax element. If the macroblock mode is "2 MV field coding" and if BMVTYPE is forward or backward, the decoder also decodes the MVSW bits to determine if the prediction type will change from the top field to the bottom field of the macroblock (i.e., from forward to backward, or vice versa).
Decoding direct mode motion vectors
To decode direct mode motion vectors, the decoder first buffers motion vectors from a previously decoded anchor frame. Specifically, for a previously decoded future P-frame, the decoder buffers half the maximum possible number of decoded luma motion vectors from the future P-frame (i.e., (2 x NumberOfMB) motion vectors). The method of selecting these motion vectors from the anchor frame to be buffered is described in section XIII above.
Using the motion vectors obtained above, the decoder applies scaling logic in Scale _ Direct _ MV, shown in pseudo code 1900 in fig. 19, to obtain forward and backward pointing motion vectors without pullback of the motion vectors.
In the present combined implementation, direct mode motion vectors are not calculated for macroblocks where direct mode prediction, such as forward and backward predicted macroblocks, are not used. In contrast, motion vectors for indirect macroblocks are predicted based on either a forward or backward motion vector buffer.
2. Motion vector decoding for interlaced B-frames
Motion vector predictor for interlaced B-frames
As with interlaced P-frames, the process of calculating a motion vector predictor for a current macroblock of an interlaced B-frame includes collecting its candidate motion vectors from neighboring macroblocks of the current macroblock and calculating the motion vector predictor for the current macroblock from the set of candidate motion vectors. Fig. 40A-40B illustrate neighboring macroblocks from which candidate motion vectors are collected. In this combined implementation, the motion vector predictor for an interlaced B-frame is selected from the candidate set according to the rules for interlaced P-frames described in the xv.c. section above.
Different prediction contexts are used for the forward and reverse mode motion vectors. The decoder predicts a forward motion vector using a forward prediction context and predicts a backward motion vector using a backward prediction context.
Filling forward and reverse prediction contexts in interlaced B-frames
The decoder separately buffers the forward and backward motion vectors and uses them to predict the forward and backward motion vectors, respectively. For interpolated macroblocks, the decoder uses a forward prediction buffer to predict forward motion vectors (first decoded MVDATA elements) and a backward buffer to predict backward motion vectors (second decoded MVDATA elements). When the macroblock is direct or interpolated, the decoder buffers the forward MV component in the forward buffer and the backward MV component in the backward buffer. The actual prediction logic used to select motion vector predictors from a candidate set in each case (e.g., 1MV macroblock, 2 field MV macroblock, etc.) is as described in the xv.c. section above.
The scheme for filling forward and backward motion vector buffers and predicting motion vectors from the motion vectors of these buffers is described in section x.c. above.
Decoding motion vector differences in interlaced B-frames
Motion vector difference values in interlaced B-frames are decoded according to pseudo-codes 7700 and 7710 in fig. 77A and 77B, as described in section xv.c.2 above.
Reconstructing motion vectors in interlaced B-frames
The motion vectors in the interlaced B-frame are decoded according to the pseudo code 7800 in fig. 78 and as described in the above sections xv.b.3 and xv.c.2.
E. Bit plane coding
Macroblock-specific binary information such as (1) the forward/non-forward decision of the macroblocks of the interlaced B-field (i.e., forward dmb flag), and (2) the direct/non-direct decision of the macroblocks of the interlaced B-field (i.e., DIRECTMB flag) may be encoded with one binary symbol per macroblock. For example, whether a macroblock of an interlaced B-field is motion compensated in forward mode (as opposed to another mode such as reverse, direct, or interpolated) may be signaled by 1 bit. In these cases, the state of all macroblocks in a field or frame may be encoded as bitplanes and transmitted in the field or frame header. An exception to this rule is that if the bit plane coding mode is set to the original mode, the state of each macroblock in this case is coded to 1 bit per symbol and transmitted at macroblock level along with other macroblock level syntax elements.
Field/frame level bit-plane coding is used to encode two-dimensional binary groups. The size of each array is rowMB x colMB, where rowMB and colMB are the number of macroblock rows and columns, respectively, in the field or frame in question. Within the bitstream, each array is encoded as a set of consecutive bits. One of seven modes is used to encode each array. The seven modes are:
1. original mode-information encoded as 1 bit per symbol and transmitted as part of the MB-level syntax;
2. normal-2 (Norm-2) mode-two symbols encoded together;
3. difference-2 (Diff-2) mode-difference coding of the bit-planes, followed by jointly coding the two residual symbols;
4. normal-6 (Norm-6) mode-six symbols that are jointly encoded;
5. difference-6 (Diff-6) mode-bit-plane difference coding, followed by jointly coding six residual symbols;
rowskip mode-signaling one bit skipping a row with no bit set; and
columnskip mode-signaling one bit skipping a column with no bit set.
The syntax element order of the upper level planes at the field or frame level is as follows: INVERT, IMODE and DATABITS.
Reversion marker (INVERT)
The INVERT syntax element is a 1-bit value, which if set, indicates that the bitplane has more set bits than 0 bits. Depending on INVERT and mode, the decoder will reverse the interpreted bit-planes to recreate the original bit-planes. Note that when the original mode is used, the value of this bit will be ignored. A description of how the INVERT values are used when decoding the bitplanes is provided below.
Coding mode (IMODE)
The IMODE syntax element is a variable length value indicating an encoding mode used to encode the bitplane. Table 11 shows a code table used to encode IMODE syntax elements. A description of how the IMODE values are used when decoding the bitplanes is provided below.
Table 11 IMODEVLC code Table
IMODE VLC Coding mode
10 Norm-2
11 Norm-6
010 Jumping
011 Jumping row
001 Diff-2
0001 Diff-6
0000 Original
Bit plane coded bit (DATABITS)
The DATABITS syntax element is a variable size syntax element that encodes the bitstream of bitplanes. The method used to encode the bitplanes is determined based on the value of IMODE. Seven encoding modes are described in the following sections.
Original mode
In this mode, the bitplanes are encoded as 1 bit per symbol scanned in the raster scan order of the macroblock and sent as part of the macroblock layer. Alternatively, the information is encoded in the original mode at the field or frame level, and the DATABITS are rowMB x colMB bits in length.
Norm-2 mode
If rowMB x colMB is odd, the first symbol is encoded as original. Subsequent symbols are encoded in pairs with a natural scan order. The binary VLC tables in table 12 are used to encode the symbol pairs.
Table 1 Norm-2/Diff-2 code table
Symbol 2n Symbol 2n +1 Code word
0 0 0
1 0 100
0 1 101
1 1 11
Diff-2 mode
The Norm-2 mode is used to generate the bit plane as described above, and then the Diff-1 operation is applied to the bit plane as described below.
Norm-6 mode
In the Norm-6 and Diff-6 modes, the bit-plane is encoded in groups of 6 pixels. These pixels are divided into blocks of 2x3 or 3x 2. The bitplanes are maximally tiled using a series of rules and the remaining pixels are encoded using a variant of the skip row and skip column mode. If and only if rowMB is a multiple of 3 and colMB is not, then 2x3 "vertical" blocks are used. Otherwise, 3x2 "horizontal" blocks are used. Fig. 79A shows a simplified example of a 2x3 "vertical" block. Fig. 79B and 79C show 3x2 "horizontal" blocks that are elongated black rectangles 1 pixel wide and encoded using skip row and skip column encoding. For planes tiled as shown in fig. 79C, linear blocks are used at the upper and left edges of the picture, and the coding order of these blocks conforms to the following pattern. The 6-element block is encoded first, followed by the column and row skip encoded linear block. If the array size is a multiple of 2x3 or 3x2, then the following linear block does not exist and the bit-plane is perfectly tiled.
The 6-element rectangular block is encoded using an incomplete Huffman code, i.e., a Huffman code that does not use all endpoints for encoding. Let N be the number of bits set in the block, i.e., N is greater than or equal to 0 and less than or equal to 6. For N < 3, the block is coded using VLC. For N ═ 3, the fixed length escape code is followed by a 5-bit fixed length code, and for N > 3, the fixed length escape code is followed by the complement of the block.
The rectangular block contains 6 bits of information. Let k be the code associated with the block, where k is bi2i,biThe binary value of the ith bit in the natural scan order within the block. Therefore, 0. ltoreq. k < 64. VLC, escape code, plus fixed length code are used to signal k.
Diff-6 mode
The Norm-6 mode is used to generate bit planes as described above, then Diff-1The operation is applied to the bit-plane as follows.
Skip mode
In skip coding mode, all zero rows are skipped with an overhead of 1 bit. The syntax is as follows: for each row, a single ROWSKIP bit indicates whether the row is skipped; if the row is skipped, then the next row's ROWSKIP bit follows; otherwise (the row is not skipped), followed by the row bits (bits for each macroblock in the row). Thus, if the integer is zero, the zero bit is sent as a ROWSKIP symbol and skips ROWBITS. If there is a set bit in the row, ROWSKIP is set to 1 and the entire row is sent as original (ROWBITS). The rows scan from the top to the bottom of a field or frame.
Skip column mode
A column skip is a transpose of a row skip. The columns are scanned from the top to the bottom of a field or frame.
Diff-1: inverse difference decoding
When either difference mode (Diff-2 or Diff-6) is used, the bit plane of the "difference bits" is first decoded using the corresponding normal mode (Norm-2 or Norm-6). The difference bits are used to regenerate the original bit-plane. Regeneration processIs the 2-D DPCM on the binary alphabet. To regenerate a bit at location (i, j), value b is predictedp(i, j) is generated (from bit b (i, j) at location (i, j)) as follows:
for the difference coding mode, the INVERT based bit-by-bit inversion process is not performed. However, the INVERT flag is used to represent the value of the symbol A in a different capacity for deriving the predicted value shown. More specifically, a equals 0 if INVERT equals 0 and a equals 1 if INVERT equals 1. The actual value of the bit plane is obtained by xoring the predicted value and the decoded difference bit value. In the above equation, b (i, j) is the bit at the (i, j) th position after the final decoding (i.e., after performing Norm-2 or Norm-6, then xoring with the difference of its predicted value).
Having described and illustrated various principles of the invention with reference to various embodiments, it will be understood that the various embodiments may be modified in arrangement and detail without departing from such principles. It should be understood that the programs, processes, or methods described herein are not related or limited to any particular type of computing environment, unless indicated otherwise. Various types of general purpose or special purpose computing environments may be used or perform operations in accordance with the teachings described herein. Elements of various embodiments shown in software may be implemented in hardware and vice versa.
In view of the many possible embodiments to which the principles of this invention may be applied, we will interpret the invention as all such embodiments as may come within the scope and spirit of the following claims and equivalents thereto.

Claims (13)

1. In a computing device implementing a video decoder, the computing device including a processor and a memory, a method of decoding video, the method comprising:
decoding, with the computing device implementing a video decoder, a current direct mode macroblock in a current interlaced bi-directionally predicted field, comprising:
if a co-located macroblock in a temporally future interlaced prediction anchor field is coded using four motion vectors:
determining up to four motion vectors for the co-located macroblock; and
calculating a motion vector to be used in a direct mode scaling operation on the current direct mode macroblock, comprising biasing a dominant polarity of up to four motion vectors of the co-located macroblock in calculating the motion vector by:
determining whether more of up to four motion vectors of the co-located macroblock reference more of the same or opposite polarity reference fields;
if more of the up to four motion vectors of the co-located macroblock reference the opposite polarity reference field, calculating motion vectors to be used in a direct mode scaling operation using those of the up to four motion vectors of the co-located macroblock that reference the opposite polarity reference field; and
otherwise, calculating motion vectors to be used in a direct mode scaling operation using those of up to four motion vectors of the co-located macroblock that reference the same polarity reference field; and
reconstructing, with the computing device implementing a video decoder, the current interlaced bi-directionally predicted field using the decoded current direct mode macroblock.
2. The method of claim 1 wherein the future interlaced prediction anchor field has the same polarity as the current interlaced bi-directional prediction field with the current direct mode macroblock.
3. The method of claim 1, wherein said computing a motion vector for use in a direct mode scaling operation further comprises:
the co-located macroblock is counted for a same polarity motion vector of up to four motion vectors and for an opposite polarity motion vector.
4. The method of claim 1, wherein said decoding the current direct mode macroblock further comprises:
using the calculated motion vectors to derive forward and backward motion vectors for use in motion compensation of the current direct mode macroblock; and
motion compensating the current direct mode macroblock using the forward motion vector and the backward motion vector.
5. In a computing device implementing a video encoder, the computing device including a processor and a memory, a method of encoding video, the method comprising:
encoding, with the computing device implementing a video encoder, a current interlaced B-field to produce encoded video information, wherein the encoding includes computing motion vectors to be used in direct mode scaling of a current macroblock in the current interlaced B-field, comprising:
if the collocated macroblock of the temporally future interlaced anchor P-field is a 4MV macroblock with up to four motion vectors:
calculating a dominant polarity motion vector from up to four motion vectors of the co-located macroblock as a motion vector used in the direct mode scaling:
determining whether more of up to four motion vectors of the co-located macroblock reference more of the same or opposite polarity reference fields;
if more of the up to four motion vectors of the co-located macroblock reference the opposite polarity reference field, calculating the dominant polarity motion vector using those of the up to four motion vectors of the co-located macroblock that reference the opposite polarity reference field; and
otherwise, calculating the dominant polarity motion vector using those of up to four motion vectors of the co-located macroblock that reference the same polarity reference field; and
outputting, with the computing device implementing a video encoder, the encoded video information in a bitstream.
6. The method of claim 5, wherein the calculating a dominant polarity motion vector comprises:
counting same polarity motion vectors of the up to four motion vectors and counting opposite polarity motion vectors.
7. The method of claim 5, wherein the calculating a dominant polarity motion vector comprises averaging two motion vectors.
8. The method of claim 5, wherein the calculating a dominant polarity motion vector comprises calculating a median of three or four motion vectors.
9. The method of claim 5, wherein the calculating a dominant polarity motion vector comprises selecting a single available motion vector among the up to four motion vectors.
10. The method of claim 5, comprising performing the calculation of the dominant polarity motion vector during processing of a current macroblock of the current interlaced B-field.
11. The method of claim 5, comprising performing the calculation of the dominant polarity motion vector during processing of the co-located macroblock before processing of a current macroblock of the current interlaced B-field begins.
12. The method of claim 5 wherein the future interlaced anchor P-field has the same polarity as the current interlaced B-field.
13. The method of claim 5, wherein the encoding further comprises:
calculating a forward motion vector and a backward motion vector using the dominant polarity motion vector for use in motion compensation of the current macroblock; and
motion compensating the current macroblock using the forward motion vector and the backward motion vector.
HK11105843.5A 2003-09-07 2011-06-09 Video encoding and decoding method HK1152178B (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US50108103P 2003-09-07 2003-09-07
US60/501,081 2003-09-07
US10/882,135 US8064520B2 (en) 2003-09-07 2004-06-29 Advanced bi-directional predictive coding of interlaced video
US10/882,135 2004-06-29

Publications (2)

Publication Number Publication Date
HK1152178A1 HK1152178A1 (en) 2012-02-17
HK1152178B true HK1152178B (en) 2013-07-12

Family

ID=

Similar Documents

Publication Publication Date Title
US7664177B2 (en) Intra-coded fields for bi-directional frames
JP4921971B2 (en) Innovations in encoding and decoding macroblocks and motion information for interlaced and progressive video
HK1152178B (en) Video encoding and decoding method
HK1152431B (en) Advanced bi-directional predictive coding of interlaced video