Video Compression Fundamentals: Pamela C. Cosman

VIDEO COMPRESSION
FUNDAMENTALS
Pamela C. Cosman
1
Compressing Digital Video
 Exploit spatial redundancy within frames (like JPEG:
transforming, quantizing, variable length coding)
 Exploit temporal redundancy between frames
 Only the sun has changed position between these 2 frames
Previous Frame Current Frame
2
Simplest Temporal Coding - DPCM
 Frame 0 (still image)
 Difference frame 1 = Frame 1
– Frame 0
 Difference frame 2 = Frame 2
– Frame 1
 If no movement in the scene,
all difference frames are 0.
Can be greatly compressed! 0 1 2 3
 If movement, can see it in the
difference images
3
Difference Frames
 Differences between two frames can be

caused by
 Camera motion: the outlines of background or
stationary objects can be seen in the Diff Image
 Object motion: the outlines of moving objects can
be seen in the Diff Image
 Illumination changes (sun rising, headlights, etc.)
 Scene Cuts: Lots of stuff in the Diff Image
 Noise
4
Difference Frames
 If the only difference between two frames is

noise (nothing moved), then you won’t
recognize anything in the Difference Image
 But, if you can see something in the Diff
Image and recognize it, there’s still
correlation in the difference image
 Goal: remove the correlation by
compensating for the motion
5
6
7
8
Types of Motion
Frame n
 Translation: simple
movement of typically
rigid objects Frame n+1 Frame n+2
(Rotation) (Zoom)
 Camera pans vs.
movement of objects  Rotation: spinning
about an axis
 Camera versus object
rotation
 Zooms –in/out
 Camera zoom vs. object
Frame n Frame n+1
zoom (movement in/out)
9
Describing Motion
 Translational
 Move (object) from (x,y) to (x+dx,y+dy)
 Rotational
 Rotate (object) by (r rads) (counter/clockwise)
 Zoom
 Move (in/out) from (object) to increase its size by
(t times)
Which is easiest? Which are we most likely to

encounter?
10
Motion Estimation
 Determining parameters for the motion

descriptions
 For some portion of the frame, estimate its
movement between 2 frames- the current
frame and the reference frame
 What is some portion?
 Individual pixels (all of them)?
 Lines/edges (have to find them first)
 Objects (must define them)
 Uniform regions (just chop up the frame)
11
General Idea
 For a region PC in the current frame, find a

region PR in the search window in reference
frame so that Error(PR,PC) is minimized
 Issues: Error measures, search techniques,
choice of search window, choice of reference
frame, choice of region PC
Current
Portion
Reference of
Search Frame
Frame interest
window
PC
12
Block-based Motion Estimation
 PC is a block of pixels (in the current frame)

 The search window is a rectangular segment
(in the reference frame)
T=1 (reference) T=2 (current)

13
Motion Vectors
 A motion vector (MV) describes the offset between
the location of the block being coded (in the current
frame) and the location of the best-match block in
the reference frame
T=1 (reference) T=2 (current)

14
Motion Compensation
The blocks being predicted are on a grid
1 3 1 2 3 4
2 4
5 6 7 8
6 7 8
5
9 10 11 12
10
9 12
11
14 13 14 15 16
13 15 16
The blocks used for prediction are NOT

15
Motion Vector Search
 1. Mean squared error  Given error measure,

 Select a block in the how to efficiently
reference frame to determine best-match
minimize block in search window?
Σ(b(Bref)-b(Bcurr))2  Full search: best results,
 2. Mean abs. error most computation
 Logarithmic search –
 Select block to
heuristic, faster
minimize
 Hierarchical motion
Σ|b(Bref)-b(Bcurr)|
estimation
16
Motion Vector Search
 Logarithmic Search: First examine
 Full search: Evaluate positions marked 1.
every position in the  Choose best of these (lowest error
search window measure) and examine positions
marked 2 surrounding it
 Choose the best of these, and
examine the positions marked 3
 Final result = best of these
17
Hierarchical Motion Estimation
 Use an averaging filter on the image, then

downsample by a factor of 2
 Conduct a search on the downsampled
image (only ¼ of the size)
 Given the results of the search on the
downsampled image, return to the full
resolution image and refine the search there
18
Motion Compensation
 The standards do not specify HOW the

encoder will find the motion vectors (MVs)
 The encoder can use exhaustive/fast search,
MSE /MAE/other error metric, etc.
 The standard DOES specify
 The allowable syntax for specifying the MVs
 What the decoder will do with them
 What the decoder does is to grab the
indicated block from reference frame, and
glue it in place
19
Standard specifies bit stream
 The video compression  This allows future encoders of
standards define syntax and better performance to remain
semantics for the bit stream compatible with existing
between encoder and decoder decoders.
bit stream
 Also allows for commercially
ENCODER DECODER
secret encoders to be
compatible with standard
decoders
not this Standard defines not this
this Today’s Ho-Hum Today’s
Encoder Decoder
 Encoder is not specified by
MPEG except that it produces Tomorrow’s Nifty
a compliant bit stream Encoder
 Compliant decoder must Today’s decoder
interpret all legal MPEG bit Very secret still works!
streams Encoder
20
Motion Compensation Example
Frame n-1 Frame n
(0,0) (-16,0) (5,0) (0,0)
(0,0) (16,7) (5,2) (0,0)
(20,-24) (0,0) (-20,-18) (0,0) MOTION COMPENSATED

Frame n
21
Objects versus Macroblocks
 Real moving objects will not coincide with
boundaries of macroblocks
background Prediction error

Background well
encoded (no
moving object Moving object well encoded motion vector) Prediction error
with motion vector
 If encoder sends MV=(MotX,MotY), object well

coded, but background poorly coded
 If encoder sends MV=(0,0), background well coded,
but moving object poorly coded
 Either approach is valid
22
Motion Compensation
 This glued together frame is called

the motion compensated frame
 The encoder can also form the difference between
the motion compensated frame and the actual
frame.
 This is called the motion compensated difference
frame
 This difference frame formed using MC should have
less correlation between pixels than the difference
frame formed without using MC
23
Motion Compensated Difference
Frames
 Suppose we are doing lossless coding
 Encoder has sequence of frames: …, F(n-2), F(n-1)
 Next: encode F(n)
 Past frames have been losslessly encoded, so the
decoder knows F(n-1) perfectly already
 Encoder sends the motion vectors for frame F(n)
relative to frame F(n-1), to form motion
compensated frame M(n)
 Encoder knows M(n), Decoder knows M(n)
24
Motion Compensation Example
F(n-1) F(n)
M(n)
(0,0) (-16,0) (5,0) (0,0)
(0,0) (16,7) (5,2) (0,0)
(20,-24) (0,0) (-20,-18) (0,0)

MOTION COMPENSATED
Frame
25
Encoding Difference Frames
 Encoder forms motion  With no motion compensation
compensated diff frame: encoder could do frame diff:
MCD(n) = F(n) – M(n) FD(n) = F(n) – F(n-1)
 Encoder losslessly  Encoder losslessly
encodes MCD(n) encodes FD(n)
 Decoder can then do  Decoder can then do
F(n) = MCD(n) + M(n) F(n) = FD(n) + F(n-1)
→ knows F(n) exactly → knows F(n) exactly
If successive frames are very similar:

fewer bits to send Motion Vectors + MCD(n) instead of FD(n)
fewer bits to send FD(n) instead of F(n)
26
Motion compensated difference frames
 Decoder knows F(n-1) and, once you send the
motion vectors, it knows M(n)
Send FD(n)
Reference Frame Original Frame Difference Image

F(n-1) F(n) FD(n)=F(n)-F(n-1)
Send Motion
Vectors Send MCD(n)
Motion compensated Motion compensated
frame M(n) difference image
MCD(n) =F(n) – M(n)
27
Motion Compensated Difference
Frames
 But we are NOT doing lossless coding
 Encoder has sequence of frames: …, F(n-2), F(n-1)
 Next: encode F(n)
 Past frames have been lossy encoded, so the
decoder has versions …, G(n-2), G(n-1)
 Encoder knows …, G(n-2), G(n-1) also
 Encoder sends the motion vectors for frame F(n)
relative to frame G(n-1), to form motion
compensated frame M(n)
28
Encoding Difference Frames
 Encoder forms motion compensated
difference frame: MCD(n) = F(n) – M(n)
 Encoder lossy encodes MCD(n)
 Call the decoder version MCD*(n)
 If the decoder received MCD(n) exactly,
could do: F(n) = MCD(n) + M(n)
 But with MCD*(n), decoder can do
G(n) = MCD*(n) + M(n)
→ knows F(n) approximately
29
Motion estimation philosophy
 Goal of motion estimation is NOT to provide a

careful analysis of the actual motion
 Goal is to achieve a given quality of representation
of the video while globally minimizing the bit rate
required to send
 The motion information
 The prediction error information
 Most of the time, for a given representation quality
 fewer bits to send MV+MCD(n) instead of sending FD(n)
 fewer bits to send FD(n) instead of sending F(n) itself.
30
Motion Compensation for Chrominance
 Luminance is highly correlated, more so than

chrominance
 The “best” motion vectors are available by
searching in the luminance plane
 Motion vectors for chrominance are not
computed separately, simply scaled as
needed
31
Motion Estimation/Compensation
Summary
 At the encoder:
 For each block in the frame being coded, examine
the search window(s) in the reference frame to
find the best match block (do this for luminance
only)
 Form the MC difference image = original image
minus motion compensated image
 Scale the motion vectors for the chrominance,
form the motion compensated chrominance
frames, and form chrominance difference image
32
Motion Estimation/Compensation
Summary
 At the decoder:
 Decode the reference frames (Y,Cr,Cb)
 For each block in a temporally coded Y frame, use
the motion vector to select a block from the
reference frame and glue it in place
 Add the Y difference image
 For each block in temporally coded Cr,Cb frames,
first scale the motion vector, then do the previous
2 steps with Cr and Cb data
33
Progress of Video Compression
34
35
36
Temporal Location of Reference
 The reference frame need not occur before

the temporally coded frames which use it
 Why? Scene changes, allow better matches
37
Flavors of Motion Estimation
 1. Forward predicted blocks: the best-match block

occurs in the reference frame before the block’s
frame
 2. Backward predicted blocks: the best-match block
occurs in the reference frame after the block’s frame
 3. Interpolatively predicted blocks: the best-match
block is the average of the best-match blocks from
reference frames before & after
 The motion compensation direction can be selected
independently for each block in a frame.
38
MPEG Frame Types
 Intra (I) pictures: coded by themselves, as

still images. No temporal coding. No motion
vectors.
39
MPEG Frame Types
 Forward Motion Compensated predicted (P)

pictures – forward motion compensated from
the previous I or P frame
40
MPEG Frame Types
 Motion Compensated interpolated (B) pictures –
forward, backward, and interpolatively motion
compensated from previous/next I/P frames
41
Motion Vector Coding
 How are the motion vectors actually encoded

for transmission to the decoder?
 Start by taking the difference between the current
motion vector and the most recent previous one of
the same type (forward/backward/interpolative)
 Encode the difference using variable length
coding
 Horizontal and vertical components coded
separately
42
MPEG Frame Structure Terminology
 A block contains 8x8 pixels

 The DCT unit
 A macroblock (MB) contains 4 blocks from
the luminance, plus the corresponding
chrominance blocks
 4 blocks from each of Cr/Cb if 4:4:4 format
 2 blocks from each of Cr/Cb if 4:2:2 format
 1 block from each of Cr/Cb if 4:1:1 or 4:2:0 format
 The motion compensation unit
43
MPEG Frame Structure Terminology
 A slice is a collection of macroblocks, tracing

in a raster scan from upper left to lower right
 The resynchronization unit
 A picture is a frame, either progressive (non-
interlaced) or interlaced
 The primary coding unit
 A Group of Pictures (GOP) contains ≥ 1
frame.
 The unit for random access into the sequence
44
MPEG GOP Structure
 A Group of Pictures (GOP) may contain

 All I pictures
 I & P pictures only
 I, P, & B Pictures
 A common GOP format for 30 frames/sec:
 I-picture spacing 15 frames (1/2 second)
 P-picture spacing 3 frames (1/10 second)
I B B P B B P B B P B B P B B I
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
45
Frame Ordering
 Display order (encoder input order):
B B I B B P B B P B B P B B P B B I
-1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
 But consider coding dependencies:

 Frame 2 (B) needs frame 4 (P) to be decoded first, etc.
 So better transmit frame 4 before frame 2
I B B P B B P B B P B B P B B I B B
1 -1 0 4 2 3 7 5 6 10 8 9 13 11 12 16 14 15
46
Types of Coding Modes
 What if the best-match block in the reference

frame is a great match?
 Then the motion vector is all you need to send
 What if it is a terrible match?
 Then don’t use the motion vector at all, just code
the block by itself, with something like JPEG
(called intra mode coding)
 What if it is a so-so match?
 Then you can send the MV, and also send the
frame difference information for that macroblock
47
Coding Mode I (Inter-Coding)
 Inter coding refers to coding with motion vectors
Macro
Block
Previous Current
Frame Frame
Motion Vector
48
Coding Mode II (Intra-Coding)
 INTRA coding refers to coding without motion vectors

 The MB is coded all by itself, in a manner similar to JPEG
Macro
Block
Previous Current
Frame Frame
49
I-Picture Coding
 Two possible coding modes for macroblocks

in I-frames
 Intra- code the 4 blocks with the current
quantization parameters
 Intra with modified quantization: scale the
quantization matrix before coding this MB
 All macroblocks in intra pictures are coded
 Quantized DC coefficients are losslessly
DPCM coded, then Huffman as in JPEG
50
I-Picture Coding
 Use of macroblocks modifies block-scan order:

8 8
 Quantized coefficients are zig-zag scanned and run-

length/Huffman coded as in JPEG
 Very similar to JPEG except (1) scaling Q matrix
separately for each MB, and (2) order of blocks
51
P-Picture Coding: many coding modes
 Motion compensated  MV = (0,0), just send

coding: Motion Vector only difference block
 Motion compensated  MV=(0,0), just send diff
coding: MV plus difference block, with modified
macroblock quantization scaling
 Intra: the MB is coded
Motion DCT with DCTs (no
Vector
difference is computed)
 Motion compensation: MV  Intra with modified
& difference MB with quantization scaling
modified quant. scaling
52
How to choose a coding mode?
 MPEG does not specify how to choose mode
 Full search = try everything…the different
possibilities will lead to different rate/distortion
outcomes for that macroblock
distortion
•
• MV, no difference
•
•
• MV, plus difference
• • Intra
•
rate
53
How to choose a coding mode?
 Tree search: use a decision tree

 For example:
 First find the best-match block in the search window. If
it’s a very good match, then use motion compensation.
Otherwise, don’t.
 If you decided to use motion compensation, then need
to decide whether or not to send the difference block as
well. Make decision based on how good a match it is.
 If you decided to send the difference block, then have to
decide whether or not to scale the quantization
parameter… check the current rate usage…
54
B-Picture Coding
 B pictures have even more possible modes:

 Forward prediction MV, no difference block
 Forward prediction MV, plus difference block
 Backward prediction MV, no difference block
 Backward prediction MV, plus difference block
 Interpolative prediction MV, no difference block
 Interpolative prediction MV, plus difference block
 Intra coding
 Some of above with modified Quant parameter
55
Group of Pictures
 IIIII…: Every picture is intra-coded.

 Fully decodable without reference to any other picture
 Editing is straightforward
 Requires about 2.5 more bit rate than bidirectional
 IBBPBBPB…: Forward and bidirectional
 Best compression factor
 Needs large decoder memory
 Hard to edit
 Most useful for final delivery of post-produced material
(e.g., broadcast) because no editing requirement
56
Group of Pictures
 IPPPPIPP…: Forward predicted only.

 Needs less decoder memory
 IBIBIB…: bidirectional compromise
 Some of the bit rate advantage of bidirectional coding
 Not nearly the full latency penalty of bidirectional
 Editable with moderate processing.
For example, if the video after a B picture needs to be
deleted, the B frame would not be decodable.
Solution is to decode the B frame first, re-encode it using
forward prediction only. Some quality loss.
57

Video Compression Fundamentals: Pamela C. Cosman

Uploaded by

Document Informationclick to expand document information

Copyright:

Available Formats

Video Compression Fundamentals: Pamela C. Cosman

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Video Compression Fundamentals: Pamela C. Cosman

Uploaded by

Copyright:

Available Formats

VIDEO COMPRESSION

Previous Frame Current Frame

 Differences between two frames can be

 If the only difference between two frames is

Which is easiest? Which are we most likely to

 Determining parameters for the motion

 For a region PC in the current frame, find a

 PC is a block of pixels (in the current frame)

T=1 (reference) T=2 (current)

T=1 (reference) T=2 (current)

The blocks being predicted are on a grid

The blocks used for prediction are NOT

 1. Mean squared error  Given error measure,

 Use an averaging filter on the image, then

 The standards do not specify HOW the

Frame n-1 Frame n

(0,0) (-16,0) (5,0) (0,0)

(0,0) (16,7) (5,2) (0,0)

(20,-24) (0,0) (-20,-18) (0,0) MOTION COMPENSATED

background Prediction error

 If encoder sends MV=(MotX,MotY), object well

 This glued together frame is called

(0,0) (-16,0) (5,0) (0,0)

(0,0) (16,7) (5,2) (0,0)

(20,-24) (0,0) (-20,-18) (0,0)

If successive frames are very similar:

Reference Frame Original Frame Difference Image

 Goal of motion estimation is NOT to provide a

 Luminance is highly correlated, more so than

 The reference frame need not occur before

 Why? Scene changes, allow better matches

 1. Forward predicted blocks: the best-match block

 Intra (I) pictures: coded by themselves, as

 Forward Motion Compensated predicted (P)

 How are the motion vectors actually encoded

 A block contains 8x8 pixels

 A slice is a collection of macroblocks, tracing

 A Group of Pictures (GOP) may contain

 Display order (encoder input order):

 But consider coding dependencies:

 What if the best-match block in the reference

 Inter coding refers to coding with motion vectors

 INTRA coding refers to coding without motion vectors

 Two possible coding modes for macroblocks

 Use of macroblocks modifies block-scan order:

 Quantized coefficients are zig-zag scanned and run-

 Motion compensated  MV = (0,0), just send

 Tree search: use a decision tree

 B pictures have even more possible modes:

 IIIII…: Every picture is intra-coded.

 IPPPPIPP…: Forward predicted only.

You might also like