[go: up one dir, main page]

0% found this document useful (0 votes)
63 views57 pages

Video Compression Fundamentals: Pamela C. Cosman

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1/ 57

VIDEO COMPRESSION

FUNDAMENTALS

Pamela C. Cosman

1
Compressing Digital Video
 Exploit spatial redundancy within frames (like JPEG:
transforming, quantizing, variable length coding)
 Exploit temporal redundancy between frames
 Only the sun has changed position between these 2 frames

Previous Frame Current Frame

2
Simplest Temporal Coding - DPCM
 Frame 0 (still image)
 Difference frame 1 = Frame 1
– Frame 0
 Difference frame 2 = Frame 2
– Frame 1
 If no movement in the scene,
all difference frames are 0.
Can be greatly compressed! 0 1 2 3
 If movement, can see it in the
difference images

3
Difference Frames

 Differences between two frames can be


caused by
 Camera motion: the outlines of background or
stationary objects can be seen in the Diff Image
 Object motion: the outlines of moving objects can
be seen in the Diff Image
 Illumination changes (sun rising, headlights, etc.)
 Scene Cuts: Lots of stuff in the Diff Image
 Noise

4
Difference Frames

 If the only difference between two frames is


noise (nothing moved), then you won’t
recognize anything in the Difference Image
 But, if you can see something in the Diff
Image and recognize it, there’s still
correlation in the difference image
 Goal: remove the correlation by
compensating for the motion

5
6
7
8
Types of Motion
Frame n

 Translation: simple
movement of typically
rigid objects Frame n+1 Frame n+2
(Rotation) (Zoom)
 Camera pans vs.
movement of objects  Rotation: spinning
about an axis
 Camera versus object
rotation
 Zooms –in/out
 Camera zoom vs. object
Frame n Frame n+1
zoom (movement in/out)

9
Describing Motion

 Translational
 Move (object) from (x,y) to (x+dx,y+dy)
 Rotational
 Rotate (object) by (r rads) (counter/clockwise)
 Zoom
 Move (in/out) from (object) to increase its size by
(t times)

Which is easiest? Which are we most likely to


encounter?
10
Motion Estimation

 Determining parameters for the motion


descriptions
 For some portion of the frame, estimate its
movement between 2 frames- the current
frame and the reference frame
 What is some portion?
 Individual pixels (all of them)?
 Lines/edges (have to find them first)
 Objects (must define them)
 Uniform regions (just chop up the frame)
11
General Idea

 For a region PC in the current frame, find a


region PR in the search window in reference
frame so that Error(PR,PC) is minimized
 Issues: Error measures, search techniques,
choice of search window, choice of reference
frame, choice of region PC
Current
Portion
Reference of
Search Frame
Frame interest
window
PC

12
Block-based Motion Estimation

 PC is a block of pixels (in the current frame)


 The search window is a rectangular segment
(in the reference frame)

T=1 (reference) T=2 (current)


13
Motion Vectors
 A motion vector (MV) describes the offset between
the location of the block being coded (in the current
frame) and the location of the best-match block in
the reference frame

T=1 (reference) T=2 (current)


14
Motion Compensation

The blocks being predicted are on a grid

1 3 1 2 3 4
2 4
5 6 7 8
6 7 8
5
9 10 11 12
10
9 12
11
14 13 14 15 16

13 15 16

The blocks used for prediction are NOT


15
Motion Vector Search

 1. Mean squared error  Given error measure,


 Select a block in the how to efficiently
reference frame to determine best-match
minimize block in search window?
Σ(b(Bref)-b(Bcurr))2  Full search: best results,
 2. Mean abs. error most computation
 Logarithmic search –
 Select block to
heuristic, faster
minimize
 Hierarchical motion
Σ|b(Bref)-b(Bcurr)|
estimation

16
Motion Vector Search
 Logarithmic Search: First examine
 Full search: Evaluate positions marked 1.
every position in the  Choose best of these (lowest error
search window measure) and examine positions
marked 2 surrounding it
 Choose the best of these, and
examine the positions marked 3
 Final result = best of these

17
Hierarchical Motion Estimation

 Use an averaging filter on the image, then


downsample by a factor of 2
 Conduct a search on the downsampled
image (only ¼ of the size)
 Given the results of the search on the
downsampled image, return to the full
resolution image and refine the search there

18
Motion Compensation

 The standards do not specify HOW the


encoder will find the motion vectors (MVs)
 The encoder can use exhaustive/fast search,
MSE /MAE/other error metric, etc.
 The standard DOES specify
 The allowable syntax for specifying the MVs
 What the decoder will do with them
 What the decoder does is to grab the
indicated block from reference frame, and
glue it in place
19
Standard specifies bit stream
 The video compression  This allows future encoders of
standards define syntax and better performance to remain
semantics for the bit stream compatible with existing
between encoder and decoder decoders.
bit stream
 Also allows for commercially
ENCODER DECODER
secret encoders to be
compatible with standard
decoders
not this Standard defines not this
this Today’s Ho-Hum Today’s
Encoder Decoder
 Encoder is not specified by
MPEG except that it produces Tomorrow’s Nifty
a compliant bit stream Encoder
 Compliant decoder must Today’s decoder
interpret all legal MPEG bit Very secret still works!
streams Encoder

20
Motion Compensation Example

Frame n-1 Frame n

(0,0) (-16,0) (5,0) (0,0)

(0,0) (16,7) (5,2) (0,0)

(20,-24) (0,0) (-20,-18) (0,0) MOTION COMPENSATED


Frame n

21
Objects versus Macroblocks
 Real moving objects will not coincide with
boundaries of macroblocks

background Prediction error


Background well
encoded (no
moving object Moving object well encoded motion vector) Prediction error
with motion vector

 If encoder sends MV=(MotX,MotY), object well


coded, but background poorly coded
 If encoder sends MV=(0,0), background well coded,
but moving object poorly coded
 Either approach is valid
22
Motion Compensation

 This glued together frame is called


the motion compensated frame
 The encoder can also form the difference between
the motion compensated frame and the actual
frame.
 This is called the motion compensated difference
frame
 This difference frame formed using MC should have
less correlation between pixels than the difference
frame formed without using MC

23
Motion Compensated Difference
Frames
 Suppose we are doing lossless coding
 Encoder has sequence of frames: …, F(n-2), F(n-1)
 Next: encode F(n)
 Past frames have been losslessly encoded, so the
decoder knows F(n-1) perfectly already
 Encoder sends the motion vectors for frame F(n)
relative to frame F(n-1), to form motion
compensated frame M(n)
 Encoder knows M(n), Decoder knows M(n)

24
Motion Compensation Example

F(n-1) F(n)
M(n)

(0,0) (-16,0) (5,0) (0,0)

(0,0) (16,7) (5,2) (0,0)

(20,-24) (0,0) (-20,-18) (0,0)


MOTION COMPENSATED
Frame

25
Encoding Difference Frames
 Encoder forms motion  With no motion compensation
compensated diff frame: encoder could do frame diff:
MCD(n) = F(n) – M(n) FD(n) = F(n) – F(n-1)
 Encoder losslessly  Encoder losslessly
encodes MCD(n) encodes FD(n)
 Decoder can then do  Decoder can then do
F(n) = MCD(n) + M(n) F(n) = FD(n) + F(n-1)
→ knows F(n) exactly → knows F(n) exactly

If successive frames are very similar:


fewer bits to send Motion Vectors + MCD(n) instead of FD(n)
fewer bits to send FD(n) instead of F(n)
26
Motion compensated difference frames
 Decoder knows F(n-1) and, once you send the
motion vectors, it knows M(n)
Send FD(n)

Reference Frame Original Frame Difference Image


F(n-1) F(n) FD(n)=F(n)-F(n-1)

Send Motion
Vectors Send MCD(n)
Motion compensated Motion compensated
frame M(n) difference image
MCD(n) =F(n) – M(n)

27
Motion Compensated Difference
Frames
 But we are NOT doing lossless coding
 Encoder has sequence of frames: …, F(n-2), F(n-1)
 Next: encode F(n)
 Past frames have been lossy encoded, so the
decoder has versions …, G(n-2), G(n-1)
 Encoder knows …, G(n-2), G(n-1) also
 Encoder sends the motion vectors for frame F(n)
relative to frame G(n-1), to form motion
compensated frame M(n)
28
Encoding Difference Frames
 Encoder forms motion compensated
difference frame: MCD(n) = F(n) – M(n)
 Encoder lossy encodes MCD(n)
 Call the decoder version MCD*(n)
 If the decoder received MCD(n) exactly,
could do: F(n) = MCD(n) + M(n)
 But with MCD*(n), decoder can do
G(n) = MCD*(n) + M(n)
→ knows F(n) approximately
29
Motion estimation philosophy

 Goal of motion estimation is NOT to provide a


careful analysis of the actual motion
 Goal is to achieve a given quality of representation
of the video while globally minimizing the bit rate
required to send
 The motion information
 The prediction error information
 Most of the time, for a given representation quality
 fewer bits to send MV+MCD(n) instead of sending FD(n)
 fewer bits to send FD(n) instead of sending F(n) itself.

30
Motion Compensation for Chrominance

 Luminance is highly correlated, more so than


chrominance
 The “best” motion vectors are available by
searching in the luminance plane
 Motion vectors for chrominance are not
computed separately, simply scaled as
needed

31
Motion Estimation/Compensation
Summary
 At the encoder:
 For each block in the frame being coded, examine
the search window(s) in the reference frame to
find the best match block (do this for luminance
only)
 Form the MC difference image = original image
minus motion compensated image
 Scale the motion vectors for the chrominance,
form the motion compensated chrominance
frames, and form chrominance difference image

32
Motion Estimation/Compensation
Summary
 At the decoder:
 Decode the reference frames (Y,Cr,Cb)
 For each block in a temporally coded Y frame, use
the motion vector to select a block from the
reference frame and glue it in place
 Add the Y difference image
 For each block in temporally coded Cr,Cb frames,
first scale the motion vector, then do the previous
2 steps with Cr and Cb data

33
Progress of Video Compression

34
Progress of Video Compression

35
Progress of Video Compression

36
Temporal Location of Reference

 The reference frame need not occur before


the temporally coded frames which use it

 Why? Scene changes, allow better matches

37
Flavors of Motion Estimation

 1. Forward predicted blocks: the best-match block


occurs in the reference frame before the block’s
frame
 2. Backward predicted blocks: the best-match block
occurs in the reference frame after the block’s frame
 3. Interpolatively predicted blocks: the best-match
block is the average of the best-match blocks from
reference frames before & after
 The motion compensation direction can be selected
independently for each block in a frame.

38
MPEG Frame Types

 Intra (I) pictures: coded by themselves, as


still images. No temporal coding. No motion
vectors.

39
MPEG Frame Types

 Forward Motion Compensated predicted (P)


pictures – forward motion compensated from
the previous I or P frame

40
MPEG Frame Types
 Motion Compensated interpolated (B) pictures –
forward, backward, and interpolatively motion
compensated from previous/next I/P frames

41
Motion Vector Coding

 How are the motion vectors actually encoded


for transmission to the decoder?
 Start by taking the difference between the current
motion vector and the most recent previous one of
the same type (forward/backward/interpolative)
 Encode the difference using variable length
coding
 Horizontal and vertical components coded
separately

42
MPEG Frame Structure Terminology

 A block contains 8x8 pixels


 The DCT unit
 A macroblock (MB) contains 4 blocks from
the luminance, plus the corresponding
chrominance blocks
 4 blocks from each of Cr/Cb if 4:4:4 format
 2 blocks from each of Cr/Cb if 4:2:2 format
 1 block from each of Cr/Cb if 4:1:1 or 4:2:0 format
 The motion compensation unit

43
MPEG Frame Structure Terminology

 A slice is a collection of macroblocks, tracing


in a raster scan from upper left to lower right
 The resynchronization unit
 A picture is a frame, either progressive (non-
interlaced) or interlaced
 The primary coding unit
 A Group of Pictures (GOP) contains ≥ 1
frame.
 The unit for random access into the sequence

44
MPEG GOP Structure

 A Group of Pictures (GOP) may contain


 All I pictures
 I & P pictures only
 I, P, & B Pictures
 A common GOP format for 30 frames/sec:
 I-picture spacing 15 frames (1/2 second)
 P-picture spacing 3 frames (1/10 second)
I B B P B B P B B P B B P B B I
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

45
Frame Ordering

 Display order (encoder input order):

B B I B B P B B P B B P B B P B B I
-1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

 But consider coding dependencies:


 Frame 2 (B) needs frame 4 (P) to be decoded first, etc.
 So better transmit frame 4 before frame 2

I B B P B B P B B P B B P B B I B B
1 -1 0 4 2 3 7 5 6 10 8 9 13 11 12 16 14 15

46
Types of Coding Modes

 What if the best-match block in the reference


frame is a great match?
 Then the motion vector is all you need to send
 What if it is a terrible match?
 Then don’t use the motion vector at all, just code
the block by itself, with something like JPEG
(called intra mode coding)
 What if it is a so-so match?
 Then you can send the MV, and also send the
frame difference information for that macroblock

47
Coding Mode I (Inter-Coding)

 Inter coding refers to coding with motion vectors

Macro
Block

Previous Current
Frame Frame
Motion Vector

48
Coding Mode II (Intra-Coding)

 INTRA coding refers to coding without motion vectors


 The MB is coded all by itself, in a manner similar to JPEG

Macro
Block

Previous Current
Frame Frame

49
I-Picture Coding

 Two possible coding modes for macroblocks


in I-frames
 Intra- code the 4 blocks with the current
quantization parameters
 Intra with modified quantization: scale the
quantization matrix before coding this MB
 All macroblocks in intra pictures are coded
 Quantized DC coefficients are losslessly
DPCM coded, then Huffman as in JPEG

50
I-Picture Coding

 Use of macroblocks modifies block-scan order:


8 8

 Quantized coefficients are zig-zag scanned and run-


length/Huffman coded as in JPEG
 Very similar to JPEG except (1) scaling Q matrix
separately for each MB, and (2) order of blocks

51
P-Picture Coding: many coding modes

 Motion compensated  MV = (0,0), just send


coding: Motion Vector only difference block
 Motion compensated  MV=(0,0), just send diff
coding: MV plus difference block, with modified
macroblock quantization scaling
 Intra: the MB is coded
Motion DCT with DCTs (no
Vector
difference is computed)
 Motion compensation: MV  Intra with modified
& difference MB with quantization scaling
modified quant. scaling

52
How to choose a coding mode?
 MPEG does not specify how to choose mode
 Full search = try everything…the different
possibilities will lead to different rate/distortion
outcomes for that macroblock

distortion

• MV, no difference


• MV, plus difference
• • Intra

rate

53
How to choose a coding mode?

 Tree search: use a decision tree


 For example:
 First find the best-match block in the search window. If
it’s a very good match, then use motion compensation.
Otherwise, don’t.
 If you decided to use motion compensation, then need
to decide whether or not to send the difference block as
well. Make decision based on how good a match it is.
 If you decided to send the difference block, then have to
decide whether or not to scale the quantization
parameter… check the current rate usage…

54
B-Picture Coding

 B pictures have even more possible modes:


 Forward prediction MV, no difference block
 Forward prediction MV, plus difference block
 Backward prediction MV, no difference block
 Backward prediction MV, plus difference block
 Interpolative prediction MV, no difference block
 Interpolative prediction MV, plus difference block
 Intra coding
 Some of above with modified Quant parameter

55
Group of Pictures

 IIIII…: Every picture is intra-coded.


 Fully decodable without reference to any other picture
 Editing is straightforward
 Requires about 2.5 more bit rate than bidirectional
 IBBPBBPB…: Forward and bidirectional
 Best compression factor
 Needs large decoder memory
 Hard to edit
 Most useful for final delivery of post-produced material
(e.g., broadcast) because no editing requirement

56
Group of Pictures

 IPPPPIPP…: Forward predicted only.


 Needs less decoder memory
 IBIBIB…: bidirectional compromise
 Some of the bit rate advantage of bidirectional coding
 Not nearly the full latency penalty of bidirectional
 Editable with moderate processing.
For example, if the video after a B picture needs to be
deleted, the B frame would not be decodable.
Solution is to decode the B frame first, re-encode it using
forward prediction only. Some quality loss.

57

You might also like