0% found this document useful (0 votes)

73 views61 pages

Lecture10 AudioVideo Compression

This lecture covers audio and video compression. It discusses how audio can be represented digitally through pulse-code modulation by sampling the audio waveform over time. It also explains how lossy compression techniques like quantization can reduce audio file sizes by approximating the original signal. The lecture outlines concepts like the Fletcher-Munson curves that describe human auditory perception and how audio is typically equalized at different frequencies. After the lecture, students should understand audio and video representation and common compression standards.

Uploaded by

Baraa Baraa

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

73 views61 pages

Lecture10 AudioVideo Compression

Uploaded by

Baraa Baraa

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 61

CS3343: Digital Media Production

Audio and Video

Compression

Brown
Lecture Topics
• Basic Compression
• Audio and hearing
• Audio compression
• Video compression
– Transform coding (DCT)
• Codecs

Outcome: After this lecture you should have a much better idea to audio and
video representation, and how it is compressed. We will also talk about
a few standards.
Brown 2
Basic Compression

Brown
Data Compression
• Two categories
– Information Preserving
• Error free compression
• Original data can be recovered completely

– Lossy
• Original data is approximated
• Less than perfect
• Generally allows much higher compression

Brown 4
Basics
• Data Compression
– Process of reducing the amount of data
required to represent a given quantity of
information
• Data vs. Information
– Data and Information are not the same thing
– Data
• the means by which information is conveyed
• various amounts of data can convey the same
information
– Information
• “A signal that contains no uncertainty”
Brown 5
Redundancy
• Redundancy
– “data” that provides no relevant information
– “data” that restates what is already known
• For example
– Consider that N1 and N2 denote the
number of “data units” in two sets that
represent the same information
– where Cr is the “Compression Ratio”
• Cr = N1 / N2

EXAMPLE
N1 = 10 & N2 = 1 data can encode the same information
Compression is Ratio
Cr = N1/N2 = 10 (or 10:1)
Implying 90% of the data in N1 is redundant

Brown 6
Variable Length Coding (1)
Our binary system (called natural binary) is not always that
good at representing data from a compression point of view
Consider the following string of data:

abaaaaabbbcccccaaaaaaaabbbbdaaaaaaa
There are 4 different pieces of information (let’s say 4 symbols)
a, b, c, d
In natural binary we would need at least 2 bits to represent this,
assigning bits as follows:
a=00, b=01, c=10, d=11

There are 35 pieces of data, that is 35*2bits = 70bits

Brown 7
Variable Length Coding (2)
Now, consider the occurrence of each symbol: a,b,c,d

abaaaaabbbcccccaaaaaaaabbbbdaaaaaaa

a = 21/35 (84%)
b = 8 /35 (22%)
c = 5/35 (14%)
d = 1/35 (02%)

Brown 8
Variable Length Coding (3)
Idea of variable length coding: assign less bits to encode to
more frequent symbols, more bits for less frequent
symbols

Bit assignment using VLC:

a=1, b=01, c=001, d=0001

Now, compute the number of bits used to encode the same

data:
21*(1) + 8*(2) + 5*(3) + 2*(4) = 51bits

So, we have a compression ratio of 70:51 or 1.37,

meaning 28% of the data using natural binary encoding
is redundant

Brown 9
Huffman encoding
• This is an example of error free coding, the information is
completely the same, the data is different

• This idea of variable length coding is used in many places

– Area codes for cities in large countries (China/India)

• Prof. David A. Huffman developed an algorithm to take a data set

and compute its “optimal” encoding (bit assignment)
– He developed this as a student at MIT

• This is very commonly applied to many compression techniques as

a
final stage

• There are other VLC techniques, such as Arthimetic coding and

LZW coding (ZIP)

David Huffman Brown 10

Run-length-encoding (RLE)
• Consider the following
aaaaaaaabbbaaaaaaaabbbba

We could instead represent this by “runs”

8a,3b,19a,4b,1a

RLE is often more compact, especially when data contains

lots of runs of the same number(s)

RLE is also lossless, you can always reconstruct the

original

Fax machines transmit RLE-encoded linescans of the

black-and-white documents

Brown 11
Quantization (Lossy)
Another thing we can do is actually quantize the data such that it
cannot be recovered completely

Consider the following string of numbers:

ORIGINAL = {10, 12, 15, 18, 20, 3, 5, 2, 13}
(all the numbers are unique so Huffman coding won’t help)

Integer divide this by 5 (i.e. quantize it)

QUANTIZED = {2, 2, 3, 3, 4, 0, 1, 0, 2}

The natural binary range is smaller, we could also use Huffman

encoding to get a bit more compression.

Of course, the values are not the same, on “reconstruction” (multiply by

5) we get only an approximation of the original:

RECOVERED = {10, 10, 15, 15, 20, 0, 5, 0, 10}

Brown 12
Lossy vs. Lossless
• For things like text documents and computer data files, lossy
compression doesn’t make sense
– An approximation of the original is no good!

• But for data like audio or visual, small errors are not easily
detectable by our senses
– An approximation is acceptable

• This is one reason we can get significant compression of images

and audio, vs. other types of data
– Lossless 10:1 is typically possible
– Lossy 300:1 is possible with no significant perceptual loss

• With lossy we can even talk about “quality”

• The more like the original the higher the quality
• The less like the original, the lower the quality

Brown 13
Audio

Brown
Audio

Sound is a compression wave made by moving particles in a medium.

Sound can be transmitted thorough gases, plasma, liquids, and even solids.

Brown 15
Sound waves are important, because
they are perceived/heard by us.

Brown 16
Sound Representation
• We can consider sound as amplitudes
over time

Brown 17
Sound Waves
• We can decompose a sound wave into frequencies and
their amplitude
Frequencies Frequencies

Original

Decomposition to
10 frequencies
Result Result
adding up adding up
the frequencies Brown
the frequencies 18
Human Response
Pain
threshold

Cannot
hear

Human response is not uniform. We have “better” hearing at difference frequencies.

Brown 19
Measure of sound
• Our range is huge, from ~0.0002 to 130
– This is measured in terms of pressure micropascals (µPA)
– 20 µPA is generally considered the lower limit of our hearing

• We use a log scale (decibels=dB) to represent this range

– dB is not a unit, the unit is still µPA

• X dB is expressed as a ratio, between a reference (R)

and another pressure (P2)

20 log(P2/R) = X dB

Brown 20
Examples in DB
Example:
A sound pressure, P2, at 86dB means:
20 log(P2/R) = 86dB // 20 is because the reference measure is 20 µPA
log (P2/R) = 4.30 // divide by 20
P2/R = 104.3 // raised to power of 10 to remove log
This means that P2 is 104.3 or (~20000) times stronger the reference sound

A sound pressure, P2 at 10dB means:

20 log(P2/R) = 10dB // 20 is because the reference measure is 20 µPA
log(P2/R) = 0.5 // divide by 20
P2/R = 100.5 // raised to power of 10 to remove log
This means that P2 is 100.5 or (~3.4) times stronger the reference sound

A sound pressure at 0dB means:

20 log( P2 / Reference ) = 0dB
log ( P2 / Reference ) = 0
P2 / Reference = 1
This means P2 is the same as the reference. So, 0dB is the starting point.

Brown 21
dB in different Frequencies

If we change the reference level for each frequency (because our response is different),
we have dBs defined differently for each frequency. This is the dB curves based on human hearing
experiments from the 1930s. This is called the Fletcher-Munson curve.
Brown 22
Sound Examples

Brown 23
Equalization
Not surprising that when we manipulate audio, we do it at varying frequencies.
We rarely manipulate the compression wave directly.

Brown 24
Digital Representation
• Wave-form coding
• Just sample the waveform in time
– We call this pulse-code modulation (PCM)
• Two values to control
• Number of samples to capture per second
• Quantization values (number of bits used to represent that sample)
sound amplitude

Sound waveform

Time

Digitized sample points, uniform in time

Brown 25
Base-Line High-Quality Sampling
• Human hearing can hear frequencies from about 16Hz to
20,000Hz
– That is 16 oscillations per second (low pitch - bass)
– To 20,000 oscillations per second (very high pitch – high treble)
• To capture these frequencies we need at least twice the
number of samples
– This comes from sampling theory, known as the Nyquist Rate
– So, 44.1KHz per second is considered the
• 16 bit per sample allows -32768 to +32767 discrete
quantities to be captured
– This seems to be sufficient bits to quantize the amplitude,
although you can read online that people will debate this

http://www.cs.columbia.edu/~hgs/audio/44.1.html Brown 26
Uncompressed Audio
• CD audio uses:
– 44,100 samples per second
– 16 bits per sample
– 2 channels (stereo)
= 1,411,200 bit/s = 1,411.2 kbit/s
(Around 10MB for minute of recording)

• This is often considered the “gold standard” for

audio encoding
– That is, we would say this is uncompressed

Brown 27
Cheap Compression (Lossy)
• Less sampling

– There will be frequencies you can’t reproduce in the sampled

signal
– Maybe OK for some sounds

• Less bits
– 8-bit, 4-bit quantization
• The resolution of the amplitudes will be coarser
• Maybe OK for some sounds?

Brown 28
Some Tricks for Compression
• aLaw and µLaw encoding
– Non-linear Amplitude Encoding
• Assignment more bits to low amplitudes, less bits to higher
amplitudes
– Average bit is still 8-bits per sample, but quantization of low
amplitudes have more bits
– Can call these “log compressors”
• 8Khz temporal sampling with aLaw or µLaw bit
assignment is often used for phone
– Claims it reduces perceive able noise in the quantization
– The quantization scheme is fixed (based on empirical studies)
• aLaw or µLaw are two different quantization standards, but the
same idea

http://en.wikipedia.org/wiki/Mu-law
Brown 29
Adaptive Resolution PCM (ADPCM)
• Like log compressor
– But add an additional trick
• Encode the difference of the samples, not
just the sample
– We call this delta-pulse-code-modulation

Assume the two samples are values 13 and 14

PCM would record 13 and 14
Delta-PCM could record “1” (i.e. 14-13=1)

13 14
ADPCM claims to get almost 50% better
compression (with similar quality) as uLaw.
Brown 30
Variations on ADPCM
• Encoding the difference between the two sample is a
simple form of “predictive coding”, where you try to
predict what the next value is going to be, and encode
only the difference
• There are better predictors (like looking at a longer
history of previous sample instead of just one – i.e.
ADPCA)
• GMS 06.10 is a predictive coder standard for voice

Note that the delta-code and predictive coders do not have

to be lossy. That is if the encode the difference
correctly, no data is loss

Brown 31
Any of these look familiar?

Brown 32
Exploiting Psychoauditory Properties
• For even better compression we can
exploit properties of the human auditory
facilities
• Loudness
– Quiet sound around a relatively loud sound
won’t be perceived
• Frequency
– Low sound around a relatively high sound
won’t be perceived

Brown 33
Temporal Masking
• Masking can effect our perception of sound up to 200
milliseconds after a “masking event”
(mask=loud noise or high-frequency)
– Call this Post-Masking
• And even up to 20 milliseconds before a mask
– Call this Pre-Masking (amazing!)
– That’s right, if there is a loud sound at time T, it will effect your
perception at time T minus 20ms --- amazing!

Brown 34
Compressors can exploit this

Other bands are set to 0

Brown 35
MP3
• Actually part of the MPEG1 standard for audio
• Input data is at 44Kz sampling and 16bit
– (others sampling can be used, but 44Kz most common)
• Algorithm cuts the audio into “frames”
– Each lasts a fraction of a second
• Transforms the frame into 32 frequency bands
– Looks at these bands and determines how to distribute bits
– Bands with more activity need more bits
• Apply psychoaudio masking: throw away bands based
on human perception (see previous slide)
– Where most of the compression comes in
– Lots of bands may become 0
• Then applies Huffman encoding to the output to squeeze
the bits a little further
http://www.mp3-converter.com/mp3codec/

Brown 36
MP3 Bitrate
• In MP3, we can set the bitrate
• This is considered a constant bitrate encoding,
versus a variable bitrate
• The encoder changes the amount of
quantization performed based on specified
bitrate
• This can be complex, since Huffman encoding is
also used, the encoder may need to try several
different quantization levels and then run
through Huffman encoding to see the result
– Padding with 0s is sometimes used

Brown 37
Other formats
• Advance Audio Coding (ACC)
– Similar to MP3, but has a more flexible encoding standard
– Supports a larger range of temporal sampling
– Part of the MPEG2 and MPEG4 standard
• Ogg Vorbis
– Freeware version that has become popular
– Again, similar to MP3 in use of masking

Brown 38
Video

Brown
Images and Video
• Image: 2D array of pixels

• Video: Images/frames over time

time

Brown 40
Video cameras interlacing and progressive scan
• Interlaced
– Cameras sometime
capture 2-fields to make a frame
– So, we capture 60 fields per second

time

• Progressive Scan
– Captures the entire frame at the same time
• 30 full frames per second
– No interlace effect
– But the motion may not look as smooth
time

Brown 41
Transform Coding

+ + + +...

...+ + + ...+
+

...+ + + +

In the pixel domain, you can consider each pixel intensity as a coefficient of a
basis that only occupies that pixels (all other pixels are 0). For a 8x8 image, we
would have 64 basis. Summing of these basis would give you the 8x8 image.

Brown 42
DCT Frequency Domain

In the frequency domain we can transform the 8x8 pixels into their corresponding frequency
representation (Discrete Cosine Transform). This also requires 64 coefficient, each one
Corresponding the amount of each basis to use. Sum all the basis up and you have the
original image.
Brown 43
Why Frequency Domain?
• It turns out we can remove many of the
frequencies and the viewer will not notice.
– Call this pyschovisual redundancy
• Humans are not so sensitive to high-frequencies
• If we heavily quantize the high-frequency
coefficients and then transform back to the pixel
domain
– Pixel values are different, but visually we can’t tell
• This is a lossy compression and provides the
most significant gain in image compression

Brown 44
JPEG Approach
f(x,y) - 128
(normalize between –128 to 127)

Quantize DCT coefficients

DCT via
(Forward DCT) Quantization “Table”
on each block
C’(u,v) = round(C(u,v)/T(u,v))

T(u,v)
Take original image and
Break it into 8x8 blocks Differential 0
coding DC component

Huffman
JPEG bitstream
Encode RLE
AC Vector

“Zig-zag” Order Coefficients

JPEG applies almost every compression trick known.

1) Transform coding, 2) psychovisual (loss), 3) RLE, 4) Difference coding, and Huffman. Brown 45
JPEG for Color Images
C
o
Note: Quantization of
l
color info 2:1:1
o
Luminance r
y y q
y y i

Y  0.299 0.587 0.114   R 

 I   0.596  0.275  0.321 G 
  
Q  0.212  0.523 0.311   B 

Convert image from RGB to YIQ, where Y=brightness, IQ are chroma.

Eye is less sensitive to chroma, so downsample them by 2.
This downsample by 2 is a lossy compression. Brown 46
JPEG Quality
• The amount of quantization applied on the
DCT coefficients amounts to a “quality”
measurement

“Format Options” refer to different ways

Quality to choose the quantization. Baseline
are the original quantization tables specified
by JPEG. Optimize may generate tables
specific to your image.

• JPEG has a 100% quality – it means no

quantization
Brown 47
MPEG is based on JPEG
• JPEG is the foundation
– MPEG extends this to video
– Some encoders just use lots of JPEGs, we call this motionJPEG
• MPEG incorporates motion compensation prediction to help exploit
similarity between frames
– This is done with 16x16 Macroblocks
• Two types of frames
– I frames = intra-coded (JPEG)
– P&B frames = inter-coded
• predicted frames (motion compensation + JPEG-like compression)
• P predicts itself from either I or P frames
• B predicts itself from I or P in either direction (front or back)

• Frames organize a series of I and PB frames

– Call this the Group of Pictures (GOP)
– This is the basic building block

Brown 48
GOP
• Group of Pictures
GOP of 12

GOP of 6

• DVD (MPEG2)
– GOP size is fixed at 18 for NTSC, 15 for PAL

Brown 49
Motion Compensation
Macroblock is
broken into 4
8x8 blocks and
Break image encoded in
into 16x16 blocks JPEG style
In a predictive frame,
search a reference
frame for a similar
block.

Encode only the

difference and the
motion vector.

(Best case, you find

exactly the same
pixels, difference = 0)

Brown 50
I,P,B frames
• I frames are the largest in terms of bits
• P frames are significantly smaller
• B frames even smaller

• Why don’t we just have very large GOP?

– E.g. One I frame with 127 PB frames

• Consider random access

– How can you jump to the frame in the middle of a
GOP?
• Consider if we encounter errors
– What happens on the decoder if there is an error?
Brown 51
MPEG Quality
• Very similar to JPEG quality
• You are setting the amount of quantization
in the DCT components

Quality control in the encoder

Brown 52
Errors in MPEG
• If data becomes corrupted (e.g during
transmission), the 8x8 and 16x16 block’s
used in the encoding scheme become
apparent

Brown 53
Errors in MPEG
• If motion vector data becomes corrupted,
we see very strange things

This is really bad, because if one frames gets corrupted, subsequent frames are
using it for reconstruction (i.e. they predicted from the correct frame, now the
frame is bad, so their prediction is wrong).

Video should become good again when an I-frame is encountered. One reason
for short GOP
Brown 54
CODECs
• Codecs
– Means “Coder/Decoder”
– Often when we download it means decoder
• Codec refers to the algorithms used to encode and
decode the audio/video file
• Note that while MPEG is a standard, its exact
implementation is not specified
– Some encoders might do a better job
– Better motion compensation, better use of DCT quantization, etc
– Almost all encoders are based on MPEG or MPEG-like encoding
• I.e. Motion compensation with GOP structure
• Also, note that encoding and decoding are asymmetric
– Encoding is much more computation than decoding

Brown 55
AVI, MOV, and DV
• AVI (Windows)
– Audio, Video, Interleave (AVI)
– Is a wrapper format for that encodes up other
formats
• I.e. you can have an AVI with codecs
• MOV (Apple)
– Similar, it is a wrapper or container format
– Many different codecs are supported
• (Digital Video) DV format
– Uses interface only compression
• (similar to motion JPEG)
Brown 56
Interleaving Audio
• Technically Audio and Video are separate
• You can have the audio in a different file or logical region
in the video file
• But this might not be good for access, esp on a
harddrive (you’d have disk thrashing)
• We instead interleave the audio data with the video data

Brown 57
Constant Bitrate Video
• Some video encoders support constant
bitrate
– The video is compressed as several different
qualities
– The highest quality closest to the desired
bitrate is chosen
– This means quality may change based on the
image content
• More motion and image content, lower quality
• Less motion and image content, higher quality

Brown 58
Temporal Downsampling
• Another way to get compression is to reduce the
framerate
– That is, show less frames per second
• A rule is that human’s feel comfortable at 24
frames per second
– Less than this can lead to temporal aliasing where we
feel uncomfortable and motion looks unnatural
• For animations it can be as low as 4-8 frames
per second
– This is acceptable as long as the animation has
limited motion
– Once motion is too large, we see temporal aliasing
again

Brown 59
Spatial Downsampling
• Another easier way to gain compression is
spatial down-sampling
– Make the image smaller
• This is acceptable for internet use
– Unacceptable for TV, film, broadcast
• Keep in mind that once you down-sample
you can never get back the original
– Resizing back to the larger size does not
restore the lost details
– Matter of fact, it often looks worse

Brown 60
Summary
• Compression is necessary to save space
– Even with today’s storage, uncompressed
data is too large
• Several types of compression
– Redundancy coding, RLE, transform coding,
psycho coding
• Psycho-coding allows us to “throw away”
information (lossy coding)
• This gives us the greatest “gain”
• Also is often what “quality” is associated with in
compression
• Higher quality means less “thrown away”

Brown 61

Audio Compression
No ratings yet
Audio Compression
6 pages
Audio Coding and Standards
No ratings yet
Audio Coding and Standards
32 pages
Data Representation B
No ratings yet
Data Representation B
29 pages
Unit-Ii Itc
No ratings yet
Unit-Ii Itc
42 pages
Data Compression
No ratings yet
Data Compression
20 pages
Lossless vs Lossy Audio Compression
100% (1)
Lossless vs Lossy Audio Compression
8 pages
L15 Compression
No ratings yet
L15 Compression
63 pages
2017 May 24 Huffman Lecture1
No ratings yet
2017 May 24 Huffman Lecture1
24 pages
Data Compression: CS 147 Minh Nguyen
No ratings yet
Data Compression: CS 147 Minh Nguyen
25 pages
Data Compression
No ratings yet
Data Compression
25 pages
Chapter 5 Data Compression
No ratings yet
Chapter 5 Data Compression
17 pages
Data Compression 1
No ratings yet
Data Compression 1
25 pages
Basic Audio Compression Techniques
No ratings yet
Basic Audio Compression Techniques
17 pages
Chapter 2-Compression Techniques
No ratings yet
Chapter 2-Compression Techniques
63 pages
CHAPTER FOURmultimedia
No ratings yet
CHAPTER FOURmultimedia
23 pages
Data Representation
No ratings yet
Data Representation
8 pages
Chap 5 Compression
No ratings yet
Chap 5 Compression
43 pages
Advanced Source Coding Lecture
No ratings yet
Advanced Source Coding Lecture
48 pages
EC 2214: Coding & Data Compression: Vishwakarma Institute of Technology
No ratings yet
EC 2214: Coding & Data Compression: Vishwakarma Institute of Technology
35 pages
Computer Science Text
No ratings yet
Computer Science Text
38 pages
Data Compression: This Article May Require by
No ratings yet
Data Compression: This Article May Require by
25 pages
Unit 5 - Data Compression
No ratings yet
Unit 5 - Data Compression
46 pages
Tutorial
No ratings yet
Tutorial
10 pages
Lossy and Lossless Compression Techniques
100% (1)
Lossy and Lossless Compression Techniques
18 pages
20250320121146-Module-3 MMC Notes
No ratings yet
20250320121146-Module-3 MMC Notes
27 pages
Data Compression Techniques
No ratings yet
Data Compression Techniques
21 pages
DC-PPT 5
No ratings yet
DC-PPT 5
44 pages
Aadel Veri
No ratings yet
Aadel Veri
37 pages
Mod 1 DCT
No ratings yet
Mod 1 DCT
37 pages
Main Techniques and Performance of Each Compression
No ratings yet
Main Techniques and Performance of Each Compression
23 pages
Le 4
No ratings yet
Le 4
3 pages
Data Represintation and Storage
No ratings yet
Data Represintation and Storage
19 pages
Data Compression Techniques Module 1 Ktu
No ratings yet
Data Compression Techniques Module 1 Ktu
15 pages
Dereje Teferi Dereje - Teferi@aau - Edu.et
No ratings yet
Dereje Teferi Dereje - Teferi@aau - Edu.et
36 pages
CH5 Data Compression
No ratings yet
CH5 Data Compression
69 pages
Dereje Teferi (PHD) Dereje - Teferi@Aau - Edu.Et
No ratings yet
Dereje Teferi (PHD) Dereje - Teferi@Aau - Edu.Et
30 pages
3 MM Compression
100% (1)
3 MM Compression
35 pages
Multimedia 4
No ratings yet
Multimedia 4
23 pages
Chapter 5 New
No ratings yet
Chapter 5 New
19 pages
Mesleki Yeterlilik
No ratings yet
Mesleki Yeterlilik
106 pages
Venkata Lakshmi 08011012170 Sep Audio Compression
No ratings yet
Venkata Lakshmi 08011012170 Sep Audio Compression
8 pages
DC M1 Merged
No ratings yet
DC M1 Merged
26 pages
Chapter 4 - Introduction To Source Coding PDF
No ratings yet
Chapter 4 - Introduction To Source Coding PDF
72 pages
Compression PDF
No ratings yet
Compression PDF
55 pages
Compression
100% (1)
Compression
38 pages
Chapter - 5 Data Compression
No ratings yet
Chapter - 5 Data Compression
8 pages
Notes and Exercises - 1.3 Compression
No ratings yet
Notes and Exercises - 1.3 Compression
10 pages
Data Compression Btech Notes
No ratings yet
Data Compression Btech Notes
32 pages
Note 01 N
No ratings yet
Note 01 N
49 pages
Computing - Ch.1
No ratings yet
Computing - Ch.1
6 pages
Introduction To Data Compression - Guy E. Blelloch PDF
No ratings yet
Introduction To Data Compression - Guy E. Blelloch PDF
54 pages
Special Topics Data Compression
No ratings yet
Special Topics Data Compression
51 pages
1.1 Introducing Digital Media
No ratings yet
1.1 Introducing Digital Media
3 pages
Towards Using Genetic Algorithms in Lossy Audio Compression 2008
No ratings yet
Towards Using Genetic Algorithms in Lossy Audio Compression 2008
8 pages
Chapter 3 Multimedia Data Compression
100% (2)
Chapter 3 Multimedia Data Compression
23 pages
O PDF
No ratings yet
O PDF
2 pages
1.1 Information Representation: 1.1.5 Compression Techniques
No ratings yet
1.1 Information Representation: 1.1.5 Compression Techniques
2 pages
Geka Laser B 50 en
No ratings yet
Geka Laser B 50 en
1 page
Physical Sciences Grade 10 Baseline Memo 2024
No ratings yet
Physical Sciences Grade 10 Baseline Memo 2024
5 pages
KATALCO 33-1 Discharge
No ratings yet
KATALCO 33-1 Discharge
1 page
AGA Report No. 9 - Measurement of Gas by Multipath Ultrasonic Meters 2nd Ed. April 2007 XQ0701
100% (1)
AGA Report No. 9 - Measurement of Gas by Multipath Ultrasonic Meters 2nd Ed. April 2007 XQ0701
113 pages
GT-1065 - Disconnection and Removal, #2 Bearing Area Probes and TCPLs
No ratings yet
GT-1065 - Disconnection and Removal, #2 Bearing Area Probes and TCPLs
14 pages
ATRT-03: Automatic, 3-Phase Transformer Turns Ratio Tester
No ratings yet
ATRT-03: Automatic, 3-Phase Transformer Turns Ratio Tester
4 pages
Lance Brown Resume 11.07.09. (v.1) Doc
No ratings yet
Lance Brown Resume 11.07.09. (v.1) Doc
3 pages
Quiz 5
No ratings yet
Quiz 5
5 pages
Rce 403 LP
No ratings yet
Rce 403 LP
2 pages
Kinds of Minds: Towards An Understanding of Consciousness by Daniel C
No ratings yet
Kinds of Minds: Towards An Understanding of Consciousness by Daniel C
5 pages
B767 200-300 - Airplane General
100% (21)
B767 200-300 - Airplane General
98 pages
Image Formation and Radiographic Quality
No ratings yet
Image Formation and Radiographic Quality
79 pages
Hdb3wle Hdb3wlen4c63 Document
No ratings yet
Hdb3wle Hdb3wlen4c63 Document
2 pages
4AESUP125RES 1868 Djvu
No ratings yet
4AESUP125RES 1868 Djvu
265 pages
Introduction To The Theory of Computation: (Second Edition), Michael Sipser
No ratings yet
Introduction To The Theory of Computation: (Second Edition), Michael Sipser
51 pages
Internet Vs TV 2
No ratings yet
Internet Vs TV 2
33 pages
EMED
No ratings yet
EMED
128 pages
FUTURE JET - EN - 50Hz
No ratings yet
FUTURE JET - EN - 50Hz
6 pages
PHD Brochure 05 06
No ratings yet
PHD Brochure 05 06
28 pages
As 2291-2007 Metallic Materials - Tensile Testing at Elevated Temperatures
No ratings yet
As 2291-2007 Metallic Materials - Tensile Testing at Elevated Temperatures
7 pages
840 Uce Ict - PP2 2025
No ratings yet
840 Uce Ict - PP2 2025
2 pages
Coding Systems
No ratings yet
Coding Systems
3 pages
VLSIGURU DFT Training Insights
100% (1)
VLSIGURU DFT Training Insights
3 pages
THESIS To Be Printed2
71% (7)
THESIS To Be Printed2
75 pages
Dissertation Writing Chapter 4
100% (2)
Dissertation Writing Chapter 4
7 pages
Oil & Gas Development Company Limited: TENDER ENQUIRY NO. PROC-FD/CB/DO-0565/2012 Hiring of Coring Services
No ratings yet
Oil & Gas Development Company Limited: TENDER ENQUIRY NO. PROC-FD/CB/DO-0565/2012 Hiring of Coring Services
8 pages
Name-Puka Kumari: Cisco ID: CSCO13220178
No ratings yet
Name-Puka Kumari: Cisco ID: CSCO13220178
4 pages
Account Statement 1733358977342
No ratings yet
Account Statement 1733358977342
6 pages
Organizational Strategy Alignment
No ratings yet
Organizational Strategy Alignment
18 pages
Tug Boat Specifications Guide
100% (1)
Tug Boat Specifications Guide
21 pages

Lecture10 AudioVideo Compression

Uploaded by

Lecture10 AudioVideo Compression

Uploaded by

CS3343: Digital Media Production

Audio and Video

There are 35 pieces of data, that is 35*2bits = 70bits

Bit assignment using VLC:

Now, compute the number of bits used to encode the same

So, we have a compression ratio of 70:51 or 1.37,

• This idea of variable length coding is used in many places

• Prof. David A. Huffman developed an algorithm to take a data set

• This is very commonly applied to many compression techniques as

• There are other VLC techniques, such as Arthimetic coding and

David Huffman Brown 10

We could instead represent this by “runs”

RLE is often more compact, especially when data contains

RLE is also lossless, you can always reconstruct the

Fax machines transmit RLE-encoded linescans of the

Consider the following string of numbers:

Integer divide this by 5 (i.e. quantize it)

The natural binary range is smaller, we could also use Huffman

Of course, the values are not the same, on “reconstruction” (multiply by

RECOVERED = {10, 10, 15, 15, 20, 0, 5, 0, 10}

• This is one reason we can get significant compression of images

• With lossy we can even talk about “quality”

Sound is a compression wave made by moving particles in a medium.

Human response is not uniform. We have “better” hearing at difference frequencies.

• We use a log scale (decibels=dB) to represent this range

• X dB is expressed as a ratio, between a reference (R)

A sound pressure, P2 at 10dB means:

A sound pressure at 0dB means:

Digitized sample points, uniform in time

• This is often considered the “gold standard” for

– There will be frequencies you can’t reproduce in the sampled

Assume the two samples are values 13 and 14

Note that the delta-code and predictive coders do not have

Other bands are set to 0

• Video: Images/frames over time

Quantize DCT coefficients

“Zig-zag” Order Coefficients

JPEG applies almost every compression trick known.

Y  0.299 0.587 0.114   R 

Convert image from RGB to YIQ, where Y=brightness, IQ are chroma.

“Format Options” refer to different ways

• JPEG has a 100% quality – it means no

• Frames organize a series of I and PB frames

Encode only the

(Best case, you find

• Why don’t we just have very large GOP?

• Consider random access

Quality control in the encoder

You might also like