0% found this document useful (0 votes)

46 views28 pages

Data Compression

Data compression involves encoding information using fewer bits than the original representation. It can be either lossless, preserving all information, or lossy, accepting some loss of information. Lossless compression removes statistical redundancy in data, while lossy compression removes marginally important information. Common techniques include entropy encoding like Huffman coding, which assigns shorter codes to more common symbols. Compression reduces storage and transmission costs but requires decompression processing.

Uploaded by

Kim

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

46 views28 pages

Data Compression

Uploaded by

Kim

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 28

Data Compression

Cleophas Mochoge
INTE 412

1
In computer science and information theory, data
compression, source coding, or bit-rate
reduction involves encoding information using
fewer bits than the original representation.
Compression can be either lossy or lossless.
Lossless compression reduces bits by identifying
And eliminating statistical redundancy. No
information is lost in lossless compression.
Lossy compression reduces bits by identifying
Marginally important information and removing it.

2
The process of reducing the size of a data file is
popularly referred to as data compression,
although its formal name is source coding
(coding done at the source of the data, before it
is stored or transmitted). Compression is useful
because it helps reduce resources usage, such
as data Storage space or transmission capacity.
Because compressed data must be
decompressed to use, this extra processing
imposes computational or other costs through
decompression.

3
Lossless
• Lossless data compression algorithms
usually exploit statistical redundancy to
represent data more concisely without
losing information. Lossless compression is
possible because most real-world data has
statistical redundancy.

4
For example, an image may have areas
of colour that do not change over several
pixels; instead of coding "red pixel, red
pixel, ..." the data may be encoded as
"279 red pixels". This is a simple example
of run-length encoding; there are many
schemes to reduce size by eliminating
redundancy.

5
Lossy
Lossy data compression is contrasted
with lossless data compression. In these
schemes, some loss of information is
acceptable. Depending upon the
application, detail can be dropped from
the data to save storage space. Generally
, lossy data compression schemes are
guided by research on how people
perceive the data in question. 6
For example, the human eye is more sensitive
to subtle variations in luminance than it is to
variations in color. JPEG image compression
works in part by "rounding off" less-important
visual information. There is a corresponding
trade-off between information lost and the
size reduction. A number of popular
compression
formats exploit these perceptual differences,
including those used in music files, images,
and video

7
Introduction

 Compression is used to reduce the volume

of information to be stored into storages or
to reduce the communication bandwidth
required for its transmission over the
networks

8
9
Compression Principles
 Entropy Encoding
 Run-length encoding
 Lossless & Independent of the type of source
information
 Used when the source information comprises
long substrings of the same character or
binary digit
(string or bit pattern, # of occurrences), as
FAX
e.g) 000000011111111110000011……
 0,7 1, 10, 0,5 1,2……  7,10,5,2……

10
Compression Principles

 Entropy Encoding
 Statistical encoding
 Based on the probability of occurrence of a
pattern
 The more probable, the shorter codeword
 “Prefix property”: a shorter codeword must not
form the start of a longer codeword

11
Huffman Codding
Huffman coding is one type of entropy
coding where a given character must be
encoded together with the probability of
their occurrence. The Huffman Coding
Algorithm determines the optimal code
using the minimum number of bits. The
length (number of bits) of the coded
character will be differing.
12
To determine Huffman code, it is useful to
construct a binary tree. The leaves (nodes) of
the tree represent the characters that are to be
encoded. Every nodes contains the occurrence
of probability 0 and 1 are assigned to the
branches of the tree. Every character has
associated weight equal to number of times the
character occurs in a data stream.

13
Stream of characters
• P(A) = 0.16
• P(B) = 0.51
• P(C) = 0.09
• P(D) = 0.13
• P(E) = 0.11

14
15
E.g) symbols A,B,C,D,E with probabilities
A(0.16), B(0.51), C(0.09), D(0.13), E(.011)

H’ = Σ i=1 5 Ni Pi = (0.16 +

0.51+0.09+0.13+0.11) = 1
bit/codeword
H = -Σ i=1 5 Pi log2Pi = -
((0.16log20.16) + (0.51log20.51)+
(0.09log20.09) + (0.13log20.13) +
(0.11log20.11)) = 1.95
16
Compression Principles

 Huffman Encoding
 Entropy, H: theoretical min. avg. # of bits that are required
to transmit a particular stream

H = -Σ i=1 n Pi log2Pi

where n: # of symbols, Pi: probability of symbol i

 Efficiency, E = H/H’
where, H’ = avr. # of bits per codeword = Σ i=1 n Ni
Pi
Ni: # of bits of symbol i 17
 E.g) symbols M(10), F(11), Y(010), N(011), 0(000),
1(001) with probabilities 0.25, 0.25, 0.125, 0.125,
0.125, 0.125

 H’ = Σ i=1 6 Ni Pi = (2(20.25) + 4(30.125)) = 2.5

bits/codeword
 H = -Σ i=1 6 Pi log2Pi = - (2(0.25log20.25) +
4(0.125log20.125)) = 2.5
 E = H/H’ =100 %
 3-bit/codeword if we use fixed-length codewords for six
symbols

18
Huffman Algorithm

Method of construction for an encoding tree

• Full Binary Tree Representation
• Each edge of the tree has a value,
(0 is the left child, 1 is the right child)
• Data is at the leaves, not internal nodes
• Result: encoding tree
• “Variable-Length Encoding”

19
Huffman Algorithm

• 1. Maintain a forest of trees

• 2. Weight of tree = sum frequency of
leaves
• 3. For 0 to N-1
– Select two smallest weight trees
– Form a new tree

20
• Huffman coding

• variable length code whose length is inversely

proportional to that character’s frequency
• must satisfy nonprefix property to be uniquely
decodable
• two pass algorithm
– first pass accumulates the character frequency
and generate codebook
– second pass does compression with the
codebook

21
Huffman coding

• create codes by constructing a binary tree

1. consider all characters as free nodes
2. assign two free nodes with lowest frequency to
a parent nodes with weights equal to sum of
their frequencies
3. remove the two free nodes and add the newly
created parent node to the list of free nodes
4. repeat step2 and 3 until there is one free node
left. It becomes the root of tree

22
• Right of binary tree :1
• Left of Binary tree :0
• Prefix (example)
– e:”01”, b: “010”
– “01” is prefix of “010” ==> “e0”
• same frequency : need consistency of
left or right

23
• Example(64 data)
• R K K K K K K K
• K K K R R K K K
• K K R R R R G G
• K K B C C C R R
• G G G M C B R R
• B B B M Y B B R
• G G G G G G G R
• G R R R R G R R

24
• Color frequency Huffman code
• =================================
• R 19 00
• K 17 01
• G 14 10
• B 7 110
• C 4 1110
• M 2 11110
• Y 1 11111

25
26
Static Huffman Coding
 Huffman (Code) Tree
 Given : a number of symbols (or characters) and their relative
probabilities in prior
 Must hold “prefix property” among codes

Symbol Occurrence
Root node 0 8 1
A 4/8
0 4 1 A
B 2/8 Branch node
0 2 1 B
C 1/8
Leaf node D C
D 1/8

Symbol Code
A 1 41 + 22 + 13 +
B 01 13 = 14 bits are
C 001 required to transmit
“AAAABBCD”
D 000
Prefix Property !
27
The end
Thank you

Data Compression
No ratings yet
Data Compression
18 pages
Data Structure: Huffman Tree:Project Submitted To: Sir Abdul Wahab
No ratings yet
Data Structure: Huffman Tree:Project Submitted To: Sir Abdul Wahab
24 pages
Multimedia Data Compression
No ratings yet
Multimedia Data Compression
31 pages
Lesson - Huffman and Entropy Coding
No ratings yet
Lesson - Huffman and Entropy Coding
31 pages
Lecture 3-Huffman Coding
No ratings yet
Lecture 3-Huffman Coding
30 pages
Chapter Three
No ratings yet
Chapter Three
30 pages
Chapter 4 Multi
No ratings yet
Chapter 4 Multi
45 pages
Graph Theory - Important Application of Trees Huffman Coding
No ratings yet
Graph Theory - Important Application of Trees Huffman Coding
50 pages
Mad Unit 3-Jntuworld
No ratings yet
Mad Unit 3-Jntuworld
53 pages
Group-8 DIP Presentation
No ratings yet
Group-8 DIP Presentation
100 pages
Huffman Coding
No ratings yet
Huffman Coding
40 pages
Compression For Sending and Storing Information: Text, Audio, Images, Videos
No ratings yet
Compression For Sending and Storing Information: Text, Audio, Images, Videos
28 pages
Multimedia System Design Guide
No ratings yet
Multimedia System Design Guide
75 pages
Huffman Encoding: WWW - Cis.Upenn - Edu/ Matuszek/Cit594-2002/SLIDES/HUFFMAN
No ratings yet
Huffman Encoding: WWW - Cis.Upenn - Edu/ Matuszek/Cit594-2002/SLIDES/HUFFMAN
13 pages
Huffman Coding Technique
No ratings yet
Huffman Coding Technique
13 pages
Ut 1 PPT
No ratings yet
Ut 1 PPT
77 pages
Multimedia Data Compression Guide
No ratings yet
Multimedia Data Compression Guide
21 pages
Coding Theory
No ratings yet
Coding Theory
49 pages
Huffman Coding
No ratings yet
Huffman Coding
65 pages
Optimization Problems
No ratings yet
Optimization Problems
38 pages
Unit 2
No ratings yet
Unit 2
28 pages
Huffman
No ratings yet
Huffman
17 pages
Multimedia System: Chapter Eight: Multimedia Data Compression
No ratings yet
Multimedia System: Chapter Eight: Multimedia Data Compression
29 pages
Unite 4-Greedy Method - CSE
No ratings yet
Unite 4-Greedy Method - CSE
41 pages
Unit 2
No ratings yet
Unit 2
82 pages
Chapter10 Part1 Huffman
No ratings yet
Chapter10 Part1 Huffman
17 pages
Huffman Coding: Greedy Algorithm Guide
No ratings yet
Huffman Coding: Greedy Algorithm Guide
27 pages
L10 Huffman Encoding Greedy
No ratings yet
L10 Huffman Encoding Greedy
52 pages
Wa0023.
No ratings yet
Wa0023.
28 pages
Unit 2 CA209
No ratings yet
Unit 2 CA209
29 pages
Compression (Compatibility Mode)
No ratings yet
Compression (Compatibility Mode)
12 pages
Huffman Coding Ms 140400147 Sadia Yunas Butt
No ratings yet
Huffman Coding Ms 140400147 Sadia Yunas Butt
9 pages
Huffman Encoding: Farhad Muhammad Riaz
No ratings yet
Huffman Encoding: Farhad Muhammad Riaz
17 pages
CH 6
No ratings yet
CH 6
21 pages
11 Huffman Coding
No ratings yet
11 Huffman Coding
25 pages
Data Structures and Algorithms Compression Methods
No ratings yet
Data Structures and Algorithms Compression Methods
21 pages
Group Assignment Multimedia System
No ratings yet
Group Assignment Multimedia System
26 pages
Huffman Coding: Vida Movahedi
No ratings yet
Huffman Coding: Vida Movahedi
24 pages
Huffman Code
No ratings yet
Huffman Code
51 pages
Dce 1
No ratings yet
Dce 1
21 pages
5 Huffman Coding
No ratings yet
5 Huffman Coding
50 pages
L15 Compression
No ratings yet
L15 Compression
63 pages
Data Compression Chapter 7
No ratings yet
Data Compression Chapter 7
40 pages
Module IV
No ratings yet
Module IV
37 pages
Report
No ratings yet
Report
43 pages
Huffman Alg
No ratings yet
Huffman Alg
14 pages
Compressor Principles
No ratings yet
Compressor Principles
32 pages
Huffman Coding
No ratings yet
Huffman Coding
22 pages
Huff Man
No ratings yet
Huff Man
8 pages
KMA SS05 Kap03 Compression
No ratings yet
KMA SS05 Kap03 Compression
54 pages
Ec8093-Digital Image Processing: Dr.K.Kalaivani Associate Professor Dept. of EIE Easwari Engineering College
No ratings yet
Ec8093-Digital Image Processing: Dr.K.Kalaivani Associate Professor Dept. of EIE Easwari Engineering College
37 pages
Algorithmics: Information Coding Techniques
No ratings yet
Algorithmics: Information Coding Techniques
44 pages
Mini Project
No ratings yet
Mini Project
26 pages
Compression & Huffman Codes
No ratings yet
Compression & Huffman Codes
29 pages
Huffman
No ratings yet
Huffman
24 pages
Javascript 1
No ratings yet
Javascript 1
32 pages
Audio Representation - LECTURE
No ratings yet
Audio Representation - LECTURE
28 pages
OOAD: Techniques & Dynamic Modeling
No ratings yet
OOAD: Techniques & Dynamic Modeling
23 pages
Inte 312 Comp 322 Cosf 326 Distributed Systems
No ratings yet
Inte 312 Comp 322 Cosf 326 Distributed Systems
5 pages
Entrepreneurship Skills Notes
100% (1)
Entrepreneurship Skills Notes
66 pages
Alexion Advance
No ratings yet
Alexion Advance
2 pages
Bias Compensation
No ratings yet
Bias Compensation
7 pages
Ultrasonics
No ratings yet
Ultrasonics
31 pages
HT48R06A-1 HoltekSemiconductorInc
No ratings yet
HT48R06A-1 HoltekSemiconductorInc
39 pages
Communication Theory of Secrecy Systems PDF
No ratings yet
Communication Theory of Secrecy Systems PDF
2 pages
WCM D Instruction Manual
No ratings yet
WCM D Instruction Manual
4 pages
Lab 11 - Shear Behavior of R.C Beam
No ratings yet
Lab 11 - Shear Behavior of R.C Beam
4 pages
CSS 6
No ratings yet
CSS 6
9 pages
Selenium Basics Notes
No ratings yet
Selenium Basics Notes
6 pages
Models For PD LGD Ead
100% (2)
Models For PD LGD Ead
38 pages
LC-3 Instruction Set Overview
No ratings yet
LC-3 Instruction Set Overview
43 pages
Train Speed & Distance Problems
No ratings yet
Train Speed & Distance Problems
129 pages
Eclipse Control Flow Graph Plugin
No ratings yet
Eclipse Control Flow Graph Plugin
31 pages
UIC Leaflet 592: Intermodal Loading Units
No ratings yet
UIC Leaflet 592: Intermodal Loading Units
47 pages
Elasticity
No ratings yet
Elasticity
18 pages
Speed /frequency / Wavelength: Equation
No ratings yet
Speed /frequency / Wavelength: Equation
3 pages
HP Laptop Comparison Guide
No ratings yet
HP Laptop Comparison Guide
2 pages
DD 213-1993
No ratings yet
DD 213-1993
16 pages
EurekaMath - G6 - Operations and Division of Fractions
No ratings yet
EurekaMath - G6 - Operations and Division of Fractions
226 pages
HSTL Operating Instructions
No ratings yet
HSTL Operating Instructions
29 pages
Addisu Jagema
No ratings yet
Addisu Jagema
83 pages
Crushing & Screening Solutions
No ratings yet
Crushing & Screening Solutions
34 pages
2023 Article 295
No ratings yet
2023 Article 295
32 pages
Grade 7 DLL 2nd Quarter WEEK 1
No ratings yet
Grade 7 DLL 2nd Quarter WEEK 1
19 pages
Stats Practical: Power Curves Analysis
No ratings yet
Stats Practical: Power Curves Analysis
2 pages
Assignment A242
No ratings yet
Assignment A242
2 pages
Reflectance Meter: Aimil LTD
No ratings yet
Reflectance Meter: Aimil LTD
2 pages
Programming With Pascal Notes
No ratings yet
Programming With Pascal Notes
28 pages
Language & Linguistics Basics
No ratings yet
Language & Linguistics Basics
22 pages
Garlock Catalogo Gasket
No ratings yet
Garlock Catalogo Gasket
46 pages

Data Compression

Uploaded by

Data Compression

Uploaded by

Data Compression

 Compression is used to reduce the volume

H’ = Σ i=1 5 Ni Pi = (0.16 +

where n: # of symbols, Pi: probability of symbol i

 H’ = Σ i=1 6 Ni Pi = (2(20.25) + 4(30.125)) = 2.5

Method of construction for an encoding tree

• 1. Maintain a forest of trees

• variable length code whose length is inversely

• create codes by constructing a binary tree

You might also like