0% found this document useful (0 votes)

125 views16 pages

Tries and Huffman Encoding

The document discusses tries, a tree-like data structure for storing strings that allows for efficient search, insertion, and deletion of words. Tries improve on hashmaps by reducing redundant storage of common prefixes and allowing for autocomplete functionality. The key aspects are: 1) Tries store strings as paths from the root node down, with common prefixes shared between nodes. 2) Insertion, search, and deletion of a word takes O(word_length) time by traversing the corresponding path. 3) Tries save space over hashmaps when many strings share common prefixes. 4) Tries allow for autocomplete searches not possible with hashmaps.

Uploaded by

RAHUL DAS

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

125 views16 pages

Tries and Huffman Encoding

Uploaded by

RAHUL DAS

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 16

Tries and Huffman Coding

Introduction to tries

Suppose we want to implement a word-dictionary using a C++ program and

perform the following functions:

● Addition of the word(s)

● Searching a word
● Removal of the word(s)

To do the same, hashmaps can be thought of as an efficient data structure as the

average case time complexity of insertion, deletion, and retrieval is O(1) for
integer, character, float, and decimal values.

Let us discuss the time complexity of the same in case of strings.

Suppose we want to insert string a

bc in our hashmap. To do so, first, we would
need to calculate the hashcode for it, which would require the traversal of the
whole string abc. Hence, the time taken will be the length of the entire string
irrespective of the time taken to perform any of the above operations. Thus, we can
conclude that the insertion of a string will be O(string_length).

To search a word in the hashmap, we again have to calculate the hashcode of the
string to be searched, and for that also, it would require O(string_length) time.

Similarly, for removing a word from the hashmap, we would have to calculate the
hashcode for that string. It would again require O(string_length) time.

For the same purpose, we can use another data structure known as tries.

Below is the visual representation of the trie:

Here, the node at the top named as the start is the root node of our n
-ary tree.

Suppose we want to insert the word ARE i n the trie. We will begin from the root,
search for the first word A and then insert R
as its child and similarly insert the
letter E
. You can see it as the left-most path in the above trie.

Now, we want to insert the word A

S in the trie. We will follow the same procedure
as above. First-of-all, we find the letter A
as the child of the root node, and then we
will search for S. If S was already present as the child of A
, then we will do nothing
as the given word is already present otherwise, we will insert S
. You can see this in
the above trie. Similarly, we added other words in the trie.

This approach also takes O(word_length) time for insertion.

Moving on to searching a word in the trie, for that also, we would have to traverse
the complete length of the string, which would take O(word_length) time.

Let us take an example. Suppose we want to search for the word DOT i n the above
trie, we will first-of-all search for the letter D
as the direct child of the start node and
then search for O and T and, then we will return true as the word is present in the
trie.

Consider another example AR, which we want to search for in the given trie.
Following the above approach, we would return true as the given word is present in
it. However ideally, we should return false as the actual word was ARE and not A
R.
To overcome this, it can be observed that in the above trie, some of the letters are
marked with a bolder circle, and others are not. This boldness represents the
termination of a word starting from the root node. Hence, while searching for a
word, we will always be careful about the last letter that it should be bolded to
represent the termination.

Note: W
hile insertion in a trie, the last letter of the word should be bolded as a
mark of termination of a word.

In the above trie, the following are all possibilities that can occur:

● ARE, AS
● DO, DOT
● NEW, NEWS, NOT
● ZEN

Here, to remove the word, we will simply unbold the terminating letter as it will
make that word invalid. For example: If you want to remove the string D
O f rom the
above trie by preserving the occurrence of string DOT, then we will reach O
and
then unbolden it. This way the word D
O i s removed but at the same time, another
word DOT w
hich was on the same path as that of word DO was still preserved in the
trie structure.

For removal of a word from trie, the time complexity is still O(word_length) as we
are traversing the complete length of the word to reach the last letter to unbold it.

When to use tries over hashmaps?

It can be observed that using tries, we can improve the space complexity.

For example: We have 1000 words starting from character A

that we want to store.
Now, if you try to hold these words using hashmap, then for each word, we have to
store character A differently. But in case of tries, we only need to store the
character A
once. In this way, we can save a lot of space, and hence space
optimization leads us to prefer tries over hashmaps in such scenarios.

In the beginning, we thought of implementing a dictionary. Let's recall a feature of a

dictionary, namely A
uto-Search. While browsing the dictionary, we start by typing a
character. All the words beginning from that character appear in the search list. But
this functionality can't be achieved using hashmaps as in the hashmap, the data
stored is independent of each other, whereas, in case of tries, the data is stored in
the form of a tree-like structure. Hence, here also, tries prove to be efficient over
hashmaps.

TrieNode class implementation

Follow the below-mentioned code (with comments)...

class TrieNode {
public :
char data; // To store data (character value: ‘A’ - ‘Z’)
TreeNode **children; // To store the address of each child
bool isTerminal; // it will be TRUE if the word terminates at this character

TrieNode(char data) { // Constructor for values initialization
this -> data = data;
children = new TrieNode*[26];
for(int i = 0; i < 26; i++) {
children[i] = NULL;
}

isTerminal = false;
}
};

Insert function

To insert a word in a trie, we will use recursion. Suppose we have the following trie:

Now we want to insert the word BET in our trie.

Recursion says that we need to work on a smaller problem, and the rest it will
handle itself. So, we will do it on the root node.

Note: P
ractically, the functionality of bolding the terminal character is achieved
using the boolean i sTerminal v
ariable for that particular node. If this value is true
means that the node is the terminal value of the string, otherwise not.

Approach:

We will first search for letter B a

nd check if it is present as the children of the root
node or not and then call recursion on it. If B is present as a child of the root node
as in our case it is, then we will simply recurse over it by shifting the length of the
string by 1. In case, character B was not the direct child of the root node, then we
have to create one and then call recursion on it. After the recursive call, we will see
that be is now a root node, and the character E
is now the word we are searching
for. We will follow the same procedure as done in searching for character B
against
the root node and then move forward to the next character of the string, i.e., T,
which happens to be the last character of our string. Now we will check character T
as the child of character E
. In our case, it is not a child of character E
, so we’ll create
it. As T is the last character of the string, so we will mark its isTerminal v
alue to
True.

Following will be the three steps of recursion:

● Small Calculation: We will check if the root node has the first character of
the string as one of its children or not. If not, then we will create one.
● Recursive call: W
e will tell the recursion to insert the remaining string in the
subtree of our trie.
● Base Case: As the length of the string becomes zero, or in other words, we
reach the NULL character, then we need to mark the isTerminal f or the last
character as True.

Follow the code below, along with the comments...

class Trie {
TrieNode *root;

public :
Trie() {
root = new TrieNode('\0');
}

void insertWord(TrieNode *root, string word) {
// Base case

if(word.size() == 0) {
root -> isTerminal = true;
return;
}
// Small Calculation
int index = word[0] - 'a'; // As for ‘a’ refers to index 0, ‘b’ refers to
// index 2 and so on, so to reach the correct index we will do so
TrieNode *child;
if(root -> children[index] != NULL) { // If the first character of string is
// already present as the child node of the root node
child = root -> children[index];
}
else { / / If not present as the child then creating one.
child = new TrieNode(word[0]);
root -> children[index] = child;
}

// Recursive call
insertWord(child, word.substr(1));
}
// For user
void insertWord(string word) {
insertWord(root, word);
}
};

Search in tries

Objective: Create a search function which will get a string as its argument to be
searched in the trie and returns a boolean value. True if the string is present in the
trie and False if not.

Approach: We will be using the same process as that used during insertion. We will
call recursion over the root node after searching for the first character as its child. If
the first character is not present as one of the children of the root node, then we
will simply return false; otherwise, we will send the remaining string to the
recursion. When we reach the empty string, then check for the last character’s

isTerminal v
alue; if it is true, m
eans that word exists in our trie, and we will return
true from there otherwise, return false.

Try to code it yourselves, and for the code, refer to the solution tab of the same.

Tries implementation: Remove

Objective: To delete the given word from our trie.

Approach: First-of-all we need to search for the word in the trie, and if found, then
we simply need to mark the i sTerminal value of the last character of the word to
false as that will simply denote that the word does not exist in our trie.

Check out the code below: (Nearly same as that of insertion, just a minor changes
which are explained along side)

void removeWord(TrieNode *root, string word) {

// Base case
if(word.size() == 0) {
root -> isTerminal = false;
return;
}

// Small calculation
TrieNode *child;
int index = word[0] - 'a';
if(root -> children[index] != NULL) {
child = root -> children[index];
}
else {
// Word not found
return;
}

removeWord(child, word.substr(1));
// Suppose if the character of the string doesn’t have any child and is a part of the
word to be deleted, then we can simply delete that node also as it is not
referencing to any other word in the trie

// Removing child Node if it is useless

if(child -> isTerminal == false) {

for(int i = 0; i < 26; i++) {
if(child -> children[i] != NULL) {
return;
}
}
delete child;
root -> children[index] = NULL;
}
}

// For user
void removeWord(string word) {
removeWord(root, word);
}

Types of tries

There are two types of tries:

● Compressed tries:
○ Majorly, used for space optimization.
○ We generally club the characters if they have at most one child.
○ General rule: Every node has at least two child nodes.

Refer to the figure below:

Suppose our regular trie looks like this-

Figure - 1

Its compressed trie version will look as follows:

● Pattern matching:
○ Used to match patterns in the trie.

■ Example: In the figure-1 (shown above), if we want to search for

pattern ben i n the trie, but the word b
end was present instead,
using the normal search function, we would return false, as the
last character n’ s isTerminal property was false, but in this trie,
we would return true.
○ To overcome this problem of the last character’s identification, just
remove the isTerminal property of the node.
○ In the figure-1, instead of searching for the pattern ben, we now want
to search for the pattern en. Our trie would return false if e
n is not
directly connected to the root. But as the pattern is present in the
word b
en, it should return true. To overcome this problem, we need to
attach each of the prefix strings to the root node so that every pattern
is encountered.
■ For example: for the string b
en, we would store b
en, e
n, n in
the trie as the direct children of the root node.

Huffman Coding

Huffman Coding is one approach followed for T

ext Compression. Text
compression means reducing the space requirement for saving a particular text.

Huffman Coding is a lossless data compression algorithm, ie. it is a way of

compressing data without the data losing any information in the process. It is
useful in cases where there is a series of frequently occurring characters.

Working of Huffman Algorithm:

Suppose, the given string is:

Here, each of the characters of the string takes 8 bits of memory. Since there are a
total of 15 characters in the string so the total memory consumption will be 15*8 =
120 bits. Let’s try to compress its size using the Huffman Algorithm.

First-of-all, the Huffman Coding creates a tree by calculating the frequencies of each
character of the string and then assigns them some unique code so that we can
retrieve the data back using these codes.

Follow the steps below:

1. Begin with calculating the frequency of each character value in the given
string.

2. Sort the characters in ascending order concerning their frequency and store
them in a priority queue, say Q
.
3. Each character should be considered as

a
different leaf node.
4. Make an empty node, say z . The left child of z marked as the minimum
frequency and right child, the second minimum frequency. The value of z is
calculated by summing up the first two frequencies.

Here, * denote the internal nodes.

5. Now, remove the two characters with the lowest frequencies from the
priority queue Q and append their sum to the same.
6. Simply insert the above node z to the tree.
7. For every character in the string, repeat steps 3 to 5.

8. Assign 0 to the left side and 1 to the right side except for the leaf nodes.

The size table is given below:

Character Frequency Code Size

A 5 11 5*2 = 10

B 1 100 1*3 = 3

C 6 0 6*1 = 6

D 3 101 3*3 = 9

4*8 = 32 bits 15 bits 28 bits

Size before encoding: 120 bits

Size after encoding: 32 + 15 + 28 = 75 bits

To decode the code, simply traverse through the tree (starting from the root)
to find the character. Suppose we want to decode 101, then:

Time complexity:

In case of encoding, inserting each character into the priority queue takes O(log n)
time. Therefore, for the complete array, the time complexity becomes O(nlog n).

Similarly, extraction of the element from the priority queue takes O(log n) time.
Hence, for the complete array, the achieved time complexity is O(nlog n).

Applications of Huffman Coding:

● They are used for transmitting fax and text.

● They are used by conventional compression formats like PKZIP, GZIP, etc.

Tries
No ratings yet
Tries
4 pages
Trie
No ratings yet
Trie
8 pages
DSA Unit V
No ratings yet
DSA Unit V
34 pages
Tries 1427
No ratings yet
Tries 1427
19 pages
5.4. ADS - Tries - Standard Tries
No ratings yet
5.4. ADS - Tries - Standard Tries
34 pages
Trie
No ratings yet
Trie
13 pages
Tries
No ratings yet
Tries
33 pages
Topic - Q Implement Trie (Prefix Tree) - Information O..
No ratings yet
Topic - Q Implement Trie (Prefix Tree) - Information O..
3 pages
Tries and Suffix Tries
No ratings yet
Tries and Suffix Tries
29 pages
Unit 3 Tries
No ratings yet
Unit 3 Tries
16 pages
A2SV - Trie Lecture (No Code)
No ratings yet
A2SV - Trie Lecture (No Code)
39 pages
Digital Search Tree
No ratings yet
Digital Search Tree
61 pages
CF
No ratings yet
CF
4 pages
Presentation 1
No ratings yet
Presentation 1
20 pages
Ders10 Data Structures-Tries
No ratings yet
Ders10 Data Structures-Tries
34 pages
Crack Maang Companies - DSA Questions (C++) - 1
No ratings yet
Crack Maang Companies - DSA Questions (C++) - 1
257 pages
Data Structure: Name: Mahadi Hassan ID: 1320133042 & Name: Mahmud Rahman Parag ID: 1320900042
No ratings yet
Data Structure: Name: Mahadi Hassan ID: 1320133042 & Name: Mahmud Rahman Parag ID: 1320900042
168 pages
Week 2+3 TRIE (Student Copy)
No ratings yet
Week 2+3 TRIE (Student Copy)
24 pages
Ads 2 Part 4
No ratings yet
Ads 2 Part 4
18 pages
P7 CSC 22 39
No ratings yet
P7 CSC 22 39
4 pages
Chap13 HW TextProcessing
No ratings yet
Chap13 HW TextProcessing
2 pages
Trie Insertion
No ratings yet
Trie Insertion
31 pages
55 TriesNOTES
No ratings yet
55 TriesNOTES
18 pages
Types of Tries
No ratings yet
Types of Tries
20 pages
41 - Data Structure and Algorithms - Tries
No ratings yet
41 - Data Structure and Algorithms - Tries
4 pages
Advance Data Structures
No ratings yet
Advance Data Structures
184 pages
Tries
No ratings yet
Tries
17 pages
52tries 2x2
No ratings yet
52tries 2x2
13 pages
Trie Data Structure
No ratings yet
Trie Data Structure
5 pages
ADS Lab Week 12
No ratings yet
ADS Lab Week 12
4 pages
MX - Road Design
No ratings yet
MX - Road Design
287 pages
Unit5 Trie
No ratings yet
Unit5 Trie
23 pages
Daa Tut 6 Sudhanshu Raut: Pseudo Code For KMP Algorithm
No ratings yet
Daa Tut 6 Sudhanshu Raut: Pseudo Code For KMP Algorithm
11 pages
CSC10004: Data Structures and Algorithms
No ratings yet
CSC10004: Data Structures and Algorithms
20 pages
TRIE Trees: Search Engines Genome Analysis Data Analytics
No ratings yet
TRIE Trees: Search Engines Genome Analysis Data Analytics
6 pages
Representation:: Insertion and Search in Trie Data Structure
No ratings yet
Representation:: Insertion and Search in Trie Data Structure
25 pages
Trie - Wikipedia
No ratings yet
Trie - Wikipedia
10 pages
Trie Data Structure: September 2015
No ratings yet
Trie Data Structure: September 2015
4 pages
Tries and Radix Tree1
No ratings yet
Tries and Radix Tree1
27 pages
An Efficient Implementation of Trie Structures
No ratings yet
An Efficient Implementation of Trie Structures
27 pages
Implement Trie (Prefix Tree)
No ratings yet
Implement Trie (Prefix Tree)
7 pages
Trie Data Structure Implementation
No ratings yet
Trie Data Structure Implementation
12 pages
Trie
No ratings yet
Trie
16 pages
Suffixtrees
No ratings yet
Suffixtrees
50 pages
Lecture03 SuffixTree
No ratings yet
Lecture03 SuffixTree
3 pages
The Python Tutorial - Python 3.13.2 Documentation
No ratings yet
The Python Tutorial - Python 3.13.2 Documentation
4 pages
Trie Vs BST Vs HashTable
No ratings yet
Trie Vs BST Vs HashTable
2 pages
Tg-0090 - 11 Tm4 Mcu Udsoncan Technical Guide
No ratings yet
Tg-0090 - 11 Tm4 Mcu Udsoncan Technical Guide
35 pages
Advantages Relative To Other Search Algorithms
No ratings yet
Advantages Relative To Other Search Algorithms
7 pages
Module 06. String Algorithms Lecture 3-6
No ratings yet
Module 06. String Algorithms Lecture 3-6
48 pages
Embedded C Module3
No ratings yet
Embedded C Module3
103 pages
Trie Tree
No ratings yet
Trie Tree
21 pages
Lecture 7 Genetic Algorithms PDF
No ratings yet
Lecture 7 Genetic Algorithms PDF
60 pages
Lecture Notes On Tries
No ratings yet
Lecture Notes On Tries
10 pages
Trie
No ratings yet
Trie
6 pages
Fortran Run Time Error Codes
100% (2)
Fortran Run Time Error Codes
14 pages
Tries
No ratings yet
Tries
3 pages
Case Study Pp-Final
No ratings yet
Case Study Pp-Final
28 pages
Structure Charts-WS
No ratings yet
Structure Charts-WS
6 pages
Hafta Diary Final
No ratings yet
Hafta Diary Final
30 pages
Suffix Array Tutorial
No ratings yet
Suffix Array Tutorial
17 pages
OBJECT ORIENTED DEVELOPMENT WITH JAVA (OODJ) - Assigemnt
No ratings yet
OBJECT ORIENTED DEVELOPMENT WITH JAVA (OODJ) - Assigemnt
25 pages
Bottom Up Parsing
No ratings yet
Bottom Up Parsing
36 pages
Craig Walls - Effective Spring
No ratings yet
Craig Walls - Effective Spring
62 pages
Java Mava
No ratings yet
Java Mava
15 pages
C801 User Manual Cheef
No ratings yet
C801 User Manual Cheef
42 pages
Java Lab Manual
No ratings yet
Java Lab Manual
29 pages
Syllabus of Java
No ratings yet
Syllabus of Java
13 pages
6.851 Advanced Data Structures (Spring'12) Prof. Erik Demaine Problem 9 Sample Solution
No ratings yet
6.851 Advanced Data Structures (Spring'12) Prof. Erik Demaine Problem 9 Sample Solution
2 pages
Tries Data Structures (Trie) PPT
100% (1)
Tries Data Structures (Trie) PPT
11 pages
Ill Sleep When Im Dead
No ratings yet
Ill Sleep When Im Dead
54 pages
Source Code: 13.write A Program To Show Addition, Subtraction and Multiplication of Two Matrices
No ratings yet
Source Code: 13.write A Program To Show Addition, Subtraction and Multiplication of Two Matrices
8 pages
Class VIII - DS Worksheet-1
No ratings yet
Class VIII - DS Worksheet-1
16 pages
WIPRO - NLTH - 3 - Anand
No ratings yet
WIPRO - NLTH - 3 - Anand
29 pages
Swayamvar: Problem Description
No ratings yet
Swayamvar: Problem Description
12 pages
Swayamvar: Problem Description
No ratings yet
Swayamvar: Problem Description
12 pages
ISODRAFT Reference Manual
No ratings yet
ISODRAFT Reference Manual
311 pages
Import Random Import Sys Import Os
No ratings yet
Import Random Import Sys Import Os
18 pages
Java Programming Solved MCQs (Set-4)
No ratings yet
Java Programming Solved MCQs (Set-4)
5 pages
Import Public Class Public Static Void New Int Int Int: Source Code
No ratings yet
Import Public Class Public Static Void New Int Int Int: Source Code
7 pages
Write A Program To Print Your Name.: Source Code
No ratings yet
Write A Program To Print Your Name.: Source Code
4 pages
Assignment 4
No ratings yet
Assignment 4
3 pages
Soal Uas Robotic 3 Ea 2023
No ratings yet
Soal Uas Robotic 3 Ea 2023
7 pages
XI CH-4, String (Assign)
No ratings yet
XI CH-4, String (Assign)
5 pages
Python Cheat Sheet
100% (3)
Python Cheat Sheet
14 pages
Source Code: 7.write A Program To Show Fibonacci Series Up To N-TH Terms Using Recursion
No ratings yet
Source Code: 7.write A Program To Show Fibonacci Series Up To N-TH Terms Using Recursion
5 pages
Date/Time Functions: Function Description
No ratings yet
Date/Time Functions: Function Description
4 pages
Import Public Class Public Static Void New Int Int Int: Problem 5 (Using Java - Util.scanner Class)
No ratings yet
Import Public Class Public Static Void New Int Int Int: Problem 5 (Using Java - Util.scanner Class)
3 pages
2016 Syllabus 11 Computer Science
No ratings yet
2016 Syllabus 11 Computer Science
7 pages
Abstract Data Type
No ratings yet
Abstract Data Type
2 pages
Dictionary Using Tries in C
No ratings yet
Dictionary Using Tries in C
6 pages
Outline and Reading: Tries 4/1/2003 9:02 AM
No ratings yet
Outline and Reading: Tries 4/1/2003 9:02 AM
3 pages
12 CS QP
No ratings yet
12 CS QP
4 pages
Python for Beginners: This comprehensive introduction to the world of coding introduces you to the Python programming language
From Everand
Python for Beginners: This comprehensive introduction to the world of coding introduces you to the Python programming language
Vere salazar
No ratings yet
The Genetic Code of All Languages,(Part-1; An Overview)
From Everand
The Genetic Code of All Languages,(Part-1; An Overview)
Moni Kanchan Panda
No ratings yet
Structures and C
From Everand
Structures and C
Prakash Hegade
4/5 (2)
PHP programming
From Everand
PHP programming
Nino Paiotta
No ratings yet

Tries and Huffman Encoding

Uploaded by

Tries and Huffman Encoding

Uploaded by

Tries and Huffman Coding

Suppose we want to implement a word-dictionary using a C++ program and

● Addition of the word(s)

To do the same, hashmaps can be thought of as an efficient data structure as the

Let us discuss the time complexity of the same in case of strings.

Suppose we want to insert string a

Below is the visual representation of the trie:

Now, we want to insert the word A

This approach also takes O(word_length) time for insertion.

When to use tries over hashmaps?

For example: ​We have 1000 words starting from character A

In the beginning, we thought of implementing a dictionary. Let's recall a feature of a

TrieNode class implementation

Follow the below-mentioned code (with comments)...

Now we want to insert the word ​BET ​ in our trie.

We will first search for letter ​B a

Following will be the three steps of recursion:

Follow the code below, along with the comments...

Tries implementation: Remove

Objective: ​To delete the given word from our trie.

void removeWord(TrieNode *root, string word) {

if(child -> isTerminal == false) {

There are two types of tries:

Refer to the figure below:

Suppose our regular trie looks like this-

Its compressed trie version will look as follows:

■ Example: In the figure-1 (shown above), if we want to search for

Huffman Coding is one approach followed for T

Huffman Coding is a lossless data compression algorithm, ie. it is a way of

Working of Huffman Algorithm:

Suppose, the given string is:

Follow the steps below:

Here, * denote the internal nodes.

The size table is given below:

Character Frequency Code Size

A 5 11 5*2 = 10

4*8 = 32 bits 15 bits 28 bits

Size before encoding:​ 120 bits

Size after encoding:​ 32 + 15 + 28 = 75 bits

Applications of Huffman Coding:

● They are used for transmitting fax and text.

You might also like

For example: We have 1000 words starting from character A

Now we want to insert the word BET in our trie.

We will first search for letter B a

Objective: To delete the given word from our trie.

Size before encoding: 120 bits

Size after encoding: 32 + 15 + 28 = 75 bits