[go: up one dir, main page]

0% found this document useful (0 votes)
87 views22 pages

Daa Unit 5

The document discusses NP-hard and NP-complete problems. It defines NP, NP-hard, and NP-complete problems and explains the relationships between them. The Cook-Levin theorem, also known as Cook's theorem, states that the Boolean satisfiability problem (SAT) is NP-complete. It discusses how Stephen Cook and Leonid Levin independently proved in 1971 that SAT is NP-complete by showing that any problem in NP can be reduced to SAT in polynomial time. This established that SAT and many other problems are NP-complete.

Uploaded by

Kaarlet
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
87 views22 pages

Daa Unit 5

The document discusses NP-hard and NP-complete problems. It defines NP, NP-hard, and NP-complete problems and explains the relationships between them. The Cook-Levin theorem, also known as Cook's theorem, states that the Boolean satisfiability problem (SAT) is NP-complete. It discusses how Stephen Cook and Leonid Levin independently proved in 1971 that SAT is NP-complete by showing that any problem in NP can be reduced to SAT in polynomial time. This established that SAT and many other problems are NP-complete.

Uploaded by

Kaarlet
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

R19 IT III YEAR 1 SEMESTER DESIGN AND ANALYSIS OF ALGORITHMS UNIT V

DESIGN AND ANALYSIS OF ALGORITHMS


UNIT V NP-Hard and NP-Complete problems: Basic concepts, Cook’s Theorem.
String Matching: Introduction, String Matching-Meaning and Application, NaÏve
String Matching Algorithm, Rabin-Karp Algorithm, Knuth-Morris-Pratt Automata,
Tries, Suffix Tree.

BASIC CONCEPTS
In this chapter we are concerned with the distinction between problems that can be solved by a
polynomial time algorithm and problems for which no polynomial time algorithm is known. It
is an unexplained phenomenon that for many of the problems we know and study, the best
algorithms for their solutions have computing times that cluster into two groups.The first group
consists of problems whose solution times are bounded by polynomials of small degree The
second group is made up of problems whose best-known algorithms are non polynomial In the
quest to develop efficient algorithms,no one has been able to develop a polynomial time
algorithm for any problem in the second group.
The theory of NP-complete doesn’t provide a method of obtaining polynomial time algorithms
for problems in the second group. Nor does it say that algorithms of this complexity do not
exist
Instead,what we do is show that many of the problems for which there are no known
polynomial time algorithms are computationally related.In fact, we establish two classes of
problems.These are given the names NP-hard and NP-complete.A problem that is NP-complete
has the property that it can be solved in polynomial time if and only if all other NP-complete
problems can also be solved in polynomial time. If an NP-hard problem can be solved in
polynomial time,then all NP-complete problems can be solved in polynomial time. All NP-
complete problems are NP-hard, but some NP-hard problems are not known to be NP-complete

Non deterministic Algorithm


In deterministic algorithm, for a given particular input, the computer will always produce
the same output going through the same states but in case of non-deterministic algorithm,
for the same input, the compiler may produce different output in different runs. In fact non-
deterministic algorithms can’t solve the problem in polynomial time and can’t determine what
is the next step. The non-deterministic algorithms can show different behaviors for the same
input on different execution and there is a degree of randomness to it.

Y.Jyothi , Assistant Professor, IT Department, Dhanekula Institute of Engineering & Technology 1


R19 IT III YEAR 1 SEMESTER DESIGN AND ANALYSIS OF ALGORITHMS UNIT V
To implement a non-deterministic algorithm, we have a couple of languages like Prolog but
these don’t have standard programming language operators and these operators are not a part of
any standard programming languages.

Some of the terms related to the non-deterministic algorithm are defined below:
choice(X) : chooses any value randomly from the set X.
failure() : denotes the unsuccessful solution.
success() : Solution is successful and current thread terminates.
Example :
Problem Statement : Search an element x on A[1:n] where n>=1, on successful search return j
if a[j] is equals to x otherwise return 0.
Non-deterministic Algorithm for this problem :
j= choice(a, n)
if(A[j]==x) then
{
write(j);
success();
}
write(0); failure();

The Classes NP-hard and NP-Complete


NP Problem:
The NP problems set of problems whose solutions are hard to find but easy to verify and are
solved by Non-Deterministic Machine in polynomial time.
NP-Hard Problem:
A Problem X is NP-Hard if there is an NP-Complete problem Y, such that Y is reducible to X
in polynomial time. NP-Hard problems are as hard as NP-Complete problems. NP-Hard
Problem need not be in NP class.
NP-Complete Problem:
A problem X is NP-Complete if there is an NP problem Y, such that Y is reducible to X in
polynomial time. NP-Complete problems are as hard as NP problems. A problem is NP-
Complete if it is a part of both NP and NP-Hard Problem. A non-deterministic Turing machine
can solve NP-Complete problem in polynomial time.
NP-hard NP-Complete

NP-Hard problems(say X) can be solved if and NP-Complete problems can be solved by


only if there is a NP-Complete problem(say Y) that a non-deterministic Algorithm/Turing
can be reducible into X in polynomial time. Machine in polynomial time.

To solve this problem, it do not have to be in NP . To solve this problem, it must be both NP
and NP-hard problems.

Y.Jyothi , Assistant Professor, IT Department, Dhanekula Institute of Engineering & Technology 2


R19 IT III YEAR 1 SEMESTER DESIGN AND ANALYSIS OF ALGORITHMS UNIT V

NP-hard NP-Complete

Do not have to be a Decision problem. It is exclusively a Decision problem.

Example: Halting problem, Vertex cover problem, Example: Determine whether a graph has
etc. a Hamiltonian cycle, Determine whether
a Boolean formula is satisfiable or not,
Circuit-satisfiability problem, etc.

COOK–LEVIN THEOREM OR COOK’S THEOREM


In computational complexity theory, the Cook–Levin theorem, also known as Cook’s theorem,
states that the Boolean satisfiability problem is NP-complete. That is, it is in NP, and any
problem in NP can be reduced in polynomial time by a deterministic Turing machine to the
Boolean satisfiability problem.
Stephen Arthur Cook and L.A. Levin in 1973 independently proved that the satisfiability
problem(SAT) is NP-complete. Stephen Cook, in 1971, published an important paper titled
‘The complexity of Theorem Proving Procedures’, in which he outlined the way of obtaining
the proof of an NP-complete problem by reducing it to SAT. He proved Circuit-
SAT and 3CNF-SAT problems are as hard as SAT. Similarly, Leonid Levin independently
worked on this problem in the then Soviet Union. The proof that SAT is NP-complete was
obtained due to the efforts of these two scientists. Later, Karp reduced 21 optimization
problems, such as Hamiltonian tour, vertex cover, and clique, to the SAT and proved that those
problems are NP-complete.
Y.Jyothi , Assistant Professor, IT Department, Dhanekula Institute of Engineering & Technology 3
R19 IT III YEAR 1 SEMESTER DESIGN AND ANALYSIS OF ALGORITHMS UNIT V
Hence, an SAT is a significant problem and can be stated as follows:
Given a boolean expression F having n variables x1,x2,….,xn, and Boolean operators, is it
possible to have an assignment for variables true or false such that binary expression F is true?
This problem is also known as the formula – SAT. An SAT(formula-SAT or simply SAT) takes
a Boolean expression F and checks whether the given expression(or formula) is satisfiable. A
Boolean expression is said to be satisfactory for some valid assignments of variables if the
evaluation comes to be true. Prior to discussing the details of SAT, let us now discuss some
important terminologies of a Boolean expression.
Boolean variable: A variable, say x, that can have only two values, true or false, is called a
boolean variable
Literal: A literal can be a logical variable, say x, or the negation of it, that is x or x̄; x is called a
positive literal, and x̄ is called the negative literal
Clause: A sequence of variables(x1,x2,….,xn) that can be separated by a logical OR operator is
called a clause. For example, (x1 V x2 V x3) is a clause of three literals.
Expressions: One can combine all the preceding clauses using a Boolean operator to form an
expression.
CNF form: An expression is in CNF form(conjunctive normal form) if the set of clauses are
separated by an AND (^), operator, while the literals are connected by an OR (v) operator. The
following is an example of an expression in the CNF form:

3 – CNF: An expression is said to be in 3-CNF if it is in the conjunctive normal form, and


every clause has exact three literals.
Thus, an SAT is one of the toughest problems, as there is no known algorithm other than the
brute force approach. A brute force algorithm would be an exponential-time algorithm,
as 2n possible assignments need to be tried to check whether the given Boolean expression is
true or not. Stephen Cook and Leonid Levin proved that the SAT is NP-complete.
Cook demonstrated the reduction of other hard problems to SATs. Karp provided proof of 21
important problems, Such as Hamiltonian tour, vertex cover, and clique, by reducing it to SAT
using Karp reduction.
Let us briefly introduce the three types of SATs. They are as follows:
Circuit- SAT: A circuit-SAT can be stated as follows: given a Boolean circuit, which is a
collection of gates such as AND, OR, and NOT, and n inputs, is there any input assignments of
Boolean variables so that the output of the given circuit is true? Again, the difficulty with these
Y.Jyothi , Assistant Professor, IT Department, Dhanekula Institute of Engineering & Technology 4
R19 IT III YEAR 1 SEMESTER DESIGN AND ANALYSIS OF ALGORITHMS UNIT V
problems is that for n inputs to the circuit. 2n possible outputs should be checked.Therefore,
this brute force algorithm is an exponential algorithm and hence this is a hard problem.
CNF-SAT: This problem is a restricted problem of SAT, where the expression should be in a
conjunctive normal form. An expression is said to be in a conjunction form if all the clauses are
connected by the Boolean AND operator. Like in case of a SAT, this is about assigning truth
values to n variables such that the output of the expression is true.
3-CNF-SAT(3-SAT): This problem is another variant where the additional restriction is that the
expression is in a conjunctive normal form and that every clause should contain exactly three
literals. This problem is also about assigning n assignments of truth values to n variables of the
Boolean expression such that the output of the expression is true. In simple words, given an
expression in 3-CNF, a 3-SAT problem is to check whether the given expression is satisfiable.

STRING MATCHING:
The string-matching is a very useful tool which is used in string matching algorithm. It
examines every character in the text exactly once and reports all the valid shifts in O (n) time.
The goal of string matching is to find the location of specific text pattern within the larger body
of text (a sentence, a paragraph, a book, etc.)
Let us look at a few string matching algorithms before proceeding to their applications in real
world. String Matching Algorithms can broadly be classified into two types of algorithms –
1.Exact String Matching Algorithms
2.Approximate String Matching Algorithms
Exact String Matching Algorithms:
Exact string matching algorithms is to find one, several, or all occurrences of a defined string
(pattern) in a large string (text or sequences) such that each matching is perfect. All alphabets
of patterns must be matched to corresponding matched subsequence. These are further
classified into four categories:
Algorithms based on character comparison:
Naive Algorithm: It slides the pattern over text one by one and check for a match. If a match is
found, then slides by 1 again to check for subsequent matches.
KMP (Knuth Morris Pratt) Algorithm: The idea is whenever a mismatch is detected, we
already know some of the characters in the text of the next window. So, we take advantage of
this information to avoid matching the characters that we know will anyway match.
Boyer Moore Algorithm: This algorithm uses best heurestics of Naive and KMP algorithm and
starts matching from the last character of the pattern.
Using the Trie data structure: It is used as an efficient information retrieval data structure. It
stores the keys in form of a balanced BST.
Deterministic Finite Automaton (DFA) method:
Automaton Matcher Algorithm: It starts from the first state of the automata and the first
character of the text. At every step, it considers next character of text, and look for the next
state in the built finite automata and move to a new state.
Algorithms based on Bit (parallelism method):
Aho-Corasick Algorithm: It finds all words in O(n + m + z) time where n is the length of text
and m be the total number characters in all words and z is total number of occurrences of words
in text. This algorithm forms the basis of the original Unix command fgrep.
Hashing-string matching algorithms:
Y.Jyothi , Assistant Professor, IT Department, Dhanekula Institute of Engineering & Technology 5
R19 IT III YEAR 1 SEMESTER DESIGN AND ANALYSIS OF ALGORITHMS UNIT V
Rabin Karp Algorithm: It matches the hash value of the pattern with the hash value of current
substring of text, and if the hash values match then only it starts matching individual characters.
Approximate String Matching Algorithms:
Approximate String Matching Algorithms (also known as Fuzzy String Searching) searches for
substrings of the input string. More specifically, the approximate string matching approach is
stated as follows: Suppose that we are given two strings, text T[1…n] and pattern P[1…m].
The task is to find all the occurrences of patterns in the text whose edit distance to the pattern is
at most k. Some well known edit distances are – Levenshtein edit distance and Hamming edit
distance.
These techniques are used when the quality of the text is low, there are spelling errors in the
pattern or text, finding DNA subsequences after mutation, heterogeneous databases, etc. Some
approximate string matching algorithms are:
Naive Approach: It slides the pattern over text one by one and check for approximate matches.
If they are found, then slides by 1 again to check for subsequent approximate matches.
Sellers Algorithm (Dynamic Programming)
Shift or Algorithm (Bitmap Algorithm)
Applications of String Matching Algorithms:
Plagiarism Detection: The documents to be compared are decomposed into string tokens and
compared using string matching algorithms. Thus, these algorithms are used to detect
similarities between them and declare if the work is plagiarized or original.

Bioinformatics and DNA Sequencing: Bioinformatics involves applying information


technology and computer science to problems involving genetic sequences to find DNA
patterns. String matching algorithms and DNA analysis are both collectively used for finding
the occurrence of the pattern set.

Digital Forensics: String matching algorithms are used to locate specific text strings of interest
in the digital forensic text, which are useful for the investigation.
Y.Jyothi , Assistant Professor, IT Department, Dhanekula Institute of Engineering & Technology 6
R19 IT III YEAR 1 SEMESTER DESIGN AND ANALYSIS OF ALGORITHMS UNIT V
Spelling Checker: Trie is built based on a predefined set of patterns. Then, this trie is used for
string matching. The text is taken as input, and if any such pattern occurs, it is shown by
reaching the acceptance state.

Spam filters: Spam filters use string matching to discard the spam. For example, to categorize
an email as spam or not, suspected spam keywords are searched in the content of the email by
string matching algorithms. Hence, the content is classified as spam or not.

Search engines or content search in large databases: To categorize and organize data efficiently,
string matching algorithms are used. Categorization is done based on the search keywords.
Thus, string matching algorithms make it easier for one to find the information they ar
searching for.

Intrusion Detection System: The data packets containing intrusion-related keywords are found
by applying string matching algorithms. All the malicious code is stored in the database, and
every incoming data is compared with stored data. If a match is found, then the alarm is
generated. It based on exact string matching algorithms where each intruded packet must be
detected.

Y.Jyothi , Assistant Professor, IT Department, Dhanekula Institute of Engineering & Technology 7


R19 IT III YEAR 1 SEMESTER DESIGN AND ANALYSIS OF ALGORITHMS UNIT V

THE NAIVE STRING MATCHING ALGORITHM


The naïve approach tests all the possible placement of Pattern P [1.......m] relative to text T
[1......n]. We try shift s = 0, 1.......n-m, successively and for each shift s. Compare T
[s+1.......s+m] to P [1......m].
The naïve algorithm finds all valid shifts using a loop that checks the condition P [1.......m] = T
[s+1.......s+m] for each of the n - m +1 possible value of s.
NAIVE-STRING-MATCHER (T, P)
1. n ← length [T]
2. m ← length [P]
3. for s ← 0 to n -m
4. do if P [1.....m] = T [s + 1....s + m]
5. then print "Pattern occurs with shift" s
Analysis: This for loop from 3 to 5 executes for n-m + 1(we need at least m characters at the
end) times and in iteration we are doing m comparisons. So the total complexity is O (n-m+1).
Example:
Suppose T = 1011101110
P = 111
Find all the Valid Shift
Solution:

Y.Jyothi , Assistant Professor, IT Department, Dhanekula Institute of Engineering & Technology 8


R19 IT III YEAR 1 SEMESTER DESIGN AND ANALYSIS OF ALGORITHMS UNIT V

Disadvantages of Naïve String Matching


There is only one disadvantage of the naïve string matching approach, which is that it is
inefficient. This is because when it has found a position, it does not use it again to find the
other position. It goes back to the starting point and looks for the pattern over again.

Y.Jyothi , Assistant Professor, IT Department, Dhanekula Institute of Engineering & Technology 9


R19 IT III YEAR 1 SEMESTER DESIGN AND ANALYSIS OF ALGORITHMS UNIT V
THE RABIN-KARP-ALGORITHM
The Rabin-Karp string matching algorithm calculates a hash value for the pattern, as well as for
each M-character subsequences of text to be compared. If the hash values are unequal, the
algorithm will determine the hash value for next M-character sequence. If the hash values are
equal, the algorithm will analyze the pattern and the M-character sequence. In this way, there is
only one comparison per text subsequence, and character matching is only required when the
hash values match.
RABIN-KARP-MATCHER (T, P, d, q)
1. n ← length [T]
2. m ← length [P]
3. h ← dm-1 mod q
4. p ← 0
5. t0 ← 0
6. for i ← 1 to m
7. do p ← (dp + P[i]) mod q
8. t0 ← (dt0+T [i]) mod q
9. for s ← 0 to n-m
10. do if p = ts
11. then if P [1.....m] = T [s+1.....s + m]
12. then "Pattern occurs with shift" s
13. If s < n-m
14. then ts+1 ← (d (ts-T [s+1]h)+T [s+m+1])mod q
Example: For string matching, working module q = 11, how many spurious hits does the
Rabin-Karp matcher encounters in Text T = 31415926535.......
T = 31415926535.......
P = 26
Here T.Length =11 so Q = 11
And P mod Q = 26 mod 11 = 4
Now find the exact match of P mod Q...
Solution:

Y.Jyothi , Assistant Professor, IT Department, Dhanekula Institute of Engineering & Technology 10


R19 IT III YEAR 1 SEMESTER DESIGN AND ANALYSIS OF ALGORITHMS UNIT V

Complexity:
The running time of RABIN-KARP-MATCHER in the worst case scenario O ((n-m+1) m but it
has a good average case running time. If the expected number of strong shifts is small O (1) and
Y.Jyothi , Assistant Professor, IT Department, Dhanekula Institute of Engineering & Technology 11
R19 IT III YEAR 1 SEMESTER DESIGN AND ANALYSIS OF ALGORITHMS UNIT V
prime q is chosen to be quite large, then the Rabin-Karp algorithm can be expected to run in
time O (n+m) plus the time to require to process spurious hits.
Drawback:Just by applying the hash function based on the table created , then sometimes even
though Hash Code matched of both pattern and Text but the characters of the Pattern dont
matches to the Text.

THE KNUTH-MORRIS-PRATT (KMP)ALGORITHM


Knuth-Morris and Pratt introduce a linear time algorithm for the string matching problem. A
matching time of O (n) is achieved by avoiding comparison with an element of 'S' that have
previously been involved in comparison with some element of the pattern 'p' to be matched. i.e.,
backtracking on the string 'S' never occurs
Components of KMP Algorithm:
1. The Prefix Function (Π): The Prefix Function, Π for a pattern encapsulates knowledge about
how the pattern matches against the shift of itself. This information can be used to avoid a
useless shift of the pattern 'p.' In other words, this enables avoiding backtracking of the string
'S.'
2. The KMP Matcher: With string 'S,' pattern 'p' and prefix function 'Π' as inputs, find the
occurrence of 'p' in 'S' and returns the number of shifts of 'p' after which occurrences are found.
The Prefix Function (Π)
Following pseudo code compute the prefix function, Π:
COMPUTE- PREFIX- FUNCTION (P)
1. m ←length [P] //'p' pattern to be matched
2. Π [1] ← 0
3. k ← 0
4. for q ← 2 to m
5. do while k > 0 and P [k + 1] ≠ P [q]
6. do k ← Π [k]
7. If P [k + 1] = P [q]
8. then k← k + 1
9. Π [q] ← k
10. Return Π
Running Time Analysis:
In the above pseudo code for calculating the prefix function, the for loop from step 4 to step 10
runs 'm' times. Step1 to Step3 take constant time. Hence the running time of computing prefix
function is O (m).
Example: Compute Π for the pattern 'p' below:

Solution:
Initially: m = length [p] = 7
Π [1] = 0
k=0

Y.Jyothi , Assistant Professor, IT Department, Dhanekula Institute of Engineering & Technology 12


R19 IT III YEAR 1 SEMESTER DESIGN AND ANALYSIS OF ALGORITHMS UNIT V

After iteration 6 times, the prefix function computation is complete:


Y.Jyothi , Assistant Professor, IT Department, Dhanekula Institute of Engineering & Technology 13
R19 IT III YEAR 1 SEMESTER DESIGN AND ANALYSIS OF ALGORITHMS UNIT V

The KMP Matcher:


The KMP Matcher with the pattern 'p,' the string 'S' and prefix function 'Π' as input, finds a
match of p in S. Following pseudo code compute the matching component of KMP algorithm:
KMP-MATCHER (T, P)
1. n ← length [T]
2. m ← length [P]
3. Π← COMPUTE-PREFIX-FUNCTION (P)
4. q ← 0 // numbers of characters matched
5. for i ← 1 to n // scan S from left to right
6. do while q > 0 and P [q + 1] ≠ T [i]
7. do q ← Π [q] // next character does not match
8. If P [q + 1] = T [i]
9. then q ← q + 1 // next character matches
10. If q = m // is all of p matched?
11. then print "Pattern occurs with shift" i - m
12. q ← Π [q] // look for the next match

Running Time Analysis:

The for loop beginning in step 5 runs 'n' times, i.e., as long as the length of the string 'S.'
Since step 1 to step 4 take constant times, the running time is dominated by this for the
loop. Thus running time of the matching function is O (n).

Example: Given a string 'T' and pattern 'P' as follows:

Let us execute the KMP Algorithm to find whether 'P' occurs in 'T.'

For 'p' the prefix function, ? was computed previously and is as follows:

Solution:
Y.Jyothi , Assistant Professor, IT Department, Dhanekula Institute of Engineering & Technology 14
R19 IT III YEAR 1 SEMESTER DESIGN AND ANALYSIS OF ALGORITHMS UNIT V
Initially: n = size of T = 15
m = size of P = 7

Y.Jyothi , Assistant Professor, IT Department, Dhanekula Institute of Engineering & Technology 15


R19 IT III YEAR 1 SEMESTER DESIGN AND ANALYSIS OF ALGORITHMS UNIT V

Y.Jyothi , Assistant Professor, IT Department, Dhanekula Institute of Engineering & Technology 16


R19 IT III YEAR 1 SEMESTER DESIGN AND ANALYSIS OF ALGORITHMS UNIT V

Pattern 'P' has been found to complexity occur in a string 'T.' The total number of shifts that
took place for the match to be found is i-m = 13 - 7 = 6 shifts.

Y.Jyothi , Assistant Professor, IT Department, Dhanekula Institute of Engineering & Technology 17


R19 IT III YEAR 1 SEMESTER DESIGN AND ANALYSIS OF ALGORITHMS UNIT V
TRIES
The word "Trie" is an excerpt from the word "retrieval". Trie is a sorted tree-based data-
structure that stores the set of strings. It has the number of pointers equal to the number of
characters of the alphabet in each node. It can search a word in the dictionary with the help of
the word's prefix. For example, if we assume that all strings are formed from the letters 'a' to 'z'
in the English alphabet, each trie node can have a maximum of 26 points.
Trie is also known as the digital tree or prefix tree. The position of a node in the Trie
determines the key with which that node is connected.
Properties of the Trie for a set of the string:
1.The root node of the trie always represents the null node.
2.Each child of nodes is sorted alphabetically.
3.Each node can have a maximum of 26 children (A to Z).
4.Each node (except the root) can store one letter of the alphabet.
The diagram below depicts a trie representation for the bell, bear, bore, bat, ball, stop, stock,
and stack.

Basic operations of Trie


There are three operations in the Trie:
1. Insertion of a node
2. Searching a node
3. Deletion of a node
Y.Jyothi , Assistant Professor, IT Department, Dhanekula Institute of Engineering & Technology 18
R19 IT III YEAR 1 SEMESTER DESIGN AND ANALYSIS OF ALGORITHMS UNIT V
Insert of a node in the Trie
The first operation is to insert a new node into the trie. Before we start the implementation, it is
important to understand some points:
1. Every letter of the input key (word) is inserted as an individual in the Trie_node. Note that
children point to the next level of Trie nodes.
2. The key character array acts as an index of children.
3. If the present node already has a reference to the present letter, set the present node to that
referenced node. Otherwise, create a new node, set the letter to be equal to the present letter,
and even start the present node with this new node.
4. The character length determines the depth of the trie.
Searching a node in Trie
The second operation is to search for a node in a Trie. The searching operation is similar to the
insertion operation. The search operation is used to search a key in the trie.
Deletion of a node in the Trie
The Third operation is the deletion of a node in the Trie. Before we begin the implementation,
it is important to understand some points:
1.If the key is not found in the trie, the delete operation will stop and exit it.
2.If the key is found in the trie, delete it from the trie.
Applications of Trie
1. Spell Checker
Spell checking is a three-step process. First, look for that word in a dictionary, generate
possible suggestions, and then sort the suggestion words with the desired word at the top.
Trie is used to store the word in dictionaries. The spell checker can easily be applied in the
most efficient way by searching for words on a data structure. Using trie not only makes it easy
to see the word in the dictionary, but it is also simple to build an algorithm to include a
collection of relevant words or suggestions.
2. Auto-complete
Auto-complete functionality is widely used on text editors, mobile applications, and the
Internet. It provides a simple way to find an alternative word to complete the word for the
following reasons.
It provides an alphabetical filter of entries by the key of the node.
We trace pointers only to get the node that represents the string entered by the user.
As soon as you start typing, it tries to complete your input.
3. Browser history
It is also used to complete the URL in the browser. The browser keeps a history of the URLs of
the websites you've visited.
Advantages of Trie
It can be insert faster and search the string than hash tables and binary search trees.
It provides an alphabetical filter of entries by the key of the node.
Disadvantages of Trie
It requires more memory to store the strings.
It is slower than the hash table.

Y.Jyothi , Assistant Professor, IT Department, Dhanekula Institute of Engineering & Technology 19


R19 IT III YEAR 1 SEMESTER DESIGN AND ANALYSIS OF ALGORITHMS UNIT V
Compressed Tries
Tries with nodes of degree at least 2. It is accomplished by compressing the nodes of the
standard trie. It is also known as Radix Tries. It is used to achieve space optimization
Since the nodes are compressed. Let’s visually compare the structure of the Standard tree and
the compressed tree for a better approach. In terms of memory, a compressed trie tree uses very
few amounts of nodes which gives a huge memory advantage(especially for long) strings with
long common prefixes. In terms of speed, a regular trie tree would be slightly faster because its
operations don’t involve any string operations, they are simple loops.
In the below image, the left tree is a Standard trie, the right tree is a compressed trie.

SUFFIX TREES
Suffix tree is a compressed trie of all the suffixes of a given string. Suffix trees help in solving
a lot of string related problems like pattern matching, finding distinct substrings in a given
string, finding longest palindrome etc
Before going to suffix tree, let's first try to understand what a compressed trie is.
Consider the following set of strings: {"banana","nabd","bcdef","bcfeg","aaaaaa","aaabaa" }
A standard trie for the above set of strings will look like:

Y.Jyothi , Assistant Professor, IT Department, Dhanekula Institute of Engineering & Technology 20


R19 IT III YEAR 1 SEMESTER DESIGN AND ANALYSIS OF ALGORITHMS UNIT V
And a compressed trie for the given set of strings will look like:

As it might be clear from the images show above, in a compressed trie, edges that direct to a
node having single child are combined together to form a single edge and their edge labels are
concatenated. So this means that each internal node in a compressed trie has atleast two
children. Also it has atmost N leaves, where N is the number of strings inserted in the
compressed trie. Now both the facts: Each internal node having atleast two children, and that
there are N leaves, implies that there are atmost 2N−1 nodes in the trie. So the space
complexity of a compressed trie is O(N) as compared to the O(N2) of a normal trie.
So that is one reason why to use compressed tries over normal tries.
Before going to construction of suffix trees, there is one more thing that should be understood,
Implicit Suffix Tree. In Implicit suffix trees, there are atmost N leaves, while in normal one
there should be exactly N leaves. The reason for atmost N leaves is one suffix being prefix of
another suffix. Following example will make it clear. Consider the string "banana"
Implicit Suffix Tree for the above string is shown in image below:

To avoid getting an Implicit Suffix Tree we append a special character that is not equal to any
other character of the string. Suppose we append $ to the given string then, so the new string
is "banana$". Now its suffix tree will be
Y.Jyothi , Assistant Professor, IT Department, Dhanekula Institute of Engineering & Technology 21
R19 IT III YEAR 1 SEMESTER DESIGN AND ANALYSIS OF ALGORITHMS UNIT V

Now let's go to the construction of the suffix trees.


Suffix tree as mentioned previously is a compressed trie of all the suffixes of a given string, so
the brute force approach will be to consider all the suffixes of the given string as separate
strings and insert them in the trie one by one. But time complexity of the brute force approach
is O(N2), and that is of no use for large values of N.

Y.Jyothi , Assistant Professor, IT Department, Dhanekula Institute of Engineering & Technology 22

You might also like