0% found this document useful (0 votes)

87 views22 pages

Daa Unit 5

The document discusses NP-hard and NP-complete problems. It defines NP, NP-hard, and NP-complete problems and explains the relationships between them. The Cook-Levin theorem, also known as Cook's theorem, states that the Boolean satisfiability problem (SAT) is NP-complete. It discusses how Stephen Cook and Leonid Levin independently proved in 1971 that SAT is NP-complete by showing that any problem in NP can be reduced to SAT in polynomial time. This established that SAT and many other problems are NP-complete.

Uploaded by

Kaarlet

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

87 views22 pages

Daa Unit 5

Uploaded by

Kaarlet

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 22

R19 IT III YEAR 1 SEMESTER DESIGN AND ANALYSIS OF ALGORITHMS UNIT V

DESIGN AND ANALYSIS OF ALGORITHMS

UNIT V NP-Hard and NP-Complete problems: Basic concepts, Cook’s Theorem.
String Matching: Introduction, String Matching-Meaning and Application, NaÏve
String Matching Algorithm, Rabin-Karp Algorithm, Knuth-Morris-Pratt Automata,
Tries, Suffix Tree.

BASIC CONCEPTS
In this chapter we are concerned with the distinction between problems that can be solved by a
polynomial time algorithm and problems for which no polynomial time algorithm is known. It
is an unexplained phenomenon that for many of the problems we know and study, the best
algorithms for their solutions have computing times that cluster into two groups.The first group
consists of problems whose solution times are bounded by polynomials of small degree The
second group is made up of problems whose best-known algorithms are non polynomial In the
quest to develop efficient algorithms,no one has been able to develop a polynomial time
algorithm for any problem in the second group.
The theory of NP-complete doesn’t provide a method of obtaining polynomial time algorithms
for problems in the second group. Nor does it say that algorithms of this complexity do not
exist
Instead,what we do is show that many of the problems for which there are no known
polynomial time algorithms are computationally related.In fact, we establish two classes of
problems.These are given the names NP-hard and NP-complete.A problem that is NP-complete
has the property that it can be solved in polynomial time if and only if all other NP-complete
problems can also be solved in polynomial time. If an NP-hard problem can be solved in
polynomial time,then all NP-complete problems can be solved in polynomial time. All NP-
complete problems are NP-hard, but some NP-hard problems are not known to be NP-complete

Non deterministic Algorithm

In deterministic algorithm, for a given particular input, the computer will always produce
the same output going through the same states but in case of non-deterministic algorithm,
for the same input, the compiler may produce different output in different runs. In fact non-
deterministic algorithms can’t solve the problem in polynomial time and can’t determine what
is the next step. The non-deterministic algorithms can show different behaviors for the same
input on different execution and there is a degree of randomness to it.

Y.Jyothi , Assistant Professor, IT Department, Dhanekula Institute of Engineering & Technology 1

R19 IT III YEAR 1 SEMESTER DESIGN AND ANALYSIS OF ALGORITHMS UNIT V
To implement a non-deterministic algorithm, we have a couple of languages like Prolog but
these don’t have standard programming language operators and these operators are not a part of
any standard programming languages.

Some of the terms related to the non-deterministic algorithm are defined below:
choice(X) : chooses any value randomly from the set X.
failure() : denotes the unsuccessful solution.
success() : Solution is successful and current thread terminates.
Example :
Problem Statement : Search an element x on A[1:n] where n>=1, on successful search return j
if a[j] is equals to x otherwise return 0.
Non-deterministic Algorithm for this problem :
j= choice(a, n)
if(A[j]==x) then
{
write(j);
success();
}
write(0); failure();

The Classes NP-hard and NP-Complete

NP Problem:
The NP problems set of problems whose solutions are hard to find but easy to verify and are
solved by Non-Deterministic Machine in polynomial time.
NP-Hard Problem:
A Problem X is NP-Hard if there is an NP-Complete problem Y, such that Y is reducible to X
in polynomial time. NP-Hard problems are as hard as NP-Complete problems. NP-Hard
Problem need not be in NP class.
NP-Complete Problem:
A problem X is NP-Complete if there is an NP problem Y, such that Y is reducible to X in
polynomial time. NP-Complete problems are as hard as NP problems. A problem is NP-
Complete if it is a part of both NP and NP-Hard Problem. A non-deterministic Turing machine
can solve NP-Complete problem in polynomial time.
NP-hard NP-Complete

NP-Hard problems(say X) can be solved if and NP-Complete problems can be solved by

only if there is a NP-Complete problem(say Y) that a non-deterministic Algorithm/Turing
can be reducible into X in polynomial time. Machine in polynomial time.

To solve this problem, it do not have to be in NP . To solve this problem, it must be both NP
and NP-hard problems.

Y.Jyothi , Assistant Professor, IT Department, Dhanekula Institute of Engineering & Technology 2

R19 IT III YEAR 1 SEMESTER DESIGN AND ANALYSIS OF ALGORITHMS UNIT V

NP-hard NP-Complete

Do not have to be a Decision problem. It is exclusively a Decision problem.

Example: Halting problem, Vertex cover problem, Example: Determine whether a graph has
etc. a Hamiltonian cycle, Determine whether
a Boolean formula is satisfiable or not,
Circuit-satisfiability problem, etc.

COOK–LEVIN THEOREM OR COOK’S THEOREM

In computational complexity theory, the Cook–Levin theorem, also known as Cook’s theorem,
states that the Boolean satisfiability problem is NP-complete. That is, it is in NP, and any
problem in NP can be reduced in polynomial time by a deterministic Turing machine to the
Boolean satisfiability problem.
Stephen Arthur Cook and L.A. Levin in 1973 independently proved that the satisfiability
problem(SAT) is NP-complete. Stephen Cook, in 1971, published an important paper titled
‘The complexity of Theorem Proving Procedures’, in which he outlined the way of obtaining
the proof of an NP-complete problem by reducing it to SAT. He proved Circuit-
SAT and 3CNF-SAT problems are as hard as SAT. Similarly, Leonid Levin independently
worked on this problem in the then Soviet Union. The proof that SAT is NP-complete was
obtained due to the efforts of these two scientists. Later, Karp reduced 21 optimization
problems, such as Hamiltonian tour, vertex cover, and clique, to the SAT and proved that those
problems are NP-complete.
Y.Jyothi , Assistant Professor, IT Department, Dhanekula Institute of Engineering & Technology 3
R19 IT III YEAR 1 SEMESTER DESIGN AND ANALYSIS OF ALGORITHMS UNIT V
Hence, an SAT is a significant problem and can be stated as follows:
Given a boolean expression F having n variables x1,x2,….,xn, and Boolean operators, is it
possible to have an assignment for variables true or false such that binary expression F is true?
This problem is also known as the formula – SAT. An SAT(formula-SAT or simply SAT) takes
a Boolean expression F and checks whether the given expression(or formula) is satisfiable. A
Boolean expression is said to be satisfactory for some valid assignments of variables if the
evaluation comes to be true. Prior to discussing the details of SAT, let us now discuss some
important terminologies of a Boolean expression.
Boolean variable: A variable, say x, that can have only two values, true or false, is called a
boolean variable
Literal: A literal can be a logical variable, say x, or the negation of it, that is x or x̄; x is called a
positive literal, and x̄ is called the negative literal
Clause: A sequence of variables(x1,x2,….,xn) that can be separated by a logical OR operator is
called a clause. For example, (x1 V x2 V x3) is a clause of three literals.
Expressions: One can combine all the preceding clauses using a Boolean operator to form an
expression.
CNF form: An expression is in CNF form(conjunctive normal form) if the set of clauses are
separated by an AND (^), operator, while the literals are connected by an OR (v) operator. The
following is an example of an expression in the CNF form:

3 – CNF: An expression is said to be in 3-CNF if it is in the conjunctive normal form, and

every clause has exact three literals.
Thus, an SAT is one of the toughest problems, as there is no known algorithm other than the
brute force approach. A brute force algorithm would be an exponential-time algorithm,
as 2n possible assignments need to be tried to check whether the given Boolean expression is
true or not. Stephen Cook and Leonid Levin proved that the SAT is NP-complete.
Cook demonstrated the reduction of other hard problems to SATs. Karp provided proof of 21
important problems, Such as Hamiltonian tour, vertex cover, and clique, by reducing it to SAT
using Karp reduction.
Let us briefly introduce the three types of SATs. They are as follows:
Circuit- SAT: A circuit-SAT can be stated as follows: given a Boolean circuit, which is a
collection of gates such as AND, OR, and NOT, and n inputs, is there any input assignments of
Boolean variables so that the output of the given circuit is true? Again, the difficulty with these
Y.Jyothi , Assistant Professor, IT Department, Dhanekula Institute of Engineering & Technology 4
R19 IT III YEAR 1 SEMESTER DESIGN AND ANALYSIS OF ALGORITHMS UNIT V
problems is that for n inputs to the circuit. 2n possible outputs should be checked.Therefore,
this brute force algorithm is an exponential algorithm and hence this is a hard problem.
CNF-SAT: This problem is a restricted problem of SAT, where the expression should be in a
conjunctive normal form. An expression is said to be in a conjunction form if all the clauses are
connected by the Boolean AND operator. Like in case of a SAT, this is about assigning truth
values to n variables such that the output of the expression is true.
3-CNF-SAT(3-SAT): This problem is another variant where the additional restriction is that the
expression is in a conjunctive normal form and that every clause should contain exactly three
literals. This problem is also about assigning n assignments of truth values to n variables of the
Boolean expression such that the output of the expression is true. In simple words, given an
expression in 3-CNF, a 3-SAT problem is to check whether the given expression is satisfiable.

STRING MATCHING:
The string-matching is a very useful tool which is used in string matching algorithm. It
examines every character in the text exactly once and reports all the valid shifts in O (n) time.
The goal of string matching is to find the location of specific text pattern within the larger body
of text (a sentence, a paragraph, a book, etc.)
Let us look at a few string matching algorithms before proceeding to their applications in real
world. String Matching Algorithms can broadly be classified into two types of algorithms –
1.Exact String Matching Algorithms
2.Approximate String Matching Algorithms
Exact String Matching Algorithms:
Exact string matching algorithms is to find one, several, or all occurrences of a defined string
(pattern) in a large string (text or sequences) such that each matching is perfect. All alphabets
of patterns must be matched to corresponding matched subsequence. These are further
classified into four categories:
Algorithms based on character comparison:
Naive Algorithm: It slides the pattern over text one by one and check for a match. If a match is
found, then slides by 1 again to check for subsequent matches.
KMP (Knuth Morris Pratt) Algorithm: The idea is whenever a mismatch is detected, we
already know some of the characters in the text of the next window. So, we take advantage of
this information to avoid matching the characters that we know will anyway match.
Boyer Moore Algorithm: This algorithm uses best heurestics of Naive and KMP algorithm and
starts matching from the last character of the pattern.
Using the Trie data structure: It is used as an efficient information retrieval data structure. It
stores the keys in form of a balanced BST.
Deterministic Finite Automaton (DFA) method:
Automaton Matcher Algorithm: It starts from the first state of the automata and the first
character of the text. At every step, it considers next character of text, and look for the next
state in the built finite automata and move to a new state.
Algorithms based on Bit (parallelism method):
Aho-Corasick Algorithm: It finds all words in O(n + m + z) time where n is the length of text
and m be the total number characters in all words and z is total number of occurrences of words
in text. This algorithm forms the basis of the original Unix command fgrep.
Hashing-string matching algorithms:
Y.Jyothi , Assistant Professor, IT Department, Dhanekula Institute of Engineering & Technology 5
R19 IT III YEAR 1 SEMESTER DESIGN AND ANALYSIS OF ALGORITHMS UNIT V
Rabin Karp Algorithm: It matches the hash value of the pattern with the hash value of current
substring of text, and if the hash values match then only it starts matching individual characters.
Approximate String Matching Algorithms:
Approximate String Matching Algorithms (also known as Fuzzy String Searching) searches for
substrings of the input string. More specifically, the approximate string matching approach is
stated as follows: Suppose that we are given two strings, text T[1…n] and pattern P[1…m].
The task is to find all the occurrences of patterns in the text whose edit distance to the pattern is
at most k. Some well known edit distances are – Levenshtein edit distance and Hamming edit
distance.
These techniques are used when the quality of the text is low, there are spelling errors in the
pattern or text, finding DNA subsequences after mutation, heterogeneous databases, etc. Some
approximate string matching algorithms are:
Naive Approach: It slides the pattern over text one by one and check for approximate matches.
If they are found, then slides by 1 again to check for subsequent approximate matches.
Sellers Algorithm (Dynamic Programming)
Shift or Algorithm (Bitmap Algorithm)
Applications of String Matching Algorithms:
Plagiarism Detection: The documents to be compared are decomposed into string tokens and
compared using string matching algorithms. Thus, these algorithms are used to detect
similarities between them and declare if the work is plagiarized or original.

Bioinformatics and DNA Sequencing: Bioinformatics involves applying information

technology and computer science to problems involving genetic sequences to find DNA
patterns. String matching algorithms and DNA analysis are both collectively used for finding
the occurrence of the pattern set.

Digital Forensics: String matching algorithms are used to locate specific text strings of interest
in the digital forensic text, which are useful for the investigation.
Y.Jyothi , Assistant Professor, IT Department, Dhanekula Institute of Engineering & Technology 6
R19 IT III YEAR 1 SEMESTER DESIGN AND ANALYSIS OF ALGORITHMS UNIT V
Spelling Checker: Trie is built based on a predefined set of patterns. Then, this trie is used for
string matching. The text is taken as input, and if any such pattern occurs, it is shown by
reaching the acceptance state.

Spam filters: Spam filters use string matching to discard the spam. For example, to categorize
an email as spam or not, suspected spam keywords are searched in the content of the email by
string matching algorithms. Hence, the content is classified as spam or not.

Search engines or content search in large databases: To categorize and organize data efficiently,
string matching algorithms are used. Categorization is done based on the search keywords.
Thus, string matching algorithms make it easier for one to find the information they ar
searching for.

Intrusion Detection System: The data packets containing intrusion-related keywords are found
by applying string matching algorithms. All the malicious code is stored in the database, and
every incoming data is compared with stored data. If a match is found, then the alarm is
generated. It based on exact string matching algorithms where each intruded packet must be
detected.

Y.Jyothi , Assistant Professor, IT Department, Dhanekula Institute of Engineering & Technology 7

R19 IT III YEAR 1 SEMESTER DESIGN AND ANALYSIS OF ALGORITHMS UNIT V

THE NAIVE STRING MATCHING ALGORITHM

The naïve approach tests all the possible placement of Pattern P [1.......m] relative to text T
[1......n]. We try shift s = 0, 1.......n-m, successively and for each shift s. Compare T
[s+1.......s+m] to P [1......m].
The naïve algorithm finds all valid shifts using a loop that checks the condition P [1.......m] = T
[s+1.......s+m] for each of the n - m +1 possible value of s.
NAIVE-STRING-MATCHER (T, P)
1. n ← length [T]
2. m ← length [P]
3. for s ← 0 to n -m
4. do if P [1.....m] = T [s + 1....s + m]
5. then print "Pattern occurs with shift" s
Analysis: This for loop from 3 to 5 executes for n-m + 1(we need at least m characters at the
end) times and in iteration we are doing m comparisons. So the total complexity is O (n-m+1).
Example:
Suppose T = 1011101110
P = 111
Find all the Valid Shift
Solution:

Y.Jyothi , Assistant Professor, IT Department, Dhanekula Institute of Engineering & Technology 8

R19 IT III YEAR 1 SEMESTER DESIGN AND ANALYSIS OF ALGORITHMS UNIT V

Disadvantages of Naïve String Matching

There is only one disadvantage of the naïve string matching approach, which is that it is
inefficient. This is because when it has found a position, it does not use it again to find the
other position. It goes back to the starting point and looks for the pattern over again.

Y.Jyothi , Assistant Professor, IT Department, Dhanekula Institute of Engineering & Technology 9

R19 IT III YEAR 1 SEMESTER DESIGN AND ANALYSIS OF ALGORITHMS UNIT V
THE RABIN-KARP-ALGORITHM
The Rabin-Karp string matching algorithm calculates a hash value for the pattern, as well as for
each M-character subsequences of text to be compared. If the hash values are unequal, the
algorithm will determine the hash value for next M-character sequence. If the hash values are
equal, the algorithm will analyze the pattern and the M-character sequence. In this way, there is
only one comparison per text subsequence, and character matching is only required when the
hash values match.
RABIN-KARP-MATCHER (T, P, d, q)
1. n ← length [T]
2. m ← length [P]
3. h ← dm-1 mod q
4. p ← 0
5. t0 ← 0
6. for i ← 1 to m
7. do p ← (dp + P[i]) mod q
8. t0 ← (dt0+T [i]) mod q
9. for s ← 0 to n-m
10. do if p = ts
11. then if P [1.....m] = T [s+1.....s + m]
12. then "Pattern occurs with shift" s
13. If s < n-m
14. then ts+1 ← (d (ts-T [s+1]h)+T [s+m+1])mod q
Example: For string matching, working module q = 11, how many spurious hits does the
Rabin-Karp matcher encounters in Text T = 31415926535.......
T = 31415926535.......
P = 26
Here T.Length =11 so Q = 11
And P mod Q = 26 mod 11 = 4
Now find the exact match of P mod Q...
Solution:

Y.Jyothi , Assistant Professor, IT Department, Dhanekula Institute of Engineering & Technology 10

R19 IT III YEAR 1 SEMESTER DESIGN AND ANALYSIS OF ALGORITHMS UNIT V

Complexity:
The running time of RABIN-KARP-MATCHER in the worst case scenario O ((n-m+1) m but it
has a good average case running time. If the expected number of strong shifts is small O (1) and
Y.Jyothi , Assistant Professor, IT Department, Dhanekula Institute of Engineering & Technology 11
R19 IT III YEAR 1 SEMESTER DESIGN AND ANALYSIS OF ALGORITHMS UNIT V
prime q is chosen to be quite large, then the Rabin-Karp algorithm can be expected to run in
time O (n+m) plus the time to require to process spurious hits.
Drawback:Just by applying the hash function based on the table created , then sometimes even
though Hash Code matched of both pattern and Text but the characters of the Pattern dont
matches to the Text.

THE KNUTH-MORRIS-PRATT (KMP)ALGORITHM

Knuth-Morris and Pratt introduce a linear time algorithm for the string matching problem. A
matching time of O (n) is achieved by avoiding comparison with an element of 'S' that have
previously been involved in comparison with some element of the pattern 'p' to be matched. i.e.,
backtracking on the string 'S' never occurs
Components of KMP Algorithm:
1. The Prefix Function (Π): The Prefix Function, Π for a pattern encapsulates knowledge about
how the pattern matches against the shift of itself. This information can be used to avoid a
useless shift of the pattern 'p.' In other words, this enables avoiding backtracking of the string
'S.'
2. The KMP Matcher: With string 'S,' pattern 'p' and prefix function 'Π' as inputs, find the
occurrence of 'p' in 'S' and returns the number of shifts of 'p' after which occurrences are found.
The Prefix Function (Π)
Following pseudo code compute the prefix function, Π:
COMPUTE- PREFIX- FUNCTION (P)
1. m ←length [P] //'p' pattern to be matched
2. Π [1] ← 0
3. k ← 0
4. for q ← 2 to m
5. do while k > 0 and P [k + 1] ≠ P [q]
6. do k ← Π [k]
7. If P [k + 1] = P [q]
8. then k← k + 1
9. Π [q] ← k
10. Return Π
Running Time Analysis:
In the above pseudo code for calculating the prefix function, the for loop from step 4 to step 10
runs 'm' times. Step1 to Step3 take constant time. Hence the running time of computing prefix
function is O (m).
Example: Compute Π for the pattern 'p' below:

Solution:
Initially: m = length [p] = 7
Π [1] = 0
k=0

Y.Jyothi , Assistant Professor, IT Department, Dhanekula Institute of Engineering & Technology 12

R19 IT III YEAR 1 SEMESTER DESIGN AND ANALYSIS OF ALGORITHMS UNIT V

After iteration 6 times, the prefix function computation is complete:

Y.Jyothi , Assistant Professor, IT Department, Dhanekula Institute of Engineering & Technology 13
R19 IT III YEAR 1 SEMESTER DESIGN AND ANALYSIS OF ALGORITHMS UNIT V

The KMP Matcher:

The KMP Matcher with the pattern 'p,' the string 'S' and prefix function 'Π' as input, finds a
match of p in S. Following pseudo code compute the matching component of KMP algorithm:
KMP-MATCHER (T, P)
1. n ← length [T]
2. m ← length [P]
3. Π← COMPUTE-PREFIX-FUNCTION (P)
4. q ← 0 // numbers of characters matched
5. for i ← 1 to n // scan S from left to right
6. do while q > 0 and P [q + 1] ≠ T [i]
7. do q ← Π [q] // next character does not match
8. If P [q + 1] = T [i]
9. then q ← q + 1 // next character matches
10. If q = m // is all of p matched?
11. then print "Pattern occurs with shift" i - m
12. q ← Π [q] // look for the next match

Running Time Analysis:

The for loop beginning in step 5 runs 'n' times, i.e., as long as the length of the string 'S.'
Since step 1 to step 4 take constant times, the running time is dominated by this for the
loop. Thus running time of the matching function is O (n).

Example: Given a string 'T' and pattern 'P' as follows:

Let us execute the KMP Algorithm to find whether 'P' occurs in 'T.'

For 'p' the prefix function, ? was computed previously and is as follows:

Solution:
Y.Jyothi , Assistant Professor, IT Department, Dhanekula Institute of Engineering & Technology 14
R19 IT III YEAR 1 SEMESTER DESIGN AND ANALYSIS OF ALGORITHMS UNIT V
Initially: n = size of T = 15
m = size of P = 7

Y.Jyothi , Assistant Professor, IT Department, Dhanekula Institute of Engineering & Technology 15

R19 IT III YEAR 1 SEMESTER DESIGN AND ANALYSIS OF ALGORITHMS UNIT V

Y.Jyothi , Assistant Professor, IT Department, Dhanekula Institute of Engineering & Technology 16

R19 IT III YEAR 1 SEMESTER DESIGN AND ANALYSIS OF ALGORITHMS UNIT V

Pattern 'P' has been found to complexity occur in a string 'T.' The total number of shifts that
took place for the match to be found is i-m = 13 - 7 = 6 shifts.

Y.Jyothi , Assistant Professor, IT Department, Dhanekula Institute of Engineering & Technology 17

R19 IT III YEAR 1 SEMESTER DESIGN AND ANALYSIS OF ALGORITHMS UNIT V
TRIES
The word "Trie" is an excerpt from the word "retrieval". Trie is a sorted tree-based data-
structure that stores the set of strings. It has the number of pointers equal to the number of
characters of the alphabet in each node. It can search a word in the dictionary with the help of
the word's prefix. For example, if we assume that all strings are formed from the letters 'a' to 'z'
in the English alphabet, each trie node can have a maximum of 26 points.
Trie is also known as the digital tree or prefix tree. The position of a node in the Trie
determines the key with which that node is connected.
Properties of the Trie for a set of the string:
1.The root node of the trie always represents the null node.
2.Each child of nodes is sorted alphabetically.
3.Each node can have a maximum of 26 children (A to Z).
4.Each node (except the root) can store one letter of the alphabet.
The diagram below depicts a trie representation for the bell, bear, bore, bat, ball, stop, stock,
and stack.

Basic operations of Trie

There are three operations in the Trie:
1. Insertion of a node
2. Searching a node
3. Deletion of a node
Y.Jyothi , Assistant Professor, IT Department, Dhanekula Institute of Engineering & Technology 18
R19 IT III YEAR 1 SEMESTER DESIGN AND ANALYSIS OF ALGORITHMS UNIT V
Insert of a node in the Trie
The first operation is to insert a new node into the trie. Before we start the implementation, it is
important to understand some points:
1. Every letter of the input key (word) is inserted as an individual in the Trie_node. Note that
children point to the next level of Trie nodes.
2. The key character array acts as an index of children.
3. If the present node already has a reference to the present letter, set the present node to that
referenced node. Otherwise, create a new node, set the letter to be equal to the present letter,
and even start the present node with this new node.
4. The character length determines the depth of the trie.
Searching a node in Trie
The second operation is to search for a node in a Trie. The searching operation is similar to the
insertion operation. The search operation is used to search a key in the trie.
Deletion of a node in the Trie
The Third operation is the deletion of a node in the Trie. Before we begin the implementation,
it is important to understand some points:
1.If the key is not found in the trie, the delete operation will stop and exit it.
2.If the key is found in the trie, delete it from the trie.
Applications of Trie
1. Spell Checker
Spell checking is a three-step process. First, look for that word in a dictionary, generate
possible suggestions, and then sort the suggestion words with the desired word at the top.
Trie is used to store the word in dictionaries. The spell checker can easily be applied in the
most efficient way by searching for words on a data structure. Using trie not only makes it easy
to see the word in the dictionary, but it is also simple to build an algorithm to include a
collection of relevant words or suggestions.
2. Auto-complete
Auto-complete functionality is widely used on text editors, mobile applications, and the
Internet. It provides a simple way to find an alternative word to complete the word for the
following reasons.
It provides an alphabetical filter of entries by the key of the node.
We trace pointers only to get the node that represents the string entered by the user.
As soon as you start typing, it tries to complete your input.
3. Browser history
It is also used to complete the URL in the browser. The browser keeps a history of the URLs of
the websites you've visited.
Advantages of Trie
It can be insert faster and search the string than hash tables and binary search trees.
It provides an alphabetical filter of entries by the key of the node.
Disadvantages of Trie
It requires more memory to store the strings.
It is slower than the hash table.

Y.Jyothi , Assistant Professor, IT Department, Dhanekula Institute of Engineering & Technology 19

R19 IT III YEAR 1 SEMESTER DESIGN AND ANALYSIS OF ALGORITHMS UNIT V
Compressed Tries
Tries with nodes of degree at least 2. It is accomplished by compressing the nodes of the
standard trie. It is also known as Radix Tries. It is used to achieve space optimization
Since the nodes are compressed. Let’s visually compare the structure of the Standard tree and
the compressed tree for a better approach. In terms of memory, a compressed trie tree uses very
few amounts of nodes which gives a huge memory advantage(especially for long) strings with
long common prefixes. In terms of speed, a regular trie tree would be slightly faster because its
operations don’t involve any string operations, they are simple loops.
In the below image, the left tree is a Standard trie, the right tree is a compressed trie.

SUFFIX TREES
Suffix tree is a compressed trie of all the suffixes of a given string. Suffix trees help in solving
a lot of string related problems like pattern matching, finding distinct substrings in a given
string, finding longest palindrome etc
Before going to suffix tree, let's first try to understand what a compressed trie is.
Consider the following set of strings: {"banana","nabd","bcdef","bcfeg","aaaaaa","aaabaa" }
A standard trie for the above set of strings will look like:

Y.Jyothi , Assistant Professor, IT Department, Dhanekula Institute of Engineering & Technology 20

R19 IT III YEAR 1 SEMESTER DESIGN AND ANALYSIS OF ALGORITHMS UNIT V
And a compressed trie for the given set of strings will look like:

As it might be clear from the images show above, in a compressed trie, edges that direct to a
node having single child are combined together to form a single edge and their edge labels are
concatenated. So this means that each internal node in a compressed trie has atleast two
children. Also it has atmost N leaves, where N is the number of strings inserted in the
compressed trie. Now both the facts: Each internal node having atleast two children, and that
there are N leaves, implies that there are atmost 2N−1 nodes in the trie. So the space
complexity of a compressed trie is O(N) as compared to the O(N2) of a normal trie.
So that is one reason why to use compressed tries over normal tries.
Before going to construction of suffix trees, there is one more thing that should be understood,
Implicit Suffix Tree. In Implicit suffix trees, there are atmost N leaves, while in normal one
there should be exactly N leaves. The reason for atmost N leaves is one suffix being prefix of
another suffix. Following example will make it clear. Consider the string "banana"
Implicit Suffix Tree for the above string is shown in image below:

To avoid getting an Implicit Suffix Tree we append a special character that is not equal to any
other character of the string. Suppose we append $ to the given string then, so the new string
is "banana$". Now its suffix tree will be
Y.Jyothi , Assistant Professor, IT Department, Dhanekula Institute of Engineering & Technology 21
R19 IT III YEAR 1 SEMESTER DESIGN AND ANALYSIS OF ALGORITHMS UNIT V

Now let's go to the construction of the suffix trees.

Suffix tree as mentioned previously is a compressed trie of all the suffixes of a given string, so
the brute force approach will be to consider all the suffixes of the given string as separate
strings and insert them in the trie one by one. But time complexity of the brute force approach
is O(N2), and that is of no use for large values of N.

Y.Jyothi , Assistant Professor, IT Department, Dhanekula Institute of Engineering & Technology 22

P, NP, NP - Complete, NP Hard
No ratings yet
P, NP, NP - Complete, NP Hard
19 pages
Unit-5-Np Hard and NP Complete Problems-1
100% (1)
Unit-5-Np Hard and NP Complete Problems-1
32 pages
NP Hard and NP Complete Problems Basic Concepts
No ratings yet
NP Hard and NP Complete Problems Basic Concepts
67 pages
DAA Unit 5
No ratings yet
DAA Unit 5
29 pages
System Dynamics Ogata 4th Solutions PDF
0% (1)
System Dynamics Ogata 4th Solutions PDF
5 pages
DAA Unit - 4
No ratings yet
DAA Unit - 4
45 pages
NP Completeness Lecture Notes
No ratings yet
NP Completeness Lecture Notes
7 pages
P, NP, Np-Complete and Np-Hard
100% (1)
P, NP, Np-Complete and Np-Hard
12 pages
P and NP Problems
No ratings yet
P and NP Problems
4 pages
Devops Unit-1
No ratings yet
Devops Unit-1
19 pages
Daa Unit-V
No ratings yet
Daa Unit-V
8 pages
DevOps UNIT3
No ratings yet
DevOps UNIT3
25 pages
NP Completeness
No ratings yet
NP Completeness
49 pages
Ahb Irs Kits Jntuk r19 Unit IV Complete
No ratings yet
Ahb Irs Kits Jntuk r19 Unit IV Complete
53 pages
NP-Completeness: Problems Proofs Approximations
No ratings yet
NP-Completeness: Problems Proofs Approximations
141 pages
NP Hard and NP Complete
No ratings yet
NP Hard and NP Complete
5 pages
Module 5
No ratings yet
Module 5
13 pages
Intro To NP Completeness Modified
No ratings yet
Intro To NP Completeness Modified
72 pages
Daa - Unit 5 Part 2
No ratings yet
Daa - Unit 5 Part 2
12 pages
NP-Completeness: Algorithms
No ratings yet
NP-Completeness: Algorithms
70 pages
Unit - 6 P, NP, and NP-Complete
No ratings yet
Unit - 6 P, NP, and NP-Complete
26 pages
Cook's Theorem
0% (1)
Cook's Theorem
18 pages
CAP538 Unit 6
No ratings yet
CAP538 Unit 6
31 pages
NP Completeness
No ratings yet
NP Completeness
20 pages
Daa PR10 123
No ratings yet
Daa PR10 123
19 pages
Plexity Theory
No ratings yet
Plexity Theory
23 pages
CS 580 Algorithm Design and Analysis: NP and NP-completeness
No ratings yet
CS 580 Algorithm Design and Analysis: NP and NP-completeness
35 pages
NP Np-Hard Np-Complete: Data Structure and Algorithm
0% (1)
NP Np-Hard Np-Complete: Data Structure and Algorithm
18 pages
7063.NP Hard
No ratings yet
7063.NP Hard
17 pages
CEng315 NP Complete 2008
No ratings yet
CEng315 NP Complete 2008
32 pages
Theory of NP Completeness
No ratings yet
Theory of NP Completeness
17 pages
Daa Unit5 Notes
No ratings yet
Daa Unit5 Notes
15 pages
AA Unit 5 Notes
No ratings yet
AA Unit 5 Notes
14 pages
NP NP-HARD
No ratings yet
NP NP-HARD
15 pages
Opus Terra - PUNQS3
No ratings yet
Opus Terra - PUNQS3
27 pages
Unit V
No ratings yet
Unit V
7 pages
Unit 5
No ratings yet
Unit 5
55 pages
Unit Iii Business Plan Preparation Entrepreneurship Development
100% (1)
Unit Iii Business Plan Preparation Entrepreneurship Development
10 pages
Hamiltonian Path Problems
No ratings yet
Hamiltonian Path Problems
12 pages
Aditya Engineering College (A) : Unit-V
No ratings yet
Aditya Engineering College (A) : Unit-V
11 pages
Daa R20 Unit 5
No ratings yet
Daa R20 Unit 5
13 pages
Unit 6 TOC
No ratings yet
Unit 6 TOC
38 pages
DAA Unit5 Part-1
No ratings yet
DAA Unit5 Part-1
7 pages
Lecture-9 P NP
No ratings yet
Lecture-9 P NP
5 pages
DAA Unit-V Material
No ratings yet
DAA Unit-V Material
3 pages
Daa 5
No ratings yet
Daa 5
12 pages
Daa 23
No ratings yet
Daa 23
4 pages
NP Completeness: Decision Problems
No ratings yet
NP Completeness: Decision Problems
6 pages
FSAN/ELEG815: Statistical Learning: Gonzalo R. Arce
No ratings yet
FSAN/ELEG815: Statistical Learning: Gonzalo R. Arce
50 pages
Lect1108 PDF
No ratings yet
Lect1108 PDF
5 pages
Introduction To LPP
No ratings yet
Introduction To LPP
10 pages
Vasvi Khullar Mca - Iv (B) 06417704417
No ratings yet
Vasvi Khullar Mca - Iv (B) 06417704417
5 pages
01 Simplex Method
No ratings yet
01 Simplex Method
42 pages
Correlation and Regression Analysis
No ratings yet
Correlation and Regression Analysis
34 pages
R19 It Iii Year 1 Semester Design and Analysis of Algorithms Unit 1
No ratings yet
R19 It Iii Year 1 Semester Design and Analysis of Algorithms Unit 1
39 pages
Unit V:: Design and Analysis of Algorithms
100% (1)
Unit V:: Design and Analysis of Algorithms
7 pages
Daa Unit2
No ratings yet
Daa Unit2
30 pages
Module 5 - NP
No ratings yet
Module 5 - NP
48 pages
Logit, Probit and Multinomial Logit Models in R: Oscar Torres-Reyna
No ratings yet
Logit, Probit and Multinomial Logit Models in R: Oscar Torres-Reyna
27 pages
Mod 5 (Aad)
No ratings yet
Mod 5 (Aad)
24 pages
Control Systems Engineering: January 2010
No ratings yet
Control Systems Engineering: January 2010
22 pages
9th Mathematics Home Work
No ratings yet
9th Mathematics Home Work
5 pages
Tractable Problem: A Problem That Is Solvable by A Polynomial-Time Algorithm - The Upper Bound Is Polynomial. Examples - Searching An
No ratings yet
Tractable Problem: A Problem That Is Solvable by A Polynomial-Time Algorithm - The Upper Bound Is Polynomial. Examples - Searching An
2 pages
Complexity in Design and Analysis of Algorithms
No ratings yet
Complexity in Design and Analysis of Algorithms
47 pages
Huffman Algorithm
No ratings yet
Huffman Algorithm
21 pages
R19 Devops Unit-3
No ratings yet
R19 Devops Unit-3
5 pages
PRAD A Bayesian Optimization-Based DSE Framework For Parameterized Reconfigurable Architecture Design
No ratings yet
PRAD A Bayesian Optimization-Based DSE Framework For Parameterized Reconfigurable Architecture Design
1 page
Cybersecurity Paper
No ratings yet
Cybersecurity Paper
16 pages
Unit 5 NP
No ratings yet
Unit 5 NP
13 pages
Mid 2 Daa
No ratings yet
Mid 2 Daa
17 pages
ST Stephen Girl College F3-Maths-17-18-my
No ratings yet
ST Stephen Girl College F3-Maths-17-18-my
12 pages
Np-Hard and Np-Complete Problems
No ratings yet
Np-Hard and Np-Complete Problems
36 pages
P, NP, NP-Hard and NP-Complete
No ratings yet
P, NP, NP-Hard and NP-Complete
30 pages
Lab Session 10
No ratings yet
Lab Session 10
9 pages
Np-Hard and Np-Complete Problems
No ratings yet
Np-Hard and Np-Complete Problems
26 pages
NP Complet Problems
No ratings yet
NP Complet Problems
11 pages
Devops UNIT5
No ratings yet
Devops UNIT5
8 pages
R-23 Adsnaa Unit-5 NP Hard and Complete
No ratings yet
R-23 Adsnaa Unit-5 NP Hard and Complete
7 pages
Winter 2022 3160714
No ratings yet
Winter 2022 3160714
2 pages
Tea Leaf Diseases Detection and Prevention by Using K-NN Algorithm
No ratings yet
Tea Leaf Diseases Detection and Prevention by Using K-NN Algorithm
10 pages
CSC321 Image Processing
No ratings yet
CSC321 Image Processing
5 pages
DAA - Unit - 5 - P & NP
No ratings yet
DAA - Unit - 5 - P & NP
11 pages
Inverse of Laplace (-S 4-14s 3+21s 2+9s+47) - ( (s+1) (s-2) 2 (S 2+3) ) - Symbolab
No ratings yet
Inverse of Laplace (-S 4-14s 3+21s 2+9s+47) - ( (s+1) (s-2) 2 (S 2+3) ) - Symbolab
9 pages
(OR1) Midterm Practice
No ratings yet
(OR1) Midterm Practice
6 pages
R19 Devops Unit-4
No ratings yet
R19 Devops Unit-4
5 pages
Aoa Unit 5
No ratings yet
Aoa Unit 5
5 pages
SAR ATR With Full-Angle Data Augmentation and Feat
No ratings yet
SAR ATR With Full-Angle Data Augmentation and Feat
5 pages
Trapezoidal Rule
No ratings yet
Trapezoidal Rule
4 pages
Dynamic Error
No ratings yet
Dynamic Error
7 pages
R19 Devops Unit-5
No ratings yet
R19 Devops Unit-5
8 pages
Interview Exp
No ratings yet
Interview Exp
2 pages
American International University-Bangladesh (AIUB) Faculty of Science & Technology Course Syllabus
No ratings yet
American International University-Bangladesh (AIUB) Faculty of Science & Technology Course Syllabus
2 pages
(AP CSP) (The Internet) Sequential vs. Parallel and Distributed (Student)
No ratings yet
(AP CSP) (The Internet) Sequential vs. Parallel and Distributed (Student)
2 pages
EEE-282N - S&S Quiz, U-1,2 - Solutions
No ratings yet
EEE-282N - S&S Quiz, U-1,2 - Solutions
2 pages
CCS 3101 Artificial Intelligence Course Outline
No ratings yet
CCS 3101 Artificial Intelligence Course Outline
2 pages
Chapter 8 Homework Assignment Ver 1.0
No ratings yet
Chapter 8 Homework Assignment Ver 1.0
2 pages
DAA R19 - Unit-5
No ratings yet
DAA R19 - Unit-5
12 pages
Toc PDF
No ratings yet
Toc PDF
1 page
Satplan: Fundamentals and Applications
From Everand
Satplan: Fundamentals and Applications
Fouad Sabry
No ratings yet
Nonlinear Transformations of Random Processes
From Everand
Nonlinear Transformations of Random Processes
Ralph Deutsch
No ratings yet

Daa Unit 5

Uploaded by

Daa Unit 5

Uploaded by

R19 IT III YEAR 1 SEMESTER DESIGN AND ANALYSIS OF ALGORITHMS UNIT V

DESIGN AND ANALYSIS OF ALGORITHMS

Non deterministic Algorithm

Y.Jyothi , Assistant Professor, IT Department, Dhanekula Institute of Engineering & Technology 1

The Classes NP-hard and NP-Complete

NP-Hard problems(say X) can be solved if and NP-Complete problems can be solved by

Y.Jyothi , Assistant Professor, IT Department, Dhanekula Institute of Engineering & Technology 2

Do not have to be a Decision problem. It is exclusively a Decision problem.

COOK–LEVIN THEOREM OR COOK’S THEOREM

3 – CNF: An expression is said to be in 3-CNF if it is in the conjunctive normal form, and

Bioinformatics and DNA Sequencing: Bioinformatics involves applying information

Y.Jyothi , Assistant Professor, IT Department, Dhanekula Institute of Engineering & Technology 7

THE NAIVE STRING MATCHING ALGORITHM

Y.Jyothi , Assistant Professor, IT Department, Dhanekula Institute of Engineering & Technology 8

Disadvantages of Naïve String Matching

Y.Jyothi , Assistant Professor, IT Department, Dhanekula Institute of Engineering & Technology 9

Y.Jyothi , Assistant Professor, IT Department, Dhanekula Institute of Engineering & Technology 10

THE KNUTH-MORRIS-PRATT (KMP)ALGORITHM

Y.Jyothi , Assistant Professor, IT Department, Dhanekula Institute of Engineering & Technology 12

After iteration 6 times, the prefix function computation is complete:

The KMP Matcher:

Running Time Analysis:

Example: Given a string 'T' and pattern 'P' as follows:

Y.Jyothi , Assistant Professor, IT Department, Dhanekula Institute of Engineering & Technology 15

Y.Jyothi , Assistant Professor, IT Department, Dhanekula Institute of Engineering & Technology 16

Y.Jyothi , Assistant Professor, IT Department, Dhanekula Institute of Engineering & Technology 17

Basic operations of Trie

Y.Jyothi , Assistant Professor, IT Department, Dhanekula Institute of Engineering & Technology 19

Y.Jyothi , Assistant Professor, IT Department, Dhanekula Institute of Engineering & Technology 20

Now let's go to the construction of the suffix trees.

Y.Jyothi , Assistant Professor, IT Department, Dhanekula Institute of Engineering & Technology 22

You might also like