[go: up one dir, main page]

0% found this document useful (0 votes)
146 views21 pages

Trie Tree

Trie trees are a specialized tree data structure used to store strings that can be broken down into sequences of substrings. They allow for efficient string searches by decomposing the search string and traversing the trie based on its components. Nodes in a trie store single characters, with child nodes representing subsequent characters. This allows strings to be stored by their internal paths. Variations include compressed tries, which remove redundant internal nodes, and suffix tries, which store all suffixes of a string. Tries are useful for fast string searches and are commonly used in web search engines to index words.

Uploaded by

Rajan Jaiprakash
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
146 views21 pages

Trie Tree

Trie trees are a specialized tree data structure used to store strings that can be broken down into sequences of substrings. They allow for efficient string searches by decomposing the search string and traversing the trie based on its components. Nodes in a trie store single characters, with child nodes representing subsequent characters. This allows strings to be stored by their internal paths. Variations include compressed tries, which remove redundant internal nodes, and suffix tries, which store all suffixes of a string. Tries are useful for fast string searches and are commonly used in web search engines to index words.

Uploaded by

Rajan Jaiprakash
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 21

Trie Tree

Indexed Search Tree (Trie)


Special case of tree
Applicable when
Key C can be decomposed into a sequence of subkeys
C
1
, C
2
, C
n
Redundancy exists between subkeys

Approach
Store subkey at each node
Path through trie yields full key
Example
Huffman tree
C
3
C
1
C
2
C
4
C
3
Tries
Useful for searching strings
String decomposes into sequence of letters
Example
ART A R T
Can be very fast
Less overhead than hashing
May reduce memory
Exploiting redundancy
May require more memory
Explicitly storing substrings
S
A
R
T E
ART
Types of Tries
Standard
Single character per node
Compressed
Eliminating chains of nodes
Compact
Stores indices into original string(s)
Suffix
Stores all suffixes of string
Standard Tries
Approach
Each node (except root) is labeled with a character
Children of node are ordered (alphabetically)
Paths from root to leaves yield all input strings
Trie for Morse
Code
Standard Trie Example
For strings
{ a, an, and, any, at }
Standard Trie Example
For strings
{ bear, bell, bid, bull, buy, sell, stock, stop }
a
e
b
r
l
l
s
u
l
l
y
e t
l
l
o
c
k
p
i
d
Standard Tries
Node structure
Value between 1m
Reference to m children
Array or linked list
Example
Class Node {
Letter value; // Letter V = { V
1
, V
2
, V
m
}
Node child[ m ];
}
Standard Tries
Efficiency
Uses O(n) space
Supports search / insert / delete in O(dm) time
For
n total size of strings indexed by trie
d length of the parameter string
m size of the alphabet
Word Matching Trie
Insert words
into trie
Each leaf
stores
occurrences of
word in the
text
s e e b e a r ? s e l l s t o c k !
s e e b u l l ? b u y s t o c k !
b i d s t o c k !
a
a
h e t h e b e l l ? s t o p !
b i d s t o c k !
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46
47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68
69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86
a r
87 88
a
e
b
l
s
u
l
e t
e
0, 24
o
c
i
l
r
6
l
78
d
47, 58
l
30
y
36
l
12
k
17, 40,
51, 62
p
84
h
e
r
69
a
Compressed Trie
Observation
Internal node v of T is redundant if v has one child and
is not the root
Approach
A chain of redundant nodes can be compressed
Replace chain with single node
Include concatenation of labels from chain
Result
Internal nodes have at least 2 children
Some nodes have multiple characters
Compressed Trie
e
b
ar ll
s
u
ll y
ell to
ck p
id
a
e
b
r
l
l
s
u
l
l
y
e t
l
l
o
c
k
p
i
d
Example
Compact Tries
Compact representation of a compressed trie
Approach
For an array of strings S = S*0+, S*s-1]
Store ranges of indices at each node
Instead of substring
Represent as a triplet of integers (i, j, k)
Such that X = s[i][j..k]
Example: S*0+ = abcd, (0,1,2) = bc
Properties
Uses O(s) space, where s = # of strings in the array
Serves as an auxiliary index structure
Compact Representation
Example
s e e
b e a r
s e l l
s t o c k
b u l l
b u y
b i d
h e
b e l l
s t o p
0 1 2 3 4
a r S[0] =
S[1] =
S[2] =
S[3] =
S[4] =
S[5] =
S[6] =
S[7] =
S[8] =
S[9] =
0 1 2 3 0 1 2 3
1, 1, 1
1, 0, 0 0, 0, 0
4, 1, 1
0, 2, 2
3, 1, 2
1, 2, 3 8, 2, 3
6, 1, 2
4, 2, 3 5, 2, 2 2, 2, 3 3, 3, 4 9, 3, 3
7, 0, 3
0, 1, 1
Suffix Trie
Compressed trie of all suffixes of text
Example: IPDPS
Suffixes
IPDPS
PDPS
DPS
PS
S
Useful for finding pattern in any part of text
Occurrence prefix of some suffix
Example: find PDP in IPDPS
D
P
S
P
I
P
D
S
P
S
D
P
S
S
Suffix Trie
Properties
For
String X with length n
Alphabet of size m
Pattern P with length d
Uses O(n) space
Can be constructed in O(n) time
Find pattern P in X in O(dm) time
Proportional to length of pattern, not text
Suffix Trie Example
e nimize
nimize ze
ze i mi
mize nimize ze
m i n i z e m i
0 1 2 3 4 5 6 7
7, 7 2, 7
2, 7 6, 7
6, 7
4, 7 2, 7 6, 7
1, 1 0, 1
Tries and Web Search Engines
Search engine index
Collection of all searchable words
Stored in compressed trie
Each leaf of trie
Associated with a word
List of pages (URLs) containing that word
Called occurrence list
Trie is kept in memory (fast)
Occurrence lists kept in external memory
Ranked by relevance
Trie Insertion
1. If the input string length is zero, then set the marker for the root
node to be true.

2. If the input string length is greater than zero, repeat steps 3 and 4
for each character

3. If the character is present in the child node of the current node,
set the current node point to the child node.

4. If the character is not present in the child node, then insert a new
node and set the current node to that newly inserted node.

5. Set the marker flag to true when the end character is reached.

Thank You

You might also like