[go: up one dir, main page]

0% found this document useful (0 votes)
12 views74 pages

Module-1 Finite Automata

The document provides an introduction to Finite Automata as part of a course on Theory of Computation at Hirasugar Institute of Technology. It covers key concepts such as deterministic and nondeterministic finite automata, their structural representations, and applications in software design and text search. Additionally, it discusses the importance of automata theory in understanding computation limits and introduces fundamental definitions related to alphabets, strings, and languages.

Uploaded by

shivahii501
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views74 pages

Module-1 Finite Automata

The document provides an introduction to Finite Automata as part of a course on Theory of Computation at Hirasugar Institute of Technology. It covers key concepts such as deterministic and nondeterministic finite automata, their structural representations, and applications in software design and text search. Additionally, it discusses the importance of automata theory in understanding computation limits and introduces fundamental definitions related to alphabets, strings, and languages.

Uploaded by

shivahii501
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 74

S. J. P. N.

TRUST’S
HIRASUGAR INSTITUTE OF TECHNOLOGY, NIDASOSHI
Accredited at 'A' Grade by NAAC
Programmes Accredited by NBA: CSE, ECE, EEE & ME.

Department of Computer Science & Engineering


Course: Theory of Computation(BCS503)

Module 1: Introduction to Finite


Automata

Prof. A. A. Daptardar
Asst. Prof. , Dept. of Computer Science & Engg.,
Hirasugar Institute of Technology, Nidasoshi
Module - 1

• Introduction to Finite Automata, Structural


Representations, Automata and Complexity.
The Central Concepts of Automata Theory.
Deterministic Finite Automata,
Nondeterministic Finite Automata, An
Application: Text Search, Finite Automata with
Epsilon-Transitions.
• TEXT BOOK: Sections 1.1, 1.5, 2.2,2.3,2.4,2.5

Prof. A.A.Daptardar, Department of CSE,


10/1/2024 2
HSIT, Nidasoshi
Introduction
• Automata theory is the study of
abstract computing devices or machines.

Prof. A.A.Daptardar, Department of CSE,


10/1/2024 3
HSIT, Nidasoshi
Why study Automata Theory?

• 1.1.1 - Introduction to Finite Automata


• 1.1.2 - Structural Representations
• 1.1.3 - Automata and Complexity

Prof. A.A.Daptardar, Department of CSE,


10/1/2024 4
HSIT, Nidasoshi
1.1.1 Introduction to Finite Automata
F i n i t e A u t o m a t a are useful model for many important kinds of hardware and
Software

• Software for designing & c h e c k i n g t h e b e h a v i o r o f digital circuits.


• Lexical analyzer o f a t y p i c a l compiler i.e. the compiler component that

breaks the input text into logical units, such as identifiers, keywords and
punctuation.
• Software for scanning large bodies of text , such as collections of web pages, to find
occurrences of words, phrases or other patterns.
• Software for verifying systems of all types that have a finite number
of distinct states, such as communications protocols or protocols for
secure exchange of information.

Prof. A.A.Daptardar, Department of CSE,


10/1/2024 5
HSIT, Nidasoshi
• Finite automata are finite collections of states
with transition rules that take you from one state
to another.
• The purpose of a state is to remember the
relevant portion of the system's history.
• Since there are only a finite number of states, the
entire history generally cannot be remembered,
so the system must be designed carefully, to
remember what is important and forget what is
not.
• The advantage of having only a finite number of
states is that we can implement the system with a
fixed set of resources.

Prof. A.A.Daptardar, Department of CSE,


10/1/2024 6
HSIT, Nidasoshi
Example 1.1

• The simplest nontrivial finite automaton is an


on/off switch.
• The device remembers whether it is in the
"on" state or the "off" state, and it allows the
user to press a button whose effect is
different, depending on the state of the
switch.

Prof. A.A.Daptardar, Department of CSE,


10/1/2024 7
HSIT, Nidasoshi
Prof. A.A.Daptardar, Department of CSE,
10/1/2024 8
HSIT, Nidasoshi
Representing FA

• Simplest representation is often a graph.


– Nodes(Circles) = states.
– Arcs indicate state transitions.
– Labels on arcs tell what causes the transition(Arcs between
states are labeled by "inputs," which represent external
influences on the system)
– One of the states is designated the "start state," the state in
which the system is placed initially.
– It is often necessary to indicate one or more states as "final" or
"accepting“ states. Entering one of these states after a
sequence of inputs indicates that the input sequence is good in
some way(It is conventional to designate accepting states by a
double circle)
Prof. A.A.Daptardar, Department of CSE,
10/1/2024 9
HSIT, Nidasoshi
Example 1.2

• The part of a lexical analyzer. The job of this


automaton is to recognize the keyword then. It thus
needs five states, each of which represents a
different position in the word then that has been
reached so far. These positions correspond to the
prefixes of the word, ranging from the empty string
(i.e., nothing of the word has been seen so far) to the
complete word.

Prof. A.A.Daptardar, Department of CSE,


10/1/2024 10
HSIT, Nidasoshi
• In Fig. 1.2, the five states arc named by the prefix of
then seen so far.
• Inputs correspond to letters.
• We may imagine that the lexical analyzer examines
one character of the program that it is compiling at a
time, and the next character to be examined is the
input to the automaton.
• The start state corresponds to the empty string, and
each state has a transition on the next letter of then to
the state that corresponds to the next-larger prefix.
• The state named then is entered when the input has
spelled the word then. Since it is the job of this
automaton to recognize when then has been seen, we
could consider that state the lone accepting state.
Prof. A.A.Daptardar, Department of CSE,
10/1/2024 11
HSIT, Nidasoshi
1.1.2 Structural Representations
• There are two important notations that are not
automaton-like, but play an important role in the
study of automata and their applications.
– Grammars are useful models when designing software that
processes data with a recursive structure. The best-known
example is a "parser," the component of a compiler that deals
with the recursively nested features of the typical
programming language, such as expressions - arithmetic,
conditional, and so on. For example, a grammatical rule like E
=> E + E states that an expression can be formed by taking any
two expressions and connecting them by a plus sign
– Regular Expressions also denote the structure of data,
especially text strings. The UNIX-style regular expression ‘[A-Z]
[a-z] * [ ] [A-Z] [A-Z]’ represents capitalized words followed by a
space and two capital letters.

Prof. A.A.Daptardar, Department of CSE,


10/1/2024 12
HSIT, Nidasoshi
1.1.3 Automata and Complexity

• Automata are essential for the study of the limits


of computation. There are two important issues:
– 1. What can a computer do at all? This study is called
"decidability," and the problems that can be solved by
computer are called "decidable.“
– 2. What can a computer do efficiently? This study is
called "intractabil­ity," and the problems that can be
solved by a computer using no more time than some
slowly growing function of the size of the input are
called "tractable."

Prof. A.A.Daptardar, Department of CSE,


10/1/2024 13
HSIT, Nidasoshi
1.5 The Central Concepts of Automata
Theory

• In this section we shall introduce the most


important definitions of terms that pervade
the theory of automata.
• These concepts include the "alphabet" (a set
of symbols), "strings" (a list of symbols from
an alphabet), and "language" (a set of
strings from the same alphabet).

Prof. A.A.Daptardar, Department of CSE,


10/1/2024 14
HSIT, Nidasoshi
1.5.1 Alphabets
• An alphabet is a finite, nonempty set of symbols.
Conventionally, we use the symbol Ʃ for an alphabet.
• Common alphabets include:
1. Ʃ = {0,1}, the binary alphabet.
2. Ʃ = {a, b, ... , z}, the set of all lower-case letters.
3.The set of all ASCII characters, or the set of all
printable ASCII characters

Prof. A.A.Daptardar, Department of CSE,


10/1/2024 15
HSIT, Nidasoshi
1.5.2 Strings
A string (or sometimes word) is a finite sequence of
symbols chosen from some alphabet.
For example, 01101 is a string from the binary
alphabet Ʃ = {0, 1}.
• The Empty String : The empty string is the string
with zero occurrences of symbols. This string,
denoted ϵ, is a string that may be chosen from
any alphabet whatsoever.
• Length of a String : It is the number of positions
for symbols in the string. For instance, 01101 has
length 5. The standard notation for the length of a
string w is lwl. For example, 10111 = 3 and l ϵ l =
0.

Prof. A.A.Daptardar, Department of CSE,


10/1/2024 16
HSIT, Nidasoshi
• Powers of an Alphabet : If Ʃ is an alphabet,
we can express the set of all strings of a
certain length from that alphabet by using an
exponential notation. We define Ʃk to be the
set of strings of length k, each of whose
symbols is in Ʃ.
• If Ʃ = {0, 1}, then Ʃ1 = {0, 1},
• Ʃ2 = {00,01, 10, 11},
• Ʃ3 = {000,001,010,011,100,101,110,111}
• Note: Ʃ0 = {ϵ} regardless of what alphabet Ʃ is.

Prof. A.A.Daptardar, Department of CSE,


10/1/2024 17
HSIT, Nidasoshi
• The set of all strings over an alphabet Ʃ is
conventionally denoted Ʃ*.
• For instance, {0, 1}* = {ϵ, 0, 1, 00, 01, 10, 11, 000,
••• }.
• Put another way,
• Ʃ* = Ʃ0 U Ʃ1 U Ʃ2 U · · ·
• Sometimes, we wish to exclude the empty string
from the set of strings. The set of nonempty
strings from alphabet Ʃ is denoted Ʃ+. Thus, two
appropriate equivalences are:
• Ʃ+ = Ʃ1 U Ʃ2 U Ʃ3 U · · ·
• Ʃ* = Ʃ+ U {ϵ}.

Prof. A.A.Daptardar, Department of CSE,


10/1/2024 18
HSIT, Nidasoshi
• Concatenation of Strings : Let x and y be strings.
Then xy denotes the concatenation of x and y, that
is, the string formed by making a copy of x and
following it by a copy of y. More precisely, if x is
the string composed of i symbols x = a1 a2…..ai and
y is the string composed of j symbols y = b1 b2
.…bj, then xy is the string of length
i + j : xy = a1 a2…..aib1 b2 .…bj
Example: Let x = 01101 and y = 110.
Then xy = 01101110 and yx = 11001101.

Prof. A.A.Daptardar, Department of CSE,


10/1/2024 19
HSIT, Nidasoshi
• 1.5.3 Languages
• A set of strings all of which are chosen from some
Ʃ* where Ʃ is a particular alphabet, is called a
language. If Ʃ is an alphabet, and L Ʃ*, then L is a
language over Ʃ.
• An example is English, where the collection of
legal English words is a set of strings over the
alphabet that consists of all the letters.
• Another example is C, or any other programming
language, where the legal programs are a subset
of the possible strings that can be formed from
the alphabet of the language.

Prof. A.A.Daptardar, Department of CSE,


10/1/2024 20
HSIT, Nidasoshi
• There are also many other languages that appear
when we study automata. Some are abstract
examples, such as:
– 1. The language of all strings consisting of n 0's followed by
n 1's, for some n ≥ 0: {ϵ, 01, 0011, 000111, ... }.
– 2. The set of strings of 0's and 1's with an equal number of
each:
{ϵ, 01, 10, 0011, 0101, 1001,, .. }
– 3. The set of binary numbers whose value is a prime:
– {10, 11, 101, 111, 1011, .. }
– 4. Ʃ* is a language for any alphabet Ʃ.
– 5. ᴓ, the empty language, is a language over any alphabet.
– 6. {ϵ}, the language consisting of only the empty string, is
also a language over any alphabet. Notice that ᴓ ≠ {ϵ}; the
former has no strings and the latter has one string.
Prof. A.A.Daptardar, Department of CSE,
10/1/2024 21
HSIT, Nidasoshi
Prof. A.A.Daptardar, Department of CSE,
10/1/2024 22
HSIT, Nidasoshi
• 1.5.4 Problems
• In automata theory, a problem is the question of
deciding whether a given string is a member of
some particular language.
• More precisely, if Ʃ is an alphabet, and L is a
language over Ʃ, then the problem L is:
• Given a string w in Ʃ*, decide whether or not w is
in L.
• Example: The problem of testing primality can be
expressed by the language LP consisting of all
binary strings whose value as a binary number is a
prime. That is, given a string of 0's and 1's, say
"yes" if the string is the binary representation of a
prime and say "no" if not.

Prof. A.A.Daptardar, Department of CSE,


10/1/2024 23
HSIT, Nidasoshi
2.2 Deterministic Finite Automata

• The term "deterministic" refers to the fact that


on each input there is one and only one state
to which the automaton can transition from its
current state. In contrast, "nondeterministic"
finite automata, the subject can be in several
states at once.

Prof. A.A.Daptardar, Department of CSE,


10/1/2024 24
HSIT, Nidasoshi
Definition of Deterministic Finite Automata
(DFA)
• A Deterministic Finite Automaton consists of:
1. A finite set of states, often denoted Q.
2. A finite set of input symbols, often denoted Ʃ.
3. A transition function that takes as arguments a state and an
input symbol and returns a state. The transition function
will commonly be denoted . In our informal graph
representation of automata, was represented by arcs
between states and the labels on the arcs. If q is a state,
and a is an input symbol, then (q, a) is that state p such
that there is an arc labeled a from q to p.
4. A start state, one of the states in Q.
5. A set of final or accepting states F. The set F is a subset of
Q.

Prof. A.A.Daptardar, Department of CSE,


10/1/2024 25
HSIT, Nidasoshi
• A deterministic finite automaton will often be
referred to by its acronym: DFA. The most
succinct representation of a DFA is a listing of
the five components above. In proofs we often
talk about a DFA in "five-tuple" notation:
A= (Q, Ʃ, ,q0,F)
• where A is the name of the DFA, Q is its set of
states, Ʃ its input symbols, its transition
function, q0 its start state, and F its set of
accepting states.

Prof. A.A.Daptardar, Department of CSE,


10/1/2024 26
HSIT, Nidasoshi
2.2.2 How a DFA Processes Strings
• Let us understand about a DFA is how the DFA decides
whether or not to "accept" a sequence of input symbols.
• The "language" of the DFA is the set of all strings that the
DFA accepts.
• Suppose a1a2 ···an is a sequence of input symbols.
• We start out with the DFA in its start state, q0.
• We consult the transition function , say (q0, a1) = q1 to
find the state that the DFA A enters after processing the first
input symbol a1.
• We process the next input symbol, a2, by evaluating (q1,
a2); let us suppose this state is q2.
• We continue in this manner, finding states q3,q4,.....qn such
that (qi-1,ai) = qi for each i.
• If qn is a member of F, then the input a1a2 ···an is accepted,
and if not then it is "rejected."
Prof. A.A.Daptardar, Department of CSE,
10/1/2024 27
HSIT, Nidasoshi
2.2.3 Simpler Notations for DFA's
• Specifying a DFA as a five-tuple with a detailed
description of the transition function is both
tedious and hard to read.
• There are two preferred notations for describing
automata:
1. A Transition Diagram which is represented
using circles, arrows, arcs with labels, double
circles etc.
2. A Transition Table, which is a tabular listing of
the function, which by implication tells us the
set of states and the input alphabet.
Prof. A.A.Daptardar, Department of CSE,
10/1/2024 28
HSIT, Nidasoshi
Transition Diagrams
• A transition diagram for a DFA A= (Q, Ʃ, , q0,F) is a
graph defined as follows:
a) For each state in Q there is a node.
b) For each state q in Q and each input symbol a in Ʃ, let
(q,a) = p. Then the transition diagram has an arc from node
q to node p, labeled
a. If there are several input symbols that cause transitions from q
to p, then the transition diagram can have one arc, labeled by the
list of these symbols.
c) There is an arrow into the start state q0, labeled Start. This
arrow does not originate at any node.
d) Nodes corresponding to accepting states (those in F) are
marked by a double circle. States not in F have a single
circle.
Prof. A.A.Daptardar, Department of CSE,
10/1/2024 29
HSIT, Nidasoshi
Transition Tables

• A transition table is a conventional, tabular


representation of a function like that takes
two arguments and returns a value.
• The rows of the table correspond to the
states, and the columns correspond to the
inputs.
• The entry for the row corresponding to state q
and the column corresponding to input a is
the state (q, a).
Prof. A.A.Daptardar, Department of CSE,
10/1/2024 30
HSIT, Nidasoshi
• Construct DFA which accepts any number of
a’s followed by a string ba and followed by
string of a’s and b’s.
• A= (Q, Ʃ, ,q0,F)
• Q = {q0,q1,q2}
• Ʃ = {a, b}
• q0 is the start state
• F = {q2} – final state

Prof. A.A.Daptardar, Department of CSE,


10/1/2024 31
HSIT, Nidasoshi
Transition Diagram
a,b
a
b a
q0 q1 q2

Transition Table
States a b
q0 q0 q1
q1 q2 -
*q2 q2 q2
Prof. A.A.Daptardar, Department of CSE,
10/1/2024 32
HSIT, Nidasoshi
• Example 2.4 - Design a DFA to accept the
language
L = { w I w has both an even number of 0's and
an even number of 1's}
• This DFA is to count both the number of 0's
and the number of 1's, but count them
modulo 2. That is, the state is used to
remember whether the number of 0's seen so
far is even or odd, and also to remember
whether the number of 1 's seen so far is even
or odd. There are thus four states, which can
be given the following interpretations:

Prof. A.A.Daptardar, Department of CSE,


10/1/2024 33
HSIT, Nidasoshi
• q0: Both the number of 0's seen so far and the number
of 1's seen so far are even
• q1: The number of 0's seen so far is even, but the
number of 1 's seen so far is odd.
• q2: The number of 1's seen so far is even, but the
number of 0's seen so far is odd.
• q3: Both the number of 0's seen so far and the number
of 1's seen so far are odd.
• State q0 is both the start state and the lone accepting
state. It is the start state, because before reading any
inputs, the numbers of 0's and 1 's seen so far are both
zero, and zero is even. It is the only accepting state,
because it describes exactly the condition for a
sequence of 0's and 1 's to be in language L.
Prof. A.A.Daptardar, Department of CSE,
10/1/2024 34
HSIT, Nidasoshi
Prof. A.A.Daptardar, Department of CSE,
10/1/2024 35
HSIT, Nidasoshi
Prof. A.A.Daptardar, Department of CSE,
10/1/2024 36
HSIT, Nidasoshi
2.2.4 Extending the Transition Function to
Strings

• we need to make the notion of the language of a DFA


precise.
• To do so, we define an extended transition function
that describes what happens when we start in any
state and follow any sequence of inputs.
• If δ is our transition function then the extended
transition function constructed from δ will be called ..
• The extended transition function is a function that
takes a state q and a string w and returns a state p -
the state that the automaton reaches when starting in
state q and processing the sequence of inputs w.
Prof. A.A.Daptardar, Department of CSE,
10/1/2024 37
HSIT, Nidasoshi
• We define by induction on the length of the input string,
as follows:
• BASIS: (q, ϵ) = q. That is, if we are in state q and read no
inputs, then we are still in the state q.
• INDUCTION: Suppose w is a string of the form xa; that is, a is
the last symbol of w, and x is the string consisting of all but
the last symbol.
• For example, w = 1101 is broken into x = 110 and a= 1. Then
• (q,w) = (δ(q,x),a)
• To compute (q, w), first compute (q, x), the state that the
automaton is in after processing all but the last symbol of w.
• Suppose this state is p; that is, (q, x) = p.
• Then (q,w) is what we get by making a transition from
state p on input a, the last symbol of w.
• That is, (q,w} = δ(p,a).

Prof. A.A.Daptardar, Department of CSE,


10/1/2024 38
HSIT, Nidasoshi
• Let us design a DFA to accept the language
• L = { w I w has both an even number of 0's and an
even number of 1's}
• Suppose the input is 110101. Since this string has
even numbers of 0's and 1's both, we expect it is
in the language. Thus, we expect that (q0,
110101) = q0, since q0 is the only accepting state.
Let us now verify that claim.
• The check involves computing (q0 ,w) for each
prefix w of 110101, starting at ϵ and going in
increasing size. The summary of this calculation is:

Prof. A.A.Daptardar, Department of CSE,


10/1/2024 39
HSIT, Nidasoshi
Prof. A.A.Daptardar, Department of CSE,
10/1/2024 40
HSIT, Nidasoshi
2.3 Nondeterministic Finite Automata

• A ''nondeterministic” finite automaton (NFA)


has the power to be in several states at once.
• This ability is often expressed as an ability to
"guess" something about its input.
• Before examining applications, we need to
define nondeterministic finite automata and
show that each one accepts a language that is
also accepted by some DFA.

Prof. A.A.Daptardar, Department of CSE,


10/1/2024 41
HSIT, Nidasoshi
Definition of NFA
• A Nondeterministic Finite Automaton consists of:
1. A finite set of states, often denoted Q.
2. A finite set of input symbols, often denoted Ʃ.
3. A δ is the transition function that takes as arguments a
state and an input symbol and returns a subset of
states which maps from Q X Ʃ into 2Q which is a power
set of Q.
4. A start state q0, one of the states in Q.
5.A set of final or accepting states F. The set F is a subset
of Q.
• An NFA is represented essentially like a DFA:
A= (Q, Ʃ , δ, q0, F)

Prof. A.A.Daptardar, Department of CSE,


10/1/2024 42
HSIT, Nidasoshi
• Example 2.6: Figure 2.9 shows a
nondeterministic finite automaton, whose job
is to accept all and only the strings of 0's and
1's that end in 01.

• The NFA can represented as


10/1/2024 43
Processing inputs by NFA

Prof. A.A.Daptardar, Department of CSE,


10/1/2024 44
HSIT, Nidasoshi
• It starts in only its start state, q0. When the first 0 is
read, the NFA may go to either state q0 or state q1, so
it does both.
• Then, the second 0 is read. State q0 may again go to
both q0 and q1. However, state q1 has no transition on
0, so it "dies”.
• When the third input, a 1, occurs, we must consider
transitions from both q0 and q1. We find that q0 goes
only to q0 on 1, while q1 goes only to q2.
• Thus, after reading 001, the NFA is in states q0 and q2.
Since q2 is an accepting state, the NFA accepts 001.
• However, the input is not finished. The fourth input, a
0, causes q2's thread to die, while q0 goes to both q0
and q1.
• The last input, a 1, sends q0 to q0 and q1 to q2. Since
we are again in an accepting state, 00101 is accepted.
Prof. A.A.Daptardar, Department of CSE,
10/1/2024 45
HSIT, Nidasoshi
The Extended Transition Function

Prof. A.A.Daptardar, Department of CSE,


10/1/2024 46
HSIT, Nidasoshi
• Let us use to describe the processing of
input 00101 by the NFA of Fig. 2.9. A summary
of the steps is:

Prof. A.A.Daptardar, Department of CSE,


10/1/2024 47
HSIT, Nidasoshi
2.3.4 The Language of an NFA

Prof. A.A.Daptardar, Department of CSE,


10/1/2024 48
HSIT, Nidasoshi
2.3.5 Equivalence of Deterministic and
Nondeterministic Finite Automata
• Although there are many languages for which an NFA
is easier to construct than a DFA such as the language
(Example 2.6) of strings that end in 01.
• It is a surprising fact that every language that can be
described by some NFA can also be described by some
DFA.
• Moreover, the DFA in practice has about as many
states as the NFA, although it often has more
transitions.
• In the worst case, however, the smallest DFA can have
2n states while the smallest NFA for the same language
has only n states. Prof. A.A.Daptardar, Department of CSE,
10/1/2024 49
HSIT, Nidasoshi
• The proof that DFA's can do whatever NFA's can
do involves an important "construction" called the
subset construction because it involves
constructing all subsets of the set of states of the
NFA.
• The subset construction starts from an NFA N =
(QN, Ʃ, δN, q0,FN). Its goal is the description of a
DFA D = (QD, Ʃ, δD, {q0}, FD)such that L(D) = L(N).
• Notice that the input alphabets of the two
automata are the same, and the start state of D is
the set containing only the start state of N.

Prof. A.A.Daptardar, Department of CSE,


10/1/2024 50
HSIT, Nidasoshi
• The other components of D are constructed as
follows.

Prof. A.A.Daptardar, Department of CSE,


10/1/2024 51
HSIT, Nidasoshi
Examples

Prof. A.A.Daptardar, Department of CSE,


10/1/2024 52
HSIT, Nidasoshi
2.4 An Application: Text Search

• In this section, we shall see that the abstract


study of the previous section, where we
considered the "problem" of deciding whether
a sequence of bits ends in 01, is actually an
excellent model for several real problems that
appear in applications such as Web search and
extraction of information from text.

Prof. A.A.Daptardar, Department of CSE,


10/1/2024 53
HSIT, Nidasoshi
2.4.1 Finding Strings in Text
• A common problem in the age of the Web and other
online text repositories is the following.
• Given a set of words, find all documents that contain
one (or all) of those words.
• A search engine is a popular example of this process.
• The search engine uses a particular technology, called
inverted indexes, where for each word appearing on
the Web (there are 100,000,000 different words), a list
of all the places where that word occurs is stored.
• Machines with very large amounts of main memory
keep the most common of these lists available,
allowing many people to search for documents at
once.
Prof. A.A.Daptardar, Department of CSE,
10/1/2024 54
HSIT, Nidasoshi
• Inverted-index techniques do not make use of
finite automata, but they also take very large
amounts of time for crawlers to copy the Web
and set up the indexes.
• There are a number of related applications
that are unsuited for inverted indexes, but are
good applications for automaton-based
techniques.

Prof. A.A.Daptardar, Department of CSE,


10/1/2024 55
HSIT, Nidasoshi
• The characteristics that make an application suitable for
searches that use automata are:
• 1. The repository on which the search is conducted is
rapidly changing. For example:
– (a)Every day, news analysts want to search the day's online news
articles for relevant topics. For example, a financial analyst might
search for certain stock ticker symbols or names of companies.
– (b) A "shopping robot" wants to search for the current prices
charged for the items that its clients request. The robot will
retrieve current catalog pages from the Web and then search
those pages for words that suggest a price for a particular item.
• 2. The documents to be searched cannot be cataloged. For
example,
– Amazon.com does not make it easy for crawlers to find all the
pages for all the books that the company sells. Rather, these
pages are generated "on the fly" in response to queries.
However, we could send a query for books on a certain topic, say
"finite automata," and then search the pages retrieved for
certain words, e.g., "excellent" in a review portion.

Prof. A.A.Daptardar, Department of CSE,


10/1/2024 56
HSIT, Nidasoshi
2.4.2 Nondeterministic Finite
Automata for Text Search
• Suppose we are given a set of words, which we
shall call the keywords, and we want to find
occurrences of any of these words.
• In applications such as these, a useful way to
proceed is to design a nondeterministic finite
automaton, which signals, by entering an
accepting state, that it has seen one of the
keywords.
• The text of a document is fed, one character at a
time to this NFA, which then recognizes
occurrences of the keywords in this text.
Prof. A.A.Daptardar, Department of CSE,
10/1/2024 57
HSIT, Nidasoshi
• There is a simple form to an NFA that recognizes a
set of keywords.
– 1.There is a start state with a transition to itself on
every input symbol, e. g. every printable ASCII
character if we are examining text. Intuitively, the start
state represents a "guess" that we have not yet begun
to see one of the keywords, even if we have seen
some letters of one of these words.
– 2.For each keyword a1a2 · · · ak, there are k states, say
q1, q2, ........, qk, There is a transition from the start
state to q1 on symbol a1, a transition from q1 to q2 on
symbol a2, and so on. The state qk is an accepting
state and indicates that the keyword a1,a2,....., ak has
been found.

Prof. A.A.Daptardar, Department of CSE,


10/1/2024 58
HSIT, Nidasoshi
Prof. A.A.Daptardar, Department of CSE,
10/1/2024 59
HSIT, Nidasoshi
2.4.3 A DFA to Recognize a Set of
Keywords
• We can apply the subset construction to any NFA.
However, when we apply that construction to an NFA
that was designed from a set of keywords, we find that
the number of states of the DFA is never greater than
the number of states of the NFA.
• Since in the worst case the number of states
exponentiates as we go to the DFA, this observation is
good news and explains why the method of designing
an NFA for keywords and then constructing a DFA from
it is used frequently.

Prof. A.A.Daptardar, Department of CSE,


10/1/2024 60
HSIT, Nidasoshi
• The rules for constructing the set of DFA states
is as follows.

Prof. A.A.Daptardar, Department of CSE,


10/1/2024 61
HSIT, Nidasoshi
Prof. A.A.Daptardar, Department of CSE,
10/1/2024 62
HSIT, Nidasoshi
2.5 Finite Automata With Epsilon-
Transitions
• We shall now introduce another extension of
the finite automaton.
• The new "feature" is that we allow a transition
on E, the empty string.
• In effect, an NFA is allowed to make a
transition spontaneously, without receiving an
input symbol.

Prof. A.A.Daptardar, Department of CSE,


10/1/2024 63
HSIT, Nidasoshi
2.5.1 Uses of ϵ -Transitions

• We shall begin with an informal treatment of ϵ -


NFA's, using transition diagrams with ϵ allowed as
a label.
• In the examples to follow, think of the automaton
as accepting those sequences of labels along
paths from the start state to an accepting state.
• However, each ϵ along a path is "invisible"; i.e., it
contributes nothing to the string along the path.

Prof. A.A.Daptardar, Department of CSE,


10/1/2024 64
HSIT, Nidasoshi
Example 2.16: In Fig. 2.18 is an ϵ - NFA that
accepts decimal numbers consisting of:
1. An optional + or - sign,
2. A string of digits,
3. A decimal point, and
4. Another string of digits. Either this string of digits, or
the string (2) can be empty, but at least one of the
two strings of digits must be nonempty.

Prof. A.A.Daptardar, Department of CSE,


10/1/2024 65
HSIT, Nidasoshi
2.5.2 The Formal Notation for an ϵ -
NFA

Prof. A.A.Daptardar, Department of CSE,


10/1/2024 66
HSIT, Nidasoshi
Prof. A.A.Daptardar, Department of CSE,
10/1/2024 67
HSIT, Nidasoshi
2.5.3 Epsilon-Closures

• Definition: ECLOSE(q) is read as ϵ - CLOSURE of


q and is the set of all states which are
reachable from q on ϵ-transitions only.
• The recursive definition of ECLOSE(q) is:
– State q is in ECLOSE(q) i.e. ECLOSE(q) = q for each
q
– If state p is in ECLOSE(q), and there is a transition
from state p to state r labeled ϵ, then r is in
ECLOSE(q) i.e. if ECLOSE(q) = {p, r}
Prof. A.A.Daptardar, Department of CSE,
10/1/2024 68
HSIT, Nidasoshi
Acceptance of Language
• A string of symbols obtained from Ʃ. The set of all
strings recognized by an automaton is called
language.
• The Language L accepted by an ϵ - NFA E =(Q, Ʃ, δ,
q0, A) where Q is the set of finite states, Ʃ is the
set of input alphabets, δ is transition from Q X {Ʃ
U ϵ) to 2Q, q0 is the start state and A is the final or
accepting state.
• The string w accepted by an NFA can be defined in
formal notation as :
• L(E) = { w | w ϵ Ʃ* and
Prof. A.A.Daptardar, Department of CSE,
10/1/2024 69
HSIT, Nidasoshi
Acceptance of Language
• A string of symbols obtained from Ʃ. The set of all
strings recognized by an automaton is called
language.
• The Language L accepted by an ϵ - NFA E =(Q, Ʃ, δ,
q0, A) where Q is the set of finite states, Ʃ is the
set of input alphabets, δ is transition from Q X {Ʃ
U ϵ) to 2Q, q0 is the start state and A is the final or
accepting state.
• The string w accepted by an NFA can be defined in
formal notation as :
• L(E) = { w | w ϵ Ʃ* and
Prof. A.A.Daptardar, Department of CSE,
10/1/2024 70
HSIT, Nidasoshi
Acceptance of Language
• A string of symbols obtained from Ʃ. The set of all
strings recognized by an automaton is called language.
• The Language L accepted by an ϵ - NFA E =(Q, Ʃ, δ, q0,
F) where Q is the set of finite states, Ʃ is the set of
input alphabets, δ is transition from Q X {Ʃ U ϵ) to 2Q,
q0 is the start state and F is the final or accepting state.
• The string w accepted by an NFA can be defined in
formal notation as :
• L(E) = { w | w ϵ Ʃ* and (q0, w) = P with at least one
component of P in F}

Prof. A.A.Daptardar, Department of CSE,


10/1/2024 71
HSIT, Nidasoshi
• Obtain an ϵ – NFA which accepts strings
consisting of zero or more a’s followed by zero
or more b’s followed by zero or more c’s.

Prof. A.A.Daptardar, Department of CSE,


10/1/2024 72
HSIT, Nidasoshi
Conversion of ϵ-NFA to DFA

• Given any ϵ-NFA E, we can find a DFA D that


accepts the same language as E. The construction
we use is very close to the subset construction, as
the states of D are subsets of the states of E.
• Let E =(QE, Ʃ, δE, q0, FE). Then the equivalent DFA
D=(QD, Ʃ, δD, q0, FD) is defined as follows:
• QD is the set of subsets of QE · More precisely, we
shall find that all accessible states of D are ϵ -
closed subsets of QE, that is, sets S QE such that
S = ECLOSE(S).

Prof. A.A.Daptardar, Department of CSE,


10/1/2024 73
HSIT, Nidasoshi
• qD = ECLOSE(q0); that is, we get the start state
of D by closing the set consisting of only the
start state of E.
• FD is those sets of states that contain at least
one accepting state of E. That is, FD = {S l S is in
QD and S ∩ FE = ᴓ }.
• δD(S, a) is computed, for all a in Ʃ; and sets S in
QD by:

Prof. A.A.Daptardar, Department of CSE,


10/1/2024 74
HSIT, Nidasoshi

You might also like