Chapter One
Chapter One
If it involves a computer, a program running on a computer and numbers going in and out
then computation is likely happening.
Theory of computation:
Computability Theory: -
‐ What can be computed?
-Are there problems that no program can solve?
Complexity Theory: -
Automata Theory: -
‐ Study of abstract machine and their properties, providing a mathematical notion of ―computer‖
‐ Automata are abstract mathematical models of machines that perform computations on an
input by moving through a series of states or configurations. If the computation of an
automaton reaches an accepting configuration it accepts that input.
Study of Automata
‐ For software designing and checking behavior of digital circuits.
‐ For designing software for checking large body of text as a collection of web pages, to
find occurrence of words, phrases, patters (i.e. pattern recognition, string matching, …)
‐ Designing ―lexical analyzer‖ of a compiler, that breaks input text into logical units called ―tokens
Abstract Model
Brief History:
Before 1930‘s, no any computer were there and Alen Turing introduced an abstract machine
that had all the capabilities of today‘s computers. This conclusion applies to today‘s real
machines.
Later in 1940‘s and 1950‘s, simple kinds of machines called finite automata were introduced by
a number of researchers.
In late 1950‘s the linguist N. Chomsky begun the study of formal grammar which are closely
related to abstract automata.
In 1969 S. Cook extended Turing‘s study of what could and what couldn‘t be computed and
classified the problem as:
-Decidable
-Tractable/intractable
The basic concepts of Languages
―alphabets,
―strings,
―languages‖, etc.
Alphabet is a finite non-empty set of symbols. The symbols can be the letters such as {a, b, c},
bits {0, 1}, digits {0, 1, 2, 3… 9}. Common characters like $, #, etc.
String is a finite sequence of symbols taken from some alphabet. E.g. 0110 is a string
from binary alphabet, ―automata‖ is a string over alphabet {a, b, c … z}.
Empty String: -
It is a string with zero occurrences of symbols. It is denoted by ‗ε‘ (epsilon).
Length of String
Power of alphabet
The set of all strings of certain length k from an alphabet is the kth power of that alphabet.
Σ0 = {ε}
Σ1 = {0, 1}
Kleen Closure
The set of all the strings over an alphabet Σ is called kleen closure of Σ & is denoted by
Σ*. Thus, kleen closure is set of all the strings over alphabet Σ with length 0 or more.
∴ Σ* = Σ0 ∪ Σ1 ∪ Σ2 ∪ Σ3 ∪ ……………
E.g. A = {0}
A* = {0n/ n = 0, 1, 2, …}.
Positive Closure: -
The set of all the strings over an alphabet Σ, except the empty string is called positive closure
and is denoted by Σ+.
∴ Σ+ = Σ1 ∪ Σ2 ∪ Σ3 ∪……………
Language:
A language L over an alphabet Σ is subset of all the strings that can be formed out of Σ; i.e. a
language is subset of kleen closure over an alphabet Σ; L ⊆ Σ*. (Set of strings chosen from Σ*
defines language). For example;
◻ Set of all strings over Σ = {0, 1} with equal number of 0‘s & 1‘s.
L = {ε, 01, 0011, 000111, ………}
◻ φ is an empty language & is a language over any alphabet.
Concatenation of Strings
Let x & y be strings then xy denotes concatenation of x & y, i.e. the string formed
by making a copy of x & following it by a copy of y.
More precisely, if x is the string of i symbols as x = a1a2a3…ai & y is the string of j symbols as y =
b1b2b3…bj then xy is the string of i + j symbols as xy = a1a2a3…aib1b2b3…bj.
For example;
x = 000
y = 111
xy = 000111 &
yx = 111000
Suffix of a string
A string s is called a suffix of a string w if it is obtained by removing 0 or more leading
symbols in w. For example;
w = abcd
s = bcd is suffix of w.
here’s is proper suffix if s ≠ w.
Prefix of a string
Substring
Problem
1. Let A be a set with n distinct elements. How many different binary relations on A
are there?
a. ∑1, ∑ 2, ∑3.
Consider a man watching a TV in his room. The TV is in ‖on‖ state. When it is switched off, the
TV goes to ―off‖ state. When it is switched on, it again goes to ―on‖ state. This can be
represented by following picture.
Switched off
TV “on” TV “off”
Switched on
Finite Automata
A finite automaton is a mathematical (model) abstract machine that has a set of ―states‖ and its
―control‖ moves from state to state in response to external ―inputs‖. The control may be either
―deterministic‖ meaning that the automation can‘t be in more than one state at any one time,
or ―non deterministic‖, meaning that it may be in several states at once. This distinguishes the
class of automata as DFA or NFA.
‐ The DFA, i.e. Deterministic Finite Automata can‘t be in more than one state at any time.
‐ The NFA, i.e. Non-Deterministic Finite Automata can be in more than one state at a time.
Applications:
The finite state machines are used in applications in computer science and data networking. For example,
finite-state machines are basis for programs for spell checking, indexing, grammar checking, searching large
bodies of text, recognizing speech, transforming text using markup languages such as XML & HTML, and
network protocols that specify how computers communicate.
Definition
A deterministic finite automaton is defined by a quintuple (5-tuple) as (Q, Σ, δ, q0, F).
Where,
Q = Finite set of states,
Σ = Finite set of input symbols,
δ = A transition function that maps Q × Σ -> Q
q0 = A start state; q0 ∈ Q
F = Set of final states; F ⊆ Q.
A transistion function δ that takes as arguments a state and an input symbol and returns a
state. In our diagram, δ is represented by arcs between states and the labels on the arcs.
For example
If s is a state and a is an input symbol then δ(p,a) is that state q such that there are arcs
labled ‗a‘ from p to q.
There are two preferred notations for describing this class of automata;
‐ Transition Table
‐ Transition Diagram
a) Transition Table: -
Transition table is a conventional, tabular representation of the transition function δ that takes
the arguments from Q × Σ & returns a value which is one of the states of the automation. The
row of the table corresponds to the states while column corresponds to the input symbol. The
starting state in the table is represented by -> followed by the state i.e. ->q, for q being start
state, whereas final state as
*q, for q being final state.
The entry for a row corresponding to state q and the column corresponding to input a, is the
state δ (q, a).
For example:
I. Consider a DFA;
Q = {q0, q1, q2, q3}
Σ = {0, 1}
q0 = q0
F = {q0}
δ = Q × Σ -> Q
δ 0 1
* -> q0 q2 q1
q1 q3 q0
q2 q0 q3
q3 q1 q2
This DFA accepts strings having both an even number of 0‘s & even number of 1‘s.
b) Transition Diagram:
‐ For each state q in Q and each input a in Σ, if δ (q, a) = p then there is an arc from node q to p
labeled a in the transition diagram. If more than one input symbol cause the transition from
state q to p then arc from q to p is labeled by a list of those symbols.
‐ The start state is labeled by an arrow written with ―start‖ on the node.
‐ For the example I considered previously, the corresponding transition diagram is:
How a DFA process strings?
The first thing we need to understand about a DFA is how DFA decides whether or not to
―accept‖ a sequence of input symbols. The ―language‖ of the DFA is the set of all symbols that
the DFA accepts. Suppose a1, a2, …… an is a sequence of input symbols. We start out with the
DFA in its start state, q0. We consult the transition function δ also for this purpose. Say δ (q0,
a1) = q1 to find the state that the DFA enters after processing the first input symbol a1. We
then process the next input symbol a2, by evaluating δ (q1, a2); suppose this state be q2. We
continue in this manner, finding states q3, q4,
…, qn. such that δ (qi-1, ai) = qi for each i. if qn is a member of F, then input a1, a2, --- an is
accepted & if not then it is rejected.
The extended transition function of DFA, denoted by 𝛿̂ is a transition function that takes two
arguments as input, one is the state q of Q and another is a string w∈ Σ*, and generates a state
p ∈ Q. This state p is that the automaton reaches when starting in state q & processing the
sequence of inputs w.
i.e. 𝛿̂ (q, w) = p
Basis step: 𝛿̂ (q,є) =q. i.e. from state q, reading no input symbol stays at the same state.
Induction: Let w be a string from Σ * such that w = xa, where x is substring of w without last
symbol and a is the last symbol of w, then 𝛿̂ (q, w) = δ (𝛿̂ (q, x), a).
Thus, to compute 𝛿̂ (q, w), we first compute 𝛿̂ (q, x), the state the automaton is in after
processing all but last symbol of w. let this state is p, i.e. 𝛿̂ (q, x) = p.
Then, 𝛿̂ (q, w) is what we get by making a transition from state p on input a, the last symbol of w.
For Example
Now compute 𝛿̂ (q0,1001)
The language of DFA M = (Q, Σ, δ, q0, F) denoted by L(M) is a set of strings over Σ* that
are accepted by M.
i. e; L(M) = {w/ 𝛿̂ (q0, w) = p ∈ F}
That is; the language of a DFA is the set of all strings w that take DFA starting from start state to
one of the accepting states. The language of DFA is called regular language.
2. DFA accepting all string over Σ = {0, 1} ending with 3 consecutive 0‟s.
3. DFA accepting zero or more consecutive 1‟s.
i.e. L (M) = {1n / n = 0, 1, 2, ……}
4. DFA over {0, 1} accepting {1, 01}
Thus, NFA can also be interpreted by a quintuple; (Q, Σ, δ, q0, F) where δ is Q × Σ =2Q. Unlike
DFA, a transition function in NFA takes the NFA from one state to several states just with a
single input.
Example:
Here, from state q1, there is no any arc for input symbol 0 & no any arc out of q2 for 0 & 1. So,
we can conclude in a NFA, there may be zero no. of arcs out of each state for each input
symbol. While in DFA, it has exactly one arc out of each state for each input symbol.
δ, the transition function is a function that takes a state in Q and an input symbol in ∑ as
arguments and returns a subset of Q The only difference between an NFA and DFA is in type of
value that δ returns. In NFA, δ returns a set of states and in case of DFA it returns a single state.
For input sequence w = 00101, the NFA can be in the states during the processing of the input are as:
2. NFA over {0, 1} accepting strings {0, 01, 11}.
The Extended transition function of NFA
As for DFA‘s, we need to define the extended transition function 𝛿̂ that takes a state q and a
string of input symbol w and returns the set of states that is in if it starts in state q and
processes the string w.
Definition by Induction:
Basis Step: 𝛿̂ (q, ε) = {q} i.e. reading no input symbol remains into the same state.
Induction: Let w be a string from Σ* such that w = xa, where x is a substring of without last
symbol a.
Also let,
𝛿̂ (q, x) = {p1, p2, p3, …pk}
and
Thus, to compute 𝛿̂ (q, w) we first compute 𝛿̂ (q, x) & then following any transition from
each of these states with input a.
Consider, a NFA,
Now, computing for 𝛿̂ (q0, 01101)
Since the result of above computation returns the set of state {q0,q2) which include the
accepting state q2 of NFA so the string 01101 is accepted by above NFA.
Although there are many languages for which NFA is easier to construct than DFA, it can be
proved that every language that can be described by some NFA can also be described by some
DFA.
The DFA has more transition than NFA and in worst case the smallest DFA can have 2 n state
while the smallest NFA for the same language has only n states.
We now show that DFAs & NFAs accept exactly the same set of languages. That is non-
determinism does not make a finite automaton more powerful.
To show that NFAs and DFAs accept the same class of language, we show;
Any language accepted by a NFA can also be accepted by some DFA. For this we describe an
algorithm that takes any NFA and converts it into a DFA that accepts the same language. The
algorithm is called ―subset construction algorithm‖.
The key idea behind the algorithm is that; the equivalent DFA simulates the NFA by keeping
track of the possible states it could be in. Each state of DFA corresponds to a subset of the set
of states of the NFA, hence the name of the algorithm. If NFA has n-states, the DFA can have 2 n
states (at most), although it usually has many less.
To convert a NFA, N = (QN, Σ, δN, q0, FN) into an equivalent DFA D = (QD, Σ, δD, q0, FD), we have
following steps.
1. The start state of D is the set of start states of N i.e. if q0 is start state of N then D has start
state as {q0}.
2. QD is set of subsets of Q N i.e. QD = 2QN. So, QD is power set of QN. So if QN has n states then QD
will have 2n states. However, all of these states may not be accessible from start state of Q D so
they can be eliminated. So QD will have less than 2n states.
3. FD is set of subsets S of QN such that S ∩ FN ≠ φ i.e. FD is all sets of N‘sstates that include at
least one final state of N.
i.e. for any state {q0, q1, q2, … qk} of the DFA & any input a, the next state of the DFA is the set
of all states of the NFA that can result as next states if the NFA is in any of the state‘s q0, q1, q2,
… qk when it reads a.
Example 1
The same table can be represented with renaming the state on table entry as
Or
The other state is removed because they are not reachable from start state.
Example 2:
Example of DFA minimization: Construct a minimum state automata equivalent to given automata?
Solution:
Transition table for above automata.
State Input = a Input = b
q1 q2 q4
q2 q1 q1
q3 q2 q4
q4 Final state q4 q4
Step 01: Remove steps which are unreachable from initial states.
A0 = {q4}
A1 = {q0,q1,q2,q3}
π0 = {q4}, {q0,q1,q2,q3}
A0 cannot be partition further.
In A1,
B0 = {q0, q2}
B1 = {q1, q3}
π1 = {q4}, {q0,q2}, {q1,q3}
Now, B0 and B1 cannot be partitioned further.
{q4},
{q0,q2},
{q1,q3}
Minimized DFA:
A Regular Expression (RegEx) is a sequence of characters that defines a search pattern, mainly used for
pattern matching in strings. It is a formal way to represent regular languages using symbols and operators.
1. Email Validation:
css
CopyEdit
^[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+$
ruby
CopyEdit
^\(\d{3}\) \d{3}-\d{4}$
markdown
CopyEdit
(1*01*01*)*
3. Regular Grammar
A Regular Grammar is a set of rules (productions) that define a regular language. These grammars can be:
nginx
CopyEdit
S → 0S | 1S | 1
less
CopyEdit
S → 1S | 0A
A → 1A | 0S | ε
less
CopyEdit
S → aA
A → aA | bA | ε
The Pumping Lemma is a property used to prove that a language is not regular. It states that for any regular
language, there exists a constant p such that any string w with length ≥ p can be divided into three parts xyz,
where:
1. |xy| ≤ p
2. |y| > 0
3. xyⁿz is in L for all n ≥ 0
1. L = {aⁿbⁿ | n ≥ 0}
o Strings like "ab", "aabb", "aaabbb" are in L.
o Cannot be recognized by a finite automaton since it requires counting.
3. L = {aⁿbⁿcⁿ | n ≥ 0}
o Strings like "abc", "aabbcc" are in L.
o Requires three-way matching, which finite automata cannot do.
Summary
4o
You said:
more detail for each
ChatGPT said:
In this chapter, we will cover regular expressions, their connection to regular languages, regular
grammars, and the Pumping Lemma, which helps identify non-regular languages. Each concept will be
explained in detail with three examples.
1. Regular Expressions
A Regular Expression (RegEx) is a formal notation used to define patterns in strings. Regular expressions
describe a set of strings that belong to a regular language, which can be recognized by finite automata.
1. Email Validation
css
CopyEdit
^[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+$
ruby
CopyEdit
^\(\d{3}\) \d{3}-\d{4}$
o This ensures that each "0" appears in pairs, making it valid for strings like "00", "1100", and "1010".
A Regular Language is a language that can be described using a regular expression and can be recognized by
a finite state machine (DFA or NFA).
3. Regular Grammar
A Regular Grammar is a formal grammar that generates a regular language. It consists of production rules
that replace non-terminal symbols with terminal symbols in a specific manner.
less
CopyEdit
S → 1S | 0A
A → 1A | 0S | ε
less
CopyEdit
S → aA
A → aA | bA | ε
The Pumping Lemma is a property used to prove that a language is not regular.
For any regular language L, there exists a constant p (pumping length) such that any string w in L with |w| ≥
p can be divided into three parts xyz, where:
1. |xy| ≤ p
2. |y| > 0
3. xyⁿz is in L for all n ≥ 0
1. L = {aⁿbⁿ | n ≥ 0}
o Example Strings: "ab", "aabb", "aaabbb".
o Why Not Regular?: The finite automaton cannot remember how many "a"s were read before reading
"b"s.
3. L = {aⁿbⁿcⁿ | n ≥ 0}
o Example Strings: "abc", "aabbcc".
o Why Not Regular?: The finite automaton cannot keep track of three different counts simultaneously.
Summary
Concept Explanation
Regular Expressions Define patterns that match regular languages.
Regular Languages Can be expressed using regular expressions and recognized by finite automata.
Regular Grammar Generates regular languages using right or left linear rules.
Pumping Lemma Helps prove that certain languages are not regular.
Would you like a detailed DFA or NFA construction for any of these examples? 😊
4o
You said:
regular language and non regular language different
ChatGPT said:
Regular and non-regular languages are classifications of formal languages based on whether they can be
recognized by a finite automaton.
These languages cannot be recognized by a finite automaton because they require counting or tracking long-
distance dependencies.
1. L = { aⁿbⁿ | n ≥ 0 }
o Example Strings: "ab", "aabb", "aaabbb".
o Why Not Regular?: The finite automaton cannot count the number of "a"s and match them with the
same number of "b"s.
3. L = { aⁿbⁿcⁿ | n ≥ 0 }
o Example Strings: "abc", "aabbcc", "aaabbbccc".
o Why Not Regular?: Requires matching three different counts, which a finite automaton cannot track.
Summary
Aspect Regular Language Non-Regular Language
Recognized By DFA/NFA (Finite Automaton) Pushdown Automaton / Turing Machine
Memory Needed? No memory (limited states) Needs memory (stack/tape)
Example Languages `{ w w ends in "01" }, { w
Proof Technique Construct DFA/NFA Use Pumping Lemma to show it’s not regular