Chapter 3:
Automata Theory, Regular Language
and Grammar
Prepared by:
Er.Sharad Neupane
Associate Professor,KEC
Kalimati,Kathmandu
Alphabet, String and Language:
Alphabet: An alphabet is a finite,non empty set of symbols denoted by Σ.For example
Σ={0,1} , Σ={a,b,c} etc.
String: A string or word over an alphabet Σ is a finite sequence of symbols taken from Σ.
Generally it is denoted by w.For example 0110,11,001 are three strings over an alphabet
Σ={0,1} and aab,accb,b,cc are four strings over an alphabet Σ={a,b,c}.
Empty string:String with zero occurrence of symbols is called an empty string and is
denoted by ‘ε’ or ‘λ’ or ‘e’.
Length of the string:
The length of the string is the number of symbols present in that string. Example, let
w=0100 then|w|=|0100|=4 and |ε|=0.
Kleene-star(Kleene closure) of an alphabet[Σ*]:
The Kleene-star over an alphabet Σ is denoted by Σ* which is the set of all
strings over an Σ whose length is zero or more.
Let, Σ={0,1} then Σ*={ε,0,1,01,10,001,…}
Similarly {0}* ={ε,0,00,000,…}
{1}* ={ε,1,11,111,…}
Positive closure of an alphabet [ Σ+ ]:
The positive closure over an alphabet Σ is denoted by Σ+ which is the set of
all the strings over an alphabet Σ except an empty string. Let, Σ={0,1} then
Σ+ ={0,1,01,10,001,…}.
Similarly {0}+ ={0,00,000,…}
{1}+ ={1,11,111,…}
Hence we can conclude that
Σ*= Σ+ ∪ {ε}
Power of an alphabet:[ Σk ]
Power of an alphabet is denoted by Σk which is the set of all strings w whose length
is k from an alphabet.
Example:
Let, Σ={0,1}
then Σ0 ={є}
Σ1 ={0,1}
Σ2 ={00,01,10,11}
Σ3 ={000,001,010,011,100,101,110,111}
.
.
.
Concatenation of string:
The two strings over the same alphabet can be combined to form an another
string by the operation of concatenation. The concatenation of two strings x
and y is written as or x.y or xy .For example if x=010 and y=101 then
xy=010101
Suffix and prefix of the string:
If w=xy for some x, then y is a suffix of w.
If w=xy for some y, then x is a prefix of w.
Substring: Any string is said to be the substring of a string ‘w’ if it contains
some or all the symbols from w. Any string is said to be the substring of itself.
Let,w=aaaabbaba be the given string then abb is the substring from w.
Reversal of a string:
Let w by any string then its reversal is denoted by wR
If w=baabb then wR=bbaab
Language:
A set of all strings which are taken from some Σ* where Σ is the particular alphabet
is called a language L.
L ⊆ Σ*.
For example, L={w є{a, b}*:w has odd numbers of a’s}
={a,ab,abaa,bba….}
L={w є{0,1}*:w has an even numbers of zeros}
={є ,1 ,00,11,01101…}
Finite State Machine(FSM):
A finite state machine is a computation model. It can be used to simulate sequential
logic and some computer programs.
Formal Definition: It is a six tuple machine with M=(Q,∑,δ,q0,R,O).
Where,
Q=Finite set of states
∑=Set of input symbols
δ=Transition function
q0=Initial state
R=Output relation function
O=Set of output symbols
FSM can be represented by state/transition diagram as:
The transition diagram is a digraph. The vertices represent the states and the states
are represented by a circle. The initial state is indicated by an arrow. The directed
edges indicates the change in state or the transition and the output that is obtained
when the input is given.
FSM also can be represented by the transition/state table. The transition/state table
of the above state diagram is:
Q)Design a FSM that performs serial addition(Imp):
Let M=(Q,∑,δ,q0,R,O) be the required FSM.
Here, ∑={00, 01, 10, 11}
O={0,1}
Given inputs are x, y
i)Either add x and y or ii)Add x, y and 1 Depending on whether the carry bit was 0
or 1. So, we have two states C(carry) and NC (no carry). The initial state is NC.
Q={NC, C}
Let NC->σ0 0
C-> σ1
Finite State Automata:(FSA)
It is a special kind of FSM in which it has no output and no output function but has
accepting or final state. Automata is a mathematical model or machine that is used
to check if the string is the part of language or not.
Formal Definition:It is a five tuple machine M= (Q,∑,δ,q0,F).
Where,
Q=Finite set of states
∑=Set of input symbols
δ=Transition function
q0= Initial state
F=Set of final states
Let us consider M= (Q,∑,δ,q0,F).
Where,Q={q0, q1, q2}
∑={0,1}
δ consists of:
δ(q0, 0)-> q0
δ(q0, 1)-> q1
δ(q1, 0)-> q2
δ(q1, 1)-> q1
δ(q2, 0)-> q0
δ(q2, 1)-> q2
The initial state , q0=q0
The final state F= q2 [Note: Final state is represented by double circle in state
diagram of FSA and by * in state table]
The state/transition diagram and state/transition table for the above FSA is shown
below:
Note: Any string is said to be accepted or recognized by a FSA, M=(Q,∑,δ,q0,F) if
we can reach to the final or accepting state starting from initial state on that string.
The string 010011 is accepted by the following FSA because we can reach to final
state starting from initial state on the inputs from the given string.
Types of Finite State Automata(FSA):
There are two types of FSA and they are:
1)Non-Deterministic Finite State Automata(NDFA or NFA).
2)Deterministic Finite State Automata(DFA).
In DFA, next-state function (transition function) takes us to a uniquely defined state,
whereas in NFA,the next-state function(transition function) takes us to a set of
states.
In NFA, transition function is defined as Q× ∑-> 2Q because from a state, transition
can occur to any combination of Q states. If A and B are two states, then the
possible number of states are A, B, AB and ᶲ.
In DFA,transition function is defined as Q×∑->Q i.e. next state is unique but NFA
has multiple next states. Design of NFA is simple than design of DFA.
Representation of DFA and language of DFA:
DFA is a five tuple machine given by M= (Q,∑,δ,q0,F).
Where,
Q=Finite set of states
∑=Set of input symbols
δ=Transition function and is defined as: Q×∑->Q
q0= Initial state
F=Set of final states
-DFA can be represented either by State Diagram/Transition Diagram or
Transition Table/State Table.
Language of DFA
Language of DFA,M= (Q,∑,δ,q0,F) is denoted by L(M) which is the set of all
strings over Σ* that are accepted by M.
Q1)Design a finite state automata that accepts only those set of strings over {a,b}
which starts with baa.Precisely only those strings which begin with baa should be
accepted and other strings over{a, b} should be rejected. Your design should include
the proper definition of FSA,transition table and transition diagram.
Let the required FSA be M= (Q,∑,δ,q0,F).
State Diagram Transition Table
Here, q4 is the dead
state
Here, Q={q0, q1, q2, q3, q4 }
∑={a,b}
q0 = q0
F= {q3}
Transition function δ is defined as:
δ(q0,a)= q4
δ(q0,b)= q1
δ(q1,a)= q2
δ(q1,b)= q4
δ(q2,a)= q3
δ(q2,b)= q4
δ(q3,a)= q3
δ(q3,b)= q3
δ(q4,a)= q4
δ(q4,b)= q4
Let us check for the string “baaba”
(q0,baaba)|-(q1,aaba)
|-(q2,aba)
|-(q3,ba)
|-(q3, a)
|-(q3, є)
Accepted
Let us check for the string “baba”
(q0,baba)|-(q1,aba)
|-(q2,ba)
|-(q4,a)
|-(q4, є)
Rejected
Q2)Design a FSA over {a,b} that contains at least three b’s.
Q3)Design a FSA over {a,b} that contain an odd number of b’s
Q4)Design a FSA ending with 100
Q5)Design a FSA that contains three consecutive zeros
Q6)Design a FSA that does not contain three consecutive zeros.
Q7)Design a FSA that contains set of all the strings starting with zero.
Q8)Design a FSA that accepts all strings over {a,b} that contains the
string aabb in it.
Q9)Design a FSA that accept sets of all strings over {0,1} of length 2
Design a DFA to accept those set of strings starting with “ab” and ending in “ba”
over an alphabet {a,b}.
Non-Deterministic Finite Automata(NFA/NDFA):
It is a five tuple machine M= (Q,∑,δ,q0,F).
Where,
Q=Finite set of states
∑=Set of input symbols
δ=Transition function and is defined as: Q× ∑-> 2Q
q0= Initial state
F=Set of final states
Equivalence of DFA and NFA:
For each NFA there is an equivalence DFA i.e. If there is a NFA accepting the
language L then there exists DFA for given NFA that also accepts the same language
L i.e.L(NFA)=L(DFA)
Conversion from NFA to DFA:(Imp)
This is NFA because state q0 has no outgoing edge level 1 and state q1 has no
outgoing edge level 0.Also there is no unique transition on input 0 in state q0 i.e.
when input 0 is given to the state it changes to {q0, q1} which is not unique.
Let δ’ be the transition function for required DFA and let δ is the transition function of
given NFA.The initial state of given NFA is q0 hence the initial state of given DFA also
will be q0.The transition function δ’ for required DFA can be defined as:
δ’(q0,0)=δ(q0,0)={q0, q1}[New state]
δ’(q0,1)=δ(q0,1)={ø} [Dead State (This is also a new state)]
δ’({q0, q1},0)=δ(q0,0) U δ(q1,0)={q0, q1}U{ø}={q0, q1}
δ’({q0, q1},1)=δ(q0,1) U δ(q1,1)= {ø}U{q1}={q1} [New state]
δ’(q1,0)=δ(q1,0)={ø}
δ’(q1,1)=δ(q1,1)={q1}
Note: Any non-final state is said to be a dead state if it transits in itself for every input
symbols.
Regular expression and its characteristics:
Regular expression can be described as the algebraic expression formed of symbols and the
operators. Symbols include alphabet, epsilon and operator includes concatenation(.), union (+ or
U or | ), kleene star (*) and positive closure(+).
Regular expression is a way to represent the regular language i.e. language accepted or recognized
by FSA(NFA or DFA)
Operators used in regular expressions:
1.Kleen Closure (*)
It is the repetition of the symbol infinite times including an empty string, a* = λ, a, aa, aaa….
2. Positive Closure ( + )
It is the repetition of the symbol infinite times without an empty string, a+= a, aa, aaa….
3. Concatenation (.)
It is the combination of the symbol, abc.cdc =abccdc
4. Union ( + or U or | )
It is the choice of the symbol, either this or that, a+b= a or b
Precedence or priority of the operators in the regular expression
1. Parenthesis or bracket ( )
2. * ( Kleene Closure)
3. + (Positive Closure)
4. . (Concatenation )
5. +(Union)
Some examples of representation in regular expressions:
1. Language L(R) = {0, 1, 2}
Regular expression R = 0 + 1 + 2
2. L(R) = {abb, a, b, bba}
R = abb + a + b + bba
3. L(R) = {λ, 0, 00, 000….}
R = 0*
4. L(R) = {1, 11, 111,1111….}
R = 1+
Regular language:
Regular languages can be characterized as languages defined by regular
expressions. A language is regular iff it is accepted by FSA.
These operators are used in regular language
1. Union: The union of two languages L and M denoted by L+M is the set of
strings that are in either L or M or both if L={001,10,111} and M={∈,001}, then
L+M= {∈, 10, 001, 111}
2. Concatenation: The concatenation of language L and M is the set of strings that
can be formed by taking any string in L and concatenation it with any string in
M and is denoted by L.M
If L = {“ab”, “c”} and M = {“d”, “ef”} then
L.M = {“abd”, “abef”, “cd”, “cef”}
If L = {001, 10, 111} and M={∈, 001}, then
L.M = {001, 10, 111, 001001, 10001, 111001}
3.The closure(iteration or star or Kleene star) of language.
The closure represents the set of those strings including empty string that can be
formed from the given language.
Let L={0,1}
L*={0,1}*= {˄,0,1,01,10,11,111,…}
Properties of Regular Language:
1]The union of two regular languages is also a regular language.
2]The concatenation of two regular languages is also a regular language.
3]Kleene star of a regular language is also a regular language.
4]The complementation of a regular language is also a regular language.
5]The intersection of two regular languages is also a regular language.
Regular expression for the string specified(Imp):
1. Start with ab
ab (a + b)*
2. Start with bba
bba(a + b )*
3. End with abb
(a+b)*abb
4. Contains abb
(a + b)*abb(a + b)*
5. Start and end with a
a + a(a+b)*a
6.Start and end with the same symbol
a+b+a(a+b)*a+b(a+b)*b
7.Start and end with different symbol.
a(a+b)*b+b(a+b)*a
8.String with exactly two a’s.
b*ab*ab*
9.String with at least two a’s.
(a+b)*a(a+b)*a(a+b)*
10.String with at most two a’s.
b*+b*ab*+b*ab*ab*.
11.Third symbol from left end is b.
(a+b)(a+b)b(a+b)*
12.Strings of length at most 2.
Є+a+b+aa+ab+ba+bb
13.Strings containing 15th symbol from the left end as a
(a+b)14 a (a+b)*
Context Free Grammar(CFG) and Context Free Language(CFL)
Language and grammar:
Grammar is the language generator and FSA is the language acceptor. The set of rules which are used for
generating strings is known as grammar.
Formal Definition of Grammar: A grammar is a quadruple G = (N, T, P ,S).
Where,N=set of non terminals or variables.[Rules are defined for non-terminals symbols]
T=set of terminals.[Rules are not defined for terminal symbols]
P=set of rules/productions.
S=start symbol.
Let G=(N,T,P,S) be a grammar. The language generated by G denoted by L(G) is the set of all strings of
terminals that are derivable from the start symbol S.
Let N={S,A}
T={a,b}
P={S->aA,S->b,A->aa} and S is the start symbol.
Determine the language generated by this grammar.
Here, S->aA
->aaa [.:A->aa]
S->b
Thus, the language generated by given grammar i.e. L(G)={aaa,b}.
Context Free Grammar(CFG):
Among different types of grammar CFG is also a type of grammar.
Context Free Grammar is the quadruple given by G={V,∑,R,S}
Where,
V=Set of terminals and non-terminal symbols.
∑=Set of terminals symbols.
R=Set of rules/Productions.
R has the production of the form:
A→ γ
Here left hand side of the production rule, A contains single non-terminal and right hand
side of the production rule, γ ∈ (N+T)*
S=Start symbol.
Context Free Language:(CFL)
The language of a context free grammar is the set of strings that are generated from the
set of rules of given CFG starting from the start symbol. The language generated by the
CFG is denoted as L(G) which is the CFL.
Q.Write down the CFG to generate palindrome strings for binary numbers.(Imp)
Solution:
Let G={V,∑,R,S} be the required CFG.
R can be defined as:
R={S->0S0,S->1S1,S->1,S->0,S-> ∈}
Here,
V={S,0,1}
∑={0,1, ∈}
S={S}
Let us generate the string”010010”
S->0S0
->01S10 [:. S->1S1]
->010S010 [:. S->0S0]
->010010 [:.S-> ∈]
Types of grammar:(Imp)
On the basis of production rules there are four types of grammar:
1)Unrestricted or Recursive enumerable grammar(Type 0):
If there is no restriction in the every production of the grammar, it is called unrestricted grammar.
For example:
S → aA|∈
A → abcdA
aAb → bB
Language generated by this grammar is accepted or recoginzed by Turing Machine.
2)Context Sensitive Grammar(Type 1):
Type-1 or context sensitive grammar has the production of the form:
α→β
Here, length of α ≤ length of β.
For example:
aAb → bbb
aA → bbb
aAb → bb (Not allowed)
Language generated by this grammar is accepted or recognized by Linearly Bounded Automata.
3)Context Free Grammar(Type 2):
If the production is of the form:
A→ γ
Here left side of the production rule, A contains single non-terminal and right hand side of
the production rule, γ ∈ (N+T)*
For example:
N={S,A,B} T={a, b} P={S → aA, S → ∈ ,A → aAB, B → b, A → a}.Start symbol S.
The language generated by this grammar is recognized by Push down Automata.
4)Regular Grammar (Type 3):
If every production is of the form A → a or A → aB Where, A,B ∈ N
a∈T
Left side of production = single non terminal.
Right side of production = either single terminal or terminal followed by a non terminal.
For example:
N={S,A} T= {a, b} P= {S → bS, S → aA, A → aS, A → bA, A → a ,S → b}. Starting
symbol S
The language generated by this grammar is recognized by Finite State Automata.
Imp
Grammar
Imp
Imp.