S. J. P. N.
TRUST’S
HIRASUGAR INSTITUTE OF TECHNOLOGY, NIDASOSHI
Accredited at 'A' Grade by NAAC
Programmes Accredited by NBA: CSE, ECE, EEE & ME.
Department of Computer Science & Engineering
.IN
Course: Theory of Computation(BCS503)
C
Module 2: Regular Expressions and
N
Languages SY
U
VT
Prof. A. A. Daptardar
Asst. Prof. , Dept. of Computer Science & Engg.,
Hirasugar Institute of Technology, Nidasoshi
Regular Expressions
.IN
C
N
SY
U
VT
2
.IN
C
REGULAR EXPRESSIONS
N
SY
U
What is a RE?
VT
3
VT
U
SY
N
C
.IN
4
VT
U
SY
N
C
.IN
5
VT
U
SY
N
C
.IN
6
VT
U
SY
N
C
.IN
7
VT
U
SY
N
C
.IN
8
VT
U
SY
N
C
.IN
9
VT
U
SY
N
C
.IN
10
VT
U
SY
N
C
.IN
11
VT
U
SY
N
C
.IN
12
VT
U
SY
N
C
.IN
13
VT
U
SY
N
C
.IN
14
VT
U
SY
N
C
.IN
15
VT
U
SY
N
C
.IN
16
VT
U
SY
N
C
.IN
17
VT
U
SY
N
C
.IN
18
VT
U
SY
N
C
.IN
19
VT
U
SY
N
C
.IN
20
VT
U
SY
N
C
.IN
21
VT
U
SY
N
C
.IN
22
.IN
FINITE AUTOMATA &
C
N
REGULAR EXPRESSIONS
SY
U
RE E NFA
VT
DFA RE
23
Relation between DFA, NFA,
ENFA & RE
.IN
C
N
SY
U
VT
24
.IN
C
OBTAIN ENFA FROM RE
N
SY
U
RE E NFA
VT
25
VT
U
SY
N
C
.IN
26
VT
U
SY
N
C
.IN
27
VT
U
SY
N
C
.IN
28
VT
U
SY
N
C
.IN
29
VT
U
SY
N
C
.IN
30
VT
U
SY
N
C
.IN
31
VT
U
SY
N
C
.IN
32
VT
U
SY
N
C
.IN
33
VT
U
SY
N
C
.IN
34
VT
U
SY
N
C
.IN
35
VT
U
SY
N
C
.IN
36
VT
U
SY
N
C
.IN
37
.IN
C
OBTAIN RE FROM FA
N
SY
U
DFA RE (Kleene’s Theorem)
VT
DFA RE (By Eliminating States)
38
Identity Rules for Regular
Expressions
• Two Regular Expressions p and q are
.IN
equivalent if p and q represent the same
C
set of strings.
N
SY
• We now give the identities for regular
expressions, these are useful for
U
simplifying the regular expressions.
VT
39
VT
U
SY
N
C
.IN
40
Kleene’s Theorem
.IN
C
N
SY
U
VT
41
VT
U
SY
N
C
.IN
42
VT
U
SY
N
C
.IN
43
VT
U
SY
N
C
.IN
44
VT
U
SY
N
C
.IN
45
VT
U
SY
N
C
.IN
46
VT
U
SY
N
C
.IN
47
VT
U
SY
N
C
.IN
48
Another Approach is:
.IN
C
N
SY
U
VT
49
VT
U
SY
N
C
.IN
50
VT
U
SY
N
C
.IN
51
.IN
C
OBTAIN RE FROM FA
N
SY
U
DFA RE (By Eliminating States)
VT
52
VT
U
SY
N
C
.IN
53
VT
U
SY
N
C
.IN
54
VT
U
SY
N
C
.IN
55
VT
U
SY
N
C
.IN
56
VT
U
SY
N
C
.IN
57
VT
U
SY
N
C
.IN
58
VT
U
SY
N
C
.IN
59
Applications of RE
• Regular Expressions in UNIX
.IN
– Eg. grep (Global search for RE and Print)
C
• Pattern Matching
N
SY
• Lexical Analysis
U
– First phase of CD
VT
60
.IN
Properties of Regular
C
Languages
N
SY
U
VT
61
4.1 Proving Languages not to be Regular
.IN
4.2 Closure Properties of Regular
C
Languages
N
SY
4.4 Equivalence and Minimization of
Automata
U
VT
62
4.2 Closure Properties of
Regular Languages
• In this section, we shall prove several theorems of the
.IN
form “if certain languages are regular, and a language L
is formed from them by certain operations (e.g., L is the
C
union of two regular languages), then L is also regular”.
N
• These theorems are often called closure properties of
SY
the regular languages, since they show that the class of
U
regular languages is dosed under the operation
mentioned.
VT
• Closure properties express the idea that when one (or
several) languages are regular, then certain related
languages arc also regular.
63
• Here is a summary of the principal closure
properties for regular languages:
1. The union of two regular languages is regular.
.IN
2. The intersection of two regular languages is regular·.
3. The complement of a regular language is regular.
C
4. The difference of two regular languages is regular.
N
5. The reversal of a regular language is regular.
SY
6. The closure (star) of a regular language is regular.
U
7. The concatenation of regular languages is regular.
VT
8. A homomorphism (substitution of strings for symbols) of
a regular language is regular.
9. The inverse homomorphism of a regular language is
regular.
64
• 4.2.1 Closure of Regular Languages
Under Boolean Operations
• Our first closure properties are the three
.IN
boolean operations: union, intersection,
and complementation:
C
N
SY
U
VT
65
.IN
C
N
• Closure Under Complementation
SY
• Starting with a regular expression, we could find
a regular expression for its complement as
U
follows:
VT
1. Convert the regular expression to an e-NFA.
2. Convert that ENFA to a DFA by the subset construction.
3. Complement the accepting states of that DFA.
4. Turn the complement DFA back into a regular expression using
the construction 66
VT
U
SY
N
C
.IN
67
VT
U
SY
N
C
.IN
68
VT
U
SY
N
C
.IN
69
VT
U
SY
N
C
.IN
70
VT
U
SY
N
C
.IN
71
VT
U
SY
N
C
.IN
72
.IN
C
REGULAR EXPRESSIONS
N
SY
U
VT
73
VT
U
SY
N
C
.IN
74
VT
U
SY
N
C
.IN
75
VT
U
SY
N
C
.IN
76
VT
U
SY
N
C
.IN
77
VT
U
SY
N
C
.IN
78
VT
U
SY
N
C
.IN
79
4.4 Equivalence and
Minimization of Automata
• In this section we discuss how to test whether two
.IN
descriptors for regular languages are equivalent, in the
sense that they define the same language.
C
N
• An important consequence of this test is that there is a
way to minimize a DFA. That is, we can take any DFA
SY
and find an equivalent DFA that has the minimum
U
number of states.
VT
• In fact, this DFA is essentially unique: given any two
minimum-state DFA 's that are equivalent, we can
always find a way to rename the states so that the two
DFA's become the same.
80
4.4.1 Testing Equivalence of
States
• The language generated by a DFA is unique. But, there
.IN
can exist ·many DFA' s that accept the same language.
C
• In such cases, the DFA's are said to be equivalent.
N
• During computations, it is desirable to represent the DFA
SY
with fewer states since the space is proportional to the
number of states of DFA.
U
• For storage efficiency, it is required to reduce the
VT
number of states and there by minimize the DFA.
• This can be achieved first by finding the distinguishable
and indistinguishable states.
81
VT
U
SY
N
C
.IN
82
VT
U
SY
N
C
.IN
83
VT
U
SY
N
C
.IN
84
VT
U
SY
N
C
.IN
85
• Minimize the following DFA.
.IN
C
N
SY
U
VT
86
4.1 Proving Languages not to
be Regular
• Not every language is a regular language.
.IN
In this section, we shall introduce a
C
powerful technique, known as the
N
"pumping lemma," for showing certain
SY
languages not to be regular.
U
– 4.1.1 The Pumping Lemma for Regular
VT
Languages
– 4.1.2 Applications of the Pumping Lemma
87
4.1.1 The Pumping Lemma for Regular
Languages
.IN
C
N
SY
U
VT
88
VT
U
SY
N
C
.IN
89
VT
U
SY
N
C
.IN
90
4.1.2 Applications of Pumping
Lemma
• The general strategy used to prove that
.IN
certain language is not regular is shown
C
below:
N
SY
– Assume that the language L is regular and the
number of states in FA be n.
U
– Select the string say w and break it into the substrings
VT
x, y and z such that w = xyz with the constraints |w| ≥
n, |xy| ≤ n and |y| ≥ 1.
– Find any i such that xyiz Ɇ L. According to pumping
lemma, xyiz ϵ L. So, the result is contradiction to the
assumption that the language is regular. Therefore,
91
the given language L is not regular.
• Example: Show that L = {wwR | w ϵ (0+1)*} is not regular.
• Step1: Let L is regular and n be the number of states in
FA. Consider the string
.IN
C
N
SY
U
VT
92
VT
U
SY
N
C
.IN
93
and so number of a’s followed by equal
number of b’s does not exist. But,
according to pumping lemma, n number of
.IN
a’s should be followed by n number of b’s
which is a contradiction to the assumption
C
that the language is regular.
N
SY
• So, the language L = {anbn | n ≥ 0} is not
regular.
U
VT
94
VT
U
SY
N
C
.IN
95