[go: up one dir, main page]

0% found this document useful (0 votes)
38 views33 pages

ToC Module 2 Handouts Sept19

The document discusses deterministic finite automata (DFAs) and their role in recognizing regular languages, particularly focusing on membership problems. It provides examples of DFAs designed to recognize strings based on specific conditions, such as having an odd number of 0's or having 0's at even positions. The document also defines key concepts related to DFAs, including configurations, acceptance, and transition functions.

Uploaded by

Vijeta Chaudhary
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
38 views33 pages

ToC Module 2 Handouts Sept19

The document discusses deterministic finite automata (DFAs) and their role in recognizing regular languages, particularly focusing on membership problems. It provides examples of DFAs designed to recognize strings based on specific conditions, such as having an odd number of 0's or having 0's at even positions. The document also defines key concepts related to DFAs, including configurations, acceptance, and transition functions.

Uploaded by

Vijeta Chaudhary
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

Finite State Automata and Regular Languages

Tapas Pandit

1 Deterministic Finite Automata


Unless otherwise specified, we consider Σ to be {0, 1} throughout our discussion. As mentioned earlier,
we are interested in solving membership problems or decisional problems. In this section, we solve these
problems using a simple model of computation, called finite state automaton/machine. Let us start with
the following question. Given a string 𝜔 ∈ Σ∗ , design a simple machine from scratch to recognize1 whether
the number of 0’s in 𝜔 is odd. The rules of the machine are as follows: The input will be given on a read-
only tape. The the machine has to read its input symbols one-by-one from the LHS and has to answer
when all its input symbols have been read. Of-course, the machine will have very limited memory. Note
that there is no restriction on the size of the input. Also keep in mind that machine does not know when
the input string comes to an end. Therefore, while reading its input one-by-one symbol, the machine has
to keep its answer ready at any instance. One simple idea is to keep a counter which gets incremented
each time when a 0 is encountered. Based on the value that is stored in the counter, the machine can
recognize the string. This approach requires a memory which depends on the size of the input string and
the machine has to implement addition operation.
Why not keep two states, ‘even’ and ‘odd’ which will represent the even number of 0’s and the odd
number of 0’s respectively. Initially, the machine will be in the even state without consuming any input
symbol. When the machine encounters a 0, it switches its state. When it encounters the symbol 1, remains
in the same state. See the transition diagram below. When all its input symbols have been scanned, the
machine checks it current state, and accepts the string, if the current state is odd (called final/accepting
state, generally marked by double circles/rectangles). Also note that a single arrow points to the even
state to indicate start/initial state.

1 1
0

𝐸𝑣𝑒𝑛 𝑂𝑑𝑑

For simplicity, let us replace even and odd states by 𝑞 0 and 𝑞 1 respectively. As we can see from
Figure 1, each time the machine takes a pair of inputs, a state 𝑞 ∈ {𝑞 0 , 𝑞 1 } and an input symbol 𝜎 ∈ Σ,
and determines its next state. That is, the machine works based on a function 𝛿 : 𝑄 × Σ −→ 𝑄, where
𝑄 = {𝑞 0 , 𝑞 1 } is called the set of finite states. This function is called transition function which is described
in the transition table below.
1 Accept and recognize will be used interchangeably throughout our discussion.

1
1 1
0
𝑞0 𝑞1

Figure 1: Machine for recognizing strings having odd number of 0’s.

input state (𝑞) input symbol (𝜎) next state (𝛿(𝑞, 𝜎))
𝑞0 0 𝑞1
𝑞0 1 𝑞0
𝑞1 0 𝑞0
𝑞1 1 𝑞1

It can be easily checked that the strings like 0, 000, 101, 1110 are all accepted by the machine in Figure
1. Actually, the machine recognizes the language 𝐿 = {𝜔 ∈ Σ∗ : #0’s in 𝜔 is odd}, that is, all the strings
of 𝐿 are accepted by the machine.
Let us consider another example. Design a machine that accepts those strings whose entry at each
even position is 0, such as 10, 00, 101, 1010, etc. We keep three states: ‘OP’ (for odd position), ‘EP’ (for
even position), and ‘R’ (for rejection). When the machine is in the OP state, it simply moves to the EP
state for any input symbol, as the machine is not interested in the symbol at the odd position. When the
machine is in the EP state and the input symbol is 0, the machine moves to the OP state. If the input
symbol is 1, the machine moves to the R state and remains there for all subsequent input symbols. The
machine description is given in Figure 2. Note that here both states OP and EP are final/accepting states.
The transition function is described in the table below:
0,1

0, 1 1
𝑂𝑃 𝐸𝑃 𝑅

Figure 2: Machine for recognizing strings with 0 at all even positions.

input state (𝑞) input symbol (𝜎) next state (𝛿(𝑞, 𝜎))
OP 0 EP
OP 1 EP
EP 0 OP
EP 1 R
R 0 R
R 1 R

It is easy to see that the machine recognizes the language

𝐿 = {𝜔 ∈ Σ∗ : if ℓ = |𝜔| > 1, then 𝜔2𝑖 = 0 ∀𝑖 ∈ [⌊ℓ/2⌋]},

where for 𝑡 ∈ N, [𝑡] = {1, 2, . . . , 𝑡}.

2
0, 1

0, 1 0, 1 1
𝑞0 𝑞1 𝑞2 𝑞3

𝑞4 0, 1

Figure 3: Machine for recognizing strings having 1 at the 3rd position from the LHS.

Before formally introducing the finite state machine, let us consider one more example. Design a
machine that accepts those strings whose 3rd bit from the LHS is 1. The machine is described in Figure
3, where it has five states 𝑞 0 , 𝑞 1 , . . . , 𝑞 4 . Regardless of the first two input symbols, the machine moves to
the state 𝑞 2 through 𝑞 1 . Then, if the 3rd bit from the LHS of the input string is 1, the machine enters the
state 𝑞 3 (the accepting state), otherwise, it enters the state 𝑞 4 . The machine remains in 𝑞 3 (resp. 𝑞 4 ) for
all subsequent input symbols.
All the machines discussed so far are called finite state machines or finite state automata, because the
number of states is finite. We now formally define finite state automaton. You might have noticed that at
each transition, the machines have only one option for their next state. For this reason, the machines are
called deterministic finite automata (DFAs).
Definition 1.1. A deterministic finite automaton (DFA) is a quintuple M = (𝑄, Σ, 𝛿, 𝑠, 𝐹), where

1. 𝑄 is the finite set of states,

2. Σ is the finite alphabet,

3. 𝑠 ∈ 𝑄 is the initial state,

4. 𝐹 ⊆ 𝑄 is the set of final states, and

5. 𝛿 : 𝑄 × Σ −→ 𝑄 is the transition function.


Example 1.1. The machine described in Figure 1 represents a deterministic finite automaton M =
(𝑄, Σ, 𝛿, 𝑠, 𝐹), where 𝑄 = {𝑞 0 , 𝑞 1 }, Σ = {0, 1}, 𝑠 = 𝑞 0 , 𝐹 = {𝑞 1 } and the transition function 𝛿 : 𝑄 × Σ −→ 𝑄
is given by 𝛿(𝑞 0 , 1) = 𝑞 0 , 𝛿(𝑞 0 , 0) = 𝑞 1 , 𝛿(𝑞 1 , 1) = 𝑞 1 and 𝛿(𝑞 1 , 0) = 𝑞 0 .
Example 1.2. The machine described in Figure 2 represents a deterministic finite automaton M =
(𝑄, Σ, 𝛿, 𝑠, 𝐹), where 𝑄 = {𝑞 0 , 𝑞 1 , 𝑞 2 }, Σ = {0, 1}, 𝑠 = 𝑞 0 , 𝐹 = {𝑞 0 , 𝑞 1 } and the transition function
𝛿 : 𝑄 × Σ −→ 𝑄 is given by 𝛿(𝑞 0 , 0) = 𝑞 1 , 𝛿(𝑞 0 , 1) = 𝑞 1 , 𝛿(𝑞 1 , 0) = 𝑞 0 , 𝛿(𝑞 1 , 1) = 𝑞 2 , 𝛿(𝑞 2 , 0) = 𝑞 2 and
𝛿(𝑞 2 , 1) = 𝑞 2 .
Definition 1.2 (Configuration). A configuration of a deterministic finite automaton M = (𝑄, Σ, 𝛿, 𝑠, 𝐹) is
any element of 𝑄 × Σ∗ , where the first component is the current state of the machine M and the second is
the portion of the input yet to be read. For example, (𝑞 2 , 𝑎𝑏𝑎𝑏𝑎𝑏) is a configuration of the DFA described
in Figure 4.
Remark 1.3. If 𝜔 be the input string and 𝑠 be the initial state, then (𝑠, 𝜔) is called initial configuration.
A configuration of the form (𝑞, 𝜖) is called final or accepting configuration, if 𝑞 ∈ 𝐹. A configuration of the
form (𝑞, 𝜖) is called non-final or rejecting configuration if 𝑞 ∉ 𝐹.

3
Input Tape (Read only)
a b a b a b a b

Reading Head

𝑞3 ..
.

𝑞2 𝑞𝑛

𝑞1 𝑞0

Finite Control

Figure 4: A snapshot of a deterministic finite automaton

Definition 1.3 (Binary relation over configurations). A binary relation ⊢M holds between two configura-
tions of M if and only if the machine M can pass from one to the other as a result of a single move.

Thus, if (𝑞, 𝜔) and (𝑞 ′, 𝜔 ′) are two configurations of M, then (𝑞, 𝜔) ⊢M (𝑞 ′, 𝜔 ′) if and only if 𝜔 = 𝑎𝜔 ′
for some symbol 𝑎 ∈ Σ and 𝛿(𝑞, 𝑎) = 𝑞 ′. In this case, we say that (𝑞, 𝜔) yields (𝑞 ′, 𝜔 ′) in one step. For
example, for the DFA defined in Figure 1, we can write (𝑞 0 , 0100) ⊢M (𝑞 1 , 100).
Remark 1.4. As we see the binary relation ⊢M is a function from 𝑄 × Σ+ to 𝑄 × Σ∗ , that is, for every
configuration except those of the form (𝑞, 𝜖) there is a uniquely determined next configuration. A config-
uration of the form (𝑞, 𝜖) signifies that M has consumed all its input symbols, and hence, its operation
ceases at this point.
We denote the reflexive, transitive closure of ⊢M by ⊢∗M which is nothing but the application of zero
step or a finite number of steps of the binary relation ⊢M . For example, for the DFA defined in Figure 1,
we can write (𝑞 0 , 0100) ⊢∗M (𝑞 0 , 0) as (𝑞 0 , 0100) ⊢M (𝑞 1 , 100) ⊢M (𝑞 1 , 00) ⊢M (𝑞 0 , 0).
Definition 1.4 (Acceptance). A string 𝜔 ∈ Σ∗ is said to be accepted by M if and only if there exists a
state 𝑞 ∈ 𝐹 such that (𝑠, 𝜔) ⊢∗M (𝑞, 𝜖). Finally, the language accepted by M, denoted by L (M), is the set of
all strings accepted by M.
Remark 1.5. The transition diagram of a DFA can always be viewed as a special type of directed graph,
where the states and transition arrows represent nodes and directed edges of the underlying graph respec-
tively. We say the DFA accepts a string 𝜔 = 𝜔1 · · · 𝜔 𝑛 if there is a directed walk from the initial node to a
node corresponding to a final state, where the directed edges are labeled with 𝜔1 , 𝜔2 , . . . , 𝜔 𝑛 respectively.
Example 1.6. Let us first write down the DFA M in Figure 3 as per definition. M = (𝑄, Σ, 𝛿, 𝑠, 𝐹), where
𝑄 = {𝑞 0 , . . . , 𝑞 4 }, Σ = {0, 1}, 𝑠 = 𝑞 0 , 𝐹 = {𝑞 3 } and the transition function 𝛿 : 𝑄 × Σ −→ 𝑄 is given
by 𝛿(𝑞 0 , 0) = 𝑞 1 , 𝛿(𝑞 0 , 1) = 𝑞 1 , 𝛿(𝑞 1 , 0) = 𝑞 2 , 𝛿(𝑞 1 , 1) = 𝑞 2 , 𝛿(𝑞 2 , 1) = 𝑞 3 , 𝛿(𝑞 2 , 0) = 𝑞 4 , 𝛿(𝑞 3 , 0) = 𝑞 3 ,
𝛿(𝑞 3 , 1) = 𝑞 3 , 𝛿(𝑞 4 , 0) = 𝑞 4 and 𝛿(𝑞 4 , 1) = 𝑞 4 . The above DFA M accepts/recognizes the language

L (M) = {𝑥 ∈ {0, 1}∗ : |𝑥| ≥ 3 and 3rd bit of 𝑥 from the LHS is 1}. (1)

4
For example, M accepts the string 101011. In fact, using the notation ⊢M we have,

(𝑞 0 , 101011) ⊢M (𝑞 1 , 01011)
⊢M (𝑞 2 , 1011)
⊢M (𝑞 3 , 011)
⊢M (𝑞 3 , 11)
⊢M (𝑞 3 , 1)
⊢M (𝑞 3 , 𝜖).

So, we can write (𝑞 0 , 101011) ⊢∗M (𝑞 3 , 𝜖) and hence, M accepts 101011.


Now, we see a formal proof of Equation 1. Let 𝐿 = {𝑥 ∈ {0, 1}∗ : |𝑥| ≥ 3 and 3rd bit of x from the LHS is 1}.
We have to show that 𝐿 = L (M), that is, 𝐿 ⊆ L (M) and 𝐿 ⊇ L (M). We first show that 𝐿 ⊆ L (M). Let
𝑥 ∈ 𝐿 and ℓ = |𝑥|. Then, we write 𝑥 = 𝑥1 𝑥2 𝑥3 · · · 𝑥ℓ , where each 𝑥𝑖 ∈ Σ. Since 𝑥3 = 1, after reading the first
three symbols of 𝑥, the machine enters 𝑞 3 (final state) and remains there for all remaining input symbols
(according to the transition function). Therefore, 𝑥 ∈ L (M) and thus, 𝐿 ⊆ L (M). For the other direction,
assume that 𝑥 ∈ L (M). This means the machine must enter state 𝑞 3 after reading the third symbol 𝑥3 ,
which only occurs when 𝑥3 = 1. Therefore, 𝑥 ∈ 𝐿, and thus, L (M) ⊆ 𝐿. This proof might seem obvious;
however, this is not the case for other languages, and some exercises are given in Exercise 1.14.
Remark 1.7. The acceptance of a string by a DFA can also be defined as follows (thanks to [Sip14]).
A deterministic finite automaton M accepts 𝜔 if there exists a sequence of states 𝑞 0 , 𝑞 1 , . . . , 𝑞 𝑛 in 𝑄 with
three conditions.

1. 𝑞 0 = 𝑠,

2. 𝛿(𝑞 𝑖−1 , 𝜔𝑖 ) = 𝑞 𝑖 for 𝑖 ∈ [𝑛], and

3. 𝑞 𝑛 ∈ 𝐹.

Example 1.8. The DFA given in Figure 5 accepts the language {𝜔 ∈ {0, 1}∗ : #1’s is even and #0’s is even}.

𝐸𝐸 𝐸𝑂

0
1 1 1 1
0

𝑂𝐸 𝑂𝑂

Figure 5: Machine for recognizing strings having even number of 1’s and even number of 0’s.

Example 1.9. The DFA given in Figure 6 accepts the language

{𝜔 ∈ {0, 1}∗ : 𝜔 contains 101 as substring}.

Example 1.10. The DFA given in Figure 7 accepts the language

{𝜔 ∈ {0, 1}∗ : decimal value of 𝜔 is congruent to 2 modulo 3}.

5
0 1 0,1

𝑞0 1 𝑞1 0 𝑞2 1 𝑞3

Figure 6: Machine recognizes those strings that contain 101 as a substring.

1%3
1
0
1
0
0%3 2%3

0 1

Figure 7: Machine accepts those strings whose decimal value is congruent to 2 modulo 3.

Example 1.11. Let us design a deterministic finite automaton M that accepts the language

L (M) = {𝜔 ∈ {𝑎, 𝑏}∗ : 𝜔 does not contain three consecutive b’s}.

We let M = (𝑄, Σ, 𝛿, 𝑠, 𝐹), where 𝑄 = {𝑞 0 , 𝑞 1 , 𝑞 2 , 𝑞 3 }, Σ = {𝑎, 𝑏}, 𝑠 = 𝑞 0 , 𝐹 = {𝑞 0 , 𝑞 1 , 𝑞 2 } and 𝛿 is given in


the following table:

𝑞 𝜎 𝛿(𝑞, 𝜎)
𝑞0 𝑎 𝑞0
𝑞0 𝑏 𝑞1
𝑞1 𝑎 𝑞0
𝑞1 𝑏 𝑞2
𝑞2 𝑎 𝑞0
𝑞2 𝑏 𝑞3
𝑞3 𝑎 𝑞3
𝑞3 𝑏 𝑞3

The transition diagram of the machine is given in Figure 8.

𝑎 𝑎 𝑎, 𝑏

𝑞0 𝑏 𝑞1 𝑏 𝑞2 𝑏 𝑞3

Figure 8: Machine recognizes those strings that do not three consecutive b’s.

Example 1.12. The DFA given in Figure 9 accepts the language

{𝜔 ∈ {0, 1}∗ : sum of binary digits in 𝜔 is congruent to 2 modulo 3}.

6
1%3
1 1

0
0%3 2%3
1

0 0

Figure 9: Machine accepts those strings whose sum of binary digits is congruent to 2 modulo 3.

Exercise 1.13. Design DFAs for the following languages:

1. {𝜔 ∈ {0, 1}∗ : decimal value of 𝜔 is congruent to 0 modulo 3}.

2. {𝜔 ∈ {0, 1}∗ : sum of binary digits in 𝜔 is congruent to 0 modulo 3}.


Exercise 1.14. Establish the following results.

1. L (M) = {𝜔 ∈ Σ∗ : #0’s is odd}, where the DFA M is defined in Example 1.1.

2. L (M) = {𝜔 ∈ Σ∗ : if ℓ = |𝜔| > 1, then 𝜔2𝑖 = 0 ∀𝑖 ∈ [⌊ℓ/2⌋]} where the DFA M is defined in Example
1.2.

3. L (M) = {𝜔 ∈ {0, 1}∗ : #1’s is even and #0’s is even} where the DFA is given in Example 1.8.

4. L (M) = {𝜔 ∈ {0, 1}∗ : 𝜔 contains 101 as substring} where the DFA is defined in Example 1.9.

5. L (M) = {𝜔 ∈ {0, 1}∗ : decimal value of 𝜔 is congruent to 2 modulo 3} where the DFA is defined in
Example 1.10.

6. L (M) = {𝜔 ∈ {𝑎, 𝑏}∗ : 𝜔 does not contain three consecutive b’s} where the DFA is defined in Exam-
ple 1.11.

7. L (M) = {𝜔 ∈ {0, 1}∗ : sum of binary digits in 𝜔 is congruent to 2 modulo 3} where the DFA is
defined in Example 1.12.

While we have constructed several deterministic finite automata, there are still specific languages for
which DFAs need to be designed. Can we design DFAs for the languages Σ∗ , ∅, {𝜖 }, {𝜔} for any 𝜔 ∈ Σ∗ ?
Yes, the DFAs are presented below.

1. For 𝐿 = Σ∗ : The machine M is given by M = (𝑄, Σ, 𝛿, 𝑠, 𝐹), where 𝑄 = {𝑞 0 }, Σ = {𝑎, 𝑏}, 𝑠 = 𝑞 0 ,


𝐹 = {𝑞 0 } and 𝛿 : 𝑄 × Σ −→ 𝑄 is given by 𝛿(𝑞, 𝜎) = 𝑞 for all 𝑞 ∈ 𝑄 and 𝜎 ∈ Σ.

𝑎, 𝑏

𝑞0

2. For 𝐿 = ∅: The machine2 M is given by M = (𝑄, Σ, 𝛿, 𝑠, 𝐹), where 𝑄 = {𝑞 0 , 𝑞 1 }, Σ = {𝑎, 𝑏}, 𝑠 = 𝑞 0 ,


𝐹 = {𝑞 1 } and 𝛿 : 𝑄 × Σ −→ 𝑄 is given by 𝛿(𝑞, 𝜎) = 𝑞 for all 𝑞 ∈ 𝑄 and 𝜎 ∈ Σ.
2 Can you design a DFA with only one state for ∅?

7
𝑎, 𝑏 𝑎, 𝑏

𝑞0 𝑞1

3. For 𝐿 = {𝜖 }: The machine M is given by M = (𝑄, Σ, 𝛿, 𝑠, 𝐹), where 𝑄 = {𝑞 0 , 𝑞 1 }, Σ = {𝑎, 𝑏}, 𝑠 = 𝑞 0 ,


𝐹 = {𝑞 0 } and 𝛿 : 𝑄 × Σ −→ 𝑄 is given by 𝛿(𝑞 0 , 𝜎) = 𝑞 1 and 𝛿(𝑞 1 , 𝜎) = 𝑞 1 for all 𝜎 ∈ Σ.

𝑎, 𝑏

𝑎, 𝑏
𝑞0 𝑞1

4. For 𝐿 = {𝜔}, where 𝜔 ∈ {𝑎, 𝑏}∗ is a fixed string. Let 𝜔 = 𝜔1 · · · 𝜔 𝑛 . The machine M is given by
M = (𝑄, Σ, 𝛿, 𝑠, 𝐹), where 𝑄 = {𝑞 0 , 𝑞 1 , . . . , 𝑞 𝑛 , 𝑞 𝑛+1 }, Σ = {𝑎, 𝑏}, 𝑠 = 𝑞 0 , 𝐹 = {𝑞 𝑛 } and 𝛿 : 𝑄 × Σ −→ 𝑄
is given as follows.

• 𝛿(𝑞 𝑖−1 , 𝜔𝑖 ) = 𝑞 𝑖 for 1 ≤ 𝑖 ≤ 𝑛,


¯ 𝑖 ) = 𝑞 𝑛+1 for 1 ≤ 𝑖 ≤ 𝑛, where 𝜔
• 𝛿(𝑞 𝑖−1 , 𝜔 ¯ 𝑖 is any symbol of Σ other than 𝜔𝑖 , and
• 𝛿(𝑞, 𝜎) = 𝑞 𝑛+1 for all 𝑞 ∈ {𝑞 𝑛 , 𝑞 𝑛+1 } and 𝜎 ∈ Σ.

For example, a machine for 𝐿 = {𝑎𝑎𝑏𝑏} is given below:

𝑞0 𝑎 𝑞1 𝑎 𝑞2 𝑏 𝑞3 𝑏 𝑞4

𝑏 𝑏
𝑎
𝑎 𝑎, 𝑏

𝑞5

𝑎, 𝑏

Exercise 1.15. Design a DFA that accepts Σ+ .


Exercise 1.16. Do you think the number of states involved in the DFAs for Σ∗ , ∅, {𝜖 }, and {𝜔}, as defined
above, can be minimized? Justify your answer.

Definition 1.5 (Regular). A language 𝐿 ⊆ Σ∗ is said to be regular, if there exists a DFA M such that
L (M) = 𝐿.

So, the languages accepted by all the DFAs discussed so far are regular. In particular, the languages
Σ∗ , ∅, {𝜖 }, {𝜔} for any 𝜔 ∈ Σ∗ are regular.

Theorem 1.1. Regular languages are closed under union, intersection and complement.

Proof. Let 𝐿 1 and 𝐿 2 be two regular languages over the same alphabet Σ. So, there exist two DFAs M1
and M2 such that L (M1 ) = 𝐿 1 and L (M2 ) = 𝐿 2 . Let M1 = (𝑄 1 , Σ, 𝛿1 , 𝑠1 , 𝐹1 ) and M2 = (𝑄 2 , Σ, 𝛿2 , 𝑠2 , 𝐹2 ).

8
1. Union. We show that there exists a DFA M such that L (M) = 𝐿 1 ∪ 𝐿 2 . In fact, the DFA M
is defined using M1 and M2 as follows. M = (𝑄, Σ, 𝛿, 𝑠, 𝐹), where 𝑄 = 𝑄 1 × 𝑄 2 , 𝑠 = (𝑠1 , 𝑠2 ), 𝐹 =
(𝐹1 × 𝑄 2 ) ∪ (𝑄 1 × 𝐹2 ) and 𝛿 : 𝑄 × Σ −→ 𝑄 is given by 𝛿((𝑞 1 , 𝑞 2 ), 𝜎) = (𝛿1 (𝑞 1 , 𝜎), 𝛿2 (𝑞 2 , 𝜎)) for any
(𝑞 1 , 𝑞 2 ) ∈ 𝑄 and 𝜎 ∈ Σ. We claim that L (M) = L (M1 ) ∪ L (M2 ): exercise.

2. Intersection. We have to show that there exists a DFA M such that L (M) = 𝐿 1 ∩ 𝐿 2 . The description
of the DFA M is exactly the same as above, except 𝐹 = 𝐹1 ×𝐹2 . We claim that L (M) = L (M1 )∩L (M2 ):
exercise.

3. Complement. We have to show that 𝐿 1 is regular. It is easy to see that the DFA M1′ = (𝑄 1 , Σ, 𝛿1 , 𝑠1 , 𝐹1′ =
𝑄 1 \ 𝐹1 ) recognizes 𝐿 1 .

Remark 1.17. Cannot the case of intersection be handled using complement and union? Yes, in fact,
𝐿 1 ∩ 𝐿 2 = Σ∗ \ ((Σ∗ \ 𝐿 1 ) ∪ (Σ∗ \ 𝐿 2 )). A similar expression can be defined for 𝐿 1 ∪ 𝐿 2 .
Remark 1.18. The approach we used for union and intersection can be viewed as running both machines
in parallel on the same input string and then determining acceptance or rejection based on the state of
each machine.

Corollary 1.2. All finite languages over Σ are regular.

Proof. Hint: Induction and for each 𝜔 ∈ Σ∗ , {𝜔} is regular. 

Exercise 1.19. Let 𝐿 1 and 𝐿 2 be two regular languages. Show that 𝐿 1 \ 𝐿 2 (set difference) and 𝐿 1 Δ𝐿 2
(symmetric difference) are regular.

Remark 1.20. Note that the constructions used in the proof of Theorem 1.1 serve as a tool for designing
DFAs for complex regular languages. We illustrate this with some examples. Suppose we are interested in
designing DFAs for the following two languages:

• {𝜔 ∈ {0, 1}∗ : either 𝜔1 = 0 or 𝜔 contains at least one 1}.

• {𝜔 ∈ {0, 1}∗ : 𝜔1 = 0 and 𝜔 contains at least one 1}.

The above languages can be written as 𝐿 1 ∪ 𝐿 2 and 𝐿 1 ∩ 𝐿 2 respectively, where

• 𝐿 1 = {𝜔 ∈ {0, 1}∗ : 𝜔 contains at least one 1}, and

• 𝐿 2 = {𝜔 ∈ {0, 1}∗ : 𝜔1 = 0}.

So, if we can design DFAs for 𝐿 1 and 𝐿 2 , then using the construction rules in the proof of Theorem 1.1,
we can design DFAs for 𝐿 1 ∪ 𝐿 2 and 𝐿 1 ∩ 𝐿 2 . The details are given in Figure 10.
Exercise 1.21. Design DFAs for the following languages.

1. {𝑎𝑖 𝑏 𝑗 𝑐 𝑘 : 𝑖, 𝑗, 𝑘 ≥ 0}.

2. {𝜔 ∈ {0}∗ : 𝜔 = 02𝑘+1 for 𝑘 ≥ 0}.

3. {𝜔 ∈ {0}∗ : 𝜔 = 03𝑘 for 𝑘 ≥ 0}.

4. {𝜔 ∈ {𝑎, 𝑏}∗ : 𝜔 has at least three a’s and at least two b’s}.

9
Regular
Deterministic Finite Automata Comments
Languages
0 0, 1
𝐿1 Simple DFA M1
𝑝0 1 𝑝1

𝑞0 0 𝑞1 0, 1
𝐿2 1 Simple DFA M2
𝑞2 0, 1

0 0, 1
DFA for
𝐿1
𝑝0 1 𝑝1 Complement

( 𝑝0, 𝑞0 ) ( 𝑝1, 𝑞1 ) ( 𝑝 0, 𝑞 1 ) 0
1
𝐿1 ∪ 𝐿2 0, 1 DFA for Union
1 0
0, 1
1
0 ( 𝑝0, 𝑞2 ) ( 𝑝1, 𝑞2 ) ( 𝑝 1, 𝑞 0 )
1
0

( 𝑝0, 𝑞0 ) ( 𝑝1, 𝑞1 ) ( 𝑝 0, 𝑞 1 ) 0
1
DFA for
𝐿1 ∩ 𝐿2 0, 1
1 0 Intersection
0, 1
1
0 ( 𝑝0, 𝑞2 ) ( 𝑝1, 𝑞2 ) ( 𝑝 1, 𝑞 0 )
1

Figure 10: Illustrations of machines recognizing the union, intersection, and complement of given languages.

5. {𝜔 ∈ {0, 1}∗ : 𝜔 contains either even number of 0’s or even number of 1’s}.

6. {𝜔 ∈ {𝑎, 𝑏}∗ : 𝜔 does not contain the substring ab}.

7. {𝜔 ∈ {𝑎, 𝑏}∗ : 𝜔 contains neither ab nor ba}.

8. {𝜔 ∈ {𝑎, 𝑏}∗ : 𝜔 doesn’t contain exactly two a’s}.

9. {𝜔 ∈ {0, 1}∗ : 𝜔 begins with 1 and ends with 0}.

10. {𝜔 ∈ {𝑎, 𝑏}∗ : 𝜔 contains at least three a’s}.

11. {𝜔 ∈ {0, 1}∗ : decimal value of 𝜔 is congruent to 1 modulo 4}.

12. {𝜔 ∈ {0, 1}∗ : number of 1’s is multiple of 5}.

13. {𝜔 ∈ {0, 1}∗ : sum of binary digits of 𝜔 is congruent to 2 modulo 4}.


Í 

14. {𝜔 ∈ {0, 1}∗ : if ℓ = |𝜔| ≥ 1, then the value of 𝑖=1 𝑖 %2 is 1}.
𝜔

10
Exercise 1.22. Suppose a DFA has 𝑛 many states and 𝑚 many symbols in its alphabet. How many bits
are required to store the transition function of the DFA?
Remark 1.23. In the context of DFA, there is no difference between a decider and a recognizer. So, given
any regular language 𝐿, we can find a DFA M that decides 𝐿. Further, given any string 𝜔, the machine M
decides 𝜔 in linear time in |𝜔|.

2 Nondeterministic Finite Automata


Before introducing somewhat powerful machines, nondeterministic finite automata (NFAs), let us see some
difficulties in deterministic finite automata’s description. The following DFA accepts those strings whose
2nd bit from the LHS is 1.

0,1

0, 1 1
𝑞0 𝑞1 𝑞2

𝑞3 0,1

The illustration of this machine is simple. After scanning the 2nd input symbol, the machine can decide3
whether the input string is accepted or not. But, if we want to design a DFA for accepting those strings
whose 2nd bit from the RHS is 1, the same mechanism will not work. While executing on a valid4 input
string, the machine should be in a state such that whenever the last symbol appears, it must move to
the final/accepting state(s). Basically, we consider four states labeled with 00, 01, 10 and 11. The state
labeled with 00 means that the last two scanned symbols are 0. Similarly, the state labeled with 10 means
the last two scanned symbols are 1 and 0 respectively. In other words, the states keep track of the last
two symbols scanned by the machine. Now look at the machine given below. At any given time, if the
machine is at state 𝑎𝑏, where 𝑎, 𝑏 ∈ {0, 1}, then the last two symbols scanned were a and b, respectively.
Therefore, when the DFA finishes scanning and is in state 10 or 11, the input string is accepted.

1
00 01
1

0 1
0

10 11 1
0

Claim. Let 𝐿 = {𝜔 ∈ {0, 1} : 2nd bit of 𝜔 is 1 from the RHS}. Any DFA that accepts the language 𝐿
must have at least 4 states. Exercise.

3 But, the decision will be made after reading all the input symbols.
4 It means the 2nd bit is 1 from the RHS.

11
Exercise 2.1. Design a DFA that accepts those strings whose 3rd bit from the RHS is 1. Show that your
DFA will have at least 8 states.

Similarly, we can argue that the DFAs that accept the language

{𝜔 ∈ {0, 1} : n-th bit of 𝜔 is 1 from the RHS}

will have at least 2𝑛 states.


We now propose a special type of finite state machine N (given below) which can recognize the language:

{𝜔 ∈ {0, 1} : 3-th bit of 𝜔 from the RHS is 1}

using only four states instead of 8 states. Of-course, we have to define what is the sense of accepting a
string by this special machine.

0,1

1 0, 1 0, 1
𝑞0 𝑞1 𝑞2 𝑞3

Notice that this machine differs from DFAs exactly in state transitions. When the machine at 𝑞 0 and
input symbol is 1, the machine has two options for its next state, either 𝑞 0 or 𝑞 1 . Note that in case of
DFA, given any string, whether it is valid or invalid, the DFA always follows a unique transition path.
But for this special machine, transition path may not be unique. For this reason, this type of machine is
called nondeterministic finite automaton (NFA). For example, suppose the input string is 00100. Then the
machine could follow any of the two different paths:

1. (𝑞 0 , 00100) ⊢N (𝑞 0 , 0100) ⊢N (𝑞 0 , 100) ⊢N (𝑞 1 , 00) ⊢N (𝑞 2 , 0) ⊢N (𝑞 3 , 𝜖) (acceptance path).

2. (𝑞 0 , 00100) ⊢N (𝑞 0 , 0100) ⊢N (𝑞 0 , 100) ⊢N (𝑞 0 , 00) ⊢N (𝑞 0 , 0) ⊢N (𝑞 0 , 𝜖) (rejection path).

So, for the same input string the machine N sometimes reaches a final state (𝑞 3 ) and sometimes a non-final
state (𝑞 0 ). Although 00100 is a valid string, sometimes N accepts it and sometimes rejects it. But, for a
valid string, the machine always has at least on path from the initial state to final state (called acceptance
path). We say a machine of above kind accepts a strings if and only if there exists at least one acceptance
path. Therefore, the above machine accepts the string 00100. On the other hand, for any invalid input
string, the machine never reaches the final state 𝑞 3 after scanning all its input symbols. You might wonder,
what will be the situation if the machine operates on an invalid string, say, 1011? In one case, the machine
always stays at state 𝑞 0 after successfully scanning all the symbols. In this case, we say the machine halts.
In other case, the machine reaches out to 𝑞 3 after scanning 101 and hangs out at there, because transition
is not defined when the machine is at state 𝑞 3 . So, in this case, machine does not halt. Hence, in either
situation, the machine does not accept 1011 as required.
Before going to formal definition of NFA, let us look at another example. Consider a machine for the
language 𝐿 = {𝑎𝑏𝑎, 𝑎𝑏}∗ . It can be shown that the following DFA accepts 𝐿.

12
𝑎
𝑞0 𝑎 𝑞1 𝑏 𝑞2 𝑎 𝑞3

𝑏
𝑏
𝑎
𝑏

𝑞4

𝑎, 𝑏

The above DFA looks a bit complicated. But, its nondeterministic counterpart (see Figure 11) has simple
description.

𝑏
𝑞0 𝑞1
𝑎

𝑎 𝑏
𝑞2

Figure 11: NFA accepting 𝐿 = {𝑎𝑏𝑎, 𝑎𝑏}∗

Although we have not introduced the formal definition of NFA, one can easily see that a DFA can be
treated as an NFA. Let us now pose a question. How much more powerful are NFAs than their deterministic
counterparts? Do some NFAs recognize some languages that are not recognized by any DFAs? The answer
is negative. We see the answer concretely as we progress.

Definition 2.1 (NFA). A nondeterministic finite automaton (NFA) is a quintuple N = (𝑄, Σ, 𝛿, 𝑠, 𝐹), where

1. 𝑄 is the finite set of states,

2. Σ is the finite alphabet,

3. 𝑠 ∈ 𝑄 is the initial state,

4. 𝐹 ⊆ 𝑄 is the set of final states, and

5. 𝛿 is the transition function5 from 𝑄 × (Σ ∪ {𝜖 }) to P (𝑄), where P (𝑄) denotes the power set of 𝑄.

Remark 2.2. Unlike DFAs, for a given input symbol, NFAs can have more than one next state and that’s
why the name “nondeterminism” is attached to this kind of machine. Essentially, the next states are
represented by a subset of 𝑄. It may also happen that machine does not move at all, which is captured by
∅. For example, see (below) the transition table of the NFA in Figure 11.

a b
𝑞0 {𝑞 1 } ∅
𝑞1 ∅ {𝑞 0 , 𝑞 2 }
𝑞2 {𝑞 0 } ∅
5 In Lewis and Papadimitriou [LP06], a transition relation, i.e., a subset of 𝑄 × (Σ ∪ {𝜖 }) × 𝑄 is considered.

13
Further, under a special type of transition, called 𝜖-transition, a machine can move to some state(s)
without consuming any symbol. When a machine has at least one 𝜖-transition, the machine is referred to
as 𝜖-NFA. The transition table of the 𝜖-NFA in Figure 12 is given below. What language does this 𝜖-NFA
accept?
0 1 𝜖
𝑞0 {𝑞 0 } {𝑞 0 , 𝑞 1 } ∅
𝑞1 {𝑞 2 } ∅ {𝑞 2 }
𝑞2 ∅ {𝑞 3 } ∅
𝑞3 {𝑞 3 } {𝑞 3 } ∅

0,1 0,1

1 0, 𝜖 1
𝑞0 𝑞1 𝑞2 𝑞3

Figure 12: NFA with an 𝜖-transition.

𝑞0

𝑞0

1
1 1

𝜖
𝑞0 𝑞1 𝑞2

0 0

𝑞0 𝑞2

1 1 1
1

𝜖
𝑞0 𝑞1 𝑞2 𝑞3

1
1 1 1 1

𝜖
𝑞0 𝑞1 𝑞2 𝑞3 𝑞3

0 0 0 0

𝑞0 𝑞2 𝑞3 𝑞3

Figure 13: Different computation paths of the NFA defined in Figure 12 on input 010110.

As mentioned earlier, an NFA may follow different paths on the same input, in contrast to DFAs.
When the machine reaches out to a final state after scanning all its input symbols, we say that the NFA
accepts the input string. It may be possible that machine enters a final state following different paths of
computation. For example, Figure 13 shows the computation of the NFA (defined in Figure 12) on the
input string 010110. For each leaf node, there is exactly one computation path from the initial state. Some
of the paths are incomplete as the next state is not defined. After scanning the input 010110, the NFA
reaches out to the final state 𝑞 3 following two different paths (shown by bold arrow).

14
Exercise 2.3. Let N be an NFA and 𝜔 be an input string of length 𝑛. How many computation paths by
the NFA N on 𝜔 are possible?

Definition 2.2 (Configuration of NFA). A configuration of N is an element of 𝑄 × Σ∗ . The relation ⊢N


between two configurations (yields in one step) is defined as follows: (𝑞, 𝜔) ⊢N (𝑞 ′, 𝜔 ′) if and only if
there is a 𝑢 ∈ Σ ∪ {𝜖 } such that 𝑤 = 𝑢𝑤 ′ and 𝑞 ′ ∈ 𝛿(𝑞, 𝑢). Note that in this case, ⊢N is not a function! As
before, ⊢∗N is the reflexive transitive closure of ⊢N . A string 𝜔 ∈ Σ∗ is accepted by N if and only if there is
a state 𝑞 ∈ 𝐹 such that (𝑠, 𝜔) ⊢∗N (𝑞, 𝜖). Similar to DFA, the language accepted by an NAF N, denoted by
L (N), is defined as L (N) = {𝜔 ∈ Σ∗ : (𝑠, 𝜔) ⊢∗N (𝑞, 𝜖) for some 𝑞 ∈ 𝐹}.

Remark 2.4. We have seen that a DFA always halts after reading all the symbols of the input. Does an
NFA always halt on an input? Answer is no. We already have seen some cases where NFA does not halt,
for example, the NFA in Figure 11. Indeed, the head of the machine will stuck at state 𝑞 1 indefinitely if the
current scan symbol is 𝑎. Can you modify a given NFA to make it halt on any input? Asnwer: Exercise.
Exercise 2.5. Solve the following problems, where Σ = {0, 1}.

1. Let 𝐿 = {𝜖, 0}.

(a) Can you design a DFA M with a single final state such that L (M) = 𝐿?
(b) Can you design a DFA M using only two states such that L (M) = 𝐿?
(c) Design an NFA N such that L (N) = 𝐿.
(d) Can you design an NFA N with a single final state such that L (N) = 𝐿?

2. Let 𝐿 = {0, 00}.

(a) Can you design a DFA M with a single final state such that L (M) = 𝐿?
(b) Can you design a DFA M using two or three states such that L (M) = 𝐿?
(c) Can you design an NFA N with a single final state such that L (N) = 𝐿?
(d) Can you design an NFA N using two states such that L (N) = 𝐿?

Definition 2.3 (Equivalent). Two finite automata N1 and N2 (deterministic or nondeterministic) are
equivalent if and only if L (N1 ) = L (N2 ).

Theorem 2.1. For each nondeterministic finite automaton, there is an equivalent deterministic finite
automaton.

Proof Idea. The key idea is to view a nondeterministic finite automaton at any moment, not a single
state but a set of states: namely, all the states that can be reached from the initial state by means of
the input consumed so far. We discuss the construction idea in two cases, NFA without 𝜖-transition and
𝜖-NFA.

1. Without 𝜖-transition. We illustrate this case using the NFA N in Figure 11 whose formal description
is given as follows. N = (𝑄, Σ, 𝛿, 𝑠, 𝐹), where 𝑄 = {𝑞 1 , 𝑞 1 , 𝑞 2 }, Σ = {𝑎, 𝑏}, 𝑠 = 𝑞 0 , 𝐹 = {𝑞 0 } and the
transition function 𝛿 : 𝑄 × Σ −→ P (𝑄) is given by 𝛿(𝑞 0 , 𝑎) = {𝑞 1 }, 𝛿(𝑞 0 , 𝑏) = ∅, 𝛿(𝑞 1 , 𝑎) = ∅,
𝛿(𝑞 1 , 𝑏) = {𝑞 1 , 𝑞 2 }, 𝛿(𝑞 2 , 𝑎) = {𝑞 0 }, and 𝛿(𝑞 2 , 𝑎) = ∅. Recall that a string 𝜔 is accepted by an NFA
if there exists at least one acceptance path from the initial state to an final state. Let us look at
different paths for the string 𝑎𝑏𝑎. Basically, there are two paths.

1. (𝑞 0 , 𝑎𝑏𝑎) ⊢N (𝑞 1 , 𝑏𝑎) ⊢N (𝑞 0 , 𝑎) ⊢N (𝑞 1 , 𝜖) (rejection path).

15
2. (𝑞 0 , 𝑎𝑏𝑎) ⊢N (𝑞 1 , 𝑏𝑎) ⊢N (𝑞 2 , 𝑎) ⊢N (𝑞 0 , 𝜖) (acceptance path).

We have to make sure the corresponding DFA M will have a unique path and that if a string is
valid, then one of the final states must be reachable from the initial state after consuming all input
symbols. Note that if the NFA starts from any state, then machine may have many reachable states
after scanning the input string. For example, the above NFA reaches 𝑞 0 and 𝑞 1 after scanning
the strings “aba”. If we consider the state of the underlying DFA M to be a subset of 𝑄, then
we see that the machine actually reaches the state {𝑞 0 , 𝑞 1 } from {𝑞 0 } (initial state). If the new
machine M has to accept the string “aba”, then {𝑞 0 , 𝑞 1 } must have to be one of the final states.
In other words, considering the general case, a state 𝐴 ⊆ 𝑄 is a final state of M if and only if
𝐴 contains at least one final state of the given NFA N. Therefore, the target DFA is given by
M = (𝑄 ′ = P (𝑄), Σ, 𝛿 ′, 𝑠 ′ = {𝑞 0 }, 𝐹 ′), where 𝐹 ′ = {𝐴 ⊆ 𝑄 : 𝐴 ∩ 𝐹 ≠ ∅} and the transition function
𝛿 ′ : 𝑄 ′ × Σ −→ 𝑄 ′ is defined by 𝛿 ′ (𝑃, 𝜎) = ∪ 𝛿( 𝑝, 𝜎) for all 𝑃 ∈ 𝑄 ′ and 𝜎 ∈ Σ. The transition
𝑝 ∈𝑃
diagram and transition table of the DFA M are given below. Note that the states {𝑞 2 }, {𝑞 1 , 𝑞 2 } and
{𝑞 0 , 𝑞 1 , 𝑞 2 } are redundant.

a b
∅ ∅ ∅
{𝑞 0 } {𝑞 1 } ∅
{𝑞 1 } ∅ {𝑞 0 , 𝑞 2 }
{𝑞 2 } {𝑞 0 } ∅
{𝑞 0 , 𝑞 1 } {𝑞 1 } {𝑞 0 , 𝑞 2 }
{𝑞 0 , 𝑞 2 } {𝑞 0 , 𝑞 1 } ∅
{𝑞 1 , 𝑞 2 } {𝑞 0 } {𝑞 0 , 𝑞 2 }
{𝑞 0 , 𝑞 1 , 𝑞 2 } {𝑞 0 , 𝑞 1 } {𝑞 0 , 𝑞 2 }

𝑎 𝑏
𝑎
{𝑞 0 } {𝑞 1 } {𝑞 2 } {𝑞 0 , 𝑞 1 } {𝑞 0 , 𝑞 2 }
𝑎

𝑏 𝑎
𝑎 𝑎 𝑏
𝑏 𝑏 𝑏

𝑎, 𝑏 ∅ {𝑞 0 , 𝑞 1 , 𝑞 2 } {𝑞 1 , 𝑞 2 }

2. With 𝜖-transition. Here, additionally, we have to handle 𝜖-transition. Recall that without consuming
any symbol, the machine may move from one state to another (or multiple states). It may also
happen that at the beginning the machine could apply 𝜖-transition. To tackle 𝜖-transition, we define

16
an 𝜖-closure of a state 𝑞, denoted by 𝐸 (𝑞), as follows. 𝐸 (𝑞) = {𝑝 ∈ 𝑄 : (𝑞, 𝜖) ⊢∗N ( 𝑝, 𝜖)}. Note
that for any 𝑞 ∈ 𝑄, 𝑞 ∈ 𝐸 (𝑞). The initial state for the corresponding DFA M will be 𝐸 (𝑞 0 ). The
set of states of M is 𝑄 ′ = P (𝑄) as above. For the NFA in Figure 12, it can be checked that
𝐸 (𝑞 0 ) = {𝑞 0 }, 𝐸 (𝑞 1 ) = {𝑞 1 , 𝑞 2 }, 𝐸 (𝑞 2 ) = {𝑞 2 } and 𝐸 (𝑞 3 ) = {𝑞 3 }. The machine may apply 𝜖-transition
after scanning any symbol. For example, the NFA in Figure 12, 𝛿(𝑞 0 , 1) = {𝑞 0 , 𝑞 1 , 𝑞 2 }, where 𝑞 2 is
reached out by applying 𝜖-transition. Therefore, the state transition of the corresponding DFA has
to take care the 𝜖-transition. In fact, the transition rule for M is defined as follows. For each 𝑃 ⊆ 𝑄
and each symbol 𝜎 ∈ Σ, we define

𝛿 ′ (𝑃, 𝜎) = ∪ {𝑟 : 𝑟 ∈ 𝐸 (𝑞) and 𝑞 ∈ 𝛿( 𝑝, 𝜎)}.


𝑝 ∈𝑃

The transition table of the DFA M corresponding to the NFA in Figure 12 is described below.

0 1
∅ ∅ ∅
{𝑞 0 } {𝑞 0 } {𝑞 0 , 𝑞 1 , 𝑞 2 }
{𝑞 1 } {𝑞 2 } ∅
{𝑞 2 } ∅ {𝑞 3 }
{𝑞 3 } {𝑞 3 } {𝑞 3 }
{𝑞 0 , 𝑞 1 } {𝑞 0 , 𝑞 2 } {𝑞 0 , 𝑞 1 , 𝑞 2 }
{𝑞 0 , 𝑞 2 } {𝑞 0 } {𝑞 0 , 𝑞 1 , 𝑞 2 , 𝑞 3 }
{𝑞 0 , 𝑞 3 } {𝑞 0 , 𝑞 3 } {𝑞 0 , 𝑞 1 , 𝑞 2 , 𝑞 3 }
{𝑞 1 , 𝑞 2 } {𝑞 2 } {𝑞 3 }
{𝑞 1 , 𝑞 3 } {𝑞 2 , 𝑞 3 } {𝑞 3 }
{𝑞 2 , 𝑞 3 } {𝑞 3 } {𝑞 3 }
{𝑞 0 , 𝑞 1 , 𝑞 2 } {𝑞 0 , 𝑞 2 } {𝑞 0 , 𝑞 1 , 𝑞 2 , 𝑞 3 }
{𝑞 0 , 𝑞 1 , 𝑞 3 } {𝑞 0 , 𝑞 2 , 𝑞 3 } {𝑞 0 , 𝑞 1 , 𝑞 2 , 𝑞 3 }
{𝑞 0 , 𝑞 2 , 𝑞 3 } {𝑞 0 , 𝑞 3 } {𝑞 0 , 𝑞 1 , 𝑞 2 , 𝑞 3 }
{𝑞 1 , 𝑞 2 , 𝑞 3 } {𝑞 2 , 𝑞 3 } {𝑞 3 }
{𝑞 0 , 𝑞 1 , 𝑞 2 , 𝑞 3 } {𝑞 0 , 𝑞 2 , 𝑞 3 } {𝑞 0 , 𝑞 1 , 𝑞 2 , 𝑞 3 }

Proof. We are now ready to give a formal proof of Theorem 2.1. Let N = (𝑄, Σ, 𝛿, 𝑠, 𝐹) be an NFA. We will
construct a DFA M = (𝑄 ′, Σ, 𝛿 ′, 𝑠 ′, 𝐹 ′) that is equivalent to N.
Construction. For any state 𝑞 ∈ 𝑄, let 𝐸 (𝑞) be the set of all states of N that are reachable from
state 𝑞 without reading any input symbol. That is, 𝐸 (𝑞) = {𝑝 ∈ 𝑄 : (𝑞, 𝜖) ⊢∗N ( 𝑝, 𝜖)}. The above set is
nonempty as 𝑞 ∈ 𝐸 (𝑞). The corresponding DFA M = (𝑄 ′, Σ, 𝛿 ′, 𝑠 ′ , 𝐹 ′) is given by: 𝑄 ′ = 2𝑄 , 𝑠 ′ = 𝐸 (𝑠),
𝐹 ′ = {𝐴 ⊆ 𝑄 : 𝐴 ∩ 𝐹 ≠ ∅}, and for each 𝑃 ⊆ 𝑄 and each symbol 𝜎 ∈ Σ, define

𝛿 ′ (𝑃, 𝜎) = ∪ {𝑟 : 𝑟 ∈ 𝐸 (𝑞) and 𝑞 ∈ 𝛿( 𝑝, 𝜎)}.


𝑝 ∈𝑃

Equivalent of N and M will immediately follow from the following claim.


Claim. For any string 𝜔 ∈ Σ∗ and any states 𝑝, 𝑞 ∈ 𝑄,

(𝑞, 𝜔) ⊢∗N ( 𝑝, 𝜖) if and only if (𝐸 (𝑞), 𝜔) ⊢∗M (𝑃, 𝜖)

for some set 𝑃 ⊆ 𝑄 containing 𝑝.


Proof of the claim. We prove the claim by induction on |𝜔|.

17
Base Step. For |𝜔| = 0, that is, for 𝜔 = 𝜖 we must show that (𝑞, 𝜖) ⊢∗N ( 𝑝, 𝜖) if and only if (𝐸 (𝑞), 𝜖) ⊢∗M
(𝑃, 𝜖) for some set 𝑃 containing 𝑝. The first statement is equivalent to saying that 𝑝 ∈ 𝐸 (𝑞). Since M is
deterministic, the second statement is equivalent to saying that 𝑃 = 𝐸 (𝑞) and 𝑃 contains 𝑝. This completes
the proof of the base step.

Induction Hypothesis. Suppose that the claim is true for all strings 𝜔 of length 𝑘 or less for some
𝑘 ≥ 0.

Induction Step. We prove the claim for any string 𝜔 of length 𝑘 + 1. Let 𝜔 = 𝑣𝑎, where 𝑎 ∈ Σ and
𝑣 ∈ Σ∗ .
(=⇒) Suppose that (𝑞, 𝜔) ⊢∗N ( 𝑝, 𝜖). Then there are states 𝑟 1 and 𝑟 2 such that

(𝑞, 𝜔) ⊢∗N (𝑟 1 , 𝑎) ⊢N (𝑟 2 , 𝜖) ⊢∗N ( 𝑝, 𝜖).

That is, N reaches state 𝑟 1 from state 𝑞 by some number of moves during which input 𝑣 is read, followed
by one move during which input 𝑎 is read, followed by some number of moves during which no input is
read. Now (𝑞, 𝑣𝑎) ⊢∗N (𝑟 1 , 𝑎) is equivalent to (𝑞, 𝑣) ⊢∗N (𝑟 1 , 𝜖). Since |𝑣| = 𝑘, by the induction hypothesis
(𝐸 (𝑞), 𝑣) ⊢∗M (𝑅1 , 𝜖) for some set 𝑅1 containing 𝑟 1 . Since (𝑟 1 , 𝑎) ⊢N (𝑟 2 , 𝜖), we have 𝛿(𝑟 1 , 𝑎) = 𝑟 2 , and
hence by the construction of M, 𝐸 (𝑟 2 ) ⊆ 𝛿 ′ (𝑅1 , 𝑎). But since (𝑟 2 , 𝜖) ⊢∗N ( 𝑝, 𝜖), it follows that 𝑝 ∈ 𝐸 (𝑟 2 ),
and therefore 𝑝 ∈ 𝛿 ′ (𝑅1 , 𝑎). Therefore, (𝑅1 , 𝑎) ⊢M (𝑃, 𝜖) for some 𝑃 (= 𝛿 ′ (𝑅1 , 𝑎)) containing 𝑝, and thus
(𝐸 (𝑞), 𝑣𝑎) ⊢∗M (𝑅1 , 𝑎) ⊢M (𝑃, 𝜖).
(⇐=) To prove the other direction, suppose that (𝐸 (𝑞), 𝑣𝑎) ⊢∗M (𝑅1 , 𝑎) ⊢M (𝑃, 𝜖) for some 𝑃 containing 𝑝 and
some 𝑅1 such that 𝛿 ′ (𝑅1 , 𝑎) = 𝑃. Since 𝑝 ∈ 𝑃 = 𝛿 ′ (𝑅1 , 𝑎), there is 𝑟 1 ∈ 𝑅1 and a transition 𝛿(𝑟 1 , 𝑎) = 𝑟 2 of
N such that 𝑝 ∈ 𝐸 (𝑟 2 ). Then (𝑟 2 , 𝜖) ⊢∗N ( 𝑝, 𝜖) by the definition of 𝐸 (𝑟 2 ). Also, by the induction hypothesis,
(𝑞, 𝑣) ⊢∗N (𝑟 1 , 𝜖) and therefore (𝑞, 𝑣𝑎) ⊢∗N (𝑟 1 , 𝑎) ⊢N (𝑟 2 , 𝜖) ⊢∗N ( 𝑝, 𝜖). 

Corollary 2.2. A language is regular if and only if some NFA accepts it.

Exercise 2.6. Convert the following NFAs to the corresponding DFAs using the construction involved in
the proof of Theorem 2.1.

𝑞1 0
𝜖
𝑞0 0,1
𝜖
1 0, 1 0, 1
𝑞2 1 𝑞0 𝑞1 𝑞2 𝑞3

0, 1 𝑎 𝑏 𝑐

1, 𝜖 0 𝜖 𝜖
𝑞0 𝑞1 𝑞2 𝑞0 𝑞1 𝑞2

Exercise 2.7. Given a finite state automaton M, construct a finite state machine N with only a single
final/accepting state such that L (N) = L (M). Hint: 𝜖-transition.

Theorem 2.3. Regular languages are closed under union, concatenation, intersection, Kleene star, reversal
and complement.

18
Proof. Let 𝐿 1 and 𝐿 2 be two regular languages. Let N1 = (𝑄 1 , Σ, 𝛿1 , 𝑠1 , 𝐹1 ) and N2 = (𝑄 2 , Σ, 𝛿2 , 𝑠2 , 𝐹2 )
be two NFAs6 such that L (N1 ) = 𝐿 1 and L (N2 ) = 𝐿 2 . Without loss of generality, we can assume that
𝑄 1 ∩ 𝑄 2 = ∅.

1. Union. We shall construct a nondeterministic finite automaton N such that L (N) = 𝐿 1 ∪ 𝐿 2 . The
NFA N is given by N = (𝑄, Σ, 𝛿, 𝑠, 𝐹), where 𝑠 is a new state (i.e., 𝑠 ∉ 𝑄 1 ∪ 𝑄 2 ), 𝑄 = 𝑄 1 ∪ 𝑄 2 ∪ {𝑠},
𝐹 = 𝐹1 ∪ 𝐹2 , and for any 𝑞 ∈ 𝑄 and 𝜎 ∈ Σ ∪ {𝜖 }, the transition function is defined by

 𝛿1 (𝑞, 𝜎) if 𝑞 ∈ 𝑄1




 𝛿 (𝑞, 𝜎)
 2 if 𝑞 ∈ 𝑄2
𝛿(𝑞, 𝜎) =

 {𝑠1 , 𝑠2 } if 𝑞 = 𝑠 and 𝜎 = 𝜖



∅ if 𝑞 = 𝑠 and 𝜎 ≠ 𝜖 .

A pictorial representation is given below.

𝑠1
N1 𝑠1 𝜖

N 𝑠

𝜖 𝑠2
N2 𝑠2

2. Concatenation. We construct a nondeterministic finite automaton N such that L (N) = 𝐿 1 𝐿 2 . In


fact, the NFA is defined as N = (𝑄, Σ, 𝛿, 𝑠, 𝐹), where 𝑄 = 𝑄 1 ∪ 𝑄 2 , 𝐹 = 𝐹2 , 𝑠 = 𝑠1 , and for any 𝑞 ∈ 𝑄
and 𝜎 ∈ Σ ∪ {𝜖 }, the transition function is given by:

 𝛿1 (𝑞, 𝜎) if 𝑞 ∈ 𝑄 1 and 𝑞 ∉ 𝐹1




 𝛿 (𝑞, 𝜎)
 1 if 𝑞 ∈ 𝐹1 and 𝜎 ≠ 𝜖
𝛿(𝑞, 𝜎) =

 𝛿1 (𝑞, 𝜎) ∪ {𝑠2 } if 𝑞 ∈ 𝐹1 and 𝜎 = 𝜖



 𝛿2 (𝑞, 𝜎) if 𝑞 ∈ 𝑄2 .

For easy illustration, see the diagram below.
6 The proof also works if 𝐿 1 and 𝐿 2 are represented by DFAs. For handling complement and intersection, the languages are
represented by DFAs.

19
N1 N2

𝜖
N
𝜖

3. Kleene star. We construct a nondeterministic finite automaton N such that L (N) = (𝐿 1 ) ∗ . Indeed,
N is defined as follows: 𝑄 = 𝑄 1 ∪ {𝑠}, 𝑠 is the new start state, 𝐹 = 𝐹1 ∪ {𝑠} and for any 𝑞 ∈ 𝑄 and
𝜎 ∈ Σ ∪ {𝜖 }, the transition function is given by:

 𝛿1 (𝑞, 𝜎) if 𝑞 ∈ 𝑄 1 and 𝑞 ∉ 𝐹1





 𝛿 (𝑞, 𝜎) if 𝑞 ∈ 𝐹1 and 𝜎 ≠ 𝜖
 1

𝛿(𝑞, 𝜎) = 𝛿1 (𝑞, 𝜎) ∪ {𝑠1 } if 𝑞 ∈ 𝐹1 and 𝜎 = 𝜖



 {𝑠1 } if 𝑞 = 𝑠 and 𝜎 = 𝜖



∅
 if 𝑞 = 𝑠 and 𝜎 ≠ 𝜖 .

For easy illustration, see the diagram below.

𝜖
𝜖
N1 𝑠1 N 𝑠 𝑠1
𝜖

4. Complement. Follows Theorem 1.1.

5. Intersection. Follows Theorem 1.1.

6. Reversal. For simplicity, let N1 = (𝑄 1 , Σ, 𝛿1 , 𝑠1 , 𝐹1 ) be a DFA such that L (N) = 𝐿 1 . We will construct
an NFA N such that L (N) = (𝐿 1 ) 𝑟 (recall that (𝐿 1 ) 𝑟 denotes the reversal language of 𝐿 1 ). In fact,
the NFA is designed as follows. N = (𝑄, Σ, 𝛿, 𝑠, 𝐹), where 𝑠 is a new state, 𝑄 = 𝑄 1 ∪ {𝑠}, 𝐹 = {𝑠1 } and
for any 𝑞 ∈ 𝑄 and 𝜎 ∈ Σ ∪ {𝜖 }, the transition function is given by
(
𝛿−1 (𝑞, 𝜎) if 𝑞 ∈ 𝑄 1
𝛿(𝑞, 𝜎) = 1
∅ if 𝑞 = 𝑠,

where 𝛿1−1 (𝑞, 𝜎) = {𝑝 ∈ 𝑄 1 : 𝛿1 ( 𝑝, 𝜎) = 𝑞}.


See a pictorial illustration below:

20
𝑎 𝑎 𝜖
N1 𝑠1 N 𝑠1 𝑠

𝑏 𝑏 𝜖

Exercise 2.8. For simplicity of reversal treatment, we describe the input language using a DFA. Now,
consider the language description using an NFA and illustrate the transition function of the target NFA.
Remark 2.9. The constructions involved in the proof of Theorem 2.3 can be used to design NFAs for
complicated regular languages. Here we illustrate some examples. Suppose we are interested in designing
NFAs for the following two languages:

• {𝜔 ∈ {0, 1}∗ : 𝜔 starts with 11 or ends with 11}

• {𝜔 ∈ {0, 1}∗ : 𝜔 contains 1111}.

Although we know how to design NFAs directly for the above languages, we use the different construction
ideas to design the underlying NFAs. The above languages can be written as 𝐿 1 ∪ 𝐿 2 and 𝐿 2 𝐿 1 respectively,
where

• 𝐿 1 = {𝜔 ∈ {0, 1}∗ : 𝜔 starts with 11} and

• 𝐿 2 = {𝜔 ∈ {0, 1}∗ : 𝜔 ends with 11}.

So, if we can design DFAs for 𝐿 1 and 𝐿 2 , then using the construction involved in above proof, we can
design NFAs for 𝐿 1 ∪ 𝐿 2 and 𝐿 2 𝐿 1 . Also, we show how to design an NFA for (𝐿 3 ) ∗ , where 𝐿 3 = {00, 11}.
The details are given in Figure 14.

Exercise 2.10. Design NFAs for the following languages.

1. {𝜔 ∈ {𝑎, 𝑏}∗ : 𝜔 has at least three a’s and at least two b’s}.

2. {𝜔 ∈ {0, 1}∗ : 𝜔 starts with 0 and ends with 11}.

3. {𝜔 ∈ {0, 1}∗ : 𝜔 contains either even number of 0’s or even number of 1’s}.

4. {𝜔 ∈ {𝑎, 𝑏}∗ : 𝜔 contains neither ab nor ba}.

5. {𝜔 ∈ {0, 1}∗ : 𝜔 begins with 1 and ends with 0}.

6. {𝜔 ∈ {0, 1}∗ : decimal value of 𝜔 is congruent to either 1 modulo 4 or 2 modulo 3}.

7. {𝜔 ∈ {0, 1}∗ : number of 1’s is either multiple of 5 or multiple of 2}.

8. {𝜔 ∈ {0, 1}∗ : sum of binary digits of 𝜔 is congruent to either 2 modulo 4 and 2 modulo 3}.

9. {𝑎 𝑛 𝜔𝑏 𝑛 : 𝜔 ∈ {𝑎, 𝑏}∗ and 1 ≤ 𝑛 ≤ 𝑘}, where 𝑘 ∈ N is a fixed integer.

10. {𝑎 𝑛 𝜔𝑏 𝑛 : 𝜔 ∈ {𝑎, 𝑏}∗ and 𝑛 ≥ 1}.

21
Regular
Nondeterministic Finite Automata Comments
Languages
0, 1
𝐿1 Simple NFA N1
𝑝0 1 𝑝1 1 𝑝2

𝑞0 1 𝑞1 1 𝑞2 Simple NFA N2 (This


can be thought of
𝐿2
reversal of N1 , i.e.,
0, 1 𝐿 2 = (𝐿 1 ) 𝑟 )

𝑞1 0 𝑞2
0
𝐿3 𝑞0 Simple NFA N3
1
𝑞3 1 𝑞4

0, 1

𝑝0 1 𝑝1 1 𝑝2
𝜖
𝐿1 ∪ 𝐿2 𝑟0 𝜖 NFA for Union
𝑞0 1 𝑞1 1 𝑞2

0, 1
0, 1

𝑞0 1 𝑞1 1 𝑞2 𝜖 𝑝0 1 𝑝1 1 𝑝2 NFA for
𝐿2 𝐿1
Concatenation

0, 1

𝑞1 0 𝑞2
0
𝜖 𝑞0
(𝐿 3 ) ∗ 𝑟0
1 NFA for Kleene star
𝑞3 1 𝑞4

Figure 14: Illustration of union, concatenation, Kleene star and reversal.

11. {𝑎 𝑛 𝜔𝑎 𝑛 : 𝜔 ∈ {𝑎, 𝑏}∗ and 𝑛 ≥ 1}.

12. {𝑎 𝑚 𝑏 𝑛 : 𝑚, 𝑛 ≥ 0 and 5|(𝑚 + 𝑛)}.

13. {𝑢𝑣𝑢𝑟 : 𝑢 ∈ {𝑎, 𝑏}+ and 𝑣 ∈ {𝑎, 𝑏}∗ } .

22
14. {𝑢𝑣𝑢𝑟 : 𝑢, 𝑣 ∈ {𝑎, 𝑏}+ } .

Remark 2.11. Consider the following three languages:

• 𝐿 1 = {𝑢𝑣𝑢𝑟 : 𝑢 ∈ {𝑎, 𝑏}+ and 𝑣 ∈ {𝑎, 𝑏}∗ }.

• 𝐿 2 = {𝑢𝑣𝑢𝑟 : 𝑢, 𝑣 ∈ {𝑎, 𝑏}+ }.

• 𝐿 3 = {𝑢𝑢𝑟 : 𝑢 ∈ {𝑎, 𝑏}+ }.

Here 𝐿 2 ⊂ 𝐿 1 and 𝐿 3 ⊂ 𝐿 1 , but 𝐿 3 ⊄ 𝐿 2 . Now, the question is 𝐿 1 \ 𝐿 2 = 𝐿 3 ? No. In fact, 𝐿 1 \ 𝐿 2 = {𝑎𝑎, 𝑏𝑏}.
We can write 𝐿 1 as

𝐿 1 = {𝑤 ∈ {𝑎, 𝑏}∗ : |𝜔| ≥ 2, and 𝜔 starts and ends with same symbol}.

On the other hand, we can write 𝐿 2 as

𝐿 2 = {𝑤 ∈ {𝑎, 𝑏}∗ : |𝜔| ≥ 3, and 𝜔 starts and ends with same symbol}.

If 𝐿 4 = {𝑢𝑣𝑢𝑟 : 𝑢, 𝑣 ∈ {𝑎, 𝑏}∗ }, then 𝐿 4 = {𝑎, 𝑏}∗ .


Exercise 2.12. Let 𝐿 be a regular language over Σ = {0, 1}. Let 𝐿 ′ = {𝜔 ∈ Σ∗ : ∃𝑥 ∈ 𝐿 s.t |𝜔| =
|𝑥| and #1’s in 𝜔 ⊕ 𝑥 is 1}, where 𝜔 ⊕ 𝑥 denotes the bitwise xor operation. Show that 𝐿 ′ is regular.

3 Regular Expressions
All of you have experience in handling various arithmetic expressions, like

(5 + 3) ∗ 4,
5 + 3 ∗ 4,
5%3 ∗ 4

which are defined over the alphabet {0, 1, . . . , 9, (, ), +, −, /, ∗, %}. Each expression gives a real value once
we evaluate it. Similarly, we can use regular expressions to describe regular languages. For example, the
regular expression (0 + 1)0∗ describes a language consisting of all strings over {0, 1} starting with a 0 or
1 followed by any number of 0’s. Here the symbols 0 and 1 stand for the sets {0} and {1} respectively,
and the connector, +, stands7 for the union of the underlying sets. So, (0 + 1) represents {0} ∪ {1}, that
is, {0, 1}. The part, 0∗ , essentially means {0}∗ , which is the language consisting of all strings having any
number of 0’s.
Regular expressions play an important role in several areas of computer science. For example, they are
invaluable in applications that involve searching for patterns in text. Many of you have likely used regular
expressions to solve problems in programming languages. Tools like awk and grep in Unix, programming
languages such as Perl, and some text editors all utilize regular expressions for pattern description.
Consider another example of regular expressions: (0 + 1) ∗ . It means applying Kleene star on {0} ∪ {1} =
{0, 1}. So, (0 + 1) ∗ describes the language consisting of all strings over {0, 1}. If Σ = {0, 1}, then Σ is the
shorthand for the regular expression (0 + 1). The regular expression (0Σ∗ ) + (Σ∗ 1) represents the language
consisting of all strings over {0, 1} either start with 0 or end with 1. Like arithmetic expressions, regular
expressions have some legitimate forms which are formally defined next.
7 Some authors, e.g., [Sip14, LP06] use ∪ in place of +.

23
Definition 3.1 (Regular expression). The regular expressions over an alphabet Σ are defined inductively
as follows:

1. ∅ is a regular expression.

2. 𝜖 is a regular expression.

3. For each 𝜎 ∈ Σ, 𝜎 is a regular expression.

4. If R1 and R2 are regular expressions, then so is (R1 + R2 ).

5. If R1 and R2 are regular expressions, then so is (R1 R2 ).

6. If R is a regular expressions, then so is R∗ .

7. Nothing is a regular expression unless it follows from (1) through (6).

Every regular expression R over8 Σ also describes a language, denoted by L (R), which is defined
inductively as follows.

1. If R = ∅, then L (R) = ∅ (empty language).

2. If R = 𝜖, then L (R) = {𝜖 } (language consisting of only one string 𝜖).

3. If R = 𝜎, then L (R) = {𝜎} (language consisting of only one symbol 𝜎).

4. If R = R1 + R2 , where R1 and R2 are regular expressions, then L (R) = L (R1 ) ∪ L (R2 ) (through union).

5. If R = R1 R2 , where R1 and R2 are regular expressions, then L (R) = L (R1 )L (R2 ) (through concate-
nation).

6. If R is a regular expression, then L (R∗) = (L (R)) ∗ (through Kleene star).

Similar to the precedence of operators involved in arithmetic expressions, the connectors of regular expres-
sions have some priorities and they are given from highest precedence to lowest precedence as follows:

1. (), 2. ∗, 3. concatenation, 4. +.

For any three expressions R1 , R2 and R3 , it is easy to check that (R1 + R2 ) + R3 and R1 + (R2 + R3 ) describe
the same language. This essentially says that parentheses do not play any role here. So, we write,
(R1 + R2 ) + R3 = R1 + R2 + R3 = R1 + (R2 + R3 ).
Remark 3.1. Let 𝐿 be a language over Σ and R be a regular expression. We say that 𝐿 is described by the
regular expression R if 𝐿 = L (R). Let 𝐿 be any subset Σ, then it can be described by a regular expression.
In fact, let 𝐿 = {𝜎1 , . . . , 𝜎𝑚 } ⊆ Σ, then R = 𝜎1 + · · · + 𝜎𝑚 describes 𝐿 as L (R) = ∪ L (𝜎𝑖 ) = 𝐿.
𝑖 ∈ [𝑚]

Let Σ = {0, 1}. Some regular expressions and their corresponding languages are given below.

1. R = 0 describes the language {0}.

2. R = 1 describes the language {1}.


8 Although we say regular expressions are defined over Σ, actually they are defined over Σ ∪ {(, ), +, ∅, 𝜖, ∗}. To avoid any

kind of notational conflict, we consider underline version of notations, like 𝜖, ∅, 𝜎 and Σ.

24
3. R = 0+1 describes the language {0}∪{1} = {0, 1}. Sometimes, we would refer to the regular expression
(0 + 1) as Σ.

4. R = 1∗ describes the language {𝜖, 1, 11, 111, . . .}.

5. R = 11∗ describes the language {1}+ = {1, 11, 111 . . .}.

6. R = 1 + 𝜖 describes the language {𝜖, 1}.

7. R = 10 + 01 + 11 describes the language {10, 01, 11}.

8. R = (0 + 1) ∗ (0 + 1) describes the language {0, 1}+ .

9. R = ∅ + (0 + 1) + 𝜖 describes the language ∅ ∪ {0, 1} ∪ {𝜖 } = {𝜖, 0, 1}.

10. R = 1∗ 01∗ describes the language {𝜔 ∈ Σ∗ : 𝜔 contains a single 0}.

11. R = (0 + 1) ∗ 0(0 + 1) ∗ describes the language {𝜔 ∈ Σ∗ : 𝜔 contains at least one 0}.

12. R = (0 + 1) ∗ 010(0 + 1) ∗ describes the language {𝜔 ∈ Σ∗ : 𝜔 contains 010 as a substring}.

13. R = 1(0 + 1) ∗ 1 describes the language {𝜔 ∈ Σ∗ : 𝜔 starts and ends with 1, and |𝜔| ≥ 2}.

14. R = 1Σ∗ 1 + 0Σ∗ 0 + 1 + 0 describes the language {𝜔 ∈ Σ∗ : 𝜔 starts and ends with the same symbol}.

15. R = (0 + 𝜖)1∗ describes the language {0, 01, 011, . . .} ∪ {𝜖, 1, 11, . . .}.

16. R = (1 + 𝜖) ∗ , then L (R) = L (1∗ ) = {1}∗ .

17. R = (0 + 𝜖)(1 + 𝜖), then L (R) = {01, 1, 0, 𝜖 }.

Exercise 3.2. Describe languages for the following regular expressions.

1. R = Σ∗ 1Σ∗ 1Σ∗ .

2. R = (ΣΣ) ∗ .

3. R = 1∗ + (1∗ 01∗ 01∗ 0) ∗

4. R = (1(01 + 0) ∗ + (01 + 0) ∗ )1∗ .

We are now ready to see a connection between regular expressions and regular languages over an
alphabet Σ. Essentially, we show that a language is regular if and only if it is described by a regular
expression.

3.1 Regular Expressions to Finite State Automata

In this section, we show that a regular expression describes a regular language.

Theorem 3.1. If R is a regular expression, then L (R) is a regular language.

Proof. It suffices to show that L (R) is accepted by an NFA. Note that any regular expression follows one
of six forms in Definition 3.1. We show that the language corresponding to each form is accepted by an
NFA.

25
1. R = ∅. The language L (R) = ∅ is accepted by the NFA N = ({𝑞 0 }, Σ, 𝛿, 𝑞 0 , ∅), where 𝛿(𝑞 0 , 𝜎) = ∅ for
symbol 𝜎 ∈ Σ. A pictorial representation of N is given below.

2. R = 𝜖. The language L (R) = {𝜖 } is accepted by the NFA N = ({𝑞 0 }, Σ, 𝛿, 𝑞 0 , {𝑞 0 }), where 𝛿(𝑞 0 , 𝜎) = ∅
for any symbol 𝜎 ∈ Σ. A pictorial illustration of N is given below.

3. R = 𝜎. Formally, an NFA for L (R) = {𝜎} is defined as follows. N = ({𝑞 0 , 𝑞 1 }, Σ, 𝛿, 𝑞 0 , {𝑞 1 }), where
𝛿(𝑞 0 , 𝜎) = 𝑞 1 and 𝛿(𝑞, 𝑎) = ∅ for any 𝑞 ≠ 𝑞 1 or 𝑎 ≠ 𝜎 and a diagram is shown below.

4. R = R1 + R2 . The NFA used in proof of Theorem 2.3 for union accepts L (R) = L (R1 ) ∪ L (R2 ).

5. R = R1 R2 . The NFA used in proof of Theorem 2.3 for concatenation accepts L (R) = L (R1 )L (R2 ).

6. R = R∗1 . The NFA used in proof of Theorem 2.3 for Kleene star accepts L (R) = (L (R1)) ∗ .

Remark 3.3. You might have noticed that the above proof uses (strong) induction. In fact, the first three
cases9 are base steps, the remaining cases are handled using induction step.

Note that Theorem 3.1 essentially illustrates how to construct a finite state automaton from a given
regular expression. Some examples are given in Figure 15.
Exercise 3.4. Let Σ = {0, 1} be the alphabet. Then, construct finite state automata for the following
regular expressions.

1. R = 0.

2. R = 1.

3. R = 0 + 1.

4. R = 1∗ .

5. R = 11∗ .

6. R = 1 + 𝜖

7. R = 10 + 01 + 11.

8. R = (0 + 1) ∗ (0 + 1).
9 They are already handled using DFAs in Section 1.

26
Regular
Sl No. Finite State Automata Comments
Expressions
𝑎
1 𝑎 NFA

2 𝑏 𝑏 NFA

3 𝑎𝑏 𝑎 𝜖 𝑏 Using (1) and (2)

4 𝑎𝑏𝑎 𝑎 𝜖 𝑏 𝜖 𝑎 Using (1) and (3)


𝑎
𝜖
5 𝑎+𝑏 Using (1) and (2)
𝑏
𝜖
𝜖
𝑎
𝜖
𝜖
6 (𝑎 + 𝑏) ∗ 𝑏 Using (5)
𝜖
𝜖
𝑎 𝜖 𝑏
𝜖
7 𝑎𝑏 + 𝑎 Using (1) and (3)
𝑎
𝜖

𝜖
𝑎 𝜖 𝑏
𝜖
8 (𝑎𝑏 + 𝑎) ∗ 𝜖 Using (7)
𝑎
𝜖
𝜖

𝜖
𝑎 𝜖 𝑏
𝜖
𝜖
9 (𝑎𝑏 + 𝑎) ∗ 𝑎𝑏 𝜖
𝑎
𝜖 Using (3) and (8)

𝜖 𝜖
𝜖

𝑎 𝜖 𝑏

Figure 15: Construction of finite state automata from regular expressions.

9. R = ∅ + (0 + 1) + 𝜖.

10. R = 1∗ 01∗ .

27
11. R = (0 + 1) ∗ 0(0 + 1) ∗ .

12. R = (0 + 1) ∗ 010(0 + 1) ∗ .

13. R = 1(0 + 1) ∗ 1.

14. R = 1Σ∗ 1 + 0Σ∗ 0 + 1 + 0.

15. R = Σ∗ 1Σ∗ 1Σ∗

16. R = (ΣΣ) ∗ .

17. R = 1∗ + (1∗ 01∗ 01∗ 0) ∗ .

18. R = (1(01 + 0) ∗ + (01 + 0) ∗ )1∗ .

19. R = (0 + 𝜖)1∗ .

20. R = (1 + 𝜖) ∗ .

21. R = (0 + 𝜖)(1 + 𝜖)

3.2 Equivalent Regular Expressions

Many regular expressions can describe the same language. In this case, we say that they are same or
equivalent. Let R, R1 , R2 and R3 be any four regular expressions over Σ. Then we have the following
identities10 .

1. (Commutative.) R1 + R2 = R2 + R1 .

2. (Associative.) (R1 + R2 ) + R3 = R1 + (R2 + R3 ).

3. (Associative.) (R1 R2 )R3 = R1 (R2 R3 ).

4. ∅ + R = R + ∅ = R.

5. 𝜖R = R𝜖 = R.

6. ∅R = R∅ = ∅

7. (Left distributive.) R1 (R2 + R3 ) = R1 R2 + R1 R3 .

8. (Right distributive.) (R2 + R3 )R1 = R2 R1 + R3 R1 .

9. (Idempotent.) R + R = R.

10. (R∗) ∗ = R∗ .

11. 𝜖 + (R∗ )R = R∗ .

12. (𝜖 + R) ∗ = R∗ .

13. ∅∗ = 𝜖.
10 These equations will help in simplifying complicated regular expressions.

28
14. 𝜖 ∗ = 𝜖.
15. For any 𝜎 ∈ Σ, (𝜖 + 𝜎) ∗ (𝜖 + 𝜎) = (𝜖 + 𝜎) ∗ = 𝜎 ∗ and (𝜖 + 𝜎)𝜎 ∗ = 𝜎 ∗
16. If 𝑎, 𝑏 ∈ Σ, (𝜖 + 𝑎 + 𝑏) ∗ (𝜖 + 𝑎 + 𝑏) = (𝑎 + 𝑏) ∗ .

3.3 Regular Languages to Regular Expressions

The following theorem essentially illustrates how to construct a regular expression from a regular language
or deterministic finite automaton.
Theorem 3.2. If a language 𝐿 is regular, then there exists a regular expression R such that L (R) = 𝐿.

Proof. Let 𝐿 be any regular language. We will construct a regular expression R such that 𝐿 = L (R). Since
𝐿 is regular, there exists a deterministic finite automaton M = (𝑄, Σ, 𝛿, 𝑠, 𝐹) such that L (M) = 𝐿. Let
𝑄 = {𝑞 1 , . . . , 𝑞 𝑛 } and 𝑠 = 𝑞 1 . For 𝑖, 𝑗 = 1, . . . , 𝑛 and 𝑘 = 0, 1, . . . , 𝑛, we define 𝐿 𝑖𝑘𝑗 as the set of all strings in
Σ∗ that may drive M from state 𝑞 𝑖 to state 𝑞 𝑗 without passing through any intermediate state numbered
𝑘 + 1 or greater; the endpoints 𝑞 𝑖 and 𝑞 𝑗 are allowed to be numbered higher than 𝑘, i.e.,
𝐿 𝑖𝑘𝑗 = {𝑥 ∈ Σ∗ : (𝑞 𝑖 , 𝑥) ⊢M (𝑞 𝑗 , 𝜖) or ((𝑞 𝑖 , 𝑥) ⊢+M (𝑞 ℓ , 𝑦) ⊢+M (𝑞 𝑗 , 𝜖) and ℓ ≤ 𝑘)}.11

For 𝑘 = 𝑛, it follows that 


𝐿 𝑖𝑛𝑗 = 𝑥 ∈ Σ∗ : (𝑞 𝑖 , 𝑥) ⊢∗M (𝑞 𝑗 , 𝜖) .
Therefore, 𝐿 = L (M) = ∪ 𝐿 1𝑛 𝑗 .
𝑞 𝑗 ∈𝐹

If we show that for each language 𝐿 𝑖𝑘𝑗 , there is a regular expression R𝑖𝑘𝑗 such that L (R𝑖𝑘𝑗 ) = 𝐿 𝑖𝑘𝑗 , then
R = R1𝑘 𝑗1 + · · · + R1𝑘 𝑗𝑚 , where 𝐹 = {𝑞 𝑗1 , . . . , 𝑞 𝑗𝑚 }, will be a regular expression for the given regular language 𝐿.
So, it suffices to show that each language 𝐿 𝑖𝑘𝑗 is described by a regular expression. Again, we use induction
on 𝑘.

Base Step. For 𝑘 = 0, (


{𝜎 ∈ Σ : 𝛿(𝑞 𝑖 , 𝜎) = 𝑞 𝑗 } if 𝑖 ≠ 𝑗,
𝐿 0𝑖 𝑗 =
{𝜖 } ∪ {𝜎 ∈ Σ : 𝛿(𝑞 𝑖 , 𝜎) = 𝑞 𝑗 } if 𝑖 = 𝑗 .
Note that {𝜎 ∈ Σ : 𝛿(𝑞 𝑖 , 𝜎) = 𝑞 𝑗 } is a subset of Σ. So, following Remark 3.1, 𝐿 0𝑖 𝑗 can be described by a
regular expression, say, R0𝑖 𝑗 .

Induction Hypothesis. Assume that for all 𝑖, 𝑗, 𝐿 𝑖𝑘−1


𝑗 can be described by a regular expression, say,
R𝑖𝑘−1
𝑗 .

Induction Step. We show that 𝐿 𝑖𝑘𝑗 can be described by a regular expression. We can express 𝐿 𝑖𝑘𝑗 as
follows:
𝑘−1 ∗ 𝑘−1
𝐿 𝑖𝑘𝑗 = 𝐿 𝑖𝑘−1 𝑘−1
𝑗 ∪ 𝐿 𝑖𝑘 (𝐿 𝑘 𝑘 ) 𝐿 𝑘 𝑗 .

By induction hypothesis, 𝐿 𝑖𝑘−1 𝑘−1 𝐿 𝑘−1 and 𝐿 𝑘−1 have regular expressions R 𝑘−1 , R 𝑘−1 R 𝑘−1 and R 𝑘−1
𝑗 , 𝐿 𝑖𝑘 𝑘𝑘 𝑘𝑗 𝑖𝑗
 ∗ 𝑖𝑘 𝑘𝑘 𝑘𝑗
𝑘 𝑘 𝑘−1 𝑘−1 𝑘−1
respectively. Then, 𝐿 𝑖 𝑗 is described by the regular expression R𝑖 𝑗 = R𝑖 𝑗 +R𝑖𝑘 R 𝑘 𝑘 R 𝑘−1𝑘𝑗 . This completes
the proof. 
11 Also, check this alternative definition: 𝐿 𝑖𝑘𝑗 = {𝑥 ∈ Σ∗ : (𝑞 𝑖 , 𝑥) ⊢∗M (𝑞 𝑗 , 𝜖) and if (𝑞 𝑖 , 𝑥) ⊢+M (𝑞 ℓ , 𝑦) ⊢+M (𝑞 𝑗 , 𝜖), then ℓ ≤ 𝑘 }

29
Exercise 3.5. Let 𝐿 be a regular language. Then, how many regular expressions of the form R𝑖𝑘𝑗 need to
be computed (in worst case) to obtain a regular expression for 𝐿?
Example 3.6. Let Σ = {𝑎, 𝑏} and 𝐿 = {𝜔 ∈ Σ∗ : number of b’s is odd}. Find a regular expression R such
that L (R) = 𝐿. The following DFA M = (𝑄, Σ, 𝛿, 𝑠, 𝐹) recognizes L.

𝑎 𝑎

𝑞1 𝑏 𝑞2
𝑏

So, here 𝑄 = {𝑞 1 , 𝑞 2 }, 𝑠 = 𝑞 1 and 𝐹 = {𝑞 2 }. Note that 𝐿 = 𝐿 212 . We inductively compute 𝐿 𝑖𝑘𝑗 and the
corresponding regular expression12 R𝑖𝑘𝑗 for 𝑖, 𝑗 ∈ [2] and 𝑘 ∈ {0, 1, 2} as follows.

1. 𝐿 011 = {𝜖 } ∪ {𝜎 ∈ Σ : 𝛿(𝑞 1 , 𝜎) = 𝑞 1 } = {𝜖, 𝑎}. So, R011 = 𝜖 + 𝑎.

2. 𝐿 012 = {𝜎 ∈ Σ : 𝛿(𝑞 1 , 𝜎) = 𝑞 2 } = {𝑏}. So, R012 = 𝑏.

3. 𝐿 021 = {𝜎 ∈ Σ : 𝛿(𝑞 2 , 𝜎) = 𝑞 1 } = {𝑏}. So, R021 = 𝑏.

4. 𝐿 022 = {𝜖 } ∪ {𝜎 ∈ Σ : 𝛿(𝑞 2 , 𝜎) = 𝑞 2 } = {𝜖, 𝑎}. So, R022 = 𝜖 + 𝑎.

5. 𝐿 112 = 𝐿 012 ∪ 𝐿 011 (𝐿 011 ) ∗ 𝐿 012 . So, R112 = R012 + R011 (R011 ) ∗ R012 = 𝑏 + (𝜖 + 𝑎)(𝜖 + 𝑎) ∗ 𝑏 = 𝑏 + 𝑎 ∗ 𝑏 = (𝜖 + 𝑎 ∗ )𝑏 = 𝑎 ∗ 𝑏.

6. 𝐿 122 = 𝐿 022 ∪ 𝐿 021 (𝐿 011 ) ∗ 𝐿 012 . So, R122 = R022 + R021 (R011 ) ∗ R012 = (𝜖 + 𝑎) + 𝑏(𝜖 + 𝑎) ∗ 𝑏 = 𝜖 + 𝑎 + 𝑏𝑎 ∗ 𝑏.

7. 𝐿 212 = 𝐿 112 ∪ 𝐿 112 (𝐿 122 ) ∗ 𝐿 122 . So, we have

R212 = R112 + R112 (R122 ) ∗ R122


= 𝑎 ∗ 𝑏 + 𝑎 ∗ 𝑏(𝜖 + 𝑎 + 𝑏𝑎 ∗ 𝑏) ∗ (𝜖 + 𝑎 + 𝑏𝑎 ∗ 𝑏)
= 𝑎 ∗ 𝑏(𝜖 + (𝜖 + 𝑎 + 𝑏𝑎 ∗ 𝑏) ∗ (𝜖 + 𝑎 + 𝑏𝑎 ∗ 𝑏))
= 𝑎 ∗ 𝑏(𝜖 + 𝑎 + 𝑏𝑎 ∗ 𝑏) ∗
= 𝑎 ∗ 𝑏(𝑎 + 𝑏𝑎 ∗ 𝑏) ∗ .

Example 3.7. Let Σ = {0, 1} and 𝐿 = {𝜔 ∈ Σ∗ : 𝜔 contains at least two 1’s}. Find a regular expression R
such that L (R) = 𝐿. The following DFA M = (𝑄, Σ, 𝛿, 𝑠, 𝐹) recognizes L.

0 0 0, 1

𝑞1 1 𝑞2 1 𝑞3

So, here 𝑄 = {𝑞 1 , 𝑞 2 , 𝑞 3 }, 𝑠 = 𝑞 1 and 𝐹 = {𝑞 3 }. Note that 𝐿 = 𝐿 312 . We inductively compute 𝐿 𝑖𝑘𝑗 and the
corresponding regular expression R𝑖𝑘𝑗 for 𝑖, 𝑗 ∈ [3] and 𝑘 ∈ {0, 1, 2, 3} as follows.

1. 𝐿 011 = {𝜖 } ∪ {𝜎 ∈ Σ : 𝛿(𝑞 1 , 𝜎) = 𝑞 1 } = {𝜖, 0}. So, R011 = 𝜖 + 0.

2. 𝐿 012 = {𝜎 ∈ Σ : 𝛿(𝑞 1 , 𝜎) = 𝑞 2 } = {1}. So, R012 = 1.


12 Some R𝑖𝑘𝑗 ’s are skipped as they are not required for R212

30
3. 𝐿 013 = {𝜎 ∈ Σ : 𝛿(𝑞 1 , 𝜎) = 𝑞 3 } = ∅. So, R013 = ∅.

4. 𝐿 021 = {𝜎 ∈ Σ : 𝛿(𝑞 2 , 𝜎) = 𝑞 1 } = ∅. So, R021 = ∅.

5. 𝐿 022 = {𝜖 } ∪ {𝜎 ∈ Σ : 𝛿(𝑞 2 , 𝜎) = 𝑞 2 } = {𝜖, 0}. So, R022 = 𝜖 + 0.

6. 𝐿 023 = {𝜎 ∈ Σ : 𝛿(𝑞 2 , 𝜎) = 𝑞 2 } = {1}. So, R023 = 1.

7. 𝐿 031 = {𝜎 ∈ Σ : 𝛿(𝑞 3 , 𝜎) = 𝑞 1 } = ∅. So, R031 = ∅.

8. 𝐿 032 = {𝜎 ∈ Σ : 𝛿(𝑞 3 , 𝜎) = 𝑞 2 } = ∅. So, R032 = ∅.

9. 𝐿 033 = {𝜖 } ∪ {𝜎 ∈ Σ : 𝛿(𝑞 3 , 𝜎) = 𝑞 3 } = {𝜖, 0, 1}. So, R033 = 𝜖 + 0 + 1.

10. 𝐿 112 = 𝐿 012 ∪ 𝐿 011 (𝐿 011 ) ∗ 𝐿 012 . So, R112 = R012 + R011 (R011 ) ∗ R012 = 1 + (𝜖 + 0)(𝜖 + 0) ∗ 1 = 0∗ 1.

11. 𝐿 113 = 𝐿 013 ∪ 𝐿 011 (𝐿 011 ) ∗ 𝐿 013 . So, R113 = R013 + R011 (R011 ) ∗ R013 = ∅ + (𝜖 + 0)(𝜖 + 0) ∗ ∅ = ∅.

12. 𝐿 122 = 𝐿 022 ∪ 𝐿 021 (𝐿 011 ) ∗ 𝐿 012 . So, R122 = R022 + R021 (R011 ) ∗ R012 = (𝜖 + 0) + ∅(𝜖 + 0) ∗ 1 = 𝜖 + 0.

13. 𝐿 123 = 𝐿 023 ∪ 𝐿 021 (𝐿 011 ) ∗ 𝐿 013 . So, R123 = R023 + R021 (R011 ) ∗ R013 = 1 + (𝜖 + 0)(𝜖 + 0) ∗ ∅ = 1.

14. 𝐿 132 = 𝐿 032 ∪ 𝐿 031 (𝐿 011 ) ∗ 𝐿 012 . So, R132 = R032 + R031 (R011 ) ∗ R012 = ∅ + ∅(𝜖 + 0) ∗ 1 = ∅.

15. 𝐿 133 = 𝐿 033 ∪ 𝐿 033 (𝐿 011 ) ∗ 𝐿 013 . So, R133 = R033 + R031 (R011 ) ∗ R013 = (𝜖 + 0 + 1) + ∅(𝜖 + 0) ∗ ∅ = 𝜖 + 0 + 1.

16. 𝐿 213 = 𝐿 113 ∪ 𝐿 112 (𝐿 122 ) ∗ 𝐿 123 . So, R113 = R113 + R112 (R122 ) ∗ R123 = ∅ + 0∗ 1(𝜖 + 0) ∗ 1 = 0∗ 10∗ 1.

17. 𝐿 233 = 𝐿 133 ∪ 𝐿 132 (𝐿 122 ) ∗ 𝐿 123 . So, R133 = R133 + R132 (R122 ) ∗ R123 = (𝜖 + 0 + 1) + ∅(𝜖 + 0) ∗ 1 = 𝜖 + 0 + 1.

18. 𝐿 313 = 𝐿 213 ∪ 𝐿 213 (𝐿 233 ) ∗ 𝐿 233 . So, we have

R313 = R213 + R213 (R333 ) ∗ R233


= 0∗ 10∗ 1 + 0∗ 10∗ 1(𝜖 + 0 + 1) ∗ (𝜖 + 0 + 1)
= 0∗ 10∗ 1 + 0∗ 10∗ 1(0 + 1) ∗
= 0∗ 10∗ 1(𝜖 + (0 + 1) ∗ )
= 0∗ 10∗ 1(0 + 1) ∗ .

Exercise 3.8. Find regular expressions for the following languages using the construction mechanism
involved in the proof of Theorem 3.2.

1. {0, 1}+ .

2. {𝜔 ∈ {0}∗ : 𝜔 = 03𝑘 for 𝑘 ≥ 0}.


Í 

3. {𝜔 ∈ {0, 1}∗ : if ℓ = |𝜔| ≥ 1, then the value of 𝑖=1 𝜔 𝑖 %2 is 1}.

4. {𝜔 ∈ {0, 1}∗ : decimal value of 𝜔 is congruent to 1 modulo 3}.

5. {𝜔 ∈ {0, 1}∗ : sum of binary digits of 𝜔 is congruent to 2 modulo 3}.

6. {𝑎𝑖 𝑏 𝑗 : 𝑖, 𝑗 ≥ 0}.

7. {0}+ ∪ {1}+ .

31
𝑦

𝑥
𝑞𝑖 = 𝑞 𝑗
𝑧

Figure 16: Illustration of Pumping lemma.

4 Pumping Lemma and Nonregular Languages


The model, the finite state automaton that we have studied cannot recognize all the languages over a
given alphabet. We will see there are languages which are not accepted by any finite state automaton.
For showing the limitation of finite state automata, we will use a well known tool, called Pumping lemma.
Essentially, using this lemma we show that some languages are not regular.
Theorem 4.1 (Pumping lemma). Let 𝐿 be a regular language. There is a positive integer 𝑛 (which depends
on 𝐿) such that any string 𝜔 ∈ 𝐿 with |𝜔| ≥ 𝑛 can be rewritten as 𝜔 = 𝑥𝑦𝑧 such that:

1. 𝑦 ≠ 𝜖,
2. |𝑥𝑦| ≤ 𝑛, and
3. 𝑥𝑦 𝑘 𝑧 ∈ 𝐿 for each 𝑘 ≥ 0.

Proof. Since 𝐿 is regular, it is accepted by a deterministic finite automaton M. Suppose that 𝑛 is the
number of states of M. Let 𝜔 be a string with |𝜔| ≥ 𝑛. Consider the first 𝑛 steps of the computation of M
on 𝜔:
( 𝑝 0 , 𝜔1 𝜔2 . . . 𝜔 𝑛 ) ⊢M ( 𝑝 1 , 𝜔2 . . . 𝜔 𝑛 ) ⊢M . . . ⊢M ( 𝑝 𝑛 , 𝜖),
where 𝑝 0 is the initial state of M, and 𝜔1 . . . 𝜔 𝑛 are the 𝑛 first symbols of 𝜔. Since M has only 𝑛 states,
and there are 𝑛 + 1 configurations ( 𝑝 𝑖 , 𝜔𝑖+1 . . . , 𝜔 𝑛 ) appearing in the computation above, by the pigeonhole
principle there exist 𝑖 and 𝑗, 0 ≤ 𝑖 < 𝑗 ≤ 𝑛, such that 𝑝 𝑖 = 𝑝 𝑗 . That is, the string 𝑦 = 𝜔𝑖+1 . . . 𝜔 𝑗 drives M
from state 𝑝 𝑖 back to state 𝑝 𝑖 , and this string is nonempty since 𝑖 < 𝑗. See the pictorial representation in
Figure 16.
Note that the string 𝑦 could be removed from 𝜔 or repeated any number of times in 𝜔 just after the
𝑖 𝑡 ℎ symbol of 𝜔, and M would still accept this string. That is, M accepts 𝑥𝑦 𝑘 𝑧 ∈ 𝐿 for each 𝑘 ≥ 0, where
𝑥 = 𝜔1 . . . 𝜔𝑖 , 𝑦 = 𝜔𝑖+1 . . . 𝜔 𝑗 and 𝑧 = 𝜔 𝑗+1 . . . 𝜔 𝑚 , where 𝑚 = |𝜔|. Furthermore, the length of 𝑥𝑦, the number
we called 𝑗 above, is by definition at most 𝑛, as required. 

Example 4.1. The language 𝐿 = {𝑎𝑖 𝑏𝑖 : 𝑖 ≥ 0} is not regular. If it were, then Theorem 4.1 would apply
for some integer 𝑛. Consider then the string 𝜔 = 𝑎 𝑛 𝑏 𝑛 ∈ 𝐿. By the theorem, it can be rewritten as 𝜔 = 𝑥𝑦𝑧
such that |𝑥𝑦| ≤ 𝑛 and 𝑦 ≠ 𝜖, that is, 𝑦 = 𝑎𝑖 for some 𝑖 > 0. But then 𝑥𝑧 = 𝑎 𝑛−𝑖 𝑏 𝑛 ∉ 𝐿, contradicting
Theorem 4.1.
Example 4.2. The language 𝐿 = {𝑎 𝑚 : 𝑚 is prime} is not regular. For suppose it were, and let 𝑥, 𝑦, and
𝑧 be as specified in Theorem 4.1. Then 𝑥 = 𝑎 𝑃 , 𝑦 = 𝑎 𝑞 , and 𝑧 = 𝑎𝑟 , where 𝑝, 𝑟 ≥ 0 and 𝑞 > 0. By the
theorem, 𝑥𝑦 𝑘 𝑧 ∈ 𝐿 for each 𝑘 ≥ 0, that is, 𝑝 + 𝑘𝑞 + 𝑟 is prime for each 𝑘 ≥ 0. But this is impossible, for let
𝑘 = 𝑝 + 2𝑞 + 𝑟 + 2, then 𝑝 + 𝑘𝑞 + 𝑟 = (𝑞 + 1) · ( 𝑝 + 2𝑞 + 𝑟), which is a product of two natural numbers, each
greater than 1.

32
Example 4.3. Show the following language is not regular:

𝐿 = {𝜔 ∈ {𝑎, 𝑏}∗ : 𝜔 has equal number of a’s and b’s}.

First note that 𝐿 1 = {𝑎 𝑚 𝑏 𝑛 : 𝑚, 𝑛 ≥ 0} is regular. Suppose 𝐿 is regular, that would imply 𝐿 ∩ 𝐿 1 is regular.
But, 𝐿 ∩ 𝐿 1 = {𝑎 𝑛 𝑏 𝑛 : 𝑛 ≥ 0} is not regular, a contradiction.
Example 4.4. Show that the following language is not regular:

𝐿 = {𝜔 ∈ {𝑎, 𝑏}∗ : 𝜔 = 𝜔𝑟 }.

Suppose 𝐿 is regular. Then by Pumping lemma, there exists an integer 𝑛. Let 𝜔 = 𝑎 𝑛 𝑏𝑏𝑎 𝑛 ∈ 𝐿. Then we
can write 𝜔 = 𝑥𝑦𝑧 with |𝑥𝑦| ≤ 𝑛, 𝑦 ≠ 𝜖 and 𝑥𝑦 𝑖 𝑧 ∈ 𝐿 for all 𝑖 ≥ 0. That is, 𝑦 = 𝑎 𝑚 for some 𝑚 > 0. Thus,
𝑥𝑦 𝑖 𝑧 = 𝑎 𝑛+𝑚(𝑖−1) 𝑏𝑏𝑎 𝑛 ∉ 𝐿 for 𝑖 ≠ 1, a contradiction.
Exercise 4.5. Show that the following languages are not regular.
2
1. {𝜔 ∈ {1}∗ : 𝜔 = 1𝑚 for 𝑚 ≥ 1}.

2. {𝜔 ∈ {𝑎, 𝑏}∗ : 𝜔 = 𝑎𝑖 𝑏 𝑗 and 𝑖, 𝑗 ≥ 0 and 𝑖 ≠ 𝑗 }.

3. {𝑎 𝑛 𝑏 𝑛 𝑐𝑛 : 𝑛 ≥ 0}.

4. {𝑎 𝑛 𝑏 𝑚 𝑎 𝑛 : 𝑚, 𝑛 ≥ 0}.

5. {𝑎 𝑚 𝑏 𝑚 𝑎 𝑛 : 𝑚, 𝑛 ≥ 0}.

6. {𝜔 ∈ {𝑎, 𝑏}∗ : 𝜔 = 𝜔𝑟 }.

7. {𝜔 ∈ {0, 1}∗ : #0’s ≠ #1’s}.

8. {𝑎 𝑛 𝑏𝑥𝑏 𝑛 : 𝑥 ∈ {𝑎, 𝑏}∗ and 𝑛 ≥ 1}.

9. {𝑎 𝑛 𝑏𝑥𝑎 𝑛 : 𝑥 ∈ {𝑎, 𝑏}∗ and 𝑛 ≥ 1}.

10. {𝜔 ∈ {𝑎, 𝑏}∗ : 𝜔 is not a palindrome}


𝑚
11. {𝜔 ∈ {𝑎}∗ : 𝜔 = 𝑎2 for 𝑚 ≥ 0}.

12. {𝜔𝜔𝑟 : 𝜔 ∈ {𝑎, 𝑏}∗ }.

e : 𝜔 ∈ {𝑎, 𝑏}∗ }, where 𝜔


13. {𝜔𝜔 e is obtained from 𝜔 by replacing all a’s and b’s by b and a respectively.

14. {𝜔𝜔 : 𝜔 ∈ {𝑎, 𝑏}∗ }.

15. {𝜔𝑡𝜔 : 𝜔, 𝑡 ∈ {𝑎, 𝑏}+ }.

References
[LP06] Harry R Lewis and Christos H. Papadimitriou. Elements of Theory of Computation. Pearson
Education, 2006.

[Sip14] Michael Sipser. Introduction to the Theory of Computation. Cengage India Private Limited, 2014.

33

You might also like