3 Finite-State Machines: 3.1 Intuition
3 Finite-State Machines: 3.1 Intuition
3 Finite-State Machines: 3.1 Intuition
Life only avails, not the having lived. Power ceases in the instant of repose;
it resides in the moment of transition from a past to a new state,
in the shooting of the gulf, in the darting to an aim.
— Ralph Waldo Emerson, “Self Reliance”, Essays, First Series (1841)
3 Finite-State Machines
3.1 Intuition
Suppose we want to determine whether a given string w[1 .. n] of bits represents a multiple of 5
in binary. After a bit of thought, you might realize that you can read the bits in w one at a time,
from left to right, keeping track of the value modulo 5 of the prefix you have read so far.
MultipleOf5(w[1 .. n]):
rem ← 0
for i ← 1 to n
rem ← (2 · rem + w[i]) mod 5
if rem = 0
return True
else
return False
Aside from the loop index i, which we need just to read the entire input string, this algorithm
has a single local variable rem, which has only four different values: 0, 1, 2, 3, or 4.
This algorithm already runs in O(n) time, which is the best we can hope for—after all, we
have to read every bit in the input—but we can speed up the algorithm in practice. Let’s define a
change or transition function δ : {0, 1, 2, 3, 4} × {0, 1} → {0, 1, 2, 3, 4} as follows:
(Here I’m implicitly converting the symbols 0 and 1 to the corresponding integers 0 and 1.) Since
we already know all values of the transition function, we can store them in a precomputed table,
and then replace the computation in the main loop of MultipleOf5 with a simple array lookup.
We can also modify the return condition to check for different values modulo 5. To be
completely general, we replace the final if-then-else lines with another array lookup, using an
array A[0 .. 4] of booleans describing which final mod-5 values are “acceptable”.
After both of these modifications, our algorithm looks like one of the following, depending on
whether we want something iterative or recursive (with q = 0 in the initial call):
DoSomethingCool(q, w):
DoSomethingCool(w[1 .. n]):
if w = "
q←0
return A[q]
for i ← 1 to n
else
q ← δ[q, w[i]]
decompose w = a · x
return A[q]
return DoSomethingCool(δ(q, a), x)
1
Models of Computation Lecture 3: Finite-State Machines [Sp’18]
We can also visualize the behavior of DoSomethingCool by drawing a directed graph, whose
vertices represent possible values of the variable q—the possible states of the algorithm—and
whose edges are labeled with input symbols to represent transitions between states. Specifically,
a
the graph includes the labeled directed edge p−→q if and only if δ(p, a) = q. To indicate the
proper return value, we draw the “acceptable” final states using doubled circles. Here is the
resulting graph for MultipleOf5:
0
1 3
1 1 0
0 0 0 1 4 1
1 0
2
State-transition graph for MultipleOf5
If we run the MultipleOf5 algorithm on the string 00101110110 (representing the number
374 in binary), the algorithm performs the following sequence of transitions:
0 0 1 0 1 1 1 0 1 1 0
0 −→ 0 −→ 0 −→ 1 −→ 2 −→ 0 −→ 1 −→ 3 −→ 1 −→ 3 −→ 2 −→ 4
Because the final state is not the “acceptable” state 0, the algorithm correctly returns False.
We can also think of this sequence of transitions as a walk in the graph, which is completely
determined by the start state 0 and the sequence of edge labels; the algorithm returns True if
and only if this walk ends at an “acceptable” state.
2
Models of Computation Lecture 3: Finite-State Machines [Sp’18]
Finally, a finite-state machine accepts a string w if and only if δ∗ (s, w) ∈ A, and rejects w
otherwise. (Compare this definition with the recursive formulation of DoSomethingCool!)
For example, our final MultipleOf5 algorithm is a DFA with the following components:
1It’s unclear why we use the letter Q to refer to the state set, and lower-case q to refer to a generic state, but that
is now the firmly-established notational standard. Although the formal study of finite-state automata began much
earlier, its modern formulation was established in a 1959 paper by Michael Rabin and Dana Scott, for which they won
the Turing award. Rabin and Scott called the set of states S, used lower-case s for a generic state, and called the start
state s0 . On the other hand, in the 1936 paper for which the Turing award was named, Alan Turing used q1 , q2 , . . . , qR
to refer to states (or “m-configurations”) of a generic Turing machine. Turing may have been mirroring the standard
notation Q for configuration spaces in classical mechanics, also of uncertain origin.
3
Models of Computation Lecture 3: Finite-State Machines [Sp’18]
We have already seen a more graphical representation of this entire sequence of transitions:
0 0 1 0 1 1 1 0 1 1 0
0 −→ 0 −→ 0 −→ 1 −→ 2 −→ 0 −→ 1 −→ 3 −→ 1 −→ 3 −→ 2 −→ 4
The arrow notation is easier to read and write for specific examples, but surprisingly, most people
actually find the more formal functional notation easier to use in formal proofs. Try them both!
We can equivalently define a DFA as a directed graph whose vertices are the states Q, whose
edges are labeled with symbols from Σ, such that every vertex has exactly one outgoing edge
with each label. In our drawings of finite state machines, the start state s is always indicated
by an incoming arrow, and the accepting states A are always indicted by doubled circles. By
induction, for any string w ∈ Σ∗ , this graph contains a unique walk that starts at s and whose
edges are labeled with the symbols in w in order. The machine accepts w if this walk ends at an
accepting state. This graphical formulation of DFAs is incredibly useful for developing intuition
and even designing DFAs. For proofs, it’s largely a matter of taste whether to write in terms of
extended transition functions or labeled graphs, but (as much as I wish otherwise) I actually find
it easier to write correct proofs using the functional formulation.
1
0 s t 0
1
A simple finite-state machine.
For example, the two-state machine M at the top of this page accepts the string 00101110100
after the following sequence of transitions:
0 0 1 0 1 1 1 0 1 0 0
s −→ s −→ s −→ t −→ t −→ s −→ t −→ s −→ s −→ t −→ t −→ t.
The same machine M rejects the string 11101101 after the following sequence of transitions:
1 1 1 0 1 1 0 1
s −→ t −→ s −→ t −→ t −→ s −→ t −→ t −→ s.
Finally, M rejects the empty string, because the start state s is not an accepting state.
From these examples and others, it is easy to conjecture that the language of M is the set of
all strings of 0s and 1s with an odd number of 1s. So let’s prove it!
Proof (tedious case analysis): Let #(a, w) denote the number of times symbol a appears in
string w. We will prove the following stronger claims by induction, for any string w.
¨ ¨
s if #(1, w) is even t if #(1, w) is even
δ∗ (s, w) = and δ∗ (t, w) =
t if #(1, w) is odd s if #(1, w) is odd
Let’s begin. Let w be an arbitrary string. Assume that for any string x that is shorter than w,
we have δ∗ (s, x) = s and δ∗ (t, x) = t if x has an even number of 1s, and δ∗ (s, x) = t and
δ∗ (t, x) = s if x has an odd number of 1s. There are five cases to consider.
4
Models of Computation Lecture 3: Finite-State Machines [Sp’18]
Since the remaining cases are similar, I’ll omit the line-by-line justification.
• If w = 1 x and #(1, w) is odd, then #(1, x) is even, so the inductive hypothesis implies
• If w = 0 x and #(1, w) is even, then #(1, x) is even, so the inductive hypothesis implies
• Finally, if w = 0 x and #(1, w) is odd, then #(1, x) is odd, so the inductive hypothesis
implies
Notice that this proof contains |Q|2 · |Σ| + |Q| separate inductive arguments. For every pair of
states p and q, we must argue about the language of all strings w such that δ∗ (p, w) = q, and
we must consider every possible first symbol in w. We must also argue about δ(p, ") for every
state p. Each of those arguments is typically straightforward, but it’s easy to get lost in the deluge
of cases.
For this particular proof, however, we can reduce the number of cases by switching from tail
recursion to head recursion. The following identity holds for all strings x ∈ Σ∗ and symbols
a ∈ Σ:
δ∗ (q, x a) = δ(δ∗ (q, x), a)
We leave the inductive proof of this identity as a straightforward exercise (hint, hint).
Proof (clever renaming, head induction): Let’s rename the states with the integers 0 and 1
instead of s and t. Then the transition function can be described concisely as δ(q , a) =
(q + a) mod 2. We claim that for every string w, we have δ∗ (0, w) = #(1, w) mod 2.
Let w be an arbitrary string, and assume that for any string x that is shorter than w that
δ∗ (0, x) = #(1, x) mod 2. There are only two cases to consider: either w is empty or it isn’t.
5
Models of Computation Lecture 3: Finite-State Machines [Sp’18]
Hmmm. This “clever” proof is certainly shorter than the earlier brute-force proof, but is it
actually better? Simpler? More intuitive? Easier to understand? I’m skeptical. Sometimes brute
force really is more effective.
This clock doesn’t quite match our abstraction, because there’s no “start” state or “accepting”
states, unless perhaps you consider the “accepting” state to be the time when your train arrives.
A more playful example of a finite-state machine is the Rubik’s cube, a well-known mechanical
puzzle invented independently by Ernő Rubik in Hungary and Terutoshi Ishigi in Japan in the mid-
1970s. This puzzle has precisely 519,024,039,293,878,272,000 distinct configurations. In the unique
solved configuration, each of the six faces of the cube shows exactly one color. We can change the
configuration of the cube by rotating one of the six faces of the cube by 90 degrees, either clockwise
or counterclockwise. The cube has six faces (front, back, left, right, up, and down), so there are
exactly twelve possible turns, typically represented by the symbols R, L, F, B, U, D, R̄, L̄, F̄, B̄, Ū, D̄,
where the letter indicates which face to turn and the presence or absence of a bar over the letter
2A second hand was added to the Swiss Railway clocks in the mid-1950s, which sweeps continuously around the
clock in approximately 58½ seconds and then pauses at 12:00 until the next minute signal “to bring calm in the last
moment and ease punctual train departure”. Let’s ignore that.
6
Models of Computation Lecture 3: Finite-State Machines [Sp’18]
The basic approach is to try to construct an algorithm that looks like MultipleOf5: A simple
for-loop through the symbols, using a constant number of variables, where each variable (except
the loop index) has only a constant number of possible values. Here, “constant” means an actual
number that is not a function of the input size n. You should be able to compute the number of
possible values for each variable at compile time.
For example, the following algorithm determines whether a given string in Σ = {0, 1}
contains the substring 11.
Contains11(w[1 .. n]):
found ← False
for i ← 1 to n
if i = 1
last2 ← w[1]
else
last2 ← w[i − 1] · w[i]
if last2 = 11
found ← True
return found
Aside from the loop index, this algorithm has exactly two variables.
7
Models of Computation Lecture 3: Finite-State Machines [Sp’18]
• A boolean flag found indicating whether we have seen the substring 11. This variable has
exactly two possible values: True and False.
• A string last2 containing the last (up to) three symbols we have read so far. This variable
has exactly 7 possible values: ", 0, 1, 00, 01, 10, and 11.
Thus, altogether, the algorithm can be in at most 2 × 7 = 14 possible states, one for each possible
pair (found, last2). Thus, we can encode the behavior of Contains11 as a DFA with fourteen
states, where the start state is (False, ") and the accepting states are all seven states of the form
(True, ∗). The transition function is described in the following table (split into two parts to save
space):
For example, given the input string 1001011100, this DFA performs the following sequence of
transitions and then accepts.
1 0 0 1
(False, ") −→ (False, 1) −→ (False, 10) −→ (False, 00) −→
0 1 1
(False, 01) −→ (False, 10) −→ (False, 01) −→
1 0 0
(True, 11) −→ (True, 11) −→ (True, 10) −→ (True, 00)
You can probably guess that the brute-force DFA we just constructed has considerably more states
than necessary, especially after seeing its transition graph:
0 0 1 1 0 0
1 0 0 1
0 1 1 1 1 0 0
0 1
For example, the state (False, 11) has no incoming transitions, so we can just delete it. (This
state would indicate that we’ve never read 11, but the last two symbols we read were 11, which
8
Models of Computation Lecture 3: Finite-State Machines [Sp’18]
is impossible!) More significantly, we don’t need actually to remember both of the last two
symbols, but only the penultimate symbol, because the last symbol is the one we’re currently
reading. This observation allows us to reduce the number of states from fourteen to only six.
1
F,1 1 Τ,1
1 1
F,ε 0 1 0 1 Τ,ε
0 0
F,0 Τ,0
0 0
A less brute-force DFA for strings containing the substring 11
But even this DFA has more states than necessary. Once the flag part of the state is set to
True, we know the machine will eventually accept, so we might as well merge all the accepting
states together. More subtly, because both transitions out of (False, 0) and (False, ") lead to the
same states, we can merge those two states together as well. After all these optimizations, we
obtain the following DFA with just three states:
• The start state, which indicates that the machine has not read the substring 11 and did
not just read the symbol 1.
• An intermediate state, which indicates that the machine has not read the substring 11 but
just read the symbol 1.
• A unique accept state, which indicates that the machine has read the substring 11.
This is the smallest possible DFA for this language.
0 0,1
0
1
1
A minimal DFA for superstrings of 11
While it is important not to use an excessive number of states when we design DFAs—too
many states makes a DFA hard to understand—there is really no point in trying to reduce DFAs
by hand to the absolute minimum number of states. Clarity is much more important than brevity
(especially in this class), and DFAs with too few states can also be hard to understand. At the end
of this note, I’ll describe an efficient algorithm that automatically transforms any given DFA into
an equivalent DFA with the fewest possible states.
9
Models of Computation Lecture 3: Finite-State Machines [Sp’18]
• The states of the new DFA are all ordered pairs (p, q), where p is a state in M00 and q is a
state in M11 .
• The start state of the new DFA is the pair (s, s0 ), where s is the start state of M00 and s0 is
the start state of M11 .
a
• The new DFA includes the transition (p, q) −→ (p0 , q0 ) if and only if M00 contains the
a a
transition p −→ p0 and M11 contains the transition q −→ q0 .
• Finally, (p, q) is an accepting state of the new DFA if and only if p is an accepting state in
M00 and q is an accepting state in M11 .
The resulting nine-state DFA is shown on the next page, with the two factor DFAs M00 and
M11 shown in gray for reference. (The state (a, a) can be removed, because it has no incoming
transition, but let’s not worry about that now.)
0 0,1
0
s a 1 b
1
1
1 s s,s 1 s,a 1 s,b
1 0 0 0 0 1
1 1
0 0 0 0
0
0,1 b b,s b,a 1 b,b
1
0 0,1
Building a DFA for the language of strings containing both 00 and 11.
More generally, let M1 = (Σ, Q 1 , δ1 , s1 , A1 ) be an arbitrary DFA that accepts some language L1 ,
and let M2 = (Σ, Q 2 , δ2 , s2 , A2 ) be an arbitrary DFA that accepts some language L2 (over the
same alphabet Σ). We can construct a third DFA M = (Σ, Q, δ, s, A) that accepts the intersection
language L1 ∩ L2 as follows.
Q := Q 1 × Q 2 = (p, q) p ∈ Q 1 and q ∈ Q 2
δ((p, q), a) := δ1 (p, a), δ2 (q, a)
s := (s1 , s2 )
A := A1 × A2 = (p, q) p ∈ A1 and q ∈ A2
To convince ourselves that this product construction is actually correct, let’s consider the
extended transition function δ∗ : (Q × Q0 ) × Σ∗ → (Q × Q0 ), which acts on strings instead of
individual symbols. Recall that this function is defined recursively as follows:
(
(p, q) if w = ",
δ∗ (p, q), w :=
∗
δ δ((p, q), a), x if w = a x.
This function behaves exactly as we should expect:
10
Models of Computation Lecture 3: Finite-State Machines [Sp’18]
Lemma 3.1. δ∗ ((p, q), w) = δ1∗ (p, w), δ2∗ (q, w) for any string w.
Proof: Let w be an arbitrary string. Assume δ∗ ((p, q), x) = δ1∗ (p, x), δ2∗ (q, x) for every string x
In both cases, we conclude that δ∗ ((p, q), w) = δ1∗ (p, w), δ2∗ (q, w) .
An immediate consequence of this lemma is that for every string w, we have δ∗ (s, w) ∈ A if
and only if both δ1∗ (s1 , w) ∈ A1 and δ2∗ (s2 , w) ∈ A2 . In other words, M accepts w if and only if
both M1 accepts w and M2 accept w, as required.
As usual, this construction technique does not necessarily yield minimal DFAs. For example,
in our first example of a product DFA, illustrated above, the central state (a, a) cannot be reached
by any other state and is therefore redundant. Whatever.
Similar product constructions can be used to build DFAs that accept any other boolean
combination of languages; in fact, the only part of the construction that changes is the choice of
accepting states. For example:
• To accept the union L1 ∪ L2 , define A = (p, q) p ∈ A1 or q ∈ A2 .
• To accept the difference L1 \ L2 , define A = (p, q) p ∈ A1 but q 6∈ A2 .
• To accept the symmetric difference L1 ⊕ L2 , define A = (p, q) p ∈ A1 xor q ∈ A2 .
Examples of these constructions are shown on the next page.
Moreover, by cascading this product construction, we can construct DFAs that accept arbitrary
boolean combinations of arbitrary finite collections of regular languages.
We call a language automatic if it is the language of some finite state machine. Our product
construction examples let us prove that the set of automatic languages is closed under simple
boolean operations.
11
Models of Computation Lecture 3: Finite-State Machines [Sp’18]
1 1 1
s,s 1 s,a 1 s,b s,s 1 s,a 1 s,b s,s 1 s,a 1 s,b
0 0 0 1 0 0 0 1 0 0 0 1
1 1 1 1 1 1
0 0 0 0 0 0 0 0 0
0 0 0
b,s b,a 1 b,b b,s b,a 1 b,b b,s b,a 1 b,b
1 1 1
0 0,1 0 0,1 0 0,1
(a) (b) (c)
DFAs for (a) strings that contain 00 or 11, (b) strings that contain either 00 or 11 but not both, and (c) strings that
contain 11 if they contain 00. These DFAs are identical except for their choices of accepting states.
Theorem 3.2. Let L and L 0 be arbitrary automatic languages over an arbitrary alphabet Σ.
• L = Σ∗ \ L is automatic.
• L ∪ L 0 is automatic.
• L ∩ L 0 is automatic.
• L \ L 0 is automatic.
• L ⊕ L 0 is automatic.
Eager students may have noticed that a Google search for the phrase “automatic language”
turns up no results that are relevant for this class, except perhaps this lecture note. That’s
because “automatic” is just a synonym for “regular”! This equivalence was first observed by
Stephen Kleene (the inventor of regular expressions) in 1956.
Theorem 3.3 (Kleene). For any regular expression R, there is a DFA M such that L(R) = L(M ).
For any DFA M , there is a regular expression R such that L(M ) = L(R).
Unfortunately, we don’t yet have all the tools we need to prove Kleene’s theorem; we’ll
return to the proof in the next lecture note, after we have introduced nondeterministic finite-state
machines. The proof is actually constructive—there are explicit algorithms that transform
arbitrary DFAs into equivalent regular expressions and vice versa.3
This equivalence between regular and automatic languages implies that the set of regular
languages is also closed under simple boolean operations. The union of two regular languages
is regular by definition, but it’s much less obvious that every boolean combination of regular
languages can also be described by regular expressions.
Corollary 3.4. Let L and L 0 be arbitrary regular languages over an arbitrary alphabet Σ.
• L = Σ∗ \ L is regular.
• L ∩ L 0 is regular.
• L \ L 0 is regular.
• L ⊕ L 0 is regular.
Conversely, because concatenations and Kleene closures of regular languages are regular by
definition, we can immediately conclude that concatenations and Kleene closures of automatic
languages are automatic.
3These conversion algorithms run in exponential time in the worst case, but that’s unavoidable. There are regular
languages whose smallest accepting DFA is exponentially larger than their smallest regular expression, and there are
regular languages whose smallest regular expression is exponentially larger than their smallest accepting DFA.
12
Models of Computation Lecture 3: Finite-State Machines [Sp’18]
These results give us several options to prove that a given languages is regular or automatic.
We can either (1) build a regular expression that describes the language, (2) build a DFA that
accepts the language, or (3) build the language from simpler pieces from other regular/automatic
languages. (Later we’ll see a fourth option, and possibly even a fifth.)
Perhaps the single most important feature of DFAs is that they have no memory other than the
current state. Once a DFA enters a particular state, all future transitions depend only on that
state and future input symbols; past input symbols are simply forgotten.
For example, consider our very first DFA, which accepts the binary representations of integers
divisible by 5.
0
1 3
1 1 0
0 0 0 1 4 1
1 0
2
DFA accepting binary multiples of 5.
The strings 0010 and 11011 both lead this DFA to state 2, although they follow different
transitions to get there. Thus, for any string z, the strings 0010z and 11011z also lead to the
same state in this DFA. In particular, 0010z leads to the accepting state if and only if 11011z
leads to the accepting state. It follows that 0010z is divisible by 5 if and only if 11011z is
divisible by 5.
More generally, any DFA M = (Σ, Q, s, A, δ) defines an equivalence relation over Σ∗ , where
two strings x and y are equivalent if and only if they lead to the same state, or more formally, if
δ∗ (s, x) = δ∗ (s, y). If x and y are equivalent strings, then for any string z, the strings xz and
yz are also equivalent. In particular, M accepts xz if and only if M accepts yz. Thus, if L is
the language accepted by M , then xz ∈ L if and only if yz ∈ L. In short, if the machine can’t
distinguish between x and y, then the language can’t distinguish between xz and yz for any
suffix z.
Now let’s turn the previous argument on its head. Let L be an arbitrary language, and let x
and y be arbitrary strings. A distinguishing suffix for x and y (with respect to L) is a third
string z such that exactly one of the strings xz and yz is in L. If x and y have a distinguishing
13
Models of Computation Lecture 3: Finite-State Machines [Sp’18]
suffix z, then in any DFA that accepts L, the strings xz and yz must lead to different states, and
therefore the strings x and y must lead to different states!
For example, let L5 denote the the set of all strings over {0, 1} that represent multiples of 5
in binary. Then the strings x = 01 and y = 0011 are distinguished by the suffix z = 01:
It follows that in every DFA that accepts L5 , the strings 01 and 0011 lead to different states.
Moreover, since neither 01 nor 0011 belong to L5 , every DFA that accepts L5 must have at least
two non-accepting states, and therefore at least three states overall.
A fooling set for a language L is a set F of strings such that every pair of strings in F has a
distinguishing suffix. For example, F = {0, 1, 10, 11, 100} is a fooling set for the language L5 of
binary multiples of 5, because each pair of strings in F has a distinguishing suffix:
• 0 distinguishes 0 and 1;
• 0 distinguishes 0 and 10;
• 0 distinguishes 0 and 11;
• 0 distinguishes 0 and 100;
• 1 distinguishes 1 and 10;
• 01 distinguishes 1 and 11;
• 01 distinguishes 1 and 100;
• 1 distinguishes 10 and 11;
• 1 distinguishes 10 and 100;
• 11 distinguishes 11 and 100.
Each of these five strings leads to a different state, for any DFA M that accepts L5 . Thus,
every DFA that accepts the language L5 has at least five states. And hey, look, we already have a
DFA for L5 with five states, so that’s the best we can do!
More generally, for any language L, and any fooling set F for L, every DFA that accepts L must
have at least |F | states. In particular, if the fooling set F is infinite, then every DFA that accepts L
must have an infinite number of states. But there’s no such thing as a finite-state machine with
an infinite number of states!
This is arguably both the simplest and most powerful method for proving that a language is
non-regular. Here are a few canonical examples of the fooling-set technique in action.
14
Models of Computation Lecture 3: Finite-State Machines [Sp’18]
Proof: Let F denote the set 0∗ 1, and let x and y be arbitrary distinct strings in F . Then we
must have x = 0i 1 and y = 0 j 1 for some integers i 6= j. The suffix z = 10i distinguishes x
and y, because xz = 0i 110i ∈ L, but yz = 0i 110 j 6∈ L. We conclude that F is a fooling set for L.
Because F is infinite, L cannot be regular.
n
Lemma 3.8. The language L = {02 | n ≥ 0} is not regular.
i
Proof (F = L): Let x and y be arbitrary distinct strings in L. Then we must have x = 02
j i
and y = 02 for some integers i 6= j. The suffix z = 02 distinguishes x and y, because
xz = 02 +2 = 02 ∈ L, but yz = 02 +2 6∈ L. We conclude that L itself is a fooling set for L.
i i i+1 i j
Proof (F = 0∗ ): Let x and y be arbitrary distinct strings in 0∗ . Then we must have x = 0i and
y = 0 j for some integers i 6= j; without loss of generality, assume i < j. Let k be any positive
k k k
integer such that 2k > j. Consider the suffix z = 02 −i . We have xz = 0i+(2 −i) = 02 ∈ L, but
k k
yz = 0 j+(2 −i) = 02 −i+ j 6∈ L, because
Thus, z is a distinguishing suffix for x and y. We conclude that 0∗ is a fooling set for L. Because
L is infinite, L cannot be regular.
Proof (F = 0∗ again): Let x and y be arbitrary distinct strings in 0∗ . Then we must have x = 0i
and y = 0 j for some integers i 6= j; without loss of generality, assume i < j. Let k be any positive
k k k
integer such that 2k−1 > j. Consider the suffix z = 02 − j . We have xz = 0i+(2 − j) = 02 − j+i 6∈ L,
because
2k−1 < 2k − 2k−1 + i < 2k − j + i < 2k .
k k
On the other hand, yz = 0 j+(2 − j) = 02 ∈ L. Thus, z is a distinguishing suffix for x and y. We
conclude that 0∗ is a fooling set for L. Because L is infinite, L cannot be regular.
The previous examples show the flexibility of this proof technique; a single non-regular
language can have many different infinite fooling sets,⁴ and each pair of strings in any fooling
set can have many different distinguishing suffixes. Fortunately, we only have to find one infinite
set F and one distinguishing suffix for each pair of strings in F .
15
Models of Computation Lecture 3: Finite-State Machines [Sp’18]
Proof (F = 0∗ ): Again, we use 0∗ as our fooling set, but but the actual argument is somewhat
more complicated than in our earlier examples.
Let x and y be arbitrary distinct strings in 0∗ . Then we must have x = 0i and y = 0 j for
some integers i 6= j; without loss of generality, assume that i < j. Let p be any prime number
larger than i. Because p + 0( j − i) is prime and p + p( j − i) > p is not, there must be a positive
integer k ≤ p such that p + (k − 1)( j − i) is prime but p + k( j − i) is not. Then I claim that the
suffix z = 0 p+(k−1) j−ki distinguishes x and y:
(Because i < j and i < p, the suffix 0 p+(k−1) j−ki = 0(p−i)+(k−1)( j−i) has positive length and
therefore actually exists!) We conclude that 0∗ is indeed a fooling set for L, which implies that L
is not regular.
Proof (F = L): Let x and y be arbitrary distinct strings in L. Then we must have x = 0 p and
y = 0q for some primes p 6= q; without loss of generality, assume p < q.
Now consider strings of the form 0 p+k(q−p) . Because p +0(q − p) is prime and p + p(q − p) > p
is not prime, there must be a non-negative integer k < p such that p + k(p − q) is prime but
p + (k + 1)(p − q) is not prime. I claim that the suffix z = 0k(q−p) distinguishes x and y:
We conclude that L is a fooling set for itself!! Because L is infinite, L cannot be regular!
Obviously the most difficult part of this technique is coming up with an appropriate fooling
set. Fortunately, most languages L—in particular, almost all languages that students are asked to
prove non-regular on homeworks or exams—fall into one of two categories:
• Some simple regular language like 0∗ or 10∗ 1 or (01)∗ is a fooling set for L. In particular,
the fooling set is a regular language with one Kleene star and no +.
The most important point to remember is that you choose the fooling set F , and you can use that
fooling set to effectively impose additional structure on the language L.
ÆÆÆ
16
Models of Computation Lecture 3: Finite-State Machines [Sp’18]
I’m not sure yet how to express this effectively, but here is some more intuition about
choosing fooling sets and distinguishing suffixes.
As a sanity check, try to write an algorithm to recognize strings in L , as described at the
start of this note, where the only variable that can take on an unbounded number of values
is the loop index i . (I should probably rewrite that template as a while-loop or tail recursion,
but anyway. . . .) If you succeed, the language is regular. But if you fail, it’s probably because
there are counters of string variables that you can’t get rid of. One of those unavoidable
counters is the basis for your fooling set.
For example, any algorithm that recognizes the language {0n 1n 2n | n ≥ 0} “obviously”
has to count 0s and 1s in the input string. (We can avoid counting 2s by decrementing the 0
counter.) Because the 0s come first in the string, this intuition suggests using strings of the
form 0n as our fooling set and matching strings of the form 1n 2n as distinguishing suffixes.
(This is a rare example of an “obvious” fact that is actually true.)
It’s also important to remember that when you choose the fooling set, you can effectively
impose additional structure that isn’t present in the language already. For example, to prove
that the language L = {w ∈ (0 + 1)∗ | #(0, w) = (1, w)} is not regular, we can use strings of
the form 0n as our fooling set and matching strings of the form 1n as distinguishing suffixes,
exactly as we did for {0n 1n | n ≥ 0}. The fact that L contains strings that start with 1 is
irrelevant. There may be more equivalence classes that our proof doesn’t find, but since we
found an infinite set of equivalence class, we don’t care.
At some level, this fooling set proof is implicitly considering the simpler language L ∩ 0∗ 1∗ =
{0 1 | n ≥ 0}. If L were regular, then L ∩ 0∗ 1∗ would also be regular, because regular
n n
?
3.9 The Myhill-Nerode Theorem
The fooling set technique implies a necessary condition for a language to be accepted by a
DFA—the language must have no infinite fooling sets. In fact, this condition is also sufficient.
The following powerful theorem was first proved by Anil Nerode in 1958, strengthening a 1957
result of John Myhill.⁵ We write x ≡ L y if xz ∈ L ⇐⇒ yz ∈ L for all strings z.
The Myhill-Nerode Theorem. For any language L, the following are equal:
(a) the minimum number of states in a DFA that accepts L,
(b) the maximum size of a fooling set for L, and
(c) the number of equivalence classes of ≡ L .
In particular, L is accepted by a DFA if and only if every fooling set for L is finite.
17
Models of Computation Lecture 3: Finite-State Machines [Sp’18]
let [w] denote its equivalence class. We define a DFA M≡ = (Σ, Q, s, A, δ) as follows:
Q := [w] w ∈ Σ∗
s := ["]
A := [w] w ∈ L
δ([w], a) := [w • a]
We claim that this DFA accepts the language L; this claim completes the proof of the theorem.
But before we can prove anything about this DFA, we first need to verify that it is actually
well-defined. Let x and y be two strings such that [x] = [ y]. By definition of L-equivalence,
for any string z, we have xz ∈ L if and only if yz ∈ L. It immediately follows that for any
symbol a ∈ Σ and any string z 0 , we have x az 0 ∈ L if and only if y az 0 ∈ L. Thus, by definition of
L-equivalence, we have [x a] = [ y a] for every symbol a ∈ Σ. We conclude that the function δ is
indeed well-defined.
An easy inductive proof implies that δ∗ (["], x) = [x] for every string x. Thus, M accepts
string x if and only if [x] = [w] for some string w ∈ L. But if [x] = [w], then by definition
(setting z = "), we have x ∈ L if and only if w ∈ L. So M accepts x if and only if x ∈ L. In other
words, M accepts L, as claimed, so the proof is complete.
?
3.10 Minimal Automata
Given a DFA M = (Σ, Q, s, A, δ), suppose we want to find another DFA M 0 = (Σ, Q0 , s0 , A0 , δ0 ) with
the fewest possible states that accepts the same language. In this final section, we describe
an efficient algorithm to minimize DFAs, first described (in slightly different form) by Edward
Moore in 1956. We analyze the running time of Moore’s in terms of two parameters: n = |Q| and
σ = |Σ|.
In the preprocessing phase, we find and remove any states that cannot be reached from the
start state s; this filtering can be performed in O(nσ) time using any graph traversal algorithm.
So from now on we assume that all states are reachable from s.
Now we recursively define two states p and q in the remaining DFA to be distingushable,
written p 6∼ q , if at least one of the following conditions holds:
• p ∈ A and q 6∈ A,
• p 6∈ A and q ∈ A, or
• δ(p, a) 6∼ δ(q, a) for some a ∈ Σ.
Equivalently, p 6∼ q if and only if there is a string z such that exactly one of the states δ∗ (p, z)
and δ∗ (q, z) is accepting. (Sound familiar?) Intuitively, the main algorithm assumes that all
states are equivalent until proven otherwise, and then repeatedly looks for state pairs that can be
proved distinguishable.
The main algorithm maintains a two-dimensional table, indexed by the states, where
Dist[p, q] = True indicates that we have proved states p and q are distinguishable. Initially, for all
states p and q, we set Dist[p, q] ← True if p ∈ A and q 6∈ A or vice versa, and Dist[p, q] = False
otherwise. Then we repeatedly consider each pair of states and each symbol to find more
distinguishable pairs, until we make a complete pass through the table without modifying it. The
table-filling algorithm can be summarized as follows:
18
Models of Computation Lecture 3: Finite-State Machines [Sp’18]
MinDFATable(Σ, Q, s, A, δ):
for all p ∈ Q
for all q ∈ Q
if (p ∈ A and q 6∈ A) or (p 6∈ A and q ∈ A)
Dist[p, q] ← True
else
Dist[p, q] ← False
notdone ← True
while notdone
notdone ← False
for all p ∈ Q
for all q ∈ Q
if Dist[p, q] = False
for all a ∈ Σ
if Dist[δ(p, a), δ(q, a)]
Dist[p, q] ← True
notdone ← True
return Dist
The algorithm must eventually halt, because there are only a finite number of entries in the
table that can be marked. In fact, the main loop is guaranteed to terminate after at most n
iterations, which implies that the entire algorithm runs in O(σn 3 ) time. Once the table is filled,⁶
any two states p and q such that Dist(p, q) = False are equivalent and can be merged into a
single state. The remaining details of constructing the minimized DFA are straightforward.
ÆÆÆ Need to prove that the main loop terminates in at most n iterations.
With more care, Moore’s minimization algorithm can be modified to run in O(σn2 ) time. A
faster DFA minimization algorithm, due to John Hopcroft, runs in O(σn log n) time.
Example
To get a better idea how this algorithm works, let’s visualize its execution on our earlier brute-force
DFA for strings containing the substring 11. This DFA has four unreachable states: (False, 11),
(True, "), (True, 0), and (True, 1). We remove these states, and relabel the remaining states for
easier reference. (In an actual implementation, the states would almost certainly be represented
by indices into an array anyway, not by mnemonic labels.)
The main algorithm initializes (the bottom half of) a 10×10 table as follows. (In the following
figures, cells marked × have value True and blank cells have value False.)
⁶More experienced readers should be enraged by the mere suggestion that any algorithm merely fills in a table, as
opposed to evaluating a recurrence. This algorithm is no exception. Consider the boolean function Dist(p, q, k), which
equals True if and only if p and q can be distinguished by some string of length at most k. This function obeys the
following recurrence:
(p ∈ A) ⊕ (q ∈ A) if k = 0,
Dist(p, q, k) = _
Dist(p, q, k − 1) ∨ Dist δ(p, a), δ(q, a), k − 1 otherwise.
a∈Σ
Moore’s “table-filling” algorithm is just a space-efficient dynamic programming algorithm to evaluate this recurrence.
19
Models of Computation Lecture 3: Finite-State Machines [Sp’18]
0 1 2
0 0 1 1
1 5 9 0 8
1 0 0
0
0 01 1 1 1
0 3 1 4 7 1 6 0
Our brute-force DFA for strings containing the substring 11, after removing all four unreachable states
0 1 2 3 4 5 6 7 8
1
2
3
4
5
6 × × × × × ×
7 × × × × × ×
8 × × × × × ×
9 × × × × × ×
In the first iteration of the main loop, the algorithm discovers several distinguishable pairs
of states. For example, the algorithm sets Dist[0, 2] ← True because Dist[δ(0, 1), δ(2, 1)] =
Dist[2, 9] = True. After the iteration ends, the table looks like this:
0 1 2 3 4 5 6 7 8
1
2 × ×
3 ×
4 × × ×
5 × ×
6 × × × × × ×
7 × × × × × ×
8 × × × × × ×
9 × × × × × ×
The second iteration of the while loop makes no further changes to the table—We got lucky!—so
the algorithm terminates.
The final table implies that the 10 states of our DFA fall into exactly three equivalence classes:
{0, 1, 3, 5}, {2, 4}, and {6, 7, 8, 9}. Replacing each equivalence class with a single state gives us
the three-state DFA that we already discovered.
Exercises
1. For each of the following languages in {0, 1}∗ , describe a deterministic finite-state machine
that accepts that language. There are infinitely many correct answers for each language.
“Describe” does not necessarily mean “draw”.
20
Models of Computation Lecture 3: Finite-State Machines [Sp’18]
0 1 2
0 1
0
1
1 5 9 0 8
1 0 1 0
0 1 0
1
0 1
0 3 1 4 7 1 6 0
0 0,1
1
1
0
Equivalence classes of states in our DFA, and the resulting minimal equivalent DFA.
21
Models of Computation Lecture 3: Finite-State Machines [Sp’18]
? (t) Strings w such that F#(10,w) mod 10 = 4, where #(10, w) denotes the number of
times 10 appears as a substring of w, and as usual Fn is the nth Fibonacci number:
0
if n = 0
Fn = 1 if n = 1
+F
F
n−1 n−2otherwise
Æ
(u) Strings w such that F#(1···0,w) mod 10 = 4, where #(1 · · · 0, w) denotes the number of
times 10 appears as a subsequence of w, and as usual Fn is the nth Fibonacci number:
0
if n = 0
Fn = 1 if n = 1
n−1 + Fn−2
F otherwise
22
Models of Computation Lecture 3: Finite-State Machines [Sp’18]
(v) Strings of the form w1 #w2 # · · · #w n for some n ≥ 2, where w i ∈ {0, 1}∗ for every
index i, and w i = w j for some indices i 6= j.
(w) The set of all palindromes in (0 + 1)∗ whose length is divisible by 7.
(x) {w ∈ (0 + 1)∗ | w is the binary representation of a perfect square}
Æ
(y) {w ∈ (0 + 1)∗ | w is the binary representation of a prime number}
4. For each of the following languages over the alphabet Σ = {0, 1}, either prove that the
language is regular (by constructing an appropriate DFA or regular expression) or prove
that the language is not regular (using fooling sets). Recall that Σ+ denotes the set of all
nonempty strings over Σ. [Hint: Believe it or not, most of these languages are actually
regular.]
(a) 0n w1n w ∈ Σ∗ and n ≥ 0
(b) 0n 1n w w ∈ Σ∗ and n ≥ 0
(c) w0n 1n x w, x ∈ Σ∗ and n ≥ 0
(d) 0n w1n x w, x ∈ Σ∗ and n ≥ 0
(e) 0n w1 x 0n w, x ∈ Σ∗ and n ≥ 0
(i) wx w w, x ∈ Σ+
(j) wx wR w, x ∈ Σ+
(k) wwx w, x ∈ Σ+
(l) wwR x w, x ∈ Σ+
(m) wx w y w, x, y ∈ Σ+
(n) wx wR y w, x, y ∈ Σ+
(o) x ww y w, x, y ∈ Σ+
(p) x wwR y w, x, y ∈ Σ+
(q) wx x w w, x ∈ Σ+
? (r) wx wR x w, x ∈ Σ+
23
Models of Computation Lecture 3: Finite-State Machines [Sp’18]
(a) Suppose for any two distinct strings x, y ∈ F , there is a string w ∈ Σ∗ such that
wx ∈ L and w y 6∈ L. (We can reasonably call w a distinguishing prefix for x and y.)
Prove that L cannot be regular. [Hint: The reversal of a regular language is regular.]
? (b) Suppose for any two distinct strings x, y ∈ F , there are two (possibly equal) strings
w, z ∈ Σ∗ such that wxz ∈ L and w yz 6∈ L. Prove that L cannot be regular.
24