Book
Book
CS, UIUC
1
Department of Computer Science; University of Illinois; 201 N. Goodwin Avenue; Urbana, IL, 61801, USA;
sariel@uiuc.edu; http://www.uiuc.edu/~sariel/.
Contents
Contents 2
Preface 12
Preface 14
I Lectures 15
1 Lecture 1: Overview and Administrivia 17
1.1 Course overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.2 Necessary Administrivia . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2
4 Lecture 4: Regular Expressions and Product Construction 34
4.1 Product Construction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
4.1.1 Product Construction: Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
4.2 Product Construction: Formal construction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.3 Operations on languages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
4.4 Regular Expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
4.4.1 More interesting examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3
9.3.5 A note on finite languages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
9.4 Irregularity via closure properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
9.4.1 Being careful in using closure arguments . . . . . . . . . . . . . . . . . . . . . . . . . . 70
4
15 Leftover: CFG to PDA, and Alternative proof of CNF effectiveness 101
15.1 PDA– Pushing multiple symbols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
15.2 CFG to PDA conversion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
15.3 Alternative proof of CNF effectiveness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
5
22 Lecture 18: More on Turing Machines 136
22.1 A Turing machine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
22.2 Turing machine configurations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
22.3 The languages recognized by Turing machines . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
22.4 Variations on Turing Machines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
22.4.1 Doubly infinite tape . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
22.4.2 Allow the head to stay in the same place . . . . . . . . . . . . . . . . . . . . . . . . . 139
22.4.3 Non-determinism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
22.4.4 Multi-tape . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
22.5 Multiple tapes do not add any power . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
24 Lecture 20: More decidable problems, and simulating TM and “real” computers 147
24.1 Review: decidability facts for regular languages . . . . . . . . . . . . . . . . . . . . . . . . . . 147
24.2 Problems involving context-free languages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
24.2.1 Context-free languages are TM decidable . . . . . . . . . . . . . . . . . . . . . . . . . 148
24.2.2 Is a word in a CFG? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
24.2.3 Is a CFG empty? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
24.2.4 Undecidable problems for CFGs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
24.3 Simulating a real computer with a Turing machine . . . . . . . . . . . . . . . . . . . . . . . . 149
24.4 Turing machine simulating a Turing machine . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
24.4.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
24.4.2 The universal Turing machine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
6
27 Lecture 23: Rice Theorem and Turing machine behavior properties 163
27.1 Outline & Previous lecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
27.1.1 Forward outline of lectures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
27.1.2 Recap of previous class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
27.2 Rice’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
27.2.1 Another Example - The language L3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
27.2.2 Rice’s theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
27.3 TM decidability by behavior . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
27.3.1 TM behavior properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
27.3.2 A decidable behavior property . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
27.3.3 An undecidable behavior property . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
27.4 More examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
27.4.1 The language LUIUC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
27.4.2 The language Halt_Empty_TM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
27.4.3 The language L111 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168
29 Lecture 25: Linear Bounded Automata and Undecidability for CFGs 173
29.1 Linear bounded automatas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
29.1.1 LBA halting is decidable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
29.1.2 LBAs with empty language are undecidable . . . . . . . . . . . . . . . . . . . . . . . . 174
29.2 On undecidable problems for context free grammars . . . . . . . . . . . . . . . . . . . . . . . 177
29.2.1 TM consecutive configuration pairs is a CFG . . . . . . . . . . . . . . . . . . . . . . . . 177
29.2.2 The language of a context-free grammar generates all strings is undecidable . . . . . . 178
29.2.3 CFG equivalence is undecidable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180
29.3 Avoiding PDAs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
7
32 Review of topics covered 198
32.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198
32.2 The players . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198
32.3 Regular languages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199
32.4 Context-free Languages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201
32.5 Turing machines and computability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202
32.5.1 Turing machines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202
32.5.2 Reductions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203
32.5.3 Other undecidability problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204
32.6 Summary of closure properties and decision problems . . . . . . . . . . . . . . . . . . . . . . . 204
32.6.1 Closure Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204
32.6.2 Decision problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205
II Discussions 206
8
37 Discussion 5: More on non-deterministic finite automatas 225
37.1 Non-regular languages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225
37.1.1 L(0n 1n ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225
37.1.2 L(#a + #b = #c) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225
37.1.3 Not too many as please . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226
37.1.4 A Trick Example (Optional) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226
9
47 Discussion 15: Review 246
47.1 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 246
IV Homeworks 299
50 Spring 2009 301
50.1 Homework 1: Problem Set 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . 301
50.2 Homework 2: DFAs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303
50.3 Homework 3: DFAs II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 304
50.4 Homework 4: NFAs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 306
50.5 Homework 5: On non-regularity. . . . . . . . . . . . . . . . . . . . . . . . 309
50.6 Homework 6: Context-free grammars. . . . . . . . . . . . . . . . . . . 311
50.7 Homework 7: Context-free grammars II . . . . . . . . . . . . . . . . 313
50.8 Homework 8: Recursive Automatas . . . . . . . . . . . . . . . . . . . . 315
50.9 Homework 9: Turing Machines . . . . . . . . . . . . . . . . . . . . . . . . 316
50.10Homework 10: Turing Machines II . . . . . . . . . . . . . . . . . . . . . 316
50.11Homework 11: Enumerators and Diagonalization . . . . . . . . 318
50.12Homework 12: Preparation for Final . . . . . . . . . . . . . . . . . . . 320
10
51.10Homework 10: Turing Machines . . . . . . . . . . . . . . . . . . . . . . . 334
51.11Homework 11: Turing Machines . . . . . . . . . . . . . . . . . . . . . . . 335
51.12Homework 12: Enumerators . . . . . . . . . . . . . . . . . . . . . . . . . . . 338
51.13Homework 13: Enumerators . . . . . . . . . . . . . . . . . . . . . . . . . . . 338
51.14Homework 14: Dovetailing, etc. . . . . . . . . . . . . . . . . . . . . . . . . 339
Bibliography 341
Index 342
11
Preface – Spring 2009
Finally: It was stated at the outset, that this system would not be here, and at once, perfected. You cannot but
plainly see that I have kept my word. But I now leave my cetological System standing thus unfinished, even as
the great Cathedral of Cologne was left, with the crane still standing upon the top of the uncompleted tower.
For small erections may be finished by their first architects; grand ones, true ones, ever leave the copestone to
posterity. God keep me from ever completing anything. This whole book is but a draft - nay, but the draft of a
draft. Oh, Time, Strength, Cash, and Patience!
– Moby Dick, Herman Melville
This manuscript is a collection of class notes used in teaching CS 373 (Theory of Computation), in the
spring of 2009, in the Computer Science department in UIUC. The instructors were Sariel Har-Peled and
Madhusudan Parthasarathy. They are based on older class notes – see second preface for details.
This class notes diverse from previous semesters in two main points:
(A) Regular languages pumping lemma. Although we still taught the pumping lemma for regular
languages, we did not expected the students to use it to proving languages are not regular. Instead, we
provided direct proofs that shoes that any automaton for these languages would require infinite number
of states. This leads to much simpler proofs than using the pumping lemma, and it seems the students
find them easier to understand. Naturally, we are not the first to come up with this idea, it is sometimes
referred to as the “technique of many states”.
The main problem with the pumping lemma is the large number of quantifiers involved in stating it.
They seem to make it harder for the student to use it.
(B) Recursive automatas. Instead of teaching PDAs, we used an alternative machine model of Recursive
automata (RA) for context-free languages. RAs are PDAs that do not manipulate the stack directly,
but only through the calling stack. For a discussion of this issue, see Chapter 20 (page 127).
This lead to various changes later in the course. In particular, the fact that the intersect of CFL language
and a regular language is still CFL, is proven directly on the grammar. Similarly, the proof that deciding
if a grammar generates all words is undecidable now follows by a simpler but different proof, see relevant
portion for details.
In particular, the alternative proof uses the fact that given two configurations of a TM written on top
of each other, then a DFA can verify that the top configuration yields the bottom configuration. This
is a cute observation that seems to be worthy of describing in class, and it leads naturally into the
Cool-Levin theorem proof.
What remains to be done. Students suggested that more examples would be useful to have in the class
notes. In future instances in the class it would be a good idea to allocate two lectures towards the end to
teach the Cook-Levin Theorem properly. A more algorithmic emphasize might be a good preparation for
later courses.
And of course, no class notes are prefect. These class notes can definitely be further improved.
Format. Every chapter corresponds to material covered in one lecture in the course. Every week we also
had a discussion section run by the TAs. The TAs also wrote (most of) the notes included for the discussion
section.
12
Acknowledgements
The chapters on recursive automata and the review chapter was written by Madhusudan Parthasarathy.
The TAs in the class (Reza Zamani, Aparna Sundar, and Micha Hadosh) provided considerable help in
writing the exercises, and their solutions, and we thank them for their valuable work.
For further acknowledgements, see the older preface.
Copyright
This work is licensed under the Creative Commons Attribution-Noncommercial 3.0 License. To view a
copy of this license, visit http://creativecommons.org/licenses/by-nc/3.0/ or send a letter to Creative
Commons, 171 Second Street, Suite 300, San Francisco, California, 94105, USA.
Sariel Har-Peled
May 2009, Urbana, IL. USA.
13
Preface - Spring 2008
This manuscript is a collection of class notes used in teaching CS 273 (Introduction to The Theory of
Computation), in the spring of 2008, in the Computer Science department in UIUC. The instructors were
Margaret Fleck and Sariel Har-Peled.
These class notes are an initial effort to have class notes covering the material taught in this class, and
is largely based on hand written class notes from previous semesters, and the book used in the class (Sipser
[Sip05]).
Quality. We do not consider these class notes to be perfect, and in fact, far from it. However, it is our
hope that people would improve these class notes in proceedings semesters, and bring them into acceptable
quality. From previous experience, it takes 3–4 iterations before the class notes reach acceptable quality.
Even getting the class notes to their current form required non-trivial amount of work.
Format. Every chapter corresponds to material covered in one lecture in the course. Every week we also
had a discussion section run by the TAs. The TAs also wrote (most of) the notes included for the discussion
section.
Why? We have no complaints about the book, but rather we prefer the form of class notes over the form
a book. Writing a class notes is also an effective (if somewhat time consuming) way to prepare for lecture.
And is usual at some points we preferred to present some material in our own way.
Ultimately, we hope that after several semesters of polishing these class notes they would be good enough
to replace the required text book in the class.
Acknowledgements
We had the benefit of interacting with several people on the work in this class notes. Other instructors
that taught this class and had contributed (directly or indirectly) to the material covered in this class are
Chandra Chekuri Madhusudan Parthasarathy, Lenny Pitt, and Mahesh Viswanathan.
In addition, the TAs in the class (Reza Zamani, James Lai, and Raman Sharykin) provided considerable
help in writing the exercises, and their solutions, and we thank them for their valuable work.
We would also like to thank the students in the class for their input, which helped in discovering numerous
typos and errors in the manuscript.
14
Part I
Lectures
15
Chapter 1
1. Theory of Computation.
• Build formal mathematical models of computation.
• Analyze the inherent capabilities and limitations of these models.
2. Course goals:
• Simple practical tools you can use in later courses, projects, etc. The course will provide you with
tools to model complicated systems and analyze them.
• Inherent limits of computers: problems that no computer can solve.
• Better fluency with formal mathematics (closely related to skill at debugging programs).
3. What is computable?
(a) check if a number n is prime
(b) compute the product of two numbers
(c) sort a list of numbers
(d) find the maximum number from a list
4. Computability, complexity, automata.
5. Example:
input n;
assume n>1;
while (n !=1) {
if (n is even)
n := n/2;
else
n := 3*n+1;
}
17
Does this program always stop? Not known.
Regular languages
9. Difference in scope
Illustrate with your favorite example from programming languages or natural language.
Illustrate with your favorite simple state machine, e.g. a vending machine.
11. Decidability:
18
• David Hilbert (1920’s) tries to formalize all of math and prove it correct
• Kurt Gödel (1931) shows that one can not prove consistency of a mathematical formalism having
non-trivial power.
14. It is mysterious and “cool” that some simple-looking problems are undecidable.
15. The proofs of undecidability are a bit abstract. Earlier parts of the course will help prepare you, so you
can understand the last part.
http://www.cs.uiuc.edu/class/sp09/cs373/{}.
• Prerequisites: CS 125, CS 173, CS 225 (or equivalents). Other experience can sometimes substitute
(e.g. advanced math). Speak to us if you are not sure.
• Vital to join the class newsgroup (details on web page). Carries important announcements, e.g. exam
times, hints and corrections on homeworks.
Especially see the Lectures page for schedule of topics, readings, quiz/exam dates.
• Homework 1 should be available on the class website. Due next Thursday. (Normally they will be
due Thursdays on 12:30, but next Monday is a holiday.) Browse chapter 0 and read section 1.1.
Normally, homeworks and readings will not be announced in class and you must watch the website and
newsgroups.
• Read and follow the homework format guidelines on the web page. Especially: each problem on a
separate sheet, your name on each problem, your section time (e.g. 10) in upper-right corner. This
makes a big difference grading and sorting graded homeworks.
• Course staff.
• Discussion sections. Office hours will be posted in the near future. Email and the newsgroup are always
an option. Please do not be shy about contacting us.
• Problem sets, exams, etc are common to all sections. It may be easier to start with your lecture and
discussion section instructors, but feel free to also talk to the rest of us.
• Sipser textbook: get a copy. We follow the textbook fairly closely. Our lecture notes only outline what
was covered and don’t duplicate the text. Used copies, international or first editions, etc are available
cheap through Amazon.
• Graded work:
19
(d) 25%: Homeworks and self-evaluations.
Worst homework will be dropped.
Self evaluations would be online quizes on the web.
(e) 5%: Attending discussion section.
• Late homeworks are not accepted, except in rare cases where you have a major excuse (e.g. serious
illness, family emergency, weather unsafe for travel).
• Homeworks can be done in groups of ≤ 3 students. Write their names under your own on your
homework. Also document any other major help you may have gotten. Each person turns in their own
write-up IN THEIR OWN WORDS.
• Doing homeworks is vital preparation for success on the exams. Getting help from your partners is
good, but don’t copy their solutions blindly. Make sure you understand the solutions.
• See the web pages for details of our cheating policy. First offense → zero on the exam or assignment
involved. Second offense or cheating on the final ⇒ fail the course. Please do not cheat.
• If you are not sure what is allowed, talk to us and/or document clearly what you did. That is enough
to ensure it is not “cheating” (though you might lose points).
• Bugs happen, on homeworks and even in the textbook and on exams. If you think you see a bug,
please bring it to our attention.
• Please tell us if you have any disabilities or other special circumstances that we should be aware of.
20
Chapter 2
This lecture covers material on strings and languages from Sipser chapter 0. Also chapter 1 up to (but
not including) the formal definition of computation (i.e. pages 31–40).
2.1.2 Strings
This section should be recapping stuff already seen in discussion section 1.
A string over an alphabet Σ is a finite sequence of characters from Σ.
Some sample strings with alphabet (say) Σ = {a, b, c} are abc, baba, and aaaabbbbccc.
The length of a string x is the number of characters in x, and it is denoted by |x|. Thus, the length of
the string w = abcdef is |w| = 6.
The empty string is denoted by , and it (of course) has length 0. The empty string is the string
containing zero characters in it.
The concatenation of two strings x and w is denoted by xw, and it is the string formed by the string
x followed by the string w. As a concrete example, consider x = cat, w = nip and the concatenated strings
xw = catnip and wx = nipcat.
Naturally, concatenating with the empty string results in no change in the string. Formally, for any string
x, we have that x = x. As such = .
For a string w, the string x is a substring of w if the string x appears contiguously in w.
As such, for w =abcdef
we have that bcd is a substring of w,
but ace is not a substring of w.
21
A string x is a suffix of w if its a substring of w appearing in the end of w. Similarly, y is a prefix of
w if y is a substring of w appearing in the beginning of w.
Definition 2.1.1 The string x is a prefix of a string w, if there exists a string z, such that w = xz.
Similarly, x is a substring of w if there exist strings y and z such that w = yxz.
2.1.3 Languages
A language is a set of strings. One special language is Σ∗ , which is the set of all possible strings generated
over the alphabet Σ∗ . For example, if
Namely, Σ∗ is the “full” language made of characters of Σ. Naturally, any language over Σ is going to be
a subset of Σ∗ .
Lexicographic ordering of a set of strings is an ordering of strings that have shorter strings first, and
sort the strings alphabetically within each length. Naturally, we assume that we have an order on the given
alphabet.
Thus, for Σ = {a, b}, the Lexicographic ordering of Σ∗ is
In words, L1 is the language of all strings made out of a, b that have even length.
Next, consider the following set
n o
L2 = x there is a w such that xw = illinois .
So L2 is the language made out of all prefixes of L2 . We can write L2 explicitly, but its tedious. Indeed,
22
Why should we care about languages?
Consider the language Lprimes that contains all strings over Σ = {0, 1, . . . , 9} which are prime numbers. If
we can build a fast computer program (or an automata) that can tell us whether a string s (i.e., a number)
is in Lprimes , then we decide if a number is prime or not. And this is a very useful program to have, since
most encryption schemes currently used by computers (i.e., RSA) rely on the ability to find very large prime
numbers.
Let us state it explicitly: The ability to decide if a word is in a spe-
cific language (like Lprimes ) is equivalent to performing a computational Yes
task (which might be extremely non-trivial). You can think about this
Input Program decide-
schematically, as a program that gets as input a number (i.e., string made ing if ihe input is
out of digits), and decides if it is prime or not. If the input is a prime a prime number. No
number, it outputs Yes and otherwise it outputs No. See figure on the
right.
qrej ∗
Here ∗ represents any possible character.
Notice key pieces of this machine: three states, q0 is the start state (arrow coming in), q1 is the final
state (double circle), transition arcs.
To run the machine, we start at the start state. On each input character, we follow the corresponding
arc. When we run out of input characters, we answer “yes” or “no”, depending on whether we are in the final
state.
The language of a machine M is the set of strings it accepts, written L(M ). In this case L(M ) =
{a, aa, ab, aaa, . . .}.
1 Here, we are considering simple programs that just read some input, and print out output, without fancy windows and stuff
like that.
23
2.2.2 Another automata
(This section is optional and can be skipped in the lecture.)
Here is a simple state machine (i.e., finite automaton) M that accepts all ASCII strings ending with
ing.
?
i n g
q0 q1 q2 q3
Notice key pieces of this machine: four states, q0 is the start state (arrow coming in), q3 is the final state
(double circle), transition arcs.
To run the machine, we start at the start state. On each input character, we follow the corresponding
arc. When we run out of input characters, we answer “yes” or “no”, depending on whether we are in the final
state.
The language of a machine M is the set of strings it accepts, written L(M ). In this case L(M ) =
{walking, flying, ing, . . .}.
q1
a
q0
b
q2
Both of the following are bad, where q1 6= q2 and the right hand machine has no outgoing transition for
the input character b.
q1 q1
a a
q0 q0
a
q2
24
not i or g
not i i
n g
q0 i q1 q2 q3
i
not i or n
i
not i
0
q0 q1
0
q0 0 q1 0 q2
q0 0 q1 0 q2 0 q3 0 q4 0 q5
0
This example is especially interesting, because we can achieve the same purpose, by observing that
n mod 6 = 0 if and only if n mod 2 = 0 and n mod 3 = 0 (i.e., to be divisible by 6, a number has to be
divisible by 2 and divisible by 3 [a generalization of this idea is known as the Chinese remainder theorem]).
So, we could run the two automatas of Section 2.3.1 and Section 2.3.2 in parallel (replicating each input
character to each one of the two automatas), and accept only if both automatas are in an accept state.
This idea would become more useful later in the course, as it provide a building operation to construct
complicated automatas from simple automatas.
25
2.3.4 Number of ones is even
Input is a string over Σ = {0, 1}.
Accept: all strings in which the number of ones is even.
0 0
1
q0 q1
1
2.3.5 Number of zero and ones is always within two of each other
Input is a string over Σ = {0, 1}.
Accept: all strings in which the difference between the number of ones and zeros in any prefix of the
string is in the range −2, . . . , 2. For example, the language contains , 0, 001, and 1101. You even have an
extended sequence of one character e.g. 001111, but it depends what preceded it. So 111100 isn’t in the
language.
q−2 1 q−1 1 q0 1 q1 1 q2
0 0
0 0
0 0, 1
1
qrej
Notice that the names of the states reflect their role in the computation. When you come to analyze
these machines formally, good names for states often makes your life much easier. BTW, the language of
this DFA is
n o
∗
L(M ) = w w ∈ {0, 1} and for every x that is a prefix of w, |#1(x) − #0(x)| ≤ 2 .
0 0
0 0 1
A B C D
1
1 1
qrej 0, 1
You can name states anything you want. Names of the form qX are often convenient, because they remind
you of what’s a state. And people often make the initial state q0 . But this isn’t obligatory.
26
2.4 The pieces of a DFA
To specify a DFA (deterministic finite automata), we need to describe
– a (finite) alphabet
– a (finite) set of states
27
Chapter 3
S
The DFA that accepts nothing, is just
a,b
(v) what is the transition from each state, on each input character?
28
Formally, a deterministic finite automaton is a 5-tuple (Q, Σ, δ, q0 , F ) where
For example, let Σ = {a, b} and consider the following DFA M , whose language L(M ) contains strings
consisting of one or more a’s followed by one or more b’s.
a b
q0 a q1 b q2
b a
qrej a,b
Then M = (Q, Σ, δ, q0 , F ), Q = {q0 , q1 , q2 , qrej }, and F = {q2 }. The transition function δ is defined by
δ a b
q0 q1 qrej
q1 q1 q2
q2 qrej q2
qrej qrej qrej
δ(q0 , a) = q1
δ(q1 , a) = q1
δ(q1 , b) = q2
δ(q2 , b) = q2
δ(q, t) = qrej for all other values of q and t.
Tables and state diagrams are most useful for small automata. Formulas are helpful for summarizing a
group of transitions that fit a common pattern. They are also helpful for describing algorithms that modify
automatas.
1. r0 = q0
3. rk ∈ F .
29
n o
The language recognized by M , denoted by L(M ), is the set w M accepts w .
For example, when our automaton above accepts the string aabb, it uses the state sequence q0 q1 q1 q2 q2 .
(Draw a picture of the transitions.) That is r0 = q0 , r1 = q1 , r2 = q1 , r3 = q2 , and r4 = q2 .
Note that the states do not have to occur in numerical order in this sequence, e.g. the following DFA
accepts aaa using the state sequence q0 q1 q0 q1 .
a
q0 q1
a
A language (i.e. set of strings) is regular if it is recognized by some DFA.
Consider the set of odd integers. If we multiply two odd integers, the answer is always odd. So the set of
odd integers is said to be closed under multiplication. But it is not closed under addition. For example,
3 + 5 = 8 which is not odd.
To talk about closure, you need two sets: a larger universe U and a smaller set X ⊆ U . The universe
is often supposed to be understood from context. Suppose you have a function F that maps values in U to
values in U . Then X is closed under f if F applied to values from X always produces an output value
that is also in X.
For automata theory, U is usually the set of all languages and X contains languages recognized by some
specific sort of machine, e.g. regular languages.
Here we are interested in the question of whether the regular languages are closed under set complement.
(The complement language keeps the same alphabet.) That is, if we have a DFA M = (Q, Σ, δ, q0 , F )
accepting some language L, can we construct a new DFA M 0 accepting L = Σ∗ \ L?
Consider the automata M from above, where L is the set of all strings of at least one a followed by at
least one b.
a b
q0 a q1 b q2
b a
qrej a,b
The complement language L contains the empty string, strings in which some b’s precede some a’s, and
strings that contain only a’s or only b’s.
Our new DFA M 0 should accept exactly those strings that M rejects. So we can make M 0 by swapping
final/non-final markings on the states:
30
a b
q0 a q1 b q2
b a
qrej a,b
Formally, M 0 = (Q, Σ, δ, q0 , Q \ F ).
a a a
q0 q1 q0 q1 q2
a
a
M1 : M2 :
Assume, that we would like to build an automata that accepts the language which is the intersection of
the language of both automatas. That is, we would like to accept the language L(M1 ) ∩ L(M2 ). How do we
build an automata for that?
The idea is to build a product automata of both automatas. See the following for an example.
a
q0 q1
a
p0 (q0 , p0 ) (q1 , p0 )
a
a
a
a
q p1 (q0 , p1 ) a (q1 , p1 )
a
a
a
p2 (q0 , p2 ) (q1 , p2 )
Given two automatas M = (Q, Σ, δ, q0 , F ) and M 0 = (Q0 , Σ0 , δ 0 , q00 , F 0 ), their product automata is the
automata formed by the product of the states. Thus, a state in the resulting automata N = M × M 0 is a
pair (q, q 0 ), where q ∈ Q and q 0 ∈ Q0 .
31
The key invariant of the product automata is that after reading a word w, its in the state (q, q 0 ), where,
q is that state that M is at after reading w, and q 0 is the state that M 0 is in after reading w.
As such, the intersection language L(M ) ∩ L(M 0 ) is recognized by the product automata, where we set
the pairs (q, q 0 ) ∈ Q(N ) to be an accepting state for N , if q ∈ F and q 0 ∈ F 0 .
Similarly, the automata accepting the union L(M ) ∪ L(M 0 ) is created from the product automata, by
setting the accepting states to be all pairs (q, q 0 ), such that either q ∈ F or q 0 ∈ F 0 .
As such, the automata accepting the union language L(M1 ) ∪ L(M2 ) is the following.
a
q0 q1
a
p0 (q0 , p0 ) (q1 , p0 ) a
a
a
a
q p1 (q0 , p1 ) a (q1 , p1 )
a
a
a
p2 (q0 , p2 ) (q1 , p2 )
q2
So, formally, an FST is a 5-tuple (Q, Σ, Γ, δ, q0 ), where
32
Notation: Γ = Γ ∪ {} .
The transition table for our example FST might look like the following.
δ 0 1
q0 (q1 , ) (q2 , )
q1 (q0 , 0) (q0 , 1)
q2 (q0 , 2) (q0 , 3)
33
Chapter 4
This lecture finishes section 1.1 of Sipser and also covers the start of 1.3.
q0 b q1 a r1
drain
L1 L2
We can run these two DFAs together, by creating states that remember the states of both machines.
(q0 , r0 ) (q1 , r0 ) (drain, r0 )
b a a
a a b b b a,b a,b
34
State of a DFA after reading a word w. In the following, given a DFA M = (Q, Σ, δ, q0 , F ) , we will
be interested in what state the DFA M is in, after reading the characters of a string w = w1 w2 . . . wk ∈ Σ∗ .
As in the definition of acceptance, we can just define the sequence of states that M would go through as it
reads w. Formally, r0 = q0 , and
ri = δ(ri−1 , wi ) , for i = 1, . . . , k.
As such, rk is the state M would be after reading the string w. We will denote this state by δ(q0 , w). Note,
that by definition
δ(q0 , w) = δ δ(q0 , w1 . . . wk−1 ) , wk .
In general, if the DFA is in a state q, and we want to know in what state it would be after reading a string
w, we will denote it by δ(q, w).
The set FN ⊆ Q of accepting states is free to be whatever we need it to be, depending on what we want
N to recognize. For example, if we would like N to accept the intersection L(M ) ∩ L(M 0 ) then we will set
FN = F ×F 0 . If we want N to recognize the union language L(M )∪L(M 0 ) then FN = (F × Q0 ) ∪ ∪(Q×F 0 ).
Lemma 4.2.1 For any input word w ∈ Σ∗ , the product automata N of the DFAs M = (Q, Σ, δ, q0 , F ) and
M 0 = (Q0 , Σ, δ 0 , q00 , F 0 ), is in state (q, q 0 ) after reading w, if and only if (i) M in the state q after reading w,
and (ii) M 0 is in the state q 0 after reading w.
Proof: The proof is by induction on the length of the word w.
If w = is the empty word, then N is initially in the state (q0 , q00 ) by construction, where q0 (resp. q00 )
is the initial state of M (resp. M 0 ). As such, the claim holds in this case.
Otherwise, assume w = w1 w2 . . . wk−1 wk , and the claim is true by induction for all input words of length
strictly smaller than k.
0
Let (qk−1 , qk−1 ) be the state that N is in after reading the string w b = w1 . . . wk−1 . By induction, as
|w|
b = k − 1, we know that M is in the state qk−1 after reading w, b and M 0 is in the state qk−1 0
after reading
w.
b
Let qk = δ(qk−1 , wk ) = δ(δ(q0 , w)
b , wk ) = δ(q0 , w) and
qk0 = δ 0 (qk−1
0
, wk ) = δ 0 (δ 0 (q00 , w)
b , wk ) = δ 0 (q00 , w) .
As such, by definition, M (resp. M 0 ) would in the state qk (resp. qk0 ) after reading w.
Also, by the definition of its transition function, after reading w the DFA N would be in the state
δN ((q0 , q00 ), w) = δN (δN ((q0 , q00 ), w)
b , wk ) = δN (qk−1 , qk−1 0
), wk
0
= δ(qk−1 , wk ), δ(qk−1 , wk ) = (qk , qk0 ) ,
Lemma 4.2.2 Let M = (Q, Σ, δ, q0 , F ) and M 0 = (Q0 , Σ, δ 0 , q00 , F 0 ) be two given DFAs. Let N be their
produced automata, where its set of accepting states is F × F 0 . Then L(N ) = L(M ) ∩ L(M 0 ).
35
Proof: If w ∈ L(M ) ∩ L(M 0 ), then qw = δ(q0 , w) ∈ F and qw 0
= δ 0 (q00 , w) ∈ F 0 . By Lemma 4.2.1, this
0 0 0
implies that δN ((q0 , q0 ), w) = (qw , qw ) ∈ F × F . Namely, N accepts the word w, implying that w ∈ L(N ),
and as such L(M ) ∩ L(M 0 ) ⊆ L(N ).
Similarly, if w ∈ L(N ), then (pw , p0w ) = δN ((q0 , q00 ), w) must be an accepting state of N . But the set
of accepting states of N is F × F 0 . That is (pw , p0w ) ∈ F × F 0 , implying that pw ∈ F and p0w ∈ F 0 .
Now, by Lemma 4.2.1, we know that δ(q0 , w) = pw ∈ F and δ 0 (q00 , w) = p0w ∈ F 0 . Thus, M and M 0 both
accept w, which implies that w ∈ L(M ) and w ∈ L(M 0 ). Namely, w ∈ L(M ) ∩ L(M 0 ), implying that
L(N ) ⊆ L(M ) ∩ L(M 0 ).
Putting the above together implies the claim.
We (hopefully) all understand what union does. The other two have some subtleties. Let
Then
LK = {underground, underwater, underwork, overground, overwater, overwork} .
Similarly,
, ground, water, work, groundground,
K∗ = groundwater, groundwork, workground, .
waterworkwork, . . .
For star operator, note that the resulting set always contains the empty string (because n can be zero).
Also, each of the substrings is chosen independently from the base set and you can repeat. E.g.
waterworkwork is in K ∗ .
Regular languages are closed under many operations, including the three “regular operations” listed above,
set intersection, set complement, string reversal, “homomorphism” (formal version of shifting alphabets). We
have seen (last class) why regular languages are closed under set complement. We will prove the rest of these
bit by bit over the next few lectures.
36
In particular, for a regular expression hexpi, we will use the notation L(hexpi) to denote the language
associated with this regular expression. Thus,
1. R = R = R.
2. R∅ = ∅ = ∅R.
This is a bit confusing, so let us see why this is true, recall that
n o
R∅ = xy x ∈ R and y ∈ ∅ .
But the empty set (∅) does not contain any element, and as such, no concatenated string can be created.
Namely, its the empty language.
4. R ∪ = ∪ R.
This expression can not always be simplified, since might not be in the language L(R).
5. ∅∗ = {}, since the empty word is always contain in the language generated by the star operator.
6. ∗ = {}.
1 From Through the Looking Glass, by Lewis Carroll:
‘And only one for birthday presents, you know. There’s glory for you!’
‘I don’t know what you mean by “glory”,’ Alice said.
Humpty Dumpty smiled contemptuously. ‘Of course you don’t – till I tell you. I meant “there’s a nice knock-
down argument for you!” ’
‘But “glory” doesn’t mean “a nice knock-down argument”,’ Alice objected.
‘When I use a word,’ Humpty Dumpty said, in rather a scornful tone, ‘it means just what I choose it to mean
– neither more nor less.’
‘The question is,’ said Alice, ‘whether you can make words mean so many different things.’
‘The question is,’ said Humpty Dumpty, ‘which is to be master – that’s all.’
37
4.4.1 More interesting examples
Suppose Σ = {a, b, c}.
3. aΣ∗ a + bΣ∗ b + cΣ∗ c is all strings that start and end with the same character.
(− ∪ ) D∗ ( ∪ .)D∗ .
But this does not force the number to contain any digits, which is probably wrong. As such, the correct
expression is
(− ∪ )(D+ ( ∪ .)D∗ ∪ D∗ ( ∪ .)D+ ).
Notice that an is not a regular expression. Some things written nwith non-star
o exponents are regular
2n
and some are not. It depends on what conditions you put on n. E.g. a n ≥ 0 is regular (even length
n o
strings of a’s). But an bn n ≥ 0 is not regular.
However, a3 (or any other fixed power) is regular, as it just a shorthand for aaa. Similarly, if R is a
regular expression, then R3 is regular since its a shorthand for RRR.
38
Chapter 5
Lecture 5: Nondeterministic
Automata
February 3, 2009
This lecture covers the first part of section 1.2 of Sipser, through p 54.
39
diagram that consumes the whole input string and ends in an accept state.
Here are two possible ways to think about this:
(i) the NFA magically guesses the right path which will lead to an accept state.
(ii) the NFA searches all paths through the state diagram to find such a path.
The first view is often the best for mathematical analysis. The second view is one reasonable approach to
implementing NFAs.
Example. Consider the DFA that accepts all the strings over {a, b} that starts with aab. Here is the
resulting DFA.
q0 a q1 a q2 b q3
b b a
a, b
snk
a, b
The NFA for the same language is even simpler if we omit transitions, and the sink state. In particular,
the NFA for the above language is the following.
q0 a q1 a q2 b q3
a, b
As another example, the automata below accepts strings containing the substring abab.
40
a,b a,b
(N1) a b a b
1 2 3 4 5
The respective DFA, shown below, needs a lot more transitions and is somewhat harder to read.
b a a,b
a
a b a b
B C D E G
For example, the word 314159#5 is in L, and so is 314159#3. But the word 314159#7 is not in L.
Here is the NFA M that recognizes this language.
[0, 9]
q0 q0′
#
[0, 9]
0 0
q1 q1′
#
1 1
.. ..
qs . . qf
9 9
[0, 9]
[0, 9] q9 q9′
#
The NFA M scans the input string until it “guesses” that it is at the character c in w that will be at the
end of the input string. When it makes this guess, M transitions into a state qc that “remembers” the value
c. The rest of the transitions then confirm that the rest of the input string matches this guess.
41
A DFA for this problem is considerably more taxing. We will need a state to remember each digit
encountered in the string read so far. Since there are 210 different subsets, we will require an automata with
at least 1024 states! The NFA above requires only 22 states, and is much easier to draw and understand.
δ : Q × Σ → P(Q),
where Σ = Σ ∪ {} and P(Q) is the power set of Q (i.e., all possible subsets of Q). As such, the input
character for δ(·) can be either a real input character or (in this case the NFA does not eat [or drink] any
input character when using this transition). The output value of δ is a set of states (unlike a DFA).
b
a b
A B ǫ C D
Here
δ(A, a) = {A, B}
δ(B, a) = ∅ (NB: not {∅})
δ(B, ) = {C} (NB: not just C)
δ(B, b) = {C} (NB: just follows one transition arc).
The trace for recognizing the input abab:
t = 0: state = A, remaining input abab.
t = 1: state = A, remaining input bab.
t = 2: state = A, remaining input abi.
t = 3: state = B, remaining input b.
t = 4: state = C, remaining input b ( transition used, and no input eaten).
t = 5: state = D, remaining input .
Is every DFA an NFA? Technically, no (why?1 ). However, it is easy to convert any DFA into an NFA.
If δ is the transition function of the DFA, then the corresponding transition of the NFA is going to be
δ 0 (q, t) = {δ(q, t)}.
42
(ii) r0 = q0 .
(The NFA starts from the start state.)
(iii) rn ∈ F .
(The final state in the trance in an accepting state.)
43
Chapter 6
This lecture covers the last part of section 1.2 of Sipser (pp. 58–63), part of 1.3 (pp. 66–69), and also
closure under string reversal and homomorphism.
6.1 Overview
We defined a language to be regular if it is recognized by some DFA. The agenda for the new few lectures
is to show that three different ways of defining languages, that is NFAs, DFAs, and regexes, and in fact all
equivalent; that is, they all define regular languages. We will show this equivalence, as follows.
next next lecture
NFA today
next lecture
One of the main properties of languages we are interested in are closure properties, and the fact that
regular languages are closed under union, intersection, complement, concatenation, and star (and also under
homomorphism).
However, closure operations are easier to show in one model than the other. For example, for DFAs
showing that they are closed under union, intersection, complement are easy. But showing closure of DFA
under concatenation and ∗ is hard.
Here is a table that lists the closure property and how hard it is to show it in the various models of
regular languages.
Model
Property ∩ ∪ L ◦ ∗
intersection union complement concatenation star
44
Recall what it means for regular languages to be closed under an operation op. If L1 and L2 are regular,
then L1 op L2 is regular. That is, if we have an NFA recognizing L1 and an NFA recognizing L2 , we can
construct an NFA recognizing L1 op L2 .
The extra power of NFAs makes it easy to prove closure properties for NFAs. When we know all DFAs,
NFAs, and regexes are equivalent, these closure results then apply to all three representations. Namely, they
would imply that regular languages have these closure properties.
We would like to claim that if L is regular, then so is LR . Formally, we need to be a little bit more
careful, since we still did not show that a language being regular implies that it is recognized by an NFA.
where qS is the only accepting state for M . Note, that δ 0 is identical to δ, except that
∀q ∈ F δ 0 (q, ) = qS . (6.1)
Now, we need to prove formally that if w ∈ L(M ) then wR ∈ L(N ), but this is easy induction, and we
omit it.
Note, that this will not work for a DFA. First, we can not force a DFA to have a single final state. Second, a
state may have two incoming transitions on the same character, resulting in non-determinism when reversed.
45
6.3 Closure of NFAs under regular operations
We consider the regular operations to be union, concatenation, and the star operator.
Advice to instructor: Do the following constructions via pictures, and give detailed tuple notation !!!
for only one of them.
M fm
N′ fm
q0 ..
.. ǫ .
q0 .
f1
f1
qs
=⇒
′
fm
N ′
fm ǫ
q0′ ..
.. .
q0′ .
f1′
f1′
Formally, we are given two NFAs N = (Q, Σ, δ, q0 , F ) and N 0 = (Q0 , Σ, δ 0 , q00 , F 0 ), where Q ∩ Q0 = ∅ and
the new state qs is not in Q or Q0 . The new NFA M is
M = (Q ∪ Q0 ∪ {qs } , Σ, δM , sq , F ∪ F 0 ) ,
where
δ(q, c) q ∈ Q, c ∈ Σ
δ 0 (q, c) q ∈ Q0 , c ∈ Σ
δM (q, c) =
{q0 , q00 } q = qs , c =
∅ q = qs , c 6= .
We thus showed the following.
Lemma 6.3.1 Given two NFAs N and N 0 one can construct an NFA M , such that L(M ) = L(N ) ∪ L(N 0 ).
46
how to break it into two strings x ∈ L(N ) and y ∈ L(N 0 ), so that w = xy. Now, there exists an execution
trace for N accepting x, then we can jump into the starting state of N 0 and then use the execution trace
accepting y, to reach an accepting state of the new NFA M . Here is how visually the resulting automata
looks like.
N fm N′ ′
fm
q0 .. q0′ ..
. .
f1 f1′
M N fm N′ ′
fm
ǫ
q0 .. q0′ ..
. ǫ .
f1 f1′
Formally, we are given two NFAs N = (Q, Σ, δ, q0 , F ) and N 0 = (Q0 , Σ, δ 0 , q00 , F 0 ), where Q ∩ Q0 = ∅. The
new automata is
M = (Q ∪ Q0 , Σ, δM , q0 , F 0 ) ,
where
δ(q, ) ∪ {q00 } q ∈ F, c =
δ(q, c) q ∈ F, c 6=
δM (q, c) =
δ(q, c) q ∈ Q \ F, c ∈ Σ
0
δ (q, c) q ∈ Q0 , c ∈ Σ .
Lemma 6.3.2 Given two NFAs N and N 0 one can construct an NFA M , such that L(M ) = L(N ) ◦ L(N 0 ) =
L(N )L(N 0 ).
Proof: The construction is described above, and the proof of the correctness (of the construction) is easy
and sketched above, so we skip it. You might want to verify that you know how to fill in the details for this
proof (wink, wink).
The idea is to connect the final states of N back to the initial state using -transitions, so that it can
loop back after recognizing a word of L(N ). As such, in the ith loop, during the execution, the new NFA M
recognized the word wi . Naturally, the NFA needs to guess when to jump back to the start state of N . One
∗
minor technicality, is that ∈ (L(N )) , but it might not be in L(N ). To overcome this, we introduce a new
start state qs (which is accepting), and its connected by (you guessed it) an -transition to the initial state
of N . This way, ∈ L(M ), and as such it recognized the required language. Visually, the transformation
looks as follows.
47
ǫ
M
N fm N fm
q0 .. qs ǫ q0 ..
. =⇒ .
f1 f1
ǫ
Formally, we are given the NFA N = (Q, Σ, δ, q0 , F ), where qs ∈
/ Q. The new NFA is
M = Q ∪ {qs } , Σ, δM , qs , F ∪ {qs } ,
where
δ(q, ) ∪ {q0 } q ∈ F, c =
δ(q, ) q ∈ F, c 6=
δM (q, c) = δ(q, c) q ∈Q\F
{q0 } q = q0 , c =
∅ q = q0 , c 6= .
Why the extra state? The construction for star needs some explanation. We add arcs from final states
back to initial state to do the loop. But then we need to ensure that is accepted. It’s tempting to just
make the initial state final, but this doesn’t work for examples like the following. So we need to add a new
initial state to handle .
b
q0 a q1
Notice that it also works to send the loopback arcs to the new initial state rather than to the old initial
state.
∗
Lemma 6.3.3 Given an NFA N , one can construct an NFA M that accepts the language (L(N )) .
Proof: The proof is by induction on the structure of R (can be interpreted as induction over the number of
operators in R)
The base of the induction is when R contains no operator (i.e., the number operators in R is zero), then
R must be one of the following:
q0 c q1 .
(i) If R = c, where c ∈ Σ, then the corresponding NFA is
48
As for induction step, assume that we proved the claim for all expressions having at most k − 1 operators,
and R has k operators in it. We consider if R can be written in any of the following forms:
(i) R = R1 + R2 . By the induction hypothesis, there exists two NFAs N1 and N2 such that L(N1 ) = L(R1 )
and L(N2 ) = L(R2 ). By Lemma 6.3.1, there exists an NFA M that recognizes the union; that is
L(M ) = L(N1 ) ∪ L(N2 ) = L(R1 ) ∪ L(R2 ) = L(R).
(ii) R = R1 ◦ R2 ≡ R1 R2 . By the induction hypothesis, there exists two NFAs N1 and N2 such that
L(N1 ) = L(R1 ) and L(N2 ) = L(R2 ). By Lemma 6.3.2, there exists an NFA M that recognizes the
concatenated language; that is, L(M ) = L(N1 ) ◦ L(N2 ) = L(R1 ) ◦ L(R2 ) = L(R).
∗
(iii) R = (R1 ) . By the induction hypothesis, there exists a NFA N1 , such that L(N1 ) = L(R1 ). By
∗
Lemma 6.3.3, there exists an NFA M that recognizes the star language; that is, L(M ) = (L(N1 )) =
∗
(L(R1 )) = L(R).
This completes the proof of the lemma, since we showed for all possible regular expressions with k operators
how to build a NFA for them.
Consider the regular expression R = (a + )(aa + ba)∗ . We have that R = R1 ◦ R2 , where R1 = a + and
R2 = (aa + ba)∗ . Let use first build an NFA for R1 = a + . The NFA for is q2 . and for a is
ǫ q2
qs
ǫ
q0 a q1
q4 a q5 ǫ q6 a q7
ǫ
q12
ǫ
q8 b q9 ǫ q10 a q11
∗
By Lemma 6.3.3, the NFA for R2 = (R3 ) is
49
q2 ǫ
ǫ ǫ
qs q4 a q5 ǫ q6 a q7
ǫ ǫ
q0 a q1 ǫ q ǫ q
13 12
ǫ
q8 b q9 ǫ q10 a q11
Figure 6.1: The NFA constructed for the regular expression R = (a + )(aa + ba)∗ .
q4 a q5 ǫ q6 a q7
ǫ
q13 ǫ q12
ǫ
q8 b q9 ǫ q10 a q11
ǫ
Now, R = R1 R2 = R1 ◦ R2 , and by Lemma 6.3.2, the NFA for R is depicted in Figure 6.1.
Note, that the resulting NFA is by no way the simplest and more elegant NFA for this language (far from
it), but rather the NFA we get by following our construction carefully.
50
Chapter 7
That is, ∆N (X, w) is the set of all the states N might be in, if it starts from a state of X, and it handles
the input w.
The proof of the following lemma is by an easy induction on the length of w.
Lemma 7.1.1 Let N = (Q, Σ, δ, q0 , F ) be a given NFA with no -transitions. For any word w ∈ Σ∗ , we have
that q ∈ ∆N ({q0 } , w), if and only if, there is a way for N to be in q after reading w (when starting from the
start state q0 ).
More details. We include the proof for the sake of completeness, but the reader should by now be able to fill in such
a proof on their own.
Proof: The proof is by induction on the length of w = w1 w2 . . . wk .
If k = 0 then w is the empty word, and then N stays in q0 . Also, by definition, we have ∆N ({q0 } , w) = {q0 }, and
the claim holds in this case.
Assume that the claim holds for all word of length at most n, and let k = n + 1 be the length of w. Consider a
state qn+1 that N reaches after reading w1 w2 . . . wn wn+1 , and let qn be the state N was before handling the character
wn+1 and reaching qn+1 . By induction, we know that qn ∈ ∆N ({q0 } , w1 w2 . . . wn ). Furthermore, we know that
qn+1 ∈ δ(qn , wn+1 ). As such, we have that
[
qn+1 ∈ δ(qn , wn+1 ) ⊆ δ(q, wn+1 )
q∈∆N({q0 },w1 w2 ...wn )
51
7.1.2 Simulating NFAs with DFAs
One possible way of thinking about simulating NFAs is to consider each state to be a “light” that can be
either on or off. In the beginning, only the initial state is on. At any point in time, all the states that the
NFA might be in are turned on. As a new input character arrives, we need to update the states that are on.
As a concrete examples, consider the automata below (which you had seen before), that accepts strings
containing the substring abab.
a,b a,b
(N1) a b a b
A B C D E
Let us run an explicit search for the above NFA (N1) on the input string ababa.
a,b a,b
t = 0:
a b a b
A B C D E
Remaining input: ababa.
a,b a,b
t = 1:
a b a b
A B C D E
Remaining input: baba.
a,b a,b
t = 2:
a b a b
A B C D E
Remaining input: aba.
a,b a,b
t = 3:
a b a b
A B C D E
Remaining input: ba.
a,b a,b
t = 4:
a b a b
A B C D E
Remaining input: a.
a,b a,b
t = 5:
a b a b
A B C D E
Remaining input: .
52
b
a,b
a,b
a,b
A
A
b a
a
a,b
A
B
a
b a b
B
a
C
b
a
C
D
a
b
D
a,b
a,b
a,b
E
a,b
E
a
b
a b
a,b
a,b
a,b
a,b
A
A
a
a
a
B
B
a b a
b
b
b
C
C
b
a
a
D
D
b
b
a,b
a,b
a,b
a,b
E
E
Figure 7.1: The resulting DFA
Note, that (N1) accepted ababa because when its done reading the input, the accepting state is on.
This provide us with a scheme to simulate this NFA with a DFA: (i) Generate all possible configurations of
states that might be turned on, and (ii) decide for each configuration what is the next configuration, what is
the next configuration. In our case, in all configurations the first state is turned on. The initial configuration
is when only state A is turned on. If this sounds familiar, it should, because what you get is just a big nasty,
hairy DFA, as shown on the last page of this class notes. The same DFA with the unreachable states removed
is shown in Figure 7.1.
Every state in the DFA of Figure 7.1 can be identified by the subset of the original states that is turned
on (namely, the original automata might be any of these states).
53
b a
Thus, a more conventional drawing of {A, C}
this automata is shown on the right. b
Thus, to convert an NFA N with a set {A} a {A, B} a
of states Q into a DFA, we consider all the
subsets of Q that N might be realized as.
Namely, every subset of Q (i.e., a member a {A, B, D}
of P(Q) – the power set of Q) is going to
b
be a state in the new automata. Now, con- b b
sider a subset X ⊆ Q, and for every input
character c ∈ Σ, let us figure out in what
a {A, C, E}
states the original NFA N might be in if it b
is in one of the states of X, and it handles a {A, B, E}
the characters c. Let Y be the resulting set {A, E} b a
of such states. a
{A, B, D, E}
b
Clearly, we had just computed the transition function of the new (equivalent) DFA, showing that if the
NFA is in one of the states of X, and we receive c, then the NFA now might be in one of the states of Y .
Now, if the initial state of the NFA N is q0 , then the new DFA MDFA would start with the state (i.e.,
configuration) {q0 } (since the original NFA might be only in q0 at this point in time).
Its important that our simulation is faithful : At any point in time, if we are in state X in MDFA then
there is a path in the original NFA N , with the given input, to reach each state of Q that is in X (and
similarly, X includes all the states that are reachable with such an input).
When does MDFA accepts? Well, if it is in state X (here X ⊆ Q), then it accepts only if X includes one
of the accepting states of the original NFA N .
Clearly, the resulting DFA MDFA is equivalent to the original NFA.
where P(Q) is the power set of Q, and δb (the transition function), qb0 the initial state, and the set of accepting
states Fb are to be specified shortly. Note that the states of MDFA are subsets of Q (which is slightly confusing),
and as such the starting state of MDFA , is qb0 = {q0 } (and not just q0 ).
We need to specify the transition function, so consider X ∈ P(Q) (i.e., X ⊆ Q), and a character c. For a
state s ∈ X, the NFA might go into any state in δ(s, c) after reading q. As such, the set of all possible states
the NFA might be in, if it started from a state in X, and received c, is the set
[
Y = δ(s, c).
s∈X
As such, the transition of MDFA from X receiving c is the state of MDFA defined by Y . Formally,
[
b
δ(X, c) = Y = δ(s, c). (7.1)
s∈X
As for the accepting states, consider a state X ∈ P(Q) of MDFA . Clearly, if there is a state of F in X,
then X is an accepting state; namely, F ∩ X 6= ∅. Thus,
n o
Fb = X X ∈ P(Q) , X ∩ F 6= ∅ .
54
Proof of correctness
Claim 7.1.2 For any w ∈ Σ∗ , the set of states reached by the NFA N on w is precisely the state reached by
b 0 } , w).
MDFA on w. That is ∆N ({q0 } , w) = δ({q
Similarly, by the definition of MDFA , we have that from the state X, after reading wk+1 , the DFA MDFA is in
the state [
b
Y = δ(X, wk+1 ) = δ(s, wk+1 ),
s∈X
Lemma 7.1.3 Any NFA N , without -transitions, can be converted into a DFA MDFA , such that MDFA
accepts the same language as N .
The DFA MDFA is in the state δ({q b 0 } , w) after reading w. Claim 7.1.2, implies that Y = δ({q
b 0 } , w) =
∆N ({q0 } , w). By construction, the MDFA accepts at this state, if and only if, Y ∈ Fb, which equivalent to
that Y contains a final state of N . That is Y ∩ F 6= ∅. Namely, MDFA accepts w if
δb {q0 } , w ∩ F 6= ∅ ⇐⇒ ∆N {q0 } , w ∩ F 6= ∅.
Handling -transitions
Now, we would like to handle a general NFA that might have -transitions. The problem is demonstrated in
the following NFA in its initial configuration:
a,b a,b
a, ǫ b a, ǫ b, ǫ
A B C D E
Clearly, the initial configuration here is {A, B} (and not the one drawn above), since the automata can
immediately jump to B if the NFA is already in A. So, the configuration {A} should not be considered at
all. As such, the true initial configuration for this automata is
55
a,b a,b
(N2) a, ǫ b a, ǫ b, ǫ
A B C D E
Next, consider the following more interesting configuration.
a,b a,b
a, ǫ b a, ǫ b, ǫ
A B C D E
But here, not only we can jump from A to B, but we can also jump from C to D, and from D to E. As
such, this configuration is in fact the following configuration
a,b a,b
(N3) a, ǫ b a, ǫ b, ǫ
A B C D E
In fact, this automata can only be in these two configurations because of the -transitions.
So, let us formalize the above idea: Whenever the NFA N might be in a state s, we need to extend the
configuration to all the states of the NFA reachable by -transitions from s. Let R (s) denote the set of all
states of N that are reachable by a sequence of -transitions from s (s is also in R (s) naturally, since we
can reach s without moving anywhere).
Thus, if N might be any state of X ⊆ Q, then it might be in any state of
[
E(X) = R (s) .
s∈X
As such, whenever we consider the set of states X for Q, in fact, we need to consider the extended set of
states E(X). As such, for the above automata, we have
Theorem 7.1.4 Any NFA N (with or without -transitions) can be converted into a DFA MDFA , such that
MDFA accepts the same language as N .
where δb is the old transition function from the proof of Lemma 7.1.3; namely, we always extend the new set
of states to include all the states we can reach by -transitions. Similarly, the initial state is now
qS = E({q0 }) .
It is now straightforward to verify that the new DFA is indeed equivalent to the original NFA, using the
argumentation of Lemma 7.1.3.
56
57
Chapter 8
In this lecture, we will show that any DFA can be converted into a regular expression. Our construction
would work by allowing regular expressions to be written on the edges of the DFA, and then showing how one
can remove states from this generalized automata (getting a new equivalent automata with the fewer states).
In the end of this state removal process, we will remain with a generalized automata with a single initial
state and a single accepting state, and it would be then easy to convert it into a single regular expression.
(C1) There are transitions going from the initial state to all other states, and there are no transitions into
the initial state.
(C2) There is a single accept state that has only transitions coming into it (and no outgoing transitions).
(C4) Except for the initial and accepting states, all other states are connected to all other states via a
transition. In particular, each state has a transition to itself.
When you can not actually go between two states, a GNFA has a transitions labelled with ∅, which will
not match any string of input characters. We do not have to draw these transitions explicitly in the state
diagrams.
58
8.1.2 Top-level outline of conversion
We will convert a DFA to a regular expression as follows:
(A) Convert DFA to a GNFA, adding new initial and final states.
(B) Remove all states one-by-one, until we have only the initial and final states.
(C) Output regex is the label on the (single) transition left in the GNFA. (The word regex is just a shortcut
for regular expression.)
Proof: We can consider M to be an NFA. Next, we add a special initial state qinit that is connected to the
old initial state via -transition. Next, we add a special final state qfinal , such that all the final states of M
are connected to qfinal via an -transition. The modified NFA M 0 has an initial state and a single final state,
such that no transition enters the initial state, and no transition leaves the final state, thus M 0 comply with
conditions (C1–C3) above. Next, we consider all pair of states x, y ∈ Q(M 0 ), and if there is no transition
∅ y . The resulting GNFA G from M 0 is now
between them, we introduce the transition x
compliant also with condition (C4).
It is easy now to verify that G is equivalent to the original DFA M .
We will remove all the intermediate states from the GNFA, leaving a GNFA with only initial and final
states, connected by one transition with a (typically complex) label on it. The equivalent regular expression
is obvious: the label on the transition.
Lemma 8.1.2 Given a GNFA N with k = 2 states, one can generate an equivalent regular expression.
Proof: A GNFA with only two states (that comply with conditions (C1)-(C4)) have the following form.
some regex
qS qF
The GNFA has a single transition from the initial state to the accepting state, and this transition has the
regular expression R associated with it. Since the initial state and the accepting state do not have self loops,
we conclude that N accepts all words that matches the regular expression R. Namely, L(N ) = L(R).
q1 r1
We first describe the construction. Since k > 2, there is at least one
state in N which is not initial or accepting, and let qrip denote this state.
We will “rip” this state out of N and fix the GNFA, so that we get a GNFA
with one less state.
Transition paths going through qrip might come from any of a variety q2 qrip r2
of states q1 , q2 , etc. They might go from qrip to any of another set of
states r1 , r2 , etc.
For each pair of states qi and ri , we need to convert the transition
through qrip into a direct transition from qi to ri .
q3 r3
59
Reworking connections for specific triple of states
To understand how this works, let us focus on the connections between qrip and two other specific states qin
and qout . Notice that qin and qout might be the same state, but they both have to be different from qrip .
The state qrip has a self loop with regular expression Rrip associated with it. So, consider a fragment of
an accepting trace that goes through qrip . It transition into qrip from a state qin with a regular expression
Rin and travels out of qrip into state qout on an edge with the associated regular expression being Rout . This
trace, corresponds to the regular expression Rin followed by 0 or more times of traveling on the self loop
(Rrip is used each time we traverse the loop), and then a transition out to qout using the regular expression
Rrip
Rout . As such, we can introduce a direct transition from qin to qout with the regular expression
∗
R = Rin (Rrip ) Rout .
Clearly, any fragment of a trace traveling qin → Rin qrip Rout
qrip → qout can be replaced by the direct tran- qin qout
R
sition qin −→ qout . So, let us do this replace-
ment for any two such stages, we connect them
directly via a new transition, so that they no Rin (Rrip )∗ Rout
longer need to travel through qrip .
Clearly, if we do that for all such pairs, the new automata accepts the same language, but no longer need
to use qrip . As such, we can just remove qrip from the resulting automata. And let M 0 denote the resulting
automata.
The automata M 0 is not quite legal, yet. Indeed, we will have now parallel transitions because of the
above process (we might even have parallel self loops). But this is easy to fix: We replace two such parallel
R1 R2
transitions qi −→ qj and qi −→ qj , by a single transition
R +R
qi −−
1
−→
2
qj .
As such, for the triple qin , qrip , qout , if the original label on the direct transition from qin to qout was
originally Rdir , then the output label for the new transition (that skips qrip ) will be
∗
Rdir + Rin (Rrip ) Rout . (8.1)
Clearly the new transition, is equivalent to the two transitions it replaces. If we repeat this process for
all the parallel transitions, we get a new GNFA M which has k − 1 states, and furthermore it accepts exactly
the same language as N .
Proof: Since k > 2, N contains least one state in N which is not accepting, and let qrip denote this state.
We will “rip” this state out of N and fix the GNFA, so that we get a GNFA with one less state.
For every pair of states qin and qout , both distinct from qrip , we replace the transitions that go through
qrip with direct transitions from qin to qout , as described in the previous section.
Correctness. Consider an accepting trace T for N for a word w. If T does not use the state qrip than
the same trace exactly is an accepting trace for M . So, assume that it uses qrip , in particular, the trace looks
like
0 or more times
z }| {
S Si+1 Sj−1 Sj−1
T = . . . qi −→
i
qrip −−→ qrip . . . −−−→ qrip −−−→ qj . . . .
Where Si Si+1 . . . , Sj is a substring of w. Clearly, Si ∈ Rin , where Rin is the regular expression associated
with the transition qi → qrip . Similarly, Sj−1 ∈ Rout , where Rout is the regular expression associated with
∗
the transition qrip → qj . Finally, Si+1 Si+2 · · · Sj−1 ∈ (Rrip ) , where Rrip is the regular expression associated
with the self loop of qrip .
60
∗
Now, clearly, the string Si Si+1 . . . Sj matches the regular expression Rin (Rout ) Rout . in particular, we
can replace this portion of the trace of T by
Si Si+1 ...Sj−1 Sj
T = . . . qi −−−−−−−−−→ qj . . . .
This transition is using the new transition between qi and qj introduced by our construction. Repeating this
replacement process in T till all the appearances of qrip are removed, results in an accepting trace Tb of M .
Namely, we proved that any string accepted by N is also accepted by M .
We need also to prove the other direction. Namely, given an accepting trace for M , we can rewrite it
into an equivalent trace of N which is accepting. This is easy, and done in a similar way to what we did
above. Indeed, if a portion of the trace uses a new transition of M (that does not appear in N ), we can
place it by a fragment of transitions going through qrip . In light of the above proof, it is easy and we omit
the straightforward but tedious details.
Theorem 8.1.4 Any DFA can be translated into an equivalent regular expression.
Proof: Indeed, convert the DFA into a GNFA N . As long as N has more than two states, reduce its
number of states by removing one of its states using Lemma 8.1.3. Repeat this process till N has only two
states. Now, we convert this GNFA into an equivalent regular expression using Lemma 8.1.2.
So, if the original DFA has n states, then the algorithm will do the inner step O(n3 ) times (which is not too
bad). Worse, each time we remove a state, we replace the regex on each remaining transition with a regex
that is potentially four times as large. (That is, we replace the regular expression Rdir associated with a
∗
transition, by a regular expression Rdir + Rin (Rrip ) Rout , see Eq. (8.1)p60 .)
So, every time we rip a state in the GNFA, the length of the regular expressions associated with the edges
of th GNFA get longer by a factor of four (at most). So, we repeat this n times, so the length of the final
output regex is O(4n ). And the actual running time of the algorithm is O(n3 4n ).
Typically output sizes and running times are not quite that bad. We really only need to consider triples
of states that are connected by arcs with labels other than ∅. Many transitions are labelled with or ∅, so
regular expression size often increases by less than a factor of 4. However, actual times are still unpleasant
for anything but very small examples.
Interestingly, while this algorithm is not very efficient, it is not the algorithm “fault”. Indeed, it is known
that regular expressions for automata can be exponentially large. There is a lower bound of 2n for regular
expressions describing an automata of size n, see [EZ74] for details.
61
8.2 Examples
8.2.1 Example: From GNFA to regex in 8 easy figures
1: The original NFA. 2: Normalizing it.
a ǫ a
A B init A B b
b b
b
a =⇒ a
ǫ
C C AC
a, b a+b
3: Remove state A.
a 4: Redrawn without old edges.
a
init B b
ǫ a
init A B b
b
=⇒ =⇒ a
b b
a ǫ
C AC
ǫ
C AC a+b
a+b
5: Removing B.
ab∗ a 6: Redrawn.
a init
init B b ab∗ a + b
=⇒ b =⇒
a
C ǫ AC
C ǫ AC
a+b
a+b
7: Removing C.
62
8.3 Closure under homomorphism
Suppose that Σ and Γ are two alphabets (possibly the same, but maybe different). A homomorphism h
is a function from Σ∗ to Γ∗ such that h(xy) = h(x)h(y) for any strings x and y. Equivalently, if we divide w
into a sequence of individual characters w = c1 c2 . . . ck , then h(w) = h(c1 )h(c2 ) . . . h(ck ). (It’s a nice exercise
to prove that the two definitions are equivalent.)
Example 8.3.1 Let Σ = {a, b, c} and Γ = {0, 1}, and let h be the mapping h : Σ → Γ, such that h(a) = 01,
h(b) = 00, h(c) = . Clearly, h is a homomorphism.
So, suppose that we have a regular language L. If L is represented by a regular expression R, then it is
easy to build a regular expression for h(L). Just replace every character c in R by its image h(c).
Example 8.3.2 The regular expression R = (ac + b)∗ over Σ becomes h(R) = (01 + 00)∗ .
Lemma 8.3.3 Let L Be a regular language over Σ, and let h : Σ → Γ be a homomorphism, then the language
h(L) is regular.
Proof: (Informal.) Let R Be a regular expression for R. Replace any character c ∈ Σ appearing in R by the
string h(c). Clearly, the resulting regular expression R0 recognizes all the words in h(L).
Proof:(More formal.) Let D be a NFA for L with a single accept state qfinal and an initial state qinit , so
that the only transitions from qinit is -transition out of it, and the is no outgoing transitions from qfinal and
only -transitions into it.
h(c)
→ q 0 in D by the transition q −−→ q 0 . Clearly, the resulting automata is a
c
Now, replace every transition q −
GNFA C that accepts the language h(L). We showed in the previous lecture, that a GNFA can be converted
into an equivalent regular expression R, such that L(C) = h(R). As such, we have that h(L) = L(C) = h(R).
Namely, h(L) is a regular language, as claimed.
Note, that in the above proof, instead of creating a GNFA, we can also create a NFA, by introducing
→ q 0 in D, and h(c) = w1 w2 . . . wk , then we will
c
temporary states. Thus, if we have the transition q −
→ q 0 by the transitions
c
introduce new temporary states s1 , . . . sk−1 , and replace the transition q −
wk−1
q0 .
w w w
q −→
1
s1 , s1 −→
2
s2 , . . . sk−2 −−−→ sk−1 , sk−1 −→
k
Note that when you have several equivalent representations, do your proofs in the one that makes the
proof easiest. So we did set complement using DFAs, concatenation using NFAs, and homomorphism using
regular expressions. Now we just have to finish the remaining bits of the proof that the three representations
are equivalent.
An interesting point is that if a language L is note regular then h(L) might be regular or not.
n o
Example 8.3.4 Consider the language L = an bn n ≥ 0 . The language L is not regular. Now, consider
n o
the homomorphism h(a) = a and h(b) = a. Clearly, h(L) = an an = a2n n ≥ 0 , which is definitely
regular. However, the identify homomorphism I(a) = a and I(b) = b maps L to itself I(L) = L, and as such
I(L) is not regular.
Intuitively, homomorphism can not make a language to be “harder” than it is (if it is regular, then it
remains regular under homomorphism). However, if it is not regular, it might remain not regular.
63
Chapter 9
In this lecture, we will see how to prove that a language is not regular.
We will see two methods for showing that a language is not regular. The “pumping lemma” shows that
certain key “seed” languages are not regular. From these seed languages, we can show that many similar
languages are also not regular, using closure properties.
q1
0 1
q0 q3
0, 1
1 1 0
q2
You can do any of the following operations:
64
The rule of the game is that when the DFA is in a final state, you would know it.
So, the question is how to decide in the above DFA what is the initial state?
Here is one possible solution.
Definition 9.1.1 For a DFA M = (Q, Σ, δ, q0 , F ), p ∈ Q and x ∈ Σ∗ , let M (p, x) be true if setting the
DFA to be in the state p, and then reading the input x causes M to arrive to an accepting state. Formally,
M (p, x) is true if and only if δ(p, x) ∈ F , and false otherwise.
The moral of this story. So, we can differentiate between two states p and q of a DFA M , by finding
strings x and y, such that M (p, x) accepts, but M (q.y) rejects, or vice versa. If x le.
Definition 9.1.2 Two states p and q of a DFA M disagree with each other, if there exists a string x, such
that M (p, x) 6= M (q, x) (that is, M (p, x) accepts but M (q, x) rejects, or vice versa).
p2
Proof: For i 6= j, since qi and qj disagree with each other, they can not possibly be the same state of M ,
since if they were the same state then they would agree with each other on all possible strings. We conclude
that q1 , . . . qn are all different states of M ; namely, M has at least n different states.
A Motivating Example
n o
Consider the language L = an bn n ≥ 0 . Intuitively, L can not be regular, because we have to remember
how many a’s we have seen before reading the b’s, and this can not be done with a finite number of states.
n o
Claim 9.1.5 The language L = an bn n ≥ 0 is not regular.
M = (Q, Σ, δ, q0 , F ).
Let qi denote the state M is in, after reading the string ai , for i = 0, 1, 2, . . . , ∞. We claim that qi disagrees
with qj if i 6= j. Indeed, observe that M (qi , bi ) accepts but M (qj , bi ) rejects, since ai bi ∈ L and ai bj ∈
/ L.
As such, by Lemma 9.1.4, M has an infinite number of state, which is impossible.
65
9.2 Irregularity via differentiation
Definition 9.2.1 Two strings x, y ∈ Σ∗ are distinguishable by L ⊆ Σ∗ , if there exists a word w ∈ Σ∗ ,
such that exactly one of the strings xw and yw is in L.
Lemma 9.2.2 Let M = (Q, Σ, δ, q0 , F ) be a given DFA, and let x ∈ Σ∗ and y ∈ Σ∗ be two strings dis-
tinguishable by L(M ). Then qx 6= qy , where qx = δ(q0 , x) (i.e., the state M is in after reading x) and
qy = δ(q0 , y) is the state that M is in after reading y.
Proof: Indeed, let w be the string causing x and y to be distinguished by L(M ), and assume that
xw ∈ L(M ) and xy ∈ / L(M ) (the other case is symmetric). Clearly, if qx = qy , then M (q0 , xw) = M (qx , w) =
M (qy , w) = M (q0 , yw), but it is given to us that M (q0 , xw) 6= M (q0 , yw) since exactly one of the words xw
and yw is in L(M ).
Lemma 9.2.3 Let L be a language, and let W = {w1 , w2 , w3 , . . .} be an infinite set of strings, such that
every pair of them is distinguishable by L. Then L is not a regular language.
Proof: Assume for the sake of contradiction, that L is regular, and let M be a DFA for M = (Q, Σ, δ, q0 , F ).
Let us set qi = δ(q0 , wi ). For i 6= j, wi and wj are distinguishable by L, and this implies by Lemma 9.2.2,
that qi 6= qj . This implies that M has an infinite number of states, which is of course impossible.
9.2.1 Examples
Example
Lemma 9.2.4 The language
n o
∗
L = 1k y y ∈ {0, 1} , and y contains at most k ones
is not regular.
Proof: Let wi = 1i , for i ≥ 0. Observe that for j > i we have that wi 01j = 1i 01j ∈
/ L but wj 01j = 1j 01j ∈ L.
As such, wi and wj are distinguishable by L, for any i 6= j. We conclude, by Lemma 9.2.3, that L is not
regular.
0i 10j
|{z}1 = wi 1wj 1 ∈
/L but wj 1wj 1 = 0j |{z}
10j 1 ∈ L
xj xj
but this implies that wi and wj are distinguishable by L, using the string xj = 10j 1. As such, by Lemma 9.2.3,
we have that L is not regular.
M = (Q, Σ, δ, q0 , F ).
66
Suppose that M has p states.
Consider the string ap bp . It is accepted using a sequence of states s0 s1 . . . s2p . Right after we read the
last a, the machine is in state sp .
In the sub-sequence s0 s1 . . . sp , there are p + 1 states. Since L has only p distinct states, this means that
two states in the sequence are the same (by the pigeonhole principle). Let us call the pair of repeated states
qi and qj , i < j. This means that the path through M ’s state diagram looks like, where ap = xyz1 .
y
x si = sj z1 sp bp
s0 s2k
But this DFA will accept all strings of the form xy j z1 bp , for j ≥ 0. Indeed, for j = 0, this is just the
string xz1 bp , which this DFA accepts, but it is not in the language, since it has less as than bs. That is, if
|y| = m, the DFA accepts all strings of the form ap−m+jm bm , for any j ≥ 0. For any value of j other than
1, such strings are not in L.
So our DFA M accepts some strings that are not in L. This is a contradiction, because L was supposed
to accept L. Therefore, we must have been wrong in our assumption that L was regular.
Theorem 9.3.1 (Pumping Lemma.) Let L be a regular language. Then there exists an integer p (the
“pumping length”) such that for any string w ∈ L with |w| ≥ p, w can be written as xyz with the following
properties:
• |xy| ≤ p.
• |y| ≥ 1 (i.e. y is not the empty string).
• xy k z ∈ L for every k ≥ 0.
Proof: The proof is written out in full detail in Sipser, here we just outline it.
Let M be a DFA accepting L, and let p be the number of states of M . Let w = c1 c2 . . . cn be a string of
length n ≥ p, and let the accepting state sequence (i.e., trace) for w be s0 s1 . . . sn .
There must be a repeat within the sequence from s0 to sp , since M has only p states, and as such, the
situation looks like the following.
y
x si = sj z1 sp z2
s0 sn
So if we set z = z1 z2 , we now have x, y, and z satisfying the conditions of the lemma.
• |xy| ≤ p because repeat is within first p + 1 states
• |y| ≥ 1 because i and j are distinct
• xy k z ∈ L for every k ≥ 0 because a loop in the state diagram can be repeated as many or as few times
as you want.
Formally, for any k, the word xy i z goes through the following sequence of states:
k times
z }| {
x y y y z
s0 −
→ si −
→ si −
→ ··· −
→ si = sj −
→ sn ,
and sn is an accepting state. Namely, M accepts xy k z, and as such xy k z ∈ L.
67
This completes the proof of the theorem.
Notice that we do not know exactly where the repeat occurs, so we have very little control over the length
of x and z1 .
68
Proving that a language is not regular
• By the Pumping Lemma, we know there exist x, y, z such that w = xyz, |xy| ≤ p, and |y| ≥ 1.
Notice that our adversary picks p. We get to pick w whose length depends on p. But then our adversary
gets to pick the specific division of w into x, y, and z.
9.3.4 Examples
The language L = an bn is not regular
Proof: For any p ≥ 0, consider the word w = ap bp , and consider any breakup of w into three parts, such
that w = xyz |y| ≥ 1, and |xy| ≤ p. Clearly, xy is a prefix of w made out of only as. As such, the word xyyz
has more as in it than bs, and as such, it is not in L.
But then, by the Pumping Lemma (Theorem 9.3.2), L is not regular.
Proof: For any p ≥ 0, consider the word w = 0p 10p 1, and consider any breakup of w into three parts,
such that w = xyz |y| ≥ 1, and |xy| ≤ p. Clearly, xy is a prefix of w made out of only 0s. As such, the word
xyyz has more 0s in its first part than the second part. As such, xyyz is not in L.
But then, by the Pumping Lemma (Theorem 9.3.2), L is not regular.
• These strings are a subset of L, chosen to exemplify what is not regular about L.
• The 1 in the middle serves as a barrier to separate the two groups of 0’s. (Think about why the proof
would fail if it was not there.)
• The 1 at the end of w does not matter to the proof, but we nee it so that w ∈ L.
69
9.3.5 A note on finite languages
A language L is finite if has a bounded number of words in it. Clearly, a finite language is regular (since
you can always write a finite regular expression that matches all the words in the language).
It is natural to ask why we can not apply the pumping lemma Theorem 9.3.1 to L? The reason is because
we can always choose the threshold p to be larger than the length of the longest word in L. Now, there is
no word in L with length larger than p in L. As such, the claim of the Pumping Lemma holds trivially for
a finite language, but no word can be pumped - and as such L stays finite. So the pumping lemma makes
sense even for finite languages!
Proof: Assume for the sake of contradiction that L0 is regular. Let h be the homomorphism that maps
0 to a and 1 to b. Then h(L0 ) must be regular (closure under homomorphism). But h(L0 ) is the language
n o
L = an bn n ≥ 0 , (9.1)
Proof: Suppose L2 were regular. Consider L2 ∩ a∗ b∗ . This must be regular because L2 and a∗ b∗ are
both regular and regular languages are closed under intersection. But L2 ∩ a∗ b∗ is just the language L from
Eq. (9.1), which is not regular (by Claim 9.1.5).
n o
Claim 9.4.3 The language L3 = an bn n ≥ 1 is not regular.
Proof: Assume for the sake of contradiction that L3 is regular. Consider L3 ∪ {}. This must be regular
because L3 and {} are both regular and regular languages are closed under union. But L3 ∪ {} is just L
from Eq. (9.1), which is not regular (by Claim 9.1.5).
A contradiction. As such, L3 is not regular.
70
Also, be sure to apply only closure properties that we know to be true. In particular, regular languages
are not closed under the subset and superset relations. Indeed, consider L1 = {001, 00}, which is regular.
But L1 is a subset of LB , which is not regular. Similarly, L2 = (0 + 1)∗ is regular. And it is a superset of L
(from Eq. (9.1) in the proof of Claim 9.4.1)). But you can not deduce that L is therefore regular. We know
it is not.
So regular languages can be subsets of non-regular ones and vice versa.
71
Chapter 10
In this lecture, we will see that every language has a unique minimal DFA. We will see this fact from
two perspectives. First, we will see a practical algorithm for minimizing a DFA, and provide a theoretical
analysis of the situation.
72
Example 10.1.2 For example, if L = 0∗ 1∗ , then:
r z
• L/ = 0∗ 1∗ = L
r z
• L/0 = 0∗ 1∗ = L
r z
• L/0i = 0∗ 1∗ = L, for any i ∈ N
r z
• L/1 = 1∗
r z
• L/1i = 1∗ , for any i ≥ 1
z n o
r
• L/10 = y 10y ∈ L = ∅.
n o
Hence there are only three suffix languages for L: 0∗ 1∗ , 1∗ , ∅. So C(L) = 0∗ 1∗ , 1∗ , ∅ .
As the above r z demonstrates, if there is a word x, such that any word w that have x as a prefix
example
is not in L, then L/x = ∅, which implies that ∅ is one of the suffix languages of L.
Example 10.1.3 The above suggests the following automata for the language of Example 10.1.2: L = 0∗ 1∗ .
0 1 0, 1
1 0
0∗ 1∗ 1∗ ∅
And clearly, this is the automata with the smallest number of states that accepts this language.
Lemma 10.1.4 For a regular language L, the number of different suffix languages it has is bounded; that is
C(L) is bounded by a constant (that depends on L).
r z
Proof: Consider the DFA M = (Q, Σ, δ, q0 , F ) that accepts L. For any string x, the suffix language L/x is
r Lz
just the languages associated with q , where q is the state M is in after reading x.
Indeed, the suffix language L/x is the set of strings w such that xw ∈ L. Since the DFA reaches q on
x, it is clear that the suffix language ofrx is zprecisely the language accepted by M starting from the state q,
which is Lq . Hence, for every x ∈ Σ∗ , L/x = Lδ(q0 ,x) , where q is the state the automaton reaches on x.
As such, any suffix language of L is realizable as the language of a state of M . Since the number of states
of M is some constant k, it follows that the number of suffix languages of L is bounded by k.
Lemma 10.1.5 If a language L has infinite number of suffix languages, then L is not regular.
73
The suffix languages of a non-regular language
n o
Consider the language L = an bn n ∈ N . The suffix language of L for ai is
z n o
r
L/ai = an−i bn n ∈ N .
r z
Note, that bi ∈ L/ai , but this is the only string made out of only bs that is in this language. As such, for
i
any i, j, where i and j are z r the zsuffix language of L with respect to a is different from that of L
r different,
with respect to aj (i.e. L/ai 6= L/aj ). Hence L has infinitely many suffix languages, and hence is not
regular, by Lemma 10.1.5.
• If two states are associated with the same suffix language, that we can merge them into a single state.
n o
• At least one non-regular language an bn n ∈ N has an infinite number of suffix languages.
It is thus natural to conjecture that the number of suffix languages of a language, is a good indicator
of how many states an automata for this language would require. And this is indeed true, as the following
section testifies.
r z r z
Lemma 10.2.2 Let L be a language over alphabet Σ. For all x, y ∈ Σ∗ we have that if L/x = L/y
r z r z
then for all a ∈ Σ we have L/xa = L/ya .
r z r z r z r z
Proof: If w ∈ L/xa , then (by definition) xaw ∈ L. But then, aw ∈ L/x . Since L/x = L/y , this
r z r z
implies that aw ∈ L/y , which implies that yaw ∈ L, which implies that w ∈ L/ya . This implies that
r z r z r z r z r z
L/xa ⊆ L/ya , a symmetric argument implies that L/ya ⊆ L/xa . We conclude that L/xa =
r z
L/ya .
Theorem 10.2.3 (Myhill-Nerode theorem.) A language L ⊆ Σ∗ is regular if and only if the number of
suffix languages of L is finite (i.e. C(L) is finite).
Moreover, if C(L) contains exactly k languages, we can build a DFA for L that has k states; also, any
DFA accepting L must have k states.
74
Proof: If L is regular, then C(L) is a finite set by Lemma 10.1.4.
Second, let us show that if C(L) is finite, then L is regular. Let the suffix languages of L be
nr z r z r zo
C(L) = L/x1 , L/x2 , . . . , L/xk . (10.1)
r z r z
Note that for any y ∈ Σ∗ , L/y = L/xj , for some j ∈ {1, . . . , k}.
We will construct a DFA whose states are the various suffix languages of L; hence we will have k states
r thezDFA. Moreover, the DFA will be designed such that after reading y, the DFA will end up in the state
in
L/y .
The DFA is M = (Q, Σ, q0 , δ, F ) where
nr z r z r zo
• Q= L/x1 , L/x2 , . . . , L/xk
r z
• q0 = L/ ,
nr z zo
r r z
• F = L/x ∈ L/x . Note, that by Lemma 10.2.1, if ∈ L/x then x ∈ L.
r z r z
• δ L/x , a = L/xa for every a ∈ Σ.
Remark 10.2.4 The full Myhill-Nerode theorem also shows that all minimal DFAs for L are isomorphic,
i.e. have identical transitions as well as the same number of states, but we will not show that part.
This is done by arguing that any DFA for L that has k states must be identical to the DFA we created
above. This is a bit more involved notationally, and is proved by showing a 1 − 1 correspondence between
the two DFAs and arguing they must be connected the same way. We omit this part of the theorem and
proof.
10.2.3 Examples
Let us explain the theorem we just proved using examples.
75
The suffix language of x ∈ Σ∗ , where x has an even number of a’s is:
z n o
r
L/x = w w has an odd number of a’s = L.
Hence there are only two distinct suffix languages for L. By the theorem, we know L must be regular
and the minimal DFA for L has two states. Going with r the r ofzthe DFA mentioned in the proof
z construction
of the theorem, we see that we have two states, q0 = L/ and q1 = L/a . The transitions are as follows:
r z r z
• From q0 = L/ , on a we go to L/a , which is the state q1 .
r z r z r z
• From q0 = L/ , on b we go to L/b , which is same as L/ , i.e. the state q0 .
r z r z r z
• From q1 = L/a , on a we go to L/aa , which is same as L/ , i.e. the state q0 .
r z r z r z
• From q1 = L/a , on b we go to L/ab , which is same as L/a , i.e. the state q1 .
r z r z
The initial state is L/ which is the state q0 , and the final states are those states L/x that have
in them, which is the set {q1 }.
We hence have a DFA for L, and in fact this is the minimal automaton accepting L.
or vice versa.
In particular, this implies that if p and q are distinct because of word w of length m, then δ(p, c1 ) and
δ(q, c1 ) are distinct because of a word w0 = c2 . . . cm of length m − 1.
Thus, its easy to compute the pairs of states distinct because of empty words, and if we computed all
the states distinct because of words of length m − 1, we can “propagate” this information for pairs of states
distinct by states of length m.
76
10.3.2 The algorithm
The algorithm for marking distinct states follows the above (recursive) definition. Create a table Distinct
with an entry for each pair of states. Table cells are initially blank.
For each pair of states (p, q) and each character a in the alphabet:
if Distinct(p, q) is empty and Distinct(δ(p, a), δ(q, a)) is not empty
Set Distinct(p, q) to be a.
(3) Two states p and q are distinct iff Distinct(p, q) is not empty.
1
0,1 1
q7 q2 q3 q5 0, 1
q7 q2 /q3
0 0,1 0 1
0 0 0 1 0
0, 1
q0 /q1 q4 q5 /q6
q0 1 q1 q4 1 q6
1
1 0, 1
0
(a) (b)
The following is the execution of the algorithm on the DFA of Figure 10.1.
After step (1):
q0
q1
q2
q3
q4
q5
q6
q7
q0 q1 q2 q3 q4 q5 q6 q7
(Note, that for a pair of states (qi , qj ) we need only a single entry since (qj , qi ) is equivalent, and we do not
need to consider pair on the diagonal of the form (qi , qi ).)
77
q0 q0
q1 q1
q2 1 1 q2 1 1
q3 1 1 q3 1 1
q4 0 0 0 0 ⇒ q4 0 0 0 0
q5 q5
q6 q6
q7 1 1 0 q7 1 1 1 1 0
q0 q1 q2 q3 q4 q5 q6 q7 q0 q1 q2 q3 q4 q5 q6 q7
After one iteration of step (2) After the second iteration of step (2)
Third iteration of step (2) makes no changes to the table, so we halt. The cells (q0 , q1 ), (q2 , q3 ) and
(q5 , q6 ) are still empty, so these pairs of states are not distinct. Merging them produces the following simpler
DFA recognizing the same language.
78
Chapter 11
This lecture introduces context-free grammars, covering section 2.1 from Sipser.
Our purpose is to come up with a way to describe the above language L in a compact way. It turns out
that context-free grammars are one possible way to capture such languages.
Here is a diagram demonstrating the classes of languages we will encounter in this class. Currently, we
only saw the weakest class – regular language. Next, we will see context free grammars.
Regular
Context free grammar
! Turing decidable
Turing recognizable
Not Turing recognizable.
(Territory of the fire-breathing dragons)
A compiler or a natural language understanding program, use these languages as follows:
• It uses regular languages to convert character strings to tokens (e.g. words, variables names, function
names).
• It uses context-free languages to parse token sequences into functions, programs, sentences.
79
Just as for regular languages, context-free languages have a procedural and a declarative representation,
which we will show to be equivalent.
procedural declarative
NFAs/DFAs regular expressions
pushdown automata (PDAs) context-free grammar
a S b
a S b
a S b
This tree is known as the parse tree of the grammar of Eq. (11.1) for the word aaabbb.
80
Deriving the context-free grammars by constructing sentence structure
A context-free grammar defines the syntax of a program or sentence. The structure is easiest to see in a
parse tree.
......
......
S.................
...... ......
.
...
....... ......
......
...
...... ......
...... ......
......
......
NP. VP .
... ... .
....
....
... ... ...
....
... ...
....
....
..... ...
.. .
.... ....
..
. ...
D N V NP.
.. .. .. . .. .. ..
...
...
.. .. .. ....
...
... ... ... ...
...
...
.. .. .. ... ...
... .... ... ... ..
(i) S → NP VP
(ii) NP → D N
(iii) VP → V NP
(v) D → the | my . . .
If projection is working, show a sample computer-language grammar from the net. (See pointers on web
page.)
Synthetic examples
In practical applications, the terminals are often whole words, as in the example above. In synthetic examples
(and often in the homework
n problems),
o the terminals will be single letters.
n n
Consider L = 0 1 n ≥ 0 . We can capture this language with a grammar that has start symbol S
and rule
S → 0S1 | .
For example, we can derive the string 000111 as follows:
81
n o
∗
Or, consider the language of palindromes L = w ∈ {a, b} w = wR . Here is a grammar with start
symbol P . for this language
P → aPa | bPb | | a | b.
A possible derivation of the string abbba is
P → aPa → abPba → abbba.
11.2 Derivations
Consider our Groke example again. It has only one parse tree, but multiple derivations: After we apply the
first rule, we have two variables in our string. So we have two choices about which to expand first:
S → NP VP → . . .
If we expand the leftmost variable first, we get this derivation:
S → NP VP → D N VP → the N VP → the Groke VP → the Groke V NP → . . .
If we expand the rightmost variable first, we get this derivation:
S → NP VP → NP V NP → NP V D N → NP V D homework
→ NP V my homework . . .
The first is called the leftmost derivation. The second is called the rightmost derivation. There
are also many other possible derivations. Each parse tree has many derivations, but exactly one rightmost
derivation, and exactly one leftmost derivation.
Definition 11.2.2 (CFG yields.) Suppose x, y, and w are strings in (V ∪ Σ)∗ and B is a variable. Then
xBy yields xwy, written as
xBy ⇒ xwy,
if there is a rule in R of the form B → w.
Notice that x ⇒ x, for any x and any set of rules.
Definition 11.2.4 If G = (V, Σ, R, S) is a grammar, then L(G) (the language of G) is the set
n o
∗
L(G) = w ∈ Σ∗ S ⇒ w ..
That is, L(G) is all the strings containing only terminals which can be derived from the start symbol of G.
82
11.2.2 Ambiguity
Consider the following grammar G = (V, Σ, R, S). Here
V = {S, N, NP , ADJ} and Σ = {and, eggs, ham, pencilgreen, cold, tasty, . . .} .
The set R contains the following rules:
Here are two possible parse trees for the string green eggs and ham (ignore the spacing for the time
being).
S
NP and NP S
ADJ NP N ADJ NP
Removing ambiguity
There are several ways to remove ambiguity:
(A) Fix grammar so it is not ambiguous. (Not always possible or reasonable or possible.)
(B) Add grouping/precedence rules.
(C) Use semantics: choose parse that makes the most sense.
Grouping/precedence rules are the most common approach in programming language applications. E.g.
“else” goes with the closest “if”, * binds more tightly than +.
Invoking semantics is more common in natural language applications. For example, “The policeman killed
the burgler with the knife.” Did the burgler have the knife or the policeman? The previous context from
the news story or the mystery novel may have made this clear. E.g. perhaps we have already been told that
the burgler had a knife and the policeman had a gun.
Fixing the grammar is less often useful in practice, but neat when you can do it. Here’s an ambiguous
grammar with start symbol E. N stands for “number” and E stands for “expression”.
E → E × E | E+E | N
N → 0N | 1N | 0 | 1
An expression like 0110 × 110 + 01111 has two parse trees and, therefore, we do not know which operation
to do first when we evaluate it.
We can remove this ambiguity as follows, by rewriting the grammar as
83
E→E+T|T
T→N×T|N
N → 0N | 1N | 0 | 1
Now, the expression 0110 × 110 + 01111 must be parsed with the + as the topmost operation.
84
Chapter 12
In this lecture, we are interested in transforming a given grammar into a cleaner form. We start by
describing how to clean up a grammar. Then, we show how to transform a cleaned up grammar into a
grammar in Chomsky Normal Form.
Note, that some of the cleanup steps are not necessary if one just wants to transform a grammar into
Chomsky Normal Form. In particular, Section 12.1.2 and Section 12.1.3 are not necessary for the CNF
conversion. Note however, that the algorithm of Section 12.1.3 gives us am immediate way to decide if a
grammar is empty or not, see Theorem 12.1.2.
⇒ S0 → S | X | Z
S→A
A→B
B→C
(G1)
C → Aa
X→C
Y → aY | a
Z → .
(i) The variable Y can never be derived by the start symbol S0 . It is a useless variable.
(ii) The rule S → A is redundant. We can replace any appearance of S by A, and reducing the number of
variables by one. Rule of the form S → A is called a unit production (or unit rule.
85
(iii) The variable A is also useless since we can note derive any word in Σ∗ from A (because once we
starting deriving from A we get into an infinite loop).
∗
(iv) We also do not like Z, since one can generate from it (that is Z =⇒ . Such a variable is called
nullable. We would like to have the property that only the start variable can be derived to .
We are going to present a sequence of algorithms that transform the grammar to not have these drawbacks.
Clearly, this algorithm returns all the variables that are derivable form the start symbol S. As such,
settings V 0 = compReachableVars(G) we can set our new grammar to be G 0 = (V 0 , Σ, R0 , S), where R0 is
the set of rules of R having only variables in V 0 .
Lemma 12.1.1 Given a context-free grammar (CFG) G = (V, Σ, R, S) one can compute an equivalent CFG
G 0 such that any variable of G 0 can derive some string in Σ∗ .
Note, that if a grammar G has an empty language, then the equivalent grammar generated by Lemma 12.1.1
will have no variables in it. Namely, given a grammar we have an algorithm to decide if the language it
generates is empty or not.
86
Theorem 12.1.2 (CFG emptiness.) Given a CFG G, there is an algorithm that decides if the language of
G is empty or not.
Applying the algorithm of Section 12.1.2 together with the algorithm of Lemma 12.1.1 results in a CFG
without any useless variables.
Lemma 12.1.3 Given a CFG one can compute an equivalent CFG without any useless variables.
X → AC,
87
(ii) If Xi is nullable, then αi is either Xi or .
Let G 00 = (V, Σ, R0 , S0 ) be the resulting grammar. Clearly, no variable is nullable, except maybe the start
variable, and there are no -production rules (except, again, for the special rule for the start variable).
Note, that we might need to feed G 00 into our procedures to remove useless variables. Since this process
does not introduce new rules or variables, we have to do it only once.
Theorem 12.3.1 Given an arbitrary CFG, one can compute an equivalent grammar G 0 , such that G 0 has
no unit rules, no -productions (except maybe a single -production for the start variable), and no useless
variables.
88
12.4 Chomsky Normal Form
Chomsky Normal Form requires that each rule in the grammar is either
(C1) of the form A → BC, where A, B, C are all variables and neither B nor C is the start variable.
(That is, a rule has exactly two variables on its right side.)
Note, that rules of the form A → B, A → BCD or A → aC are all illegal in a CNF.
Also a grammar in CNF never has the start variable on the right side of a rule.
Why should we care for CNF? Well, its an effective grammar, in the sense that every variable that being
expanded (being a node in a parse tree), is guaranteed to generate a letter in the final string. As such, a
word w of length n, must be generated by a parse tree that has O(n) nodes. This is of course not necessarily
true with general grammars that might have huge trees, with little strings generated by them.
(i) Create a new start symbol S0 , with new rule S0 → S mapping it to old start symbol (i.e., S).
(ii) Remove nullable variables (i.e., variables that can generate the empty string).
(iii) Remove unit rules (i.e., variables that can generate each other).
The only step we did not describe yet is the last one.
Removing characters from right side of rules. As a first step, we introduce a variable Vc for every
character c ∈ Σ and it to V. Next, we add the rules Vc → c to the grammar, for every c ∈ Σ.
∗
Now, for any string w ∈ (V ∪ Σ) , let wb denote the string, such that any appearance of a character c in
w, is replaced by Vc .
Now, we replace every rule X → w, such that |w| > 1, by the rule X → w. b
Clearly, (C2) and (C3) hold for the resulting grammar, and furthermore, any rule having variables on
the right side, is made only of variables.
Making rules with only two variables on the right side. The only remaining problem, is that in the
current grammar, we might have rules that are too long, since they have long string on the right side. For
example, we might have a rule in the grammar of the form
X → B1 B3 . . . Bk .
89
To make this into a binary rule (with only two variables on the right side, we remove this rule from the
grammar, and replace it by the following set of rules
X → B1 Z1
Z1 → B2 Z2 Z2 → B3 Z3
...
Zk−3 → Bk−2 Zk−2
Zk−2 → Bk−1 Bk ,
Theorem 12.4.1 (CFG → CNF.) Any context-free grammar can be converted into Chomsky normal form.
⇒ S → ASA | aB
(G0) A→B|S
B→b|
After adding the new start symbol S0 , we get the following grammar.
⇒ S0 → S
S → ASA | aB
(G1)
A→B|S
B→b|
Removing nullable variables In the above grammar, both A and B are the nullable variables. We have
the rule S → ASA. Since A is nullable, we need to add S → SA and S → AS and S → S (which is of course a
silly rule, so we will not waste our time putting it in). We also have S → aB. Since B is nullable, we need
to add S → a. The resulting grammar is the following.
⇒ S0 → S
S → ASA | aB | a | SA | AS
(G2)
A→B|S
B→b
Removing unit rules. The unit pairs for this grammar are {A → B, A → S, S0 → S}. We need to copy
the productions for S up to S0 , copying the productions for S down to A, and copying the production B → b
to A → b.
⇒ S0 → ASA | aB | a | SA | AS
S → ASA | aB | a | SA | AS
(G3)
A → b | ASA | aB | a | SA | AS
B→b
90
Final restructuring. Now, we can directly patch any places where our grammar rules have the wrong form
for CNF. First, if the rule has at least two symbols on its righthand side but some of them are terminals,
we introduce new variables which expand into these terminals. For our example, the offending rules are
S0 → aB, S → aB, and A → aB. We can fix these by replacing the a’s with a new variable U, and adding a
rule U → a.
⇒ S0 → ASA | UB | a | SA | AS
S → ASA | UB | a | SA | AS
(G4) A → b | ASA | UB | a | SA | AS
B→b
U→a
We are done!
91
Chapter 13
This lecture introduces pushdown automata, i.e. about the first half of section 2.2 from Sipser.
PDAs are the procedural counterpart to context-free grammars. A pushdown automaton (PDA) is simply
an NFA with a stack. In a couple lectures, we’ll prove that they generate the same languages.
ǫ,push$ ǫ,pop $
q0 q1 q2
],pop [ ),pop (
Do a short trace of the state sequence and sequence of stack contents as this machine recognizes the
string [()()].
Formal notation for the labels on PDA transition arcs is
c, s → t ,
where c is the character to be read from the input stream, s is the character to be popped from the top of
the stack, and t is the character to be pushed back onto the stack. All of these can be .
For example, to pop something from the stack, we use a label like c, s → . To push something onto the
stack: c, → t. A transition like c, s → t pops s from the stack and substitutes t in its place.
So a properly drawn version of our state diagram would look like:
92
[, ǫ → [ (, ǫ → (
ǫ, ǫ → $ ǫ, $ → ǫ
q0 q1 q2
], [→ ǫ ), (→ ǫ
a, ǫ → a a, a → ǫ
b, ǫ → b b, b → ǫ
ǫ, ǫ → $ ǫ, ǫ → ǫ ǫ, $ → ǫ
q0 q1 q2 q2
• You can take a transition if the input character and the top-of stack character both match what is on
the transition or are .
• It is non-deterministic (like an NFA): more than one transition might match and you need to pick the
“right” one. E.g. guessing when you have reached the midpoint of the string and need to take the
transition from q1 to q2 .
ǫ, ǫ → $ ǫ, ǫ → ǫ ǫ, $ → ǫ
q0 q1 q2 q3
b, X → ǫ b, X → ǫ
q2a
Things to notice:
• The stack alphabet is different from the input alphabet. You can use some of the same characters but
you don’t have to.
• The 2-step loop involving q2 and q2a checks that the number of b’s is even.
93
13.4 The language an b2n
n o
Another example: L = an b2n n ≥ 1 . Notice that elements of L have twice as many b’s as a’s. Also, the
empty string isn’t in the language; the shortest string in L is abb.
a, ǫ → X
ǫ, ǫ → $ a, ǫ → X ǫ, $ → ǫ
q0 q1 q2 q3
b, X → ǫ b, ǫ → ǫ
q2a
Here’s another one of the possible PDAs that recognizes this language.
b, X → ǫ
ǫ, ǫ → $ b, X → ǫ ǫ, $ → ǫ
q0 q1 q2 q3
ǫ, ǫ → X a, ǫ → X
q1a
Things to notice:
• The 2-step loop involving q1 and q1a reads one a off the input and pushes two X’s onto the stack.
• The transition from q1 to q2 explicitly reads an input character. So the empty string can not make it
through to the final state.
That is, the input to the transition function is a state, an input symbol, and a stack symbol. The input
and stack symbols can be real characters from Σ and Γ or they can be . The output is a set of pairs (q, t),
where q is a state and t is a stack symbol or .
A PDA M = (Q, Σ, Γ, δ, q0 , F ) accepts a string w if there is
(a) a sequence of states s0 , s1 , . . . sk ,
(b) a sequence of characters and ’s c1 , c2 , . . . ck , and
(c) a sequence of strings t0 , t1 , . . . tk (the stack snapshots),
such that
• w = c1 c2 . . . ck
• s0 = q0 and t0 = (we start at the start state with an empty stack)
• sk ∈ F (we end at an accepting state)
94
• There are characters (or ’s) a and b and a string x such that tk−1 = ax, tk = bx, and (sk , b) ∈
δ(sk−1 , ck , a). (Each change to the machine’s state follows what is allowed by δ.)
The last condition will need some discussion and probably a picture or two.
a, ǫ → a ǫ, $ → ǫ
ǫ, ǫ → ǫ q2a q3a
ǫ, ǫ → $
q0 q1 ǫ, ǫ → ǫ
ǫ, ǫ → ǫ ǫ, $ → ǫ
q2b q3b q4b
b, ǫ → ǫ c, a → ǫ
It turns out that recognizing this language requires non-determinism. There is a deterministic version
of a PDA, but it does not recognize as many languages as a normal (non-deterministic) PDA.
95
Chapter 14
Observation 14.1.1 Consider a grammar G which is CNF, and a variable X of G which is not the start
variable. Then, any string derived from X must be of length at least one.
Claim 14.1.2 Let G = (V, Σ, R, S) be a context-free grammar in Chomsky normal form, and w be a string
∗
of length one. Furthermore, assume that there is a X ∈ V, such that X =⇒ w (i.e., w can be derived from
X), and let T be the corresponding parse tree for w. Then the tree T has exactly 2 |w| − 1 internal nodes.
Proof: A full binary tree is a tree were every node other than the leaves has two children. It is easy to
verify by easy induction that such a tree with m leaves, has m − 1 internal nodes.
Now, the given tree T is not quite a full binary tree. Indeed, the kth leaf (from the left) of T , denoted
by `k , is the k character of w, and its parent must be a node labeled by a variable, Xk , for k = 1, . . . , n.
Furthermore, we must have that the parent of `k has only a single child. As such, if we remove the n leafs of
T , we remain with a full binary tree T 0 with n leafs (every parent of a leaf `k became a leaf). This tree is a
full binary tree, because any internal node, must correspond to a non-terminal derivation of a CNF grammar,
and any such derivation has the form X → YV Z; that is, the derivation corresponds to an internal node with
two children in T . Now, the tree T 0 has n − 1 internal nodes, by the aforementioned fact about full binary
trees. As such, T 0 has n − 1 internal nodes. Each leaf of T 0 is an internal node of T , and T 0 has n such leafs.
We conclude that T has 2n − 1 internal nodes.
96
respectively, and let T1 and T2 denote the corresponding subtrees of T . Clearly w = w1 w2 , |w1 | > 0 and
|w2 | > 0, by the above observation. Clearly, T1 (resp. T2 ) is a tree that derives w1 (resp. w2 ) from X1 (resp.
X2 ). By induction, T1 (resp. T2 ) has 2 |w1 | − 1 (resp. 2 |w2 | − 1) internal nodes. As such, T has
# internal # internal
N =1+ +
nodes of T1 nodes of T2
= 1 +(2 |w1 | − 1) +(2 |w2 | − 1) = 2(|w1 | + |w2 |) − 1
= 2 |w| − 1,
The following is the same claim, restated as a claim on the number of derivation used.
Claim 14.1.3 if G is a context-free grammar in Chomsky normal form, and w is a string of length n ≥ 1,
then any derivation of w from any variable X contains exactly 2n − 1 steps.
Theorem 14.1.4 Given a context free grammar G, and a word w, then one can decide if w ∈ L(G) by an
algorithm that always stop.
Proof: Convert G into Chomsky normal form, and let G 0 be the resulting grammar tree. Let n = |w|. Observe
that w has a parse tree using G 0 with 2n − 1 internal nodes, by Claim 14.1.2. Enumerate all such possible
parse trees (their number is large, but finite), and check if any of them is (i) a legal parse tree for G 0 , and
(ii) it derives the word w. If we found such a legal tree deriving w, then w ∈ L(G). Otherwise, w can not be
generated by G 0 , which implies that w ∈/ L(G).
S → Astar X
Astar → aAstar |
X → bXc | ,
with the start symbol being S is a CFG for L1 (a similar grammar works for L2 ). As such, both L1 and L2
are CFG, but their intersection L = L1 ∩ L2 is not CFG. Thus, context-free languages are not closed under
intersection.
97
CFGs are closed under union
Suppose we have grammars for two languages, with start symbols S and T , respectively. Rename variables
(in the two grammars) as needed to ensure that the two grammars do not share any variables. Then construct
a grammar for the union of the languages, with start symbol Z, by taking all the rules from both grammars
together and adding a new rule Z → S | T .
Concatenation.
Suppose we have grammars for two languages, with start symbols S and T . Rename variables as needed to
ensure that the two grammars do not share any variable. Then construct a grammar for the union of the
languages, with start symbol Z, by taking all the rules from both grammars and adding a new rule Z → S T .
Star operator
Suppose that we have a grammar for the language L, with start symbol S. The grammar for L∗ , with start
symbol T , contains all the rules from the original grammar plus the rule T → T S | .
String reversal
Reverse the character string on the righthand side of every rule in the grammar.
Homomorphism
Suppose that we have a grammar G for language L and a homomorphism h. To construct a grammar for
h(L), modify the righthand side of every rule in G to replace each terminal symbol t with its image h(t)
under the homomorphism.
Here, the variable Xq q0 represents all strings that can be derived by the variable X of G, and furthermore
if we feed such a string to D (starting at state q), then we would reach the state q 0 . So, consider a rule of
the form
X → YZ
that is in G. For every possible starting state q, and ending state q 0 , we want to generate a rule for the
variable Xq q0 . So we derive a substring w for Y. Feeding the D the string w, starting at q, would lead us
to a state s. As such, the string generated from Yin this case, would move D from q to s, and the string
generated by Zwould move D from s to q 0 . That is, this rule can be rewritten as
∀q, q 0 , s ∈ Q Xq q0 → Yq s Zs q0 .
1 Here, and in a lot of other places, we abuse notations. When we write an bn , what we really mean is the language
n ˛ o
an bn ˛ n ≥ 0 .
˛
98
If we have a rule of the from X → c in G, then we create the rule Xq q0 → c if there is a transition in D
from q to q 0 that reads the character c, where c ∈ Σ .
Finally, we create a new start variable S0 , and we introduce the rule S0 → Sqinit q0 , where q0 is the initial
state of D, and q 0 ∈ F is an accept state of D.
We claim that the resulting grammar accepts only words in the language LCFG ∩ Lreg .
Formal description
We have a CFG G = (V, Σ, R, S) and a DFA D = (Q, Σ, δ, qinit , F ). We now build a new grammar for the
language L(G) ∩ L(D). The set of new variables is
n o
V 0 = {S0 } ∪ Xq q0 X ∈ R, q, q 0 ∈ Q .
n o
R0 = Xq q0 → Yq s Zs q0 ∀q, q 0 , s ∈ Q (X → YZ) ∈ R (14.1)
[n o
S0 → Sqinit q0 q 0 ∈ F (14.2)
[n o
Xq q0 → c (X → c) ∈ R and δ(q, c) = q 0 . (14.3)
If S → ∈ R and qinit ∈ F (i.e., is in the intersection language) then we add the rule {S0 → } to R0 .
The new grammar is G∩D = (V 0 , Σ, R0 , S0 ).
Observation 14.2.2 The new grammar G∩D is “almost” a CNF. That is, if we ignore rules involving the
start symbol S0 of G∩D then its a CNF.
Correctness
Lemma 14.2.3 Let G be a context-free grammar in Chomsky normal form, and let D be a DFA. Then one
can construct a grammar is a grammar G∩D = (V 0 , Σ, R0 , S0 ), such that, for any word w ∈ Σ∗ \ {}, we have
∗ ∗
that X =⇒ w and δ(q, w) = q 0 if and only if Xq q0 =⇒ w.
Proof: The construction is described above, and proof is by induction of the length of w.
h i
|w| = 1 : If |w| = 1 then w = c, where c ∈ Σ.
∗
Thus, if X =⇒ w and δ(q, w) = q 0 then X → c is in R, which implies that we introduced the rule
∗
Xq q0 → c into R, which implies that Xq q0 =⇒ w.
∗
Similarly, if Xq q0 =⇒ w then since G∩D is a CNF, and |w| = 1, this implies that there must be a
derivation Xq q0 → c. But this implies, by construction, that X → c is a rule of G and δ(q, c) = q 0 , as
required.
h i
|w| > 1 : Assume, that by induction, the claim holds for all words strictly shorter than w.
∗ ∗
– X =⇒ w and δ(q, w) = q 0 =⇒ Xq q0 =⇒ w.
∗
IF X =⇒ w and δ(q, w) = q 0 , then consider the parse tree of G deriving X from w. Since G is a
CNF, we have that the root of this parse tree T corresponds to a rule of the form X → YZ. Let
wY and wZ be the two sub-words derived by these two subtrees of the T . Clearly, w = wY wZ , and
since G is a CNF, we have that |wY | , |wZ | > 0 (since any symbol except the root in a CNF derives
a word of length at least 1). As such, |wY | , |wZ | < |w|. Now, let q 00 = δ(q, wY ). We have that
∗
Y =⇒ wY , 0 < |wY | < |w| , and q 00 = δ(q, wY ).
99
∗
As such, by induction, it must be that Yq q00 =⇒ wY . Similarly, since δ(q 00 , wZ ) = q 0 , and
∗
by the same argument, we have that Zq00 q0 =⇒ wZ . Now, by Eq. (14.1), we have the rule
Xq q0 → Yq q00 Yq00 q in R0 . Namely,
∗
Xq q0 → Yq q 00 Yq 00 q =⇒ wY wZ = w,
Xq q0 → Yq q 00 Yq 00 q.
be the ruled used in the root of T 0 , and let wY , wZ be the two substrings of w generated by these
two subtrees. That is w = wY wZ . By induction, we have that
∗ ∗
Y =⇒ wY , δ(q, wY ) = q 00 , and Z =⇒ wZ , δ(q 00 , wZ ) = q 0 .
∗
Now, by construction, the rule X → YZ must be in R. As such X → YZ =⇒ wY wZ = w, and
Theorem 14.2.4 Let L be a context-free language and L0 be a regular language. Then, L ∩ L0 is a context
free language.
Proof: Let G = (V, Σ, R, S) be a CNF for L, and let D = (Q, Σ, δ, qinit , F ) be a DFA for L0 . We apply the
above construction to compute a grammar G∩D = (V 0 , Σ, R0 , S0 ) for the intersection.
• w ∈ L ∩ L0 =⇒ w ∈ L(G∩D ).
If w = then the rule S0 → is in G∩D and we have that ∈ L(G∩D ).
∗
For any other word, if w ∈ L ∩ L0 then S =⇒ w and q 0 = δ(qinit , w) ∈ F then, by Lemma 14.2.3, we
have that
∗
Sqinit q0 =⇒ w.
Furthermore, by construction, we have the rule
S0 → Sqinit q0 .
∗
As such, S0 =⇒ w, and w ∈ L(G∩D ).
• w ∈ L(G∩D ) =⇒ w ∈ L ∩ L0 .
Similar to the above proof, and we omit it.
100
Chapter 15
Here we start proving a set of results similar to those we proved for regular languages. That is, we will
show that context-free grammars and pushdown automata generate the same languages. We will show that
context-free languages are closed under a variety of operations, though not as many as regular languages.
We will also prove a version of the pumping lemma, which we can use to show that certain languages (e.g.
an bn cn ) are not context-free.
q a, s → e
q
X
a, s → cde =⇒ ǫ, ǫ → d
r Y
r ǫ, ǫ → c
101
Idea: The PDA needs to verify that there is a derivation for the input string w. Here’s an algorithm that
almost works:
• Use the grammar rules from G to expand variables on the stack. The PDA guesses which rule to apply
at each step.
• When the stack contains only terminals, compare this terminal string against the input w. Accept iff
they are the same.
S → XB
Consider the example grammar G with start symbol S: X → aXb |
B → cB |
n o
This generates the language L = an bn ci i ≥ 0, n ≥ 0 .
Here is a derivation for the string aabbc:
If we could do this on the stack, we will get the following sequence of snapshots for the stack of the PDA.
top of stack → S X a a a a
$ B X a a a
⇒ ⇒ ⇒ ⇒ ⇒
$ b b b b
B b b b
$ B c c
$ B $
$
In the end of this process, the PDA would guess that it finished applying all the grammar rules, and it
would each character of the input, verify that it is indeed equal to the top of the stack, pop the stack and go
on to the next letter. Then, the PDA would verify that we had reached the bottom of the stack (by popping
the character $ out of it), and move into the accept state.
Let us try to do this on the stack. We get two steps in, and the stack looks like the following. At this
stage, we can not expand X because it is not on the top of the stack. There is a terminal (i.e., a) blocking
our access to it.
a ⇐ top of the stack
X
b
B
So, the “real” PDA has to interleave these two steps: expanding variables by guessing grammar rules and
matching the input against the terminals at the top of the stack. We also need to mark the bottom of the
stack, so we can make sure we have completely emptied it after we’ve finished reading the input string.
So the final PDA looks like the following.
looprules
ǫ, ǫ → S$ ǫ, $ → ǫ
q0 q1 q2
The rules on the loop in this PDA are of two types:
102
For our example grammar, the loop rules are:
• S → XB, • B → cB,
• b, b → ,
• X → aXb, • B → ,
• c, c → .
• X → , • a, a → ,
• We remove the first step in the derivation. Not, for example, the final step. In general, results about
grammars require removing the first step of a derivation or the top node in a parse tree.
• It almost never works right to start with a derivation or tree of size k and try to expand it into a
derivation or parse tree of size k + 1. Please avoid this common mistake.
As an example of why inherent hopelessness of arguing in this wrong direction, consider the following
grammar:
⇒ S→A|B
(G6) A → aaA |
B → bbB | b
All the even-length strings are generated from A and contain all a’s. We claim that all the odd-length
strings are generated from B and contain all b’s. As such, there is no way to prove this claim by
induction, by taking the parse tree for an even-length string and graft on a couple nodes to make a
parse tree for the next longer length string.
103
Chapter 16
A → b | AZ | UB | a | SA | AS
U
..
B .. ...
S... A ..
. .. ... ..
(G5) ..
.. .. ... ... ..
...
B→b ..
..
..
...
....
.
.
.
..
.
...
...
.
...
...
.
U→a a b A Z a S0
.. . ..
... ....
.
... ....
Z → SA ...
..
.. ...
..
...
... ...
...
...
...
.. .
.. .. ... ..
...
What if we wanted to make the second word longer without thinking too much? Well, both parse trees
have subtrees with nodes generated by the variable S. As such, we could just cut the subtree of the first
word using the variable S, and replace the subtree of S in the second word by this subtree, which would like
the following:
104
.
......
S0 .......
..... ......
...... ......
..
..
...... ......
......
..
...... ......
...... ......
A
.. ... ...
Z...... ..
S0 ..
... .... ... .... ... ....
.... ... ... ....
... ... .... .... ... ....
....
..
. ... ... ....
.... .... ....
. . ...
U ..
B ..
S A ..
S A ..
.. .. .. ...
... .. ... ..
... ..
.. ... ...
.. ... ... ..
... .. ... ... .. ... ... ...
..... ... ... ...
.... ... ... ....
.
a b A ..
Z.. ...
a A Z b
.. ... .. ...
.. ... ..... .. ... .....
.. ... ... .. ... ...
.. ..
... .
.
... ... .... ...
.. ..
b S .
A ..
b S ..
A ..
.. .. .. ..
.. .. ... ..
.. .. .. ...
.. ..
... ... .... .....
a b a b
Even more interestingly, we can do this cut and paste on the original tree:
...
........
S0 ...................
........ ........
........ ........
...
...
......... ........
........
....
........ ........
........ ........
........ ........
........ .....
A . .......
.......
.. Z....................
... ... ..
..... .......
... ..... ....... .......
... ... ....
......... .......
.......
..
... .. ....... .......
....... .......
.....
.
......
S0 ............. U B ......
S .......... A
...... ....... .. .. ...... ..
....... ....... ... .. ......
......
...
...
........ ....... .. .. ......
......
...... ..
......
. ....... .. ... ......
. ...... ..
....... ....... .. .. .
......
..
.. ...... ..
....... ....... .. .. .......
......
...... ..
....... .......
A . .....
...... Z.............. a b A ..
......
.....
Z.............. a
... ... ..... ...... .. .....
. ......
... ..... ...... ...... .. ...... ......
... ... .
....... ...... .. ...... ......
... ..... ...... .. ..... ......
.. ..... ......
..... ... ...... ......
.....
U
..
B
..
S ... A
..
b S ... A
..
... ... ..... .... ..... ...
..
.. ..... ..... .
.. ..... .....
.. ..... ..... .. ..... ..... ..
..
.. ... .
......
. ..... .. .
.....
. ..... ...
.. .
... ..... .. .
... ..... ..
.. .. ..... ..... .. ..... .... ..
a b A ....
Z...... a A . ..
Z ...... b
..
.. .... .... .. .... ...
.. ... .... .. ... ....
.. .... .... .. .... ....
.. .
.. .... .. .... ....
... ... ... ....
.
b S A
..
b S A
..
... ... .. ...
... ..
...
... .. .. ... ..
...
... ... ... ... ..
.. ... .. ... ... ..
... ...
A .
Z. .
b A .
Z
. .
b
..
.. ... ..... .. ... .....
.. ... ... ... ... ...
.. ... ... .. ... ...
... ... .. .. ... ..
. ...
b S A b S ..
A
..
... ..
.. .. .
.. .. .. ..
..
.. .. ...
... ... .... ..
..
.. ..
. .. ..
a b a b
(Pumping once.) (Pumping twice.)
Naturally, we can repeat this pumping operation (cutting and pasting a subtree) as many times as want,
see for example Figure 16.1. In particular, we get that the word abbi abi a, for any i, is in the language of the
grammar (G5). Notice that unlike the pumping lemma for regular languages, here the repetition happens in
two places in the string. We claim that such a repetition (in two places) in the word must happen for any
context free language, once we take a word which is sufficiently long.
105
.............
..............
...............
S0 ...........................................
...............
.
...
...
...
.................
. ...............
............. ...............
............... ...............
............... ...............
............... ...............
............... ...............
............... .............
A . .. ......... ... .... .............
.
..............
.
Z .............
..............
..............
..............
... .... ........ .... ........ ..............
... ...
......... ....... .
. .
. .. ..............
.............
... ... . ........... ..............
.. ..
..................... ..............
...... ..............
..............
U B S A
.........
..
.......... .............
..... ...
..............
. . . .
.... ......... ... .
... .. ......... .
. .. ................. ..
.. .. ............. ............ ..
.. ... ............ ............. ..
... ..... ...
..................... .............
.......
..
.
.
........ .............
. . ............ ............. ..
............
A Z
........
a b .. ......... .
. ...............
....... ............
............
...................
a
.. ............ ............
.. ............ ............
.. ............ ............
............ ............
..
... ....................... ............
............
...
............ ...........
b .......... .
...
.......
. S ...........
.............. A
........ ....
...
.. ................. ...
.......... ........... ..
........... ........... ..
...
.............. ...........
.... ...
.
.... ... . ............. .
........... ........... ..
...........
A Z
....
... ... . ..
..... ...........
....... ..........
..........
..
.. .............
b
.......... ..........
.
.. .......... ..........
... .......... ..........
.. .......... ..........
.. ................... ..........
..........
.......... .........
b ..
........
......... S .........
...........
A .
........
...
. .... .......... ..
......... ......... ..
......... ......... ...
..
. .
............. .........
.......
..
.
.
.. .........
......... ......... ..
.........
A Z
..
.. .. ................
..... .........
........
.............
b
. ......... .........
... ........ .........
.. ........ .........
... ......... .........
.. .. .
... ......... .........
..... .........
b ...
........
S ................ .
A
........ ........ ...
........ ........ ..
...
...
......... ........
........ ..
..
....
........ ........ ...
....... ........ .
....... ......
A ...........
.... Z.................... b
.. .......
. .. .......
.. ....... .......
... .
..... .......
.... ....... .......
....... .......
.. ....... .......
.....
b ..
......
S .......... A
...... ....... ...
...... ...... ..
.
...
....... ......
...... ..
..
...... ...... ....
...... ......
...... ..... .
A .. .....
..
.....
Z.............. b
.. ...... ......
.. .
....... ......
......
.. .
......
. ......
... ...... ......
.. .....
b S ... A .
...
....
..... ..
.....
..... ..
..... ..... ..
..... ..... ..
...
. ..... ....
.... ....
A ..
Z ...... b
.. . .... ....
.. .... ....
.. .... ....
... .... ...
.. ....
..
b S A ..
.. ... ..
... ... ..
..
... ...
... ..
... ...
A Z b
... .. ...
..
.. ... .....
.. ... ...
... .... ...
..
b S ..
A .
.. ..
.. ...
.. ..
.. ..
... ...
a b
Figure 16.1: Naturally, one can pump the string as many times as one want, to get a longer and longer
string.
106
16.2.1 If a variable repeats
So, assume we have a parsing tree T for a word s (where the underlying grammar is in CNF), and there is a
path in T from the root to a leaf, such that a variables repeats twice. So, say nodes α and β have the same
variable (say S) stored in them:
sv
)
s
su
x y z v w
Now, if we copy the subtree rooted at α and copy it to β, we get a new parse tree:
107
α
x y v w
0
β
y z v
The new tree is much bigger, and the new string it represents is s = xyyzvvw. In general, if we do this
cut & paste operation i − 1 times, we get the string
xy i zv i w.
.
S...
.. ...
... ...
... ...
..
. ...
. .
A
..
Z
.. ..
.
... .. ....
We will refer to a parse generated from a context free grammar in CNF form as a .. ... ...
.. ... ...
... ... ...
CNF tree. A CNF tree has the special property that the parent of a leaf as a single
child which is a terminal. The height of a tree is the maximum number of edges on a b S.
A .
.. ..
.. ..
... ...
path from the root of the tree to a leaf. Thus, the tree depicted on the right has height ... .
.
.
.. ..
3. a b
The grammar G has m variables. As such, if the parse tree T has a path π from
the root of length k, and k > m (i.e., the path has k edges), then it must contain at
least m + 1 variables (the last edge is between a variable and a terminal). As such, by
the pigeon hole principle, there must be a repeated variable along π. In particular, a
parse tree that does not have a repeated variable have height at most m.
Since G is in CNF, its a binary tree, and a variable either has two children, or a single child which is a
leaf (and that leaf contains a single character of the input). As such, a tree of height at most m, contains at
most 2m leaves1 , and represents as such a string of length at most 2m .
We restate the above observation formally for the record.
Observation 16.2.1 If a CNF parse tree (or a subtree of such a tree) has height h, then the string it
generates is of length at most 2h .
Lemma 16.2.2 Let G be a grammar given in Chomsky Normal Form (CNF), and consider a word s ∈ L(G),
such that ` = |s| is strictly larger than 2m (i.e., ` > 2m ). Then, any parse tree T for s (generated by G)
must have a path from the root to some leaf with a repeated variable on it.
fact, a CNF tree of height m can have at most 2m−1 leaves (figure out why), but thats a subtlety we will ignore that
1 In
108
Proof: Assume for the sake of contradiction that T has no repeated variable on any path from the root, then
the height of T is at most m. But a parse tree of height m for a CNF can generate a string of length at most
2m . A contradiction, since ` = |s| > 2m .
Lemma 16.2.3 (CNF is effective.) In a CNF parse tree T , if u and v are two nodes, both storing variables
in them, and u is an ancestor of v, then the string Su generated by the subtree of u is strictly longer than
the substring Sv generated by the subtree of u. Namely, |Su | > |Sv |. (Of course, Sv is a substring of Su .)
Proof: Assume that the node u stores a variable X, and that we had used the rule X → BC to generate its
two children uL and uR . Furthermore, assume that uL and uR generated the strings SL and SR , respectively.
The string generated by v must be a substring of either SL or SR . However, CNF has the property that no
variable2 can generate the empty word . As such |SR | > 0 and |SL | > 0.
In particular, assume without loss of generality, that v is in the left subtree of u, and as such Sv is a
substring of SL . We have that
|Sv | ≤ |SL | < |SL | + |SR | = |Su | .
Lemma 16.2.4 Let T be a tree, and π be the longest path in the tree realizing the height h of T . Fix k ≥ 0,
and let u be the kth node from the end of π (i.e., u is in distance h − k from the root of T ). Then the tree
rooted at u has height at most k.
Proof: Let r be the root of T , and assume, for the sake of contradiction, that Tu (i.e., the subtree rooted
at u) has height larger than k, and let σ be the path from u to the leaf γ of Tu realizing this height (i.e., the
length of σ is > k). Next, consider the path formed by concatenating the path in T from r to u with the
path σ. Clearly, this is a new path of length h − k + |σ| > h that leads from the root of T into a leaf of T .
As such, the height of T is larger than h, which is a contradiction.
Lemma 16.2.5 (Pumping lemma for Chomsky Normal Form (CNF).) Let G be a CNF context-free
grammar with m variables in it. Then, given any word S in L(G) of length > 2m , one can break S into 5
substrings S = xyzvw, such that for any i ≥ 0, we have that xy i zv i w is a word in L(G). In addition, the
following holds:
1. The strings y and v are not both empty (i.e., the pumping is getting us new words).
2. |yzv| ≤ 2m .
Proof: Let T be a CNF parse tree for S (generated by G). Since ` = |s| > 2m , by Lemma 16.2.2, there is
a path in T from its root to a leaf which has a repeated variable (and its length is longer than m). In fact,
let π be the longest path in T from the root to a leaf (i.e., π is the path realizing the height of the tree T ).
We know that T has more than m + 1 variables on it and as such it has a repetition.
We need to be a bit careful in picking the two nodes α and β on π to apply the pumping to. In particular,
let α be the last node on π such that there is a repeated appearance of the symbol stored in u later in the
path. Clearly, the length of the subpath τ of π starting at α till the end of π has at most m symbols on it
(because otherwise, there would be another repetition on π). Let β be the node of τ ⊆ π which has repetition
of the symbol stored in α.
By Lemma 16.2.4 the subtree Tα (i.e., the subtree of T rooted at α) has height at most m. As above, Tα
and Tβ generate two strings Sα and Sβ , respectively. By Observation 16.2.1, we have that |Sα | ≤ 2m . By
2 Except the start variable, but this not relevant here.
109
Lemma 16.2.3, we have that |Sα | > |Sβ |. As such, the two substrings Sα and Sβ breaks S into 5 substrings
S = xyzvw. Here, we have
Sα =
z }| {
S=x y z v w.
|{z}
=Sβ
As such, we know that |yv| = |Sα | − |Sβ | > 0. Namely, the strings y and v are not both empty. Furthermore,
|yzv| = |Sα | ≤ 2m .
The remaining task is to show the pumping. Indeed, if we replace Tβ by the tree Tα we get a parse tree
generating the string xy 2 zv 2 w. If we repeat this process i − 1 times, we get the word
xy i zv i w ∈ L(G) ,
for any i, establishing the lemma.
Lemma 16.2.6 (Pumping lemma for context-free languages.) If L is a context-free language, then
there is a number p (the pumping length) where, if S is any string in L of length at least p, then S may be
divided into five pieces S = xyzvw satisfying the conditions:
1. for any i ≥ 0, we have xy i zv i w ∈ L,
2. |yv| > 0,
3. and |yzv| ≤ p.
Proof: Since L is context free it has a CNF grammar G that generates it. Now, if m is the number of variables
in G, then for p = 2m + 1, the lemma follows by Lemma 16.2.5.
Proof: Assume, for the sake of contradiction, that L is context-free, and apply the Pumping Lemma to it
(Lemma 16.2.6). As such, there exists p > 0 such that any word in L longer than p can be pumped. So,
consider the word S = ap+1 bp+1 cp+1 . By the pumping lemma, it can be written as ap+1 bp+1 cp+1 = xyzvw,
where |yzv| ≤ p.
We claim, that yzv can made out of only two characters. Indeed, if yzv contained all three characters,
it would have to contain the string bp+1 as a substring (as bp+1 separates all the appearances of a from all
the appearances of c in S). This would require that |yzv| > p but we know that |yzv| ≤ p.
In particular, let ia , ib and ic be the number of as, bs and cs in the string yv, respectively. All we know
is that ia + ib + ic = |yv| > 0 and that ia = 0 or ic = 0. Namely, ia 6= ib or ib 6= ic (the case ia 6= ic implies
one of these two cases). In particular, by the pumping lemma, the word
S2 = xy 2 zv 2 w ∈ L.
We have the following:
character how many times it appears in S2
a p + 1 + ia
b p + 1 + ib
c p + 1 + ic
If ia 6= ib then S2 , by the above table, does not have the same number of as and bs and as such it is not in
L.
If ib 6= ic then S2 , by the above table, does not have the same number of bs and cs and as such it is not
in L.
In either case, we get that S2 ∈/ L, which is a contradiction. Namely, our assumption that L is context-free
is false.
110
16.4 Closure properties
16.4.1 Context-free languages are not closed under intersection
We know that the languages
n o n o
L1 = a∗ bn cn n ≥ 0 and L2 = an bn c∗ n ≥ 0
is not context-free by Lemma 16.3.1. We conclude that the intersection of two context-free languages is not
necessarily context-free.
The language L3 is clearly context-free (why?). Consider its complement language L3 . If L3 is context-
free, and context-free languages are closed under complement (which we are assuming here for the sake of
contradiction), then L3 is context-free.
Now, L3 contains many strings we are not interested in (for example, cccccba. So, let us intersect it
with the regular language a∗ b∗ c∗ . We proved (in previous lecture), that the intersection of a context-free
language and a regular language is still context-free. As such,
b = L3 ∩ {a∗ b∗ c∗ }
L
is context-free. But this language contains all the strings ai bj ck , where i = j and j = k. That is, its the
language of Lemma 16.3.1, which we know is not context-free. A contradiction.
L = an bn c∗ ∩ a∗ bn cn = an bn c∗ ∪ a∗ bn cn . (16.1)
We assume for the sake of contradiction, that context-free languages are closed under complement, and we
already know that they are closed under union. However, the languages
an bn c∗ and a∗ bn cn
are context-free. As such, by closure properties and Eq. (16.1), the language L is context-free, which is a
contradiction to Lemma 16.3.1.
111
Chapter 17
This lecture covers the construction for converting a PDA to an equivalent CFG (Section 2.2 of Sipser).
We also cover the Chomsky Normal Form for context-free grammars and an example of grammar-based
induction.
If it looks like this lecture is too long, we can push the grammar-based induction part into lecture 15.
Li → xLj
We also add for any accepting state qi the rule Li → . (Note, that x can be . Then the -transition of
moving from qi to qj , is translated into the rule Li → Lj .
a
As a concrete example, consider the NFA on the right. We q1 q2
c b a
introduce a CFG with symbols L1 , L2 , L3 , where L1 is the initial
symbol. We have the following rules: b
b b
L1 → aL2 | bL3
L2 → bL1 | bL2 | aL3 q3
L3 → bL2 | cL1 | aL3 | bL3 | .
a, b
Interestingly, the state q3 is an accept state, and as such we add
the transition L3 → to the rules.
112
17.2.1 From PDA to a normalized PDA
Given a PDA N 0 we would like to convert into a PDA N that has the following three properties:
(C) Each transition either pushes a symbol to the stack or pop it, but not both.
Transforming a given PDA into an equivalent PDA with these properties might seem like a tool order
initially, but in fact can be easily done with some care.
(C) Each transition either pushes a symbol to the stack or pop it, but not both.
One bad case for us is a transition that both pushes and pops from the stack. For example, we have a
transition
x,b→c
qi −−−−→ qj .
(Read the character x from the input, pop b from the stack, and push c instead of it.) To remove such
0
transitions, we will introduce a special state qtemp , and introduce two transitions – one doing the pop,
the other doing the push. Formally, for the above transition, we will introduce the transitions
0
x,b→ 0 ,→c
qi −−−−→ qtemp and qtemp −−−−→ qj .
Similarly, if we have a transition that neither pushes nor pops anything, we replace it with a sequence
of two transitions that push and then immediately pop some newly-created dummy stack symbol.
In the end of this normalization process, we end up with an equivalent PDA N that complies with our
requirements.
113
Stack is empty in the middle. If during this execution, the stack ever becomes empty at some inter-
mediate state r, then a word of Lp,q can be formed by concatenating a word of Lp,r (that got N from state
p into state r with an empty stack), and a word of Lr,q (that got N from r to q).
Stack never empty in the middle. The other possibility is that the stack is never empty in the middle
of the execution as N transits from p to q, for the input w ∈ Lp,q . But then, it must be that the first
transition (from p into say p1 ) must have been a push, and the last transition into q (from say q1 ) was a
pop. Furthermore, the this pop transition, popped exactly the character pushed into the stack by the first
transition (from p to p1 ). Thus, if the PDA read the character x (from the input) as it moved from p to p1
and read the letter y (from the input) as it moved from q1 to q, then
w = xw0 y,
where w0 is an input that causes the PDA N to start from p1 with an empty stack, and end up in q1 with
an empty stack. Namely, w0 ∈ Lp1 q1 .
Formally, if there is a push transition (pushing z into the stack) from p to p1 (reading x) and pop
transition from q1 to q (popping the same z from the stack and reading y), then a word in Lp,q can be
constructed from the expression
xLp1 ,q1 y.
Notice that x and/or y could be , if one of the two transitions didn’t read anything from the input.
The construction
We now explicitly state the construction. First, for every state p, we introduce the rule
Sp,p → .
The case that the stack is empty in the middle of transitioning from p to q is captured by introducing,
for any states p, q, r of N , we define the following rule in our CFG:
As for the other case, that the stack is never empty, we specify for any given states p, p1 , q1 , r of N , such
that there is a push transition from p to p1 and a pop transition from q1 to r (that push and pop the same
letter), we introduce an appropriate rule. Formally, for any p, p1 , q1 , r, if there are transitions in N of the
form
x,→z y,z→
p −−−−→ p1 and q1 −−−−→ q .
| {z } | {z }
push z pop z
Remark 17.2.1 At the start of our construction, we got rid of all the transitions that don’t touch the stack
at all. Another option would have been to handle them with a variation of our second type of context-free
rule. That is, we have a transition from a state p to p1 that does not touch the stack (and reads the character
x from the input). A small extension of the above construction would give us the transition:
Sp,q → xSp1 ,q .
114
17.2.3 Proof of correctness
Here, prove that the language generated by Sqinit ,qacc is the language recognized by the PDA N .
Claim 17.2.2 If the string w can be generated by Sp,q then there is an execution of N stating at p (with an
empty stack) and ending at q (with an empty stack).
Proof: The proof is by induction on the number n of steps used in the derivation generating w from Sp,q
For the base of the induction, consider n = 1. The only rules in C that have no symbols in them are of
the form
Sp,p → .
Which implies the claim trivially.
Thus, consider the case where n > 1, and assume that we proved that any word generated by at most n
derivation steps (in the CFG grammar C) can be realized by an execution of the PDA N . We would like to
prove the inductive step for n + 1. So, assume that w is generated from Sp,q using n + 1 derivation steps.
There are two possibilities for what is the first derivation rule used. The first possibility is that we used the
rule
Sp,q → Sp,r Sr,q .
|{z} |{z} |{z} |{z}
w = w1 w2
As such, w1 is generated from Sp,r in at most (n + 1) − 1 = n steps, and w2 is generated from Sr,q in at
most (n + 1) − 1 = n steps. As such, by induction, there is an execution of N starting at p and ending in r
(with empty stack in the beginning and the end) and, similarly, there is an execution of N starting at r and
ending q (with empty stack in the beginning and the end). By performing these two execution one of after
the other, we end up with an execution starting at p and ending at q, with an empty stack on both ends,
such that the PDA N reads the input w during this execution. Thus, this establishes the claim in this case.
The other possibility, is that w was derived by first applying a rule of the form
that generated this rule. Furthermore, by induction, the word w0 was generated from Sp1 ,q1 using n derivation
steps. As such, there exists a compliant execution X from p1 to q1 generating w0 . Thus, if we start at p,
apply the first transition of Eq. (17.1), then the execution X and then the second transition of Eq. (17.1),
then we end up with a complaint execution of N that starts at p, ends at q (with empty stack on both ends),
and reads the string w, which establishes the claim in this case, since we showed an execution that reads N .
Claim 17.2.3 If there is an execution of N (with empty stack in both ends) starting at a state p and ending
at a state q, that reads the string w, then w can be generated by Sp,q .
Proof: The proof is somewhat similar to the previous proof. Consider the execution for w, and assume
that it takes n steps. We will prove the claim by induction on n.
For n = 0, the execution is empty, and starts at p and ends at q = p. But then, w is , and it can be
derived by Sp,p since the CFG C has the rule Sp,p → .
Otherwise, for n > 0, assume by induction that we had proved the claim for all executions of length n,
and we now consider an execution of length n + 1.
If the first transition in the execution is a push to the stack of a character z, and z is being popped by
the last transition in the execution, then the first and last transitions are of the form
x,→z y,z→
p −−−−→ p1 and q −−−−→ q ,
| {z } |1 {z }
115
respectively, and furthermore w = xw0 y. As such, we have an execution of length (n + 1) − 1 ≤ n that reads
w0 , and by induction, w0 can be generated by the symbol Sp1 ,q1 . But then, by construction, the rule
Together with the results from earlier lectures, we can conclude the following.
Theorem 17.2.5 A language L is context-free if and only if there is a PDA that recognizes it.
116
Chapter 18
=⇒ S → N VP
VN → N N
VP → V N
N → students | Jeff | geometry | trains
V → trains
1 In real life, long sentences in news reports often exhibit versions of this problem.
2 Draw an n by n + 1 rectangle and fill in the lower half.
3 It still takes exponential time to extract all parse trees from the table, but we usually interested only in one of these trees.
117
Given a string w of length n, we build a triangular table with n rows and n columns. Conceptually, we
write w below the bottom row of the table. The ith column correspond to the ith word. The cell at the
ith column and the jth row (from the bottom) of the table corresponds to the substring starting the ith
character of length j. The following is the table, and the substrings each entry corresponds to.
len
Jeff trains
4
geometry students
3 Jeff trains geometry trains geometry students
2 Jeff trains trains geometry geometry students
1 Jeff trains geometry students
Jeff trains geometry students
first word in substring
CYK builds a table containing a cell for each substring. The cell for a substring x contains a list of
variables V from which we can derive x (in one or more steps).
4
3
length 2
1
Jeff trains geometry students
first word in substring
The bottom row contains the variables that can derive each substring of length 1. This is easy to fill in:
4
3
length 2
1 N N,V N N
Jeff trains geometry students
first word in substring
Now we fill the table row-by-row, moving upwards. To fill in the cell for a 2-word substring x, we look
at the labels in the cells for its two constituent words and see what rules could derive this pair of labels. In
this case, we use the rules N → N N and VP → V N to produce:
4
3
length 2 N N,VP N
1 N N,V N N
Jeff trains geometry students
first word in substring
For each longer substring x, we have to consider all the ways to divide x into two shorter substrings. For
example, suppose x is the substring of length 3 starting with “trains”. This can be divided into divided into
(a) “trains geometry” plus “students” or (b) “trains” plus “geometry students.”
Consider option (a). Looking at the lower rows of the table, “students” has label N. One label for “trains
geometry” is VP , but we don’t have any rule whose righthand side contains VP followed by N. The other
label for “trains geometry” is N. In this case, we find the rule N → N N. So one label for x is N. (That is, x
is one big long compound noun.)
Now consider option (b). Again, we have the possibility that both parts have label N. But we also find
that “trains” could have the label V. We can then apply the rule VP → V N to add the label VP to the cell
for x.
118
CYK ( G, w )
G = (V, Σ, R, S), Σ ∪ V = {X1 , . . . , Xr }, w = w1 w2 . . . wn .
begin
Initialize the 3d array B[1 . . . n, 1 . . . n, 1 . . . r] to FALSE
for i = 1 to n do
for (Xj → x) ∈ R do
if x = wi then B[i, i, j] ← TRUE.
for i = 2 to n do /* Length of span */
for L = 1 to n − i + 1 do /* Start of span */
R=L+i−1 /* Current span s = wL wL+1 . . . wR */
for M = L + 1 to R do /* Partition of span */
/* x = wL wL+1 . . . wM −1 , y = wM wM +1 . . . wR , and s = xy */
for (Xα → Xβ Xγ ) ∈ R do
/* Can we match Xβ to x and Xγ to y? */
if B[L, M − 1, β] and B[M, R, γ] then
B[L, R, α] ← TRUE /* If so, then can generate s by Xα ! */
for i = 1 to r do
if B[1, n, i] then return TRUE
return FALSE
4
3 N,VP
length 2 N N,VP N
1 N N,V N N
Jeff trains geometry students
first word in substring
4 N,S
3 N,S N,VP
length 2 N N,VP N
1 N N,V N N
Jeff trains geometry students
first word in substring
Remember that a string is in the language if it can be derived from the start symbol S. The top cell in
the table contains the variables from which we can derive the entire input string. Since S is in that top cell,
we know that our string is in the language.
By adding some simple annotations to these tables as we fill them in, we can make it easy to read out
an entire parse tree by tracing downwards from the top cell. In this case, the tree:
S
N VP
V N
N N
119
We have O n2 cells in the table. For each cell, we have to consider n ways to divide its substring into
two smaller substrings. So the table-filling procedure takes only O n3 time.
Theorem 18.2.1 Let G = (V, Σ, R, S) be a grammar in CNF with r = |Σ| + |V| variables and terminals,
and t = |R| rules. Let w ∈ Σ∗ be a word of length n. Then, one can compute a parse tree for w using G, if
w ∈ L(G). The running time of the algorithm is O(n3 t).
The result just follow from the CYK algorithm depicted in Figure 18.1. Note, that our pseudo-code just
decides if a word can be generated by a grammar. With slight modifications, one can even generate the
parse tree.
120
Chapter 19
where
• For any m, m0 ∈ M , m 6= m0 we have Qm ∩ Qm0 = ∅ (the set of states of different modules are disjoint).
Intuitively, we view a recursive automaton as a set of procedures/modules, where the execution starts
with the main-module, and the automaton processes the word by calling modules recursively.
121
main: ε
q0 q3
0 1
q1 main q2
Why? The recursive automaton consists of single module, which is also the main module. The module
either accepts , or reads 0, calls itself, and after returning from the call, reads 1 and reaches a final state
(at which point it can return if it was called). In order to accept, we require the run to return from all calls
and reach the final state of the module main.
For example, the recursive automaton accepts 01 because of the following execution
0 call main return 1
q0 −
→ q1 −−−−−−→ q0 →
− q3 −−−−→ q2 −
→ q3 .
122
19.2 CFGs and recursive automata
We will now show that context-free grammars and recursive automata accept precisely the same class of
languages.
=⇒ S → aSb | aBb
B → c.
Each variable in the CFG corresponds to a language; this language is recursively defined using other
variables. We hence look upon each variable as a module; and define modules that accept words by calling
other modules recursively.
For example, the recursive automaton for the above grammar is:
S: S
q3 q4
a b
q0 q5
a b
q1 B q2
B: c
q6 q7′
Formal construction.
n Let G = (V, Σ, R, S) be
the given o context free grammar.
m
Let DG = M, S, (Qm , Σ ∪ M, δm , q0 , Fm ) m ∈ M where M = V, and the main module is S. Fur-
X
thermore, for each X ∈ nM , let DX = QX , Σo∪ M, δX , q0 , FX ) be an NFA that accepts the (finite, and hence)
regular language LX = w (X → w) ∈ R .
X X X
Let us elaborate on the construction of DX . We create two special states qinit and qfinal . Here qinit is
X
the initial state of DX and qfinal is the accepting state of DX . Now, consider a rule (X → w) ∈ R. We will
X X
introduce a path of length |w| in DX (corresponding to w) leading from qinit to qfinal . Creating this path
requires introducing new “dummy” states in the middle of the path, if |w| > 1. The ith transition along
this path reads the ith character of w. Naturally, if this ith character is a variable, then this edge would
correspond to a recursive call to the corresponding module. As such, if the variable X has k rules in the
X X
grammar G, then DX would contain k disjoint paths from qinit to qfinal , corresponding to each such rule. For
X X
example, if we have the derivation (X → ) ∈ R, then we have an -transition from qinit to qfinal .
123
• Regular transitions. For any m ∈ M , q, q 0 ∈ Qm , c ∈ Σ ∪ {}, if q 0 ∈ δm (q, c), then the rule
Xq → cXq0 is added to R.
Intuitively, a transition within a module is simulated by generating the letter on the transition and
generating a variable that stands for the language generated from the next state.
• Recursive call transitions. for all m, m0 ∈ M and q, q 0 ∈ Qm , if q 0 ∈ δm (q, m0 ), then the rule
Xq → Xqm0 Xq0 is in R,
init
Intuitively, if q 0 ∈ δm (q, m0 ), then Xq can generate a word of the form xy where x is accepted using a
call to module m and y is accepted from the state q 0 .
• Acceptance/return rules.
S
For any q ∈ m∈M Fm , we add Xq → to R.
When arriving at a final state, we can stop generating letters and return from the recursive call.
We have a CFG and it is not too hard to see intuitively that the language generated by this grammar
is equal to the RA C language. We will not prove it formally here, but we state the result for the sake of
completeness.
main: m1
p1 p2
ε
Xp1 → Xp5 Xp2 | Xp3
c
m2 Xp2 → cXp2 |
p3 p4
a Xp3 → aXp3 | Xp9 Xp4
Xp4 →
Xp5 → aXp6 | Xp8
m1: m1
p6 p7 Xp6 → Xp5 Xp7
a Xp7 → bXp8
b
Xp8 →
p5 ε p8 Xp9 → bXp10 | Xp12
Xp10 → Xp9 Xp11
Xp11 → cXp12
Xp12 →
m2: m2
p10 p11
b c
The start variable is Xp1 .
p9 ε p12
124
19.3 More examples
19.3.1 Example 1: RA for the language an b2n
n o
Let us design a recursive automaton for the language L = an b2n n ∈ N . We would like to generate this
recursively. How do we generate an+1 b2n+2 using a procedure to generate an b2n ? We read a followed by
a call to generate an b2n , and follow that by generating two b’s. The “base-case” of this recursion is when
n = 0, when we must accept . This leads us to the following automaton:
main: a main b b
p1 p2 p3 p4 p5
Thinking recursively, the smallest palindromes are , a, b, c, and we can construct a longer palindrome by
generating awa, bwb, cwc, where w is a smaller palindrome. This give us the following recursive automaton:
main: main
p2 p3
a a
p4 main p5
b b
p1 p8
c c
p6 main p7
a, b, c, ǫ
19.3.3 Example 3: #a = #b
∗
Let us design a recursive automaton for the language L containing all strings w ∈ {a, b} that has an equal
number of a’s and b’s.
Let w be a string, of length at least one, with equal number of a’s and b’s.
Case 1: w starts with a. As we read longer and longer prefixes of w, we have the number of a’s seen is
more than the number of b’s seen. This situation can continue, but we must reach a place when
the number of a’s seen is precisely the number of b’s seen (at worst at the end of the word). Let us
consider some prefix longer than a where this happens. Then we have that w = aw1 bw2 , where the
number of a’s and b’s in aw1 b is the same, i.e. the number of a’s and b’s in w1 are the same. Hence
the number of a’s and b’s in w2 are also the same.
Case 2: If w starts with b, then by a similar argument as above, w = bw1 aw2 for some (smaller) words w1
and w2 in L.
125
Hence any word w in L of length at least one is of the form aw1 bw2 or bw1 aw2 , where w1 , w2 ∈ L, and
they are strictly shorter than w. Also, note is in L. So this gives us the following recursive automaton.
main:
p2 main p3 b p3
a ma
in
p1 b p4 main p5 a p5 main p
8
126
Chapter 20
S → aSbS | bSaS | .
The PDA we come up with for this language explicitly thinks of using the stack to do counting: the excess
number of letters of one kind as opposed to the other is stored in the stack. In other words, we are using the
stack as a “counter”. This example shows clearly the difference between PDA-thinking and RA-thinking...
we believe that thinking recursively of the language is a lot more useful and typical in computer science.
Anyway, if we use recursive automata, we probably are not teaching the idea behind using the stack to
count— but the point is “who cares?”
In summary, going away from using an explicit pushdown stack, and relying instead on a recursion idiom
is natural, beautiful, and pleasing!
Finally, I think this model keeps things easy and natural to remember: regular languages are accepted by
programs with finite memory; CFLs are accepted by recursive programs with finite memory; and decidable
languages are accepted by Turing machines with infinite memory.
127
CFGs to RA and back: The conversions from CFGs to RAs is very simple (and visual!). In contrast,
converting CFGs to PDAs is mildly harder and way less visual (we usually define an extension of PDAs that
can push words in reverse to a stack, and have the guess a left-most derivation).
The conversion from RAs to CFGs is a lot easier than from PDAs to CFGs. To convert PDAs to CFGs, one
must convert the PDA to a PDA that accepts with empty stack, and then argue that any run of the PDA
can be “summarized” using what happens between a push and a pop of the same symbol, and build a CFG
that captures this idea. When going from a recursive automaton to a CFG, it is more natural to divide a run
into the part that is read by a module and the suffix read after returning from the module.
Closure properties: Turning to closure properties: given two programs, clearly one can create the au-
tomaton for the union by building a new initial module that calls either of them nondeterministically. Closure
under concatenation and Kleene-* are also easy. We can quickly sketch these closure properties in about
10 minutes (we will do them for CFGs anyway). The intuition behind writing programs will make this very
accessible to students.
Closure under intersection does not work as we cannot take the product of the two recursive automata—
for if one wants to call a module and the other does not, then there is no way to simulate both of them.
An aside: in visibly pushdown languages, the recursive automata synchronize on calls and returns, and
hence the class is closed under intersection. In fact, it’s closed under complement as well, and determinizable.
Turning to closure on intersection with a regular language, this is easy. We just do the product con-
struction... and at a call, guessing the return values of the finite automaton. We will do this using CFGs as
well.
Non-context-freeness: When we have later shown {an bn cn | n ∈ N} is not context-free (using pumping
lemmas for CFLs), we think it is nice to go back and say that this means there is no finite memory recursive
program to generate an bn cn , which I think is interesting and perhaps more appreciable.
CFLs are decidable: The negative aspect that recursive automata are not clearly simulated by (determin-
istic) Turing machines exists, as it used to exist with pushdown automata. The way out is still by showing
membership in CFGs is decidable using a conversion to CNF or by showing the CYK algorithm. This has
little to do with either automaton model.
Applications of CFLs: There are three main applications of CFLs we see: the first is parsing languages
(like programming languages), the second is parsing natural languages, and the third is static program
analysis.
Parsing programming languages and the like is done using an abstraction of pushdown automata, with
a million variants and lookahead distances and what not. The recursive automaton definition does yield a
pushdown automaton definition which we will talk about, and hence the students will be able to understand
these algorithms. However, they will be a tad unfamiliar with them.
In parsing natural languages, Julia provided us with papers where they build automata for grammars
almost exactly the way we do. In particular, there are many models with NFAs calling each other. And they
argue this is a more natural representation of a machine for a grammar (and often use it to characterize
languages that admit a faster membership problem).
Turning to program analysis, statically analyzing a program (for data-flows, control-flows, etc.) often in-
volves abstracting the variables into a finite set of data-facts and looking at the recursive program accurately.
The natural model one obtains (for flow-sensitive context-sensitive analysis) is a recursive automaton! In
fact, the recursive automata we have here are motivated by the class of “recursive state machines” studied
in the verification literature and used to model programs with finite memory.
Foreseeable objections:
128
model anyway. They wouldn’t have seen a formal conversion from PDAs to CFGs. Which, arguably,
they wouldn’t remember even if they have seen them.
Instead, now, they would know that CFGs are simply languages accepted by recursive programs with
finite memory. Which is a natural characterization worthy of knowledge.
• PDAs being far away from CFGs is a good thing
We agree; showing a model further away from CFGs is same as CFGs is a more interesting result.
However, the argument to teaching PDAs holds only if it is a natural or useful model in its own right.
There’s very little evidence to claim PDAs are better than recursive automata.
• Reasoning with an explicit stack data structure is mental weightlifting.
We don’t think the course is well-served by introducing weight-lifting exercises. The students, frankly,
lift too much weight already, given their level of maturity. Let’s teach them beautiful and natural
things; not geeky, useless, hard things.
• Don’t the upper level classes needs PDAs?
We have asked most people (Elsa, Sam, Margaret, etc.). The unanimous opinion we hear is that while
seeing some machine model for CFLs is good, pushdown automata are not crucial.
• We must carefully consider making changes to such a fundamental topic:
We think we have paid careful attention, and by seeking comments, we would have covered most
opinions. The problem is that this is not a small change. It’s a big change that can happen only if we
do it now... if we postpone it, it will never happen!
129
Chapter 21
The Electric Monk was a labor-saving device, like a dishwasher or a video recorder. Dishwashers washed tedious
dishes for you, thus saving you the bother of washing them yourself, video recorders watched tedious television
for you, thus saving you the bother of looking at it yourself; Electric Monks believed things for you, thus saving
you what was becoming an increasingly onerous task, that of believing all the things the world expected you to
believe.
– Dirk Gently’s Holistic Detective Agency, Douglas Adams.
21.1 Computability
For the alphabet
Σ = {0, 1, . . . , 9, +, −} ,
consider the language
ai , bj , ck ∈ [0, 9] and
han an−1 . . . a0 i
L = an an−1 . . . a0 + bm bm−1 . . . b0 = cr cr−1 . . . c0 ,
+ hbm bm−1 . . . b0 i
= hcr cr−1 . . . c0 i
Pn
where han an−1 . . . a0 i = i=0 ai · 10i is the number represented in base ten by the string an an−1 . . . a0 . We
are interested in the question of whether or not a given string belongs to this language. This is an example
of a decision problem (where the output is either yes or no), which is easy in this specific case, but clearly
too hard for a PDA to solve it1 .
Usually, we are interested in algorithms that compute something for their input and output the results.
For example, given the strings an an−1 . . . a0 and bm bm−1 . . . b0 (i.e., two numbers) we want to compute the
string representing their sum.
Here is another example for such a decision algorithm: Given a quadratic equation ax2 + bx + c = 0,
we would like to find the roots of this equation. Namely, two numbers r1 , r2 such that ax2 + bx + c =
a(x − r1 )(x − r2 ) = 0. Thus, given numbers a, b and c, the algorithm should output the numbers r1 and r2 .
To see how subtle this innocent question can be, consider the question of computing the roots of a
polynomial of degree 5. That is, given an equation
130
can we compute the values of x for which is equation holds? Interestingly, if we limit our algorithm to use
√ √
only the standard operators on numbers +, −, ∗, /, , k then no such algorithm exists.2
In the final part of this course, we will look at the question of what (formally) is a computation? Or, in
other words, what is (what we consider to be) a computer or an algorithm? A precise model for computation
will allow us to prove that computers can solve certain problems but not others.
21.1.1 History
Early in this century, mathematicians (e.g. David Hilbert) thought that it might be possible to build formal
algorithms that could decide whether any mathematical statement was true or false. For obvious reasons,
there was great interest in whether this could really be done. In particular, he took upon himself the
project of trying to formalize the known mathematics at the time. Gödel showed in 1929 that the project
(of explicitly describing all of mathematics) is hopeless and there is no finite description of mathematical
models.
In 1936, Alonzo Church and Alan Turing independently showed that this goal was impossible. In his
paper, Alan Turing introduced the Turing machine (described below). Alonzo Church introduced the λ-
calculus, which formed the starting point for the development of a number of functional programming
languages and also formal models of meaning in natural languages. Since then, these two models and some
others (e.g. recursion theory) have been shown to be equivalent.
This has led to the Church-Turing Hypothesis.
This is not something you could actually prove is true (what is reasonable in the above statement, for
example?). It could be proved false if someone found another model of computation that could solve more
problems than a Turing machine, but no one has done this yet. Notice that we are ignoring how fast the
computation can be done: it is certainly possible to improve on the speed of a Turing machine (in fact,
every Turing machine can be speeded up by making it more complicated). We are only interested in what
problems the machines can or can not solve.
Both types of machines read their input left-to-right. They halt exactly when the input is exhausted. Turing
machines are like a RA/PDA, in that they have a finite control and an unbounded one dimensional memory
tape (i.e., stack). However, a Turing machine is different in the following ways.
(A) The input is delivered on the memory tape (not in a separate stream).
(B) The machine head can move freely back and forth, reading and writing on the tape in any pattern.
(C) The machine halts immediately when it enters an accept or reject state.
Notice condition (C) in particular. A Turing machine can read through its input several times, or it might
halt without reading the whole input (e.g. the language of all strings that start with ab can be recognized
by just reading two letters).
2 This is the main result of Evariste Galois that died at the age of 20(!) in a duel. Niels Henrik Abel (which also died
relatively young) proved this slightly before Galois, but Galois work lead to a more general theory.
131
Figure 21.1: Comic by Geoff Draper.
Moving back and forth along the tape allows a Turing machine to (somewhat slowly) simulate random
access to memory. Surprisingly, this very simple machine can simulate all the features of “regular” computers.
Here equivalent is meant only in the sense that whatever a regular computer can compute, so can a Turing
machine compute. Of course, Turing machines do not have graphics/sound cards, internet connection and
they are generally considered to be an inferior platform for computer games. Nevertheless, computationally,
TMs can compute whatever a “regular” computer can compute.
Tape: ~s h a l o m ␣ ␣ ␣ ...
w
w
The read/write head
Each step of the Turing machine first reads the symbol on the cell of the tape under the head. Depending
on the symbol and the current state of the controller, it then
For example, the following transition is taken if the controller is in state q and the symbol under the read
head is b. It replaces the b with the character c and then moves right, switching the controller to the state
r.
132
b → c, R
q r
Note, that Turing machines are deterministic. That is, once you know the state of the controller and
which symbol is under the read/write head, there is exactly one choice for what the machine can (and must)
do.
The controller has two special states qacc and qrej . When the machine enters one of these states, it halts.
It either accepts or rejects, depending on which of the two it entered.
Note 21.2.1 If the Turing machine is at the start of the tape and tries to move left, it simply stays put on
the start position. This is not the only reasonable way to handle this case.
Note 21.2.2 Nothing guarantees that a Turing machine will eventually halt (i.e., stop). Like your favorite
Java program, it can get stuck in an infinite loop3 . This will have important consequences later, when we
show that deciding if a program halts or not is in fact a task that computers can not solve.
Remark 21.2.3 Some authors define Turing machines to have a doubly-infinite tape. This does not change
what the Turing machine can compute. There are many small variations on Turing machines which do not
change the power of the machine. Later, we will see a few sample variations and how to prove they are
equivalent to our basic model. The robustness of this model to minor changes in features is yet another
reason computer scientists believe the Church-Turing hypothesis.
which is not context-free. So, let describe a TM that accepts this language.
One algorithm for recognizing L works as follows. It first
1. Cross off the first character a or b in the input (i.e. replace it with x, where x is some special character))
and remember what it was (by encoding the character in the current state). Let u denote this character.
2. Move right until we see a $.
3. Read across any x’s.
4. Read the character (not x) on the tape. If this character is different from u, then it immediately rejects.
5. Cross off this character, and replace it by x.
6. Move left past the $ and then keep going until we see an x on the tape.
7. Move one position right and go back to the first step.
We repeat this until the first step can not find any more a’s and b’s to cross off.
Figure 21.2 depicts the resulting TM. Observe, that for the sake of simplicity of exposition, we did not
include the state qrej in the diagram. In particular, all missing transitions in the diagram are transitions
that go into the reject state.
Notice that we did not include the reject state in the diagram, because it is already too messy. If there
is no transition shown, we will assume that one goes into the reject state.
Note 21.2.4 For most algorithms, the Turing machine code is complicated and tedious to write out explic-
itly. In particular, it is not reasonable to write it out as a state diagram or a transition function. This only
works for the relatively simple examples, like the ones shown here. In particular, its important to be able
to describe a TM in high level in pseudo-code, but yet be able to translate it into the nitty-gritty details if
necessary.
3 Or just get stuck inside of Mobile with the Memphis blues again...
133
b → b, R
a → a, R $ → $, R
q1 q2 x → x, R
a → x, R b → b, L
a → a, L a → x, L
q0 q3 q4 x → x, L
x → x, R $ → $, L
b → x, L
$ → x, R
b→
x,
R
$ → $, R
q6 q7
q5 x → x, R
b → b, R x → x, R
␣ → ␣, R a → a, R
qacc
134
• qacc ∈ Q is the accepting /final state.
• qrej ∈ Q is the rejecting state.
This definition assumes that we’ve already defined a special blank character. In Sipser, the blank is
written t or ␣. A popular alternative is B. (If you use any other symbol for blank, you should write a note
explaining what it is.)
The special blank character (i.e., ␣) is in the tape alphabet but it is not in the input alphabet.
Example
For the TM of Figure 21.2, we have the following M = (Q, Σ, Γ, δ, q0 , qacc , qrej ), where
(i) Q = {q0 , q1 , q2 , q3 , q4 , q5 , q6 , q7 , qacc , qrej }.
135
Chapter 22
This lecture covers the formal definition of a Turing machine and related concepts such as configuration
and Turing decidable. It surveys a range of variant forms of Turing machines and shows for one of them
(multi-tape) why it is equivalent to the basic model.
Example 22.1.1 Here we describe a TM that takes it input on the tape, shifts it to the right by one
character, and put a $ on the leftmost position on the tape.
So, let Σ = {a, b} (but the machine we describe would work for any alphabet). Let
n o
Q = {q0 , qacc , qrej } ∪ qc c ∈ Σ .
∀s ∈ Σ δ(q0 , s) = (qs , $, R)
∀s, t ∈ Σ δ(qs , t) = (qt , s, R)
∀s ∈ Σ δ(qs , ␣) = (qacc , $, R) .
δ(q0 , ␣) = (qacc , $, R)
136
␣ → $, R
a → a, R
a → $, R ␣ → a, R
q0 qa qacc
b→
a → b, R
b → a, R
$,
R R
b,
␣→
qb qrej
b → b, R
Figure 22.1: A TM that shifts its input right by one position, and inserts $ in the beginning of the tape.
The resulting machine is depicted in Figure 22.1, and here its pseudo-code:
Shift_Tape_Right
At first tape position,
remember character and write $
At later positions,
remember character on tape,
and write previously remembered character.
On blank, write remembered character and halt accepting.
Tape: α b
~ β ␣ ␣ ␣ ...
w
w
The read/write head
and the current control state of the TM is qi . In this case, it would be convenient to write the TM config-
uration as
αqi bβ.
Namely, imagine that the head is just to the left of the cell its reading/writing, and bβ is the string to the
right of the head.
As such, the start configuration, with a word w is
Tape: ~ w ␣ ␣ ␣ ...
w
w
The read/write head
137
We can now describe a transition of the TM using this configuration notation. Indeed, imagine the given
TM is in a configuration αqi aβ and its transition is
δ(qi , a) = (qj , c, R) ,
then the resulting configuration is αcqj β. We will write the resulting transition as
αqi aβ ⇒ αcqj β.
γ d qk e τ,
where γ and τ are two strings, and d, e ∈ Σ. Assume the TM transition in this case is
δ(qk , e) = (qm , f, L) ,
γ d q e τ ⇒ γ qm d f τ .
| {zk } | {z }
c c0
In this case, we will say that c yields c0 , we will use the notation c 7→ c0 .
As we seen before, the ends of tape are special, as follows:
• You can not move off the tape from the left side. If the head is instructed to move to the left, it just
stays where it is.
• The tape is padded on the right side with spaces (i.e., ␣). Namely, you can think about the tape as
initially as being full with spaces (spaced out?), except for the input that is written on the beginning
of the tape.
138
Definition 22.3.3 A TM that halts on all inputs is called a decider .
As such, a language L is Turing decidable if there is a decider TM M , such that L(M ) = L.
Regular
on the usual tape. Clearly, now the doubly infinite tape becomes the usual one-sided infinite tape, and we
can easily simulate the original machine on this new machine. Indeed, as long as we are far from the folding
point on the tape, all we need to do is to just move in jumps of two (i.e., move L is mapped into move LL).
Now, if we reach the beginning of the tape, we need to change between odd location and even location, but
that’s also easy to do with a bit of care. We omit the easy but tedious details.
Another approach would be to keep the working part of the doubly-infinite tape in its original order.
When the machine tries to move off the lefthand end, push everything to the right to make more space.
22.4.3 Non-determinism
This does not buy you anything, but the details are not trivial, and we will delay the discussion of this issue
to later.
139
22.4.4 Multi-tape
Consider a TM that has k tapes, where k > 1 is a some finite integer constant. Here each tape has its own
read/write head, but there is only one finite control. The transition function of this machine, is a function
k
δ : Q × Γk → Q × Γk × {L, R, S} ,
and the initial input is placed on the first tape.
The string between the ith and (i + 1)th $ in this string, is going to be the content of the ith tape. We
need to keep track on each of these tapes where the head is supposed to be. To this end, we create for each
•
character a ∈ Γ, we create a dotted version, for example a . Thus, if the initial input w = xw0 , where x is
a character, the new rewritten tape, would look like:
• • • •
$xw0 $ ␣ $ ␣ . . . $ ␣ $.
| {z }
k−1times
This way, we can keep track of the head location in each one of the tapes.
For each move of N , we go back on M to the beginning of the tape and scan the tape from left to right,
reading all the dotted characters and store them (encoding them in the current state), once we did that, we
know which transition of N needs to be executed:
0
qhc1 ,...,ck i → qhd1 ,D1 ,d2 ,D2 ,...,dk ,Dk i
,
where Di ∈ {L, R, S} is the instruction where the ith head must move. To implement this transition, we
scan the tape from left to right (first moving the head to the start of the tape), and when we encounter
the ith dotted character ci , we replace it by (the undotted) di , and we move the head as instructed by Di ,
by rewriting the relevant character (immidiately near the head location) by its dotted version. After doing
that, we continue the scan to the right, to perform the operation for the remaining i + 1, . . . , k tapes.
•
After completing this process, we might have $ on the tape (i.e., the relevant head is located on the end
of the space allocated to its tape). We use the Shift_Tape_Right algorithm we describe above, to create
space to the left of such a dotted dollar, and write in the newly created spot a dotted space. Thus, if the
tape locally looked like
•
. . . ab $ c . . .
then after the shifting right and dotting the space, the new tape would look like
•
. . . ab ␣ $c . . .
By doing this shift-right operation to all the dotted $’s, we end up with a new tape that is guaranteed to
have enough space if we decide to write new characters to any of the k tapes of N .
Its easy to now verify that we can now simulate N on this Turing machine M , which uses a single tape.
In particular, any language that N recognizes is also recognized by M , which is a standard TM, establishing
the claim.
140
Chapter 23
Regular
141
For the alphabet
Σ = {0, 1, . . . , 9, +, −} ,
consider the language
ai , bj , ck ∈ [0, 9] and
han an−1 . . . a0 i
L = an an−1 . . . a0 + bm bm−1 . . . b0 = cr cr−1 . . . c0 ,
+ hbm bm−1 . . . b0 i
= hcr cr−1 . . . c0 i
Pn
where han an−1 . . . a0 i = i=0 ai · 10i is the number represented in base ten by the string an an−1 . . . a0 .
We then ask whether we can build a TM which decides the language L.
Reversing a tape
Given the content of tape 1 , we can reverse it easily in two steps using a temporary tape. First, we put a
marker onto the temporary tape. Moving the heads on both tapes to the right, we copy the contents of 1
1 head at the start of its tape, but the temporary tape head remains at the end of
this tape. We copy the material back onto
Now, let us assemble the addition algorithm. We will use five tapes: the input ( 1 ), three tapes to hold
numbers ( 2 , 3 , and 4 ), and a scratch tape used for the reversal operation.
The TM will first scan the input tape (i.e.,
3 to the beginning of the tapes, and we start moving them together computing the sum of the digits under
the two heads, writing the output to
If one of the heads of 2 or 3 reaches the end of the tape, then we continue moving it, interpreting ␣
as a 0. We halt when the heads on both tapes see ␣.
Next, we move the head of 4 back to the beginning of the tape, and do ReverseTape(4). Finally, we
compare the content of 4 with the number written on 1 after the = character. If they are equal, the TM
accepts, otherwise it rejects.
142
5
Graph encoding
5
7
(1,2)
1 2 3 (2,3)
(3,5)
(5,1)
(3,4)
(4,3)
4 (4,2)
Figure 23.1: A graph encoded as text. The string encoding the graph is in fact
“5hNLi7hNLi(1,2)hNLi(2,3)hNLi(3,5)hNLi(5,1)hNLi(3,4)hNLi(4,3)hNLi(4,2)”. Here hNLi denotes
the spacial new-line character.
We are given a directed graph G = (V, E), and two vertices s, t ∈ V , and we would like to decide if there
is a way to reach t from s.
All sorts of encodings are possible. But it is easiest to understand if we use encodings that look like
standard ASCII file, of the sort you might use as input to your Java or C++ program. ASCII files look like
they are two-dimensional. But remember that they are actually one-dimensional strings inside the computer.
Line breaks display in a special way, but they are underlyingly just a special separator character (<NL> on
a unix system), very similar to the $ or # that we’ve used to subdivide items in our string examples.
To make things easy, we will number the vertices of V from 1 to n = |V |. To specify that there is an edge
between two vertices u and v, we then specify the two indices of u and v. We will use the notation (u, v).
Thus, to specify a graph as a text file, we could use the following format, where n is the number of vertices
and m is the number of edges in the graph.
n
m
(n1 , n01 )
(n2 , n02 )
..
.
(nm , n0m )
Namely, the first line of the file, will contain the number (written explicitly using ASCII), next the second
line is the number of edges of G (i.e., m). Then, every line specify one edge of the graph, by specifying the
two numbers that are the vertices of the edge. As a concrete example, consider the following graph.
The number of edges is a bit redundant, because we could just stop reading at the end of the file. But
it is convenient for algorithm design.
See Figure 23.1, for an example of a graph its encoding using these scheme.
143
To solve this problem, we will need to search the graph, starting with node s. The TM accepts iff this
search finds the node t. We will store information on four TM tapes, in addition to the input tape. The TM
would have the following tapes:
1 : Input tape
2 : Target node t.
3 : Edge list.
4 : Done list: list of nodes that we’ve finished processing
5 : To-do list: list of nodes whose outgoing edges have not been followed
Given the graph, the TM reads the graph (checking that the input is in the right format). It puts the
list of edges onto tape 3 , puts t onto its own tape (i.e., 2 ), and puts the node s onto the to-do list tape
(i.e., 5 ).
Next, the TM loops. In each iteration, it removes the first node x from the to-do list. If x = t, the TM
halts and accepts. Otherwise, x is added to the done list (i.e., 4 ). Then the TM searches the Edge list for
all edges going outwards from x. Suppose an outgoing edge goes from x to y. Then if y is not already on
the finished list or the to-do list, then y is added to the to-do list.
If there is nothing left on on the to-do list, the TM halts and rejects.
This algorithm is a graph search algorithm. It is breadth-first search if the new nodes are added to the
end of the to-do list and depth-first search if they are added in the start of the list. (Or, said another way,
the to-do list operates as either a queue or a stack.)
The separate visited list is necessary to prevent the algorithm from going into an infinite loop if the graph
contains cycles.
This language is decidable. Namely, given an instance hDi, there is a TM that reads hDi, this TM always
stops, and accepts if and only if L(D) is empty. Indeed, do a graph search on the DFA (as above) starting at
the start state of D, and check whether any of the final states is reachable. If so, the L(D) 6= ∅.
This language is decidable. Indeed, convert the given NFA into a DFA (as done in class, long time ago)
and then call the code for EDFA on the encoded DFA. Notice that the first step in this algorithm takes the
encoded version of D and writes the encoding for the corresponding DFA. You can imagine this as taking a
state diagram as input and producing a new state diagram as output.
144
Equal languages for DFAs. Consider the language
n o
EQDFA = hD, Ci D and C are NFAs, and L(D) = L(C) .
This language is also decidable. Remember that the symmetric difference of two sets X and Y is
X ⊕ Y = (X ∩ Y ) ∪ (Y ∩ X). The set X ⊕ Y is empty if and only if the two sets are equal. But, given a
DFA, we know how to make a DFA recognizing the complement of its language. And we also know how to
take two DFA’s and make a DFA recognizing the union or intersection of their languages.
So, given the encodings for D and C, our TM will construct the encoding of a DFA hBi recognizing the
symmetric difference of their languages. Then it would call the code for deciding if hBi ∈ EDFA .
Informally, problems involving regular languages are always decidable, because they are so easy to manip-
ulate. Problems involving context-free languages are sometimes decidable. And only the simplest problems
involving Turing machines are decidable.
As before, the notation hD, wi is the encoding of the DFA D and the word w; that is, it is the pair hDi and hwi.
For example, if hwi is just w (it’s already a string), then hD, wi might be hDi #w where # is some separator
character. Or it might be (hDi , w). Or anything similar that encodes the input well. We will just assume
that it is in some such reasonable encoding of a pair and that the low-level code for our TM (which we will
not spell out in detail) knows what it is.
A Turing machine deciding ADFA needs to be able to take the code for some arbitrary DFA, plus some
arbitrary string, and decide if that DFA accepts that string. So it will need to contain a general-purpose
DFA simulator. This is called the acceptance problem for DFA’s.
It’s useful to contrast this with a similar-sounding claim. If D is any DFA, then L(D) is Turing-decidable.
Indeed, to build a TM that accepts L(D), we simply move the TM head to the right over the input, using the
TM’s controller to simulate the controller of the DFA directly.
In this case, we are given a specific fixed DFA D and we only need to cook up a TM that recognizes strings
from this one particular language. This is much easier than ADFA .
To decide ADFA , our TM will use five tapes:
2 : state,
3 : final states
4 : transition triples
5 : input string.
(1) Check the format of the input. Copy the start state to tape
Copy the transition triples and final states of the input machine hDi to tapes 3 and 4 .
145
(4) Change the current state of the simulated DFA from p to q.
Specifically, copy the state q (written on the triple we just found on
4 ), to tape
2 .
(5) Move the tape
5 head to the right (i.e., the simulation handled this input character).
(6) Goto step (3).
146
Chapter 24
This lecture presents more example of languages that are Turing decidable, from Sipser section 4.1.
(B) EQDFA : the language of all pairs of DFAs that have the same language.
n o
(C) ADFA = hD, wi D is a DFA, w is a word, and D accepts w .
Here hD, wi is in the language if and only if the DFA D accepts w.
n o
(D) ANFA = hD, wi D is a NFA generating w .
n o
(E) EQDFA = hD, Ci D, C are DFA’s and L(D) = L(C) .
n o
(F) Aregex = hR, wi R is a regular expression generating w .
To decide this language, the TM can convert R into a DFA D, and then check if hD, wi ∈ ADFA .
147
24.2.1 Context-free languages are TM decidable
Given a RA P, we are interested in the question of whether we can build a TM decider that accepts L(D).
Observe, that we can turn P into an equivalent CFG, and this CFG can be turned into an equivalent CNF
grammar G. With G it is now easy to decide if an input word w is in L(G). Indeed, we can either using the
CYK algorithm to decide if a word is in the grammar, or alternatively, enumerate all possible parse trees for
the given CNF that generates the given word w. That is, if n = |w|, then we need to generate all possible
parse trees with 2n − 1 internal nodes (since this is the size of a parse tree deriving such a word in CNF),
and see if any of them generates w. In either case, we have the following.
Lemma 24.2.1 Given a RA P, there is a TM T which is a decider, and L(P) = L(T). Namely, for every RA
there exists an equivalent TM.
Proof: We build a TM TCFG for ACFG . The input for it is the pair hG, wi. AS a first step, we convert G to
be in CNF (we saw the algorithm of how to do this in detail in class). Let G 0 denote the resulting grammar.
Next, we use CYK to decide if w ∈ L(G 0 ). If it is, the TM TCFG accepts, otherwise it rejects.
Given a TM decider TCFG for ACFG , building a TM decider that has language equal to a specific given G
is easy. Specifically, given G, we would like to build a TM decider T0 such that L(T0 ) = L(G).
So, modify the given TM to encode G. As a first step, the new TM T0 would write G on the input tape
(next to the input word w). Next, it would run the TM TCFG on this given input to decide if hG, wi ∈ ACFG .
Remark 24.2.3 (Encoding instances inside a TM.) The above demonstrates that given a more general
algorithm we can use it to solve the problem for specific instances. This is done by encoding the given specific
instance in the constructed TM.
If you have trouble imaging this encoding the whole CFG grammar into the TM, as done above, think
about storing a short string like UIUC in the TM state diagram, to be written out on (say) tape 2. The first
state transition in the TM would write U onto tape 2, the next three transitions would write I, then U, then
C. Finally, it would move the tape 2 head back to the beginning and transition into the first state that does
the actual computation.
Note, that doing this encoding of a specific instance inside the TM, does not necessarily yields the most
efficient TM for the problem. For example, in the above, we could first convert the given instance into CNF
before encoding it into the TM.
We could also hard-code the string w into our TM but leave the grammar as a variable input. We omit
the proof of the following easy lemma.
n o
Lemma 24.2.4 Let w be a specific string. The language ACFG,w = hGi G is a CFG and G generates w
is decidable.
Proof: We already saw that in the conversion algorithm of a CFG into CNF (this was one of the initial
steps of this conversion). We shortly re-sketch the algorithm.
To this end, the TM mark all the variables in G that can generate (in one step) a string of terminals (or
of course). We will refer to such a variable as being useful. Now, the TM iterates repeatedly over the rules
of G. For a rule X → w, where w is a string of terminals and variables, the variable X is useful, if all the
148
variables of w are useful, and in such a case we will mark X as useful. The loop halts when the TM has made
a full pass through the rules of G without marking anything new as useful.
This TM accepts the input grammar if the initial variable of G is useful, and otherwise it rejects.
At every iteration over all the rules of G the TM must designate at least one new variable as new to
repeat this process again. So it follows that the number of outer iterations performed by this algorithm is
bounded by the number of variables in the grammar G‘, implying that this algorithm always terminates.
• Numbers & arithmetic: We already saw in previous lecture how some basic integer operations can
be handled. It is not too hard to extend these to negative integers and perform all required numerical
operations if we allow a TM with multiple tapes. As such, we can assume that we can implement any
standard numerical operation.
Of course, can also do floating point operations on a TM. The details are overwhelming but they are
quite doable. In fact, until 20 years1 ago, many computers implemented floating point operations using
integer arithmetic. Hardware implementation of floating point-operations became mainstream, when
Intel introduced the i486 in 1989 that had FPU (floating-point unit). You would probably will see/seen
how floating point arithmetic works in computer architecture courses.
• Stored constant strings: The program we are trying to translate into a TM might have strings and
constants in it. For example, it might check if the input contains the (all important) string UIUC. As we
saw above, we can encode such strings in the states. Initially, on power-up, the TM starts by writing
out such strings, onto a special tape that we use for this purpose.
• Random-access memory: We will use an associative memory. Here, consider the memory as having
a unique label to identify it (i.e., its address), and content. Thus, if cell 17 contains the value abc, we
will consider it as storing the pair (17, abc). We can store the memory on a tape as a list of such pairs.
Thus, the tape might look like:
Here, address 17 stores the string abc, address 1 stores the string samuel, and so on.
Reading the value of address x from the tape is easy. Suppose x is written on i , and we would like to
find the value associated with x on the memory tape and write it onto j . To do this, the TM scans
mem the memory tape (i.e., the tape we use to simulate the associative memory) from the beginning,
till the TM encounter a pair in mem having x as its first argument. It then copies the second part of
the pair to the output tape j .
Storing new value (x, y) in memory is almost as easy. If a pair having x as first element exists you
delete it out (by writing a special cross-out character over it), and then you write the new pair (x, y)
in the end of the tape mem .
149
If you wanted to use memory more efficiently, the new value could be written into the original location,
whenever the original location had enough room. You could also write new pairs into crossed-out
regions, if they have enough room. Implementations of C malloc/free and Java garbage collection use
slightly more sophisticated versions of these ideas. However, TM designers rarely care about efficiency.
• Subroutine calls: To simulate a real program, we need to be able to do calls (and recursive calls).
The standard way to implement such things is by having a stack. It is clear how to implement a stack
on its own TM tape.
We need to store three pieces of information for each procedure call:
(i) private working space,
(ii) the return value,
(iii) and the name of the state to return to after the call is done.
The private working space needs to be implemented with a stack, because a set of nested procedure
calls might be active all at once, including several recursive calls to the same procedure.
The return value can be handled by just putting it onto a designated register tape, say
24 .
Right before we give control over to a procedure, we need to store the name of the state it should
return to when it is done. This allows us to call a single fixed piece of code from several different places
in our TM. Again, these return points need to be put on a stack, to handle nested procedure calls.
After it returns from a procedure, the TM reads the state name to return to. A special set of TM
states handle reading a state name and transitioning to the corresponding TM state.
These are just the most essential features for a very simple general-purpose computer. In some computer
architecture class, you will see how to implement fancier program features (e.g. garbage collection, objects)
on top of this simple model.
For example, in a perfect world (which we are not living in, naturally), we would like to give a formal
specification of a program (say, a TM that decides if a prime number is prime), and have another program
2 Things of course are way more complicated in practice, since Java virtual machines nowadays usually compile portions of
the code being run frequently to achieve faster performance (i.e., just in time compilation [JIT]), but still, you can safely think
about a JVM as an interpreter.
150
that would swallow this description and spits out the program performing this computation (i.e., have a
computer that writes our programs for us).
A more realistic example is a compiler which translates (say) Java code into assembly code. It takes code
as input and procedures code in a different language as output. We could also build an optimizer that reads
Java code and produces new code, also in Java but more efficient. Or a cheating helper program that reads
Java code and writes out a new version with different variable names and modified comments.
We emphasize that UTM is not a decider. Namely, its stops only if T accepts w, but it might run forever if
T does not accept w.
To simplify our discussion, we assume that T is a single tape machine with some fixed alphabet (say
ΣT = {0, 1} and the tape alphabet is ΓT = {0, 1, ␣}. To simplify the discussion, the TM for ATM is going to
be a multi-tape machine. Naturally, one can convert this TM into a single tape TM.
So, the input for UTM is an encoding hT, wi. As a first step, the UTM would verify that the input is in
the right format (such a reasonable encoding for a TM was given as an exercise in the homework). The UTM
would copy different components of the input into different tapes:
1 : Transition function δ of T.
It is going to be a sequence (separated by $) of transitions. A transition (q, c) → (q 0 , t, L) would be
encouded as a string of the form:
(#q, c) − (#q 0 , t, L)
where #q is the index which is the index of the state q (in T) and #q 0 is the index of q 0 .
More specificially, you an think about the states of T being numbered between 1 and m, and #q is just
the binary representation of the index of the state q.
2 : #q0 – the initial state of T.
3 : #qacc – the accept state of T.
4 : #qrej – the reject state of T .
5 : $w – the input tape to be handled.
Once done copying the input, the UTM would move the head of
5 to the beginning of the tape. It then
performs the following loop:
(I) Loop:
(i) Scan
1 to find transition matching state on
2 and the character under the head of
5 .
(ii) Update state on
2 .
(iii) Update character and head position on
5 .
We repeat this till the state in
2 is equal to the state written on either
3 (qacc ) or
4 (qrej ).
151
Chapter 25
‘There must be some mistake,’ he said, ‘are you not a greater computer than the Milliard Gargantubrain at
Maximegalon which can count all the atoms in a star in a millisecond?’
‘The Milliard Gargantubrain?’ said Deep Thought with unconcealed contempt. ‘A mere abacus - mention it not.’
– The Hitch Hiker’s Guide to the Galaxy, by Douglas Adams.
In this lecture we will discuss the halting problem and diagonalization. This covers most of Sipser
section 4.2.
“We will play a game to decide which way you will die,” said the man. “You may say one thing,
and one thing only. If what you say is true, I will strangle you with my bare hands. If what you
say is false, I will cut off your head.”
After some soul-searching, Lief replies “My head will be cut off.” At this point, there’s no way for the
giant to make good on his threat, so the spell he’s under melts away, he changes back to his original bird
form, and Lief gets to cross the bridge.
The key problem for the giant is that, if he strangles Lief, then Lief’s statement will have been false.
But he said he would strangle him only if his statement was true. So that does not work. And cutting off
his head does not work any better. So the giant’s algorithm sounded good, but it turned out not to work
properly for certain inputs.
A key property of this paradox is that the input (Lief’s reply) duplicates material used in the algorithm.
We’ve fed part of the algorithm back into itself.
152
Recognize-ATM ( hM, wi)
We saw in the previous lecture, that one can build a Simulate M using UTM till it halts
universal Turing machine UTM that can simulate any if M halts and accepts then
Turing machine on any input. As such, using UTM , we accept
have the following TM recognizing ATM : else
reject
25.2.1 Implications
So, let us suppose that the Halting problem (i.e., deciding if a word in is in ATM ) were decidable. Namely,
there is an algorithm that can solves it (for any input). this seems somewhat hard to believe since even
humans can not solve this problem (and we still live under the delusion that we are smarter than computers).
If we could decide the Halting problem, then we could build compilers that would automatically prevent
programs from going into infinite loops and other very useful debugging tools. We could also solve a variety
of hard mathematical problems. For example, consider the following program.
Percolate ( n)
for p < q < n do
if p is prime and q is prime, and p + q = n then
return
Main:
n←4
while true do
Percolate (n)
n←n+2
Does this program stops? We do not know. If it does stop, then the Strong Goldbach conjecture is
false.
Conjecture 25.2.1 (Strong Goldbach conjecture.) Every even integer greater than 2 can be written as
a sum of two primes.
This conjecture is still open and its considered to be one of the major open problems in mathematics. It
was stated in a letter on 7 of June 1742, and it is still open. Its seems unlikely that a computer program
would be able to solve this, and a larger number of other mathematical conjectures. If ATM is decidable, then
we can write a program that would try to generate all possible proofs of a conjecture and verify each proof.
Now, if we can decide if a programs stop, then we can discover whether or not a mathematical conjecture
is true or not, and this seems extremely unlikely (that a computer would be able to solve all problems in
mathematics).
153
I hope that this informal argument convinces you that its seems extremely unlikely that ATM is TM
decidable. Fortunately, we can prove this fact formally.
Theorem 25.4.1 (The halting theorem.) The language ATM is not TM decidable,
154
Proof: Assume ATM is TM decidable, and let Halt be this TM deciding ATM . That is, Halt is a TM
that always halts, and works as follows
(
accept M accepts w
Halt hM, wi =
reject M does not accept w.
We input hM
will now build a new TM Flipper, such that on the i, it runs Halt on the input hM, M i. If
Halt hM, M i accepts than Flipper rejects, and if Halt hM, M i rejects than Flipper accepts. Formally
Flipper ( hM i)
res ← Halt(hM, M i)
if res is accept then
reject
else
accept
The key observation is that Flipper always stops. Indeed, it uses Halt as a subroutine and Halt by our
assumptions always halts. In particular, we have the following
(
reject M accepts hM i
Flipper hM i =
accept M does not accept hM i .
Flipper is a TM (duh!), and as such it has an encoding hFlipperi. Now, consider running Flipper on itself.
We get the following
(
reject Flipper accepts hFlipperi
Flipper hFlipperi =
accept Flipper does not accept hFlipperi .
This is absurd. Ridiculous even! Indeed, if Flipper accepts hFlipperi, then it rejects it (by the above
definition), which is impossible. Indeed, if Flipper must reject (note, that Flipper always stops!) hFlipperi,
but then by the above definition it must accept hFlipperi, which is also impossible.
Thus, it must be that our assumption that Halt exists is false. We conclude that ATM is not TM
decidable.
155
Theorem 25.5.1 There is no C program that reads a C program P and input w, and decides if P “accepts”
w.
The proof of the above theorem is identical to the halting theorem - we just perform our rewriting on
the C program.
Also, notice that being able to recognize a language and its complement implies that the language is
decidable, as the following theorem testifies.
Theorem 25.5.2 A language is TM decidable iff it is TM recognizable and its complement is also TM
recognizable.
Proof: It is obvious that decidability implies that the language and its complement are recognizable. To
prove the other direction, assume that L and L are both recognizable. Let M and N be Turing machines
recognizing them, respectively. Then we can build a decider for L by running M and N in parallel.
Specifically, suppose that w is the string input to M . Simulate both M and N using UT M , but single-step
the simulations. Advance each simulation by one step, alternating between the two simulations. Halt when
either of the simulations halts, returning the appropriate answer.
If w is in L, then the simulation of M must eventually halt. If w is not in L, then the simulation of N
must eventually halt. So our combined simulation must eventually halt and, therefore, it is a decider for L.
156
Chapter 26
Meta definition: Problem A reduces to problem B, if given a solution to B, then it implies a solution for
A. Namely, we can solve B then we can solve A. We will done this by A =⇒ B.
An oracle ORAC for a language L is a function that receives as a word w, and it returns true if and
only if w ∈ L. An oracle can be thought of as a black box that can solve membership in a language without
requiring us to consider the question of whether L is computable or not. Alternatively, you can think about
an oracle as a provided library function that computes whatever it requires to do, and it always return (i.e.,
it never goes into an infinite loop).
Intuitively, a TM decider for a language L is the ultimate oracle. Not only it can decide if a word is in
L, but furthermore, it can be implemented as a TM that always stops.
In the context of showing languages are undecidable, the following more specific definition would be
useful.
Definition 26.1.1 A language X reduces to a language Y , if one can construct a TM decider for X using
a given oracle ORACY for Y .
We will denote this fact by X =⇒ Y .
In particular, if X reduces to Y then given a decider for the language Y (i.e., an oracle for Y ), then there
is a program that can decide X. So Y must be at least as “hard” as X. In particular, if X is undecidable,
then it must be that Y is also undecidable.
Warning. It is easy to get confused about which of the two problems “reduces” to the other. Do not get
hung up on this. Instead, concentrate on getting the right outline for your proofs (proving them in the right
direction, of course).
Reduction proof technique. Formally, consider a problem B that we would like to prove is undecidable.
We will prove this via reduction, that is a proof by contradiction, similar in outline to the ones we have seen
for regular and context-free languages. You assume that your new language L (i.e., the language of B) is
decided by some TM M . Then you use M as a component to create a decider for some language known to
be undecidable (typically ATM ). This is would imply that we have a decider for A (i.e., ATM ). But this is
a contradiction since A (i.e., ATM ) is not decidable. As such, we must have been wrong in assuming that L
was decidable.
157
We will concentrate on using reductions to show that problems are undecidable. However, the technique
is actually very general. Similar methods can be used to show problems to be not TM recognizable. We have
used similar proofs to show languages to be not regular or not context-free. And reductions will be used in
CS 473 to show that certain problems are “NP complete”, i.e. these problems (probably) require exponential
time to solve.
Proof: Let T be the TM decider for Y . Since X reduces to Y , it follows that there is a procedure TX|Y
(i.e., TM decider) for X that uses an oracle for Y as a subroutine. We replace the calls to this oracle in TX|Y
by calls to T. The resulting TM TX is a TM decider and its language is X. Thus X is TM decidable.
Lemma 26.1.3 Let X and Y be two languages, and assume that X =⇒ Y . If X is TM undecidable then
Y is TM undecidable.
26.2 Halting
We remind the reader that ATM is the language
n o
ATM = hM, wi M is a TM and M accepts w .
This is the problem that we showed (last class) to be undecidable (via diagonalization). Right now, it is
the only problem we officially know to be undecidable.
Consider the following slight modification, which is all the pairs hM, wi such that M halts on w. Formally,
n o
AHalt = hM, wi M is a TM and M stops on w .
Intuitively, this is very similar to ATM . The big obstacle to building a decider for ATM was deciding
whether a simulation would ever halt or not.
To show formally that AHalt is undecidable, we show that we can use a oracle for AHalt to build a decider
for ATM . This construction looks like the following.
Lemma 26.2.1 The language ATM reduces to AHalt . Namely, given an oracle for AHalt one can build a
decider (that uses this oracle) for ATM .
Proof: Let ORACHalt be the given oracle for AHalt . We build the following decider for ATM .
Decider-ATM hM, wi
res ← ORACHalt hM, wi
// if M does not halt on w then reject.
if res = reject then
halt and reject.
return res2 .
158
Clearly, this procedure always return and as such its a decider for ATM .
We will be usually less formal in our presentation. We will just show that given a TM decider for AHalt
implies that we can build a decider for ATM . This would imply that ATM is undecidable.
Thus, given a black box (i.e., decider) TMHalt that can decide membership in AHalt , we build a decider
for ATM is follows.
reject reject
This would imply that if AHalt is decidable, then we can decide ATM , which is of course impossible.
26.3 Emptiness
Now, consider the language
n o
ETM = hM i M is a TM and L(M ) = ∅ .
Again, we assume that we have a decider for ETM . Let us call it TMETM . We need to use the component
TMETM to build a decider for ATM .
A decider for ATM is given M and w and must decide whether M accepts w. We need to restructure this
question into a question about some Turing machine having an empty language. Notice that the decider for
ETM takes only one input: a Turing machine. So we have to somehow make the second input (w) disappear.
The key trick here is to hard-code w into M , creating a TM Mw which runs M on the fixed string w.
Specifically the code for Mw might look like:
TM Mw :
1. Input = x (which will be ignored)
2. Simulate M on w.
3. If the simulation accepts, accept. If the simulation rejects, reject.
Its important to understand what is going on. The input is hM i and w. Namely, a string encoding M
and a the string w. The above shows that we can write a procedure (i.e., TM) that accepts this two strings
as input, and outputs the string hMw i which encodes Mw . We will refer to this procedure as EmbedString.
The algorithm EmbedString(hM, wi) as such, is a procedure reading its input, which is just two strings,
and outputting a string that encodes the TM hMw i.
It is natural to ask, what is the language of the machine encoded by the string hMw i; that is, what is
L(Mw )?
Because we are ignoring the input x, the language of Mw is either Σ∗ or ∅. It is Σ∗ if M accepts w, and
it is ∅ if M does not accept w.
We are now ready to prove the following theorem.
159
Theorem 26.3.1 The language ETM is undecidable.
Proof: We assume, for the sake of contradiction, that ETM is decidable, and let TMETM be its decider.
Next, we build our decider AnotherDecider-ATM for ATM , using the EmbedString procedure described
above.
AnotherDecider-ATM(hM, wi)
hMw i ← EmbedString (hM, wi)
r ← TMETM (hMw i).
if r = accept then
reject.
return accept
Observe, that AnotherDecider-ATM never actually runs the code for Mw . It hands the code to a
function TMETM which analyzes what the code would do if we ever did choose to run it. But we never run
it. So it does not matter that Mw might go into an infinite loop.
Also notice that we have two input strings floating around our code: w (one input to the decider for
ATM ) and x (input to Mw ). Be careful to keep track of which strings are input to which functions. Also be
careful about how many inputs, and what types of inputs, each function expects.
26.4 Equality
An easy corollary of the undecidability of ETM is the undecidability of the language
n o
EQTM = hM, N i M and N are TM’s and L(M ) = L(N ) .
160
Proof: Suppose that we had a decider DeciderEqual for EQTM . Then we can build a decider for ETM
as follows:
TM R:
1. Input = hM i
2. Include the (constant) code for a TM T that rejects all its input. We denote the string encoding
T by hT i.
3. Run DeciderEqual on hM, T i.
4. If DeciderEqual accepts, then accept.
5. If DeciderEqual rejects, then reject.
Since the decider for ETM (i.e., TMETM ) takes one input but the decider for EQTM (i.e. DeciderEqual)
requires two inputs, we are tying one of DeciderEqual’s input to a constant value (i.e., T ).
There are many Turing machines that reject all their input and could be used as T . Building code for R
just requires writing code for one such TM.
26.5 Regularity
It turns out that almost any property defining a TM language induces a language which is undecidable, and
the proofs all have the same basic pattern. Let us do a slightly more complex example and study the outline
in more detail.
Let n o
RegularTM = hM i M is a TM and L(M ) is regular .
Suppose that we have a TM DeciderRegL that decides RegularTM . In this case, doing the reduction
from halting, would require to turn a problem about deciding whether a TM M accepts w (i.e., is w ∈ ATM )
into a problem about whether some TM accepts a regular set of strings.
Given M and w, consider the following TM Mw0 :
TM Mw0 :
(i) Input = x
(ii) If x has the form an bn , halt and accept.
(iii) Otherwise, simulate M on w.
(iv) If the simulation accepts, then accept.
(v) If the simulation rejects, then reject.
Again, we are not going to execute Mw0 directly ourself. Rather, we will feed its description hMw0 i (which
is just a string) into DeciderRegL. Let EmbedRegularStringdenote this algorithm, which accepts as
input hM i and w, and outputs hMw0 i, which is the encoding of the machine Mw0 .
If M accepts w, then every input x will eventually be accepted by the machine Mw0 . Some are accepted
right away and some are accepted in step (i). So if M accepts w then the language of Mw0 is Σ∗ .
If M does not accept w, then some strings x (that are of the form an bn ) will be accepted in step (ii) of
0
Mw . However, after that,n either step (iii)
o will never halt or step (iv) will reject. So the rest of the strings
n n
(that are in the set Σ \ a b n ≥ 0 ) will not be accepted. So the language of Mw0 is an bn in this case.
∗
Since an bn is not regular, we can use our decider DeciderRegL on Mw0 to distinguish these two cases.
Notice that the test in step (ii) was cooked up specifically to match the capabilities of our given decider
DeciderRegL. If DeciderRegL had been testing whether our language contained the string “uiuc”, step
(ii) would be comparing x to see if it was equal to “uiuc”. This test can be anything that a TM can compute
without the danger of going into an infinite loop.
Specifically, we can build a decider for ATM as follows.
161
YetAnotherDecider-ATM (hM, wi)
hMw0 i ← EmbedRegularString (hM, wi)
r ← DeciderRegL(hMw0 i).
return r
— If DeciderRegL accepts, then L(Mw0 ) is regular. So it must be Σ∗ . This implies that M accepts w.
So YetAnotherDecider-ATM should accept hM, wi.
— If DeciderRegL rejects, then L(Mw0 ) is not regular. So it must be an bn . This implies that M does
not accept w. So YetAnotherDecider-ATM should reject hM, wi.
26.6 Windup
Notice that the code in Section 26.5 is almost exactly the same as the code for the ETM example in Sec-
tion 26.3. The details of Mw and Mw0 were different. And one example passed on the return values from
YetAnotherDecider-ATM directly, whereas the other example negated them. This similarity is not acci-
dental, as many examples can be done with very similar proofs.
Next class, we will see Rice’s Theorem, which uses this common proof template to show a very general
result. Namely, almost any nontrivial property of a TM’s language is undecidable.
162
Chapter 27
To do this, we assume that RegularTM was decided by some TM S. We then used this to build a decider
for ATM (which can not exist)
(i) Input = x
(ii) If x has the form an bn , halt and accept.
163
(iii) Otherwise, simulate M on w.
(iv) If the simulation accepts, then accept.
(v) If the simulation rejects, then reject.
That is L3 contains all Turing machines whose languages contain exactly three strings.
Proof: Proof by reduction from ATM . Assume, for the sake of contradiction, that L3 was decidable and let
deciderL3 be a TM deciding it. We use deciderL3 to construct a Turing machine decider9 -ATM deciding
ATM . The decider TMdecider9 -ATM is constructed as follows:
decider9 -ATM ( hM, wi )
Construct a new Turing machine Mw :
Mw ( x ): // x: input
res ← Run M on w
if (res = reject) then
reject
if x = UIUC or x = Iowa or x = Michigan then
accept
reject
(We emphasize here again, that constructing Mw involve taking the encoding of hM i and w, and gener-
ating the encoding of hMw i.)
Notice that the language of Mw has only two possible values. If M loops or rejects w, then L(Mw ) = ∅.
If M accepts w, then th the language
of Mw contains exactly three strings: “UIUC”, “Iowa”, and “Michigan”.
So decider9 -ATM hMw i accepts exactly when M accepts w. Thus, decider9 -ATM is a decider for ATM
But we know that ATM is undecidable. A contradiction. As such, our assumption that L3 is decidable is
false.
164
Theorem 27.2.2 (Rice’s Theorem.) Suppose that L is a language of Turing machines; that is, each word
in L encodes a TM. Furthermore, assume that the following two properties hold.
(a) Membership in L depends only on the Turing machine’s language, i.e. if L(M ) = L(N ) then hM i ∈ L ⇔
hN i ∈ L.
(b) The set L is “non-trivial,” i.e. L 6= ∅ and L does not contain all Turing machines.
Then L is a undecidable.
Proof: Assume, for the sake of contradiction, that L is decided by TMdeciderForL. We will construct
a TMDecider4 -ATM that decides ATM . Since Decider4 -ATM does not exist, we will have a contradiction,
implying that deciderForL does not exist.
Remember from last class that TM∅ is a TM (pick your favorite) which rejects all input strings. Assume,
for the time being, that TM∅ 6∈ L. This assumption will be removed shortly.
Since L is non-trivial, also choose some other TM Z ∈ L. Now, given hM, wi Decider4 -ATM will construct
the encoding of the following TM Mw .
TM Mw :
(1) Input = x.
(2) Simulate M on w.
(3) If the simulation rejects, halt and reject.
(4) If the simulation accepts, simulate Z on x and accept if and only if T halts and accepts.
If M loops or rejects w, then Mw will get stuck on line (2) or stop at line (3). So L(Mw ) is ∅. Because
membership in L depends only on a Turing machine’s language and hTM∅ i is not in L, this means that Mw
is not in L. So Mw will be rejected by N .
If M accepts w, then Mw will proceed to line (4), where it simulates the behavior of Z. So L(Mw ) will
be L(Z). Because membership in L depends only on a Turing machine’s language and T is L, this means
that Mw is in L. So Mw will be accepted by N .
As usual, our decider for ATM looks like:
Decider4 -ATM (hM, wi)
Construct hMw i from hM, wi
return deciderForL (hMw i)
So Decider4 -ATM (hM, wi) will accept hM, wi iff deciderForL accepts Mw . But we saw above that
deciderForL accepts Mw iff M accepts w. So Decider4 -ATM is a decider for ATM . Since such a decider
cannot exist, we must have been wrong in our assumption that there was a decider for L.
Now, let us remove the assumption that TM∅ ∈ / L. The above proof showed that L is undecidable,
assuming that hTM∅ i was not in L. If TM∅ ∈ L, then we run the above proof using L in place of L. At the
end, we note that L is decidable iff L is decidable.
165
27.3.2 A decidable behavior property
For example, consider the following set of Turing machines:
n o
LR = hM i M never moves left for the input x, where x is the empty word .
Surprising, the language LR is decidable because never moving left (equivalently: always moving right)
destroys the Turing machine’s ability to do random access into its tape. It is effectively made into a DFA.
Specifically, if a Turing machine M never moves left, it reads through the whole input, then starts looking
at blank tape cells. Once it is on the blank part of the tape, it can cycle through its set of states. But after
|Q| moves, it has run out of distinct states and must be in a loop. So, if you watch M for four moves (the
length of the string "UIUC") plus |Q| + 1 moves, it has either halted or its in an infinite loop.
Therefore, to decide LR , you simulate the input Turing machine for |Q| + 5 moves. After that many
moves, it has either
This algorithm is a decider (not just a recognizer) for L, because it definitely halts on any input Turing
machine M .
This language Lx is undecidable. The reason is that a Turing machine with this restriction (no writing
x’s) can simulate a Turing machine without the restriction.
Proof: Suppose that Lx were decidable. Let R be a Turing machine deciding Lx . We will now construct
a Turing machine S that decides ATM .
S is constructed as follows:
• Input is hM, wi, where M is the code for a Turing Machine and w is a string.
166
Appendix - more examples of
undecidable languages
167
• Input is hM, wi, where M is the code for a Turing Machine and w is a string.
• Construct code for a new Turing machine Mw as follows:
– Input is a string x.
– Ignore the value of x.
– Simulate M on w.
• Feed hMw i to R. If R accepts, then accept. If R rejects, then reject.
If M accepts w, the language of Mw contains all strings and, thus, in particular the empty string. If M
not accept w, the language of Mw is the empty set and, thus, does not contain the empty string. So
does
R hMw i accepts exactly when M accepts w. Thus, S decides ATM
But we know that ATM is undecidable. So S can not exist. Therefore we have a contradiction. So
Halt_Empty_TM must have been undecidable.
– every use of the character 1 is replaced by a new character 10 which M does not use.
– when M would accept, M 0 first prints 111 and then accepts
• Similarly, create a string w’ in which every character 1 has been replaced by 10 .
• Create a second new Turing machine Mw0 which simulates M 0 on the hard-coded string w0 .
168
Chapter 28
This lecture covers dovetailing, a method for running a gradually expanding set of simulations in parallel.
We use it to demonstrate that non-deterministic TMs can be simulated by deterministic TMs.
28.1 Dovetailing
28.1.1 Interleaving
We have seen that you can run two Turing machines in parallel, to compute some function of their outputs,
e.g. recognize the union of their languages.
Suppose that we had Turing machines SkM1 , . . . , Mk recognizing languages L1 , . . . , Lk , respectively. Then,
we can build a TM M which recognizes i=1 Li . To do this, we assume that M has k simulation tapes, plus
input and working tapes. The TM M cycles through the k simulations in turn, advancing each one by a
single step. If any of the simulations halts and accepts, then M halts and accepts.
We could use this same method to run a single TM M on a set of k input strings w1 , . . . , wk ; that is,
accept the input list if M accepts any of the strings w1 , . . . , wk .
The limitation of this approach is that the number of tapes is finite and fixed for any particular Turing
machine.
The language L b is recognizable, but we have to be careful how we construct its recognizer M
c. Because M
is not necessarily a decider, we can not process the input strings one after another, because one of them
might get stuck in an infinite loop. Instead, we need to run all k simulations in parallel. But k is different
c, so we can not just give k tapes to M
for different inputs to M c.
Instead, we can store all the simulations on a single tape
T . Divide up
T
into k sections, one for each simulation. If a simulation runs out of space in its section, push everything over
to the right to make more room.
169
28.1.3 Dovetailing
algBuggyRecog (hM i)
x=
while True do
simulated M on x (using UTM )
if M accepts then
halt and accept
x ← next string in lexicographic order
Unfortunately, if M never halts on one of the strings, this process will get stuck before it even reaches
the string that M does accept. So we need to run our simulations in parallel. Since we can not start up an
infinite number of simulations all at once, we use the following idea.
Dovetailing is the idea of running k simulations in parallel, but keep dynamically increasing k. So, for
our example, suppose that we store all our simulations on tape T and x lives on some other tape. Then
our code might look like:
algDovetailingRecog (hM i)
x=
while True do
On
T , start up the simulation of M on x
Advance all simulations on
T by one step.
if any simulation on
T accepted then
halt and accept
x ← Next(x)
for i = 0, 1, . . .
170
(3) Run the set of simulations for i steps.
(4) If any simulation has accepted, halt and accept
(5) Otherwise increment i and repeat the loop
Each iteration of the loop does only a finite amount of work: i steps for each of i simulations. However,
because i increases without bound, the loop will eventually consider every string in Σ∗ and will run each
simulation for more and more steps. So if there is some string w which is accepted by M , our procedure will
eventually simulate M on w for enough steps to see it halt.
An NTM M accepts an input w if there is some possible run of M on w which reaches the accept state.
Otherwise, M does not accept w.
This works just like non-determinism for the simpler automata. That is, you can either imagine searching
through all possible runs, or you can imagine that the NTM magically makes the right guess for what option
to take in each transition.
For regular languages, the deterministic and non-deteterministic machines do the same thing. For context-
free languages, they do different things. We claim that non-deterministic Turing machines can recognize the
same languages as ordinary Turing machines.
171
28.2.2 Halting and deciders
Like a regular TM, an NTM can cleanly reject an input string w or it can implicitly reject it by never halting.
An NTM halts if all possible runs eventually halt. Once it halts, the NTM accepts the input if some run
ended in the accept state, and rejects the input if all runs ended in the reject state.
A NTM is a decider if it halts on all possible inputs on all branches. Formally, if you think about all
possible configurations that a TM might generate for a specific input as a tree (i.e., a branch represents a
non-deterministic choice) then an NTM is a decider if and only if this tree is finite, for all inputs.
The simulation we used above has the property that the simulation halts exactly if the NTM would have
halted. So we have also shown that
28.2.3 Enumerators
A language can be enumerated , if there exists a TM with an output tape (in addition to its working tape),
that the TM prints out on this tape all the words in the language (assume that between two printed words
we place a special separator character). Note, that the output tape is a write only tape.
Definition 28.2.3 (Lexicographical ordering.) For two strings s1 and s2 , we have that s1 < s2 in
lexicographical ordering if |s1 | < |s2 | or |s1 | = |s2 | then s1 appears before s2 in the dictionary ordering.
(That is, lexicographical ordering is a dictionary ordering for strings of the same length, and shortest
strings appear before longer strings.)
Proof: Let T the TM recognizer for L, and we need to build an enumerator for this language. Using
dovetailing, we “run” T on all the strings in Σ∗ = {w1 , w2 , . . .} (say in lexicographical ordering). Whenever
one of this executions stops and accepts on a string wi , we print this string wi to the enumerator output
string. Clearly, all the words of L would be sooner or later printed by this enumerated. As such, this language
can be enumerated.
As for the other direction, assume that we are given an enumerate Tenum for L. Given a word x ∈ Σ∗ ,
we can recognize if it is in L, by running the enumerator and reading the strings it prints out one by one. If
one of these strings is x, then we stop and accept. Otherwise, this TM would continue running. Clearly, if
x ∈ L sooner or later the enumerator would output x and our TM would stop and accept it.
172
Chapter 29
“It is a damn poor mind indeed which can’t think of at least two ways to spell any word.”
– Andrew Jackson
This lecture covers Linear Bounded Automata, an interesting compromise in power between Turing
machines and the simpler automatas (DFAs, NFAs, PDAs). We will use LBAs to show two CFG grammar
problems (equality and generating all strings) are undecidable.
In some of the descriptions we uses PDAs. However, the last part of this notes show how these PDAs can
be avoided, resulting in arguably a simpler and slightly more elegant argument.
173
Then T can be in at most
α(n) = kn ∗ n ∗ q = k n nq (29.1)
configurations.
Here is the shocker: If an LBA runs more than α(n) steps, then it must be looping. As such, given an
LBA T and a word w (with n characters), we can simulate it for α steps. If it does not terminate by then,
then it must be looping, and as such it can never stop on its input. Thus, an LBA that stops on input w
must stop in at most α(|w|) steps.
This implies that n o
ALBA = hT, wi T is a LBA and T accepts w
is decidable. Similarly, the language
n o
HaltLBA = hT, wi T is a LBA and T stops on w .
is decidable.
We formally prove one of the above claims. The other one follows by a similar argumentation.
The idea. Assume we are given a general TM T (which we emphasize is not an LBA) and a word w. We
would like to decide if T accepts w (which is of course undecidable).
If T does accept w, we can demonstrate that it does by providing a trace of the execution of T on w. This
trace (defined formally below) is just a string. We can easily build a TM that verifies that a supposed trace
is legal (e.g. uses the correct transitions for T), and indeed shows that T accepts w.
Crucially, this trace verification can be done by a TM VrfT,w that just uses the space provided by the
string itself. That is, the verifier VrfT,w is an LBA. The language of VrfT,w is empty if T does not accept w
1 Forthe very careful reader, Sipser handles this case slightly differently. His encoding of T would specify that the machine
is supposed to be an LBA. Attempts to move off the input region would cause the read head to stay put.
174
(because then there is no accepting trace). If T does accept w then the language of VrfT,w contains a single
word: the trace showing that T accepts w. So, if we have a decider than decides if hVrfT,w i ∈ ELBA , then we
can decide if T accepts w.
Observe that we assumed nothing about T or w. The only required property is that VrfT,w is a LBA.
The pair of sharp signs marks the end of the trace, so the algorithm knows when the trace ends.
Such a trace is an accepting trace if the configuration Ck is an accepting configuration (i.e., the accept
state qacc of T is the state of T encoded in Ck ).2
Initial checks. So, we are given hTi and w, and we want to build a verifier VrfT,w that checks, given a
trace t as input, that this trace is indeed an accepting trace for T on w. As a first step, VrfT,w will verify that
C1 (the first configuration written in t) is indeed q0 w. Next it needs to verify that Ck (the last configuration
in t) is an accepting configuration which is also easy (i.e., just verify that qacc is the state written in it).
Finally, the verifier needs to make sure that the ith configuration implies the (i + 1)th configuration in the
trace t, for all i.
Verifying two consecutive configurations. So, consider the ith configuration in t, that is
Ci = αaqbβ,
where α and β are two strings. Naturally, Ci+1 is the next configuration in the input trace t. Since VrfT,w
has the code of T inside it (as a built-in constant), it knows what δT (q, b), the transition function of T, is.
Say it knows that δT (q, b) = (q 0 , c, R). If our input is a valid trace, then Ci+1 is supposed to be
Ci+1 = αacq 0 β.
To verify that Ci and Ci+1 do match up in this way, the TM VrfT,w goes back and forth on the tape
erasing the parts of Ci and Ci+1 that must be identical. We can not erase these symbols: we will need to
keep Ci+1 around so we can check it against Ci+2 . So instead we translate each letter a into a special version
of this character b
a.3
After we have marked all the identical characters, we’ve verified this pair of configurations except for the
middle two to three letters (depending on whether this was a left or right move). So the tape in this stage
looks like
Ci+1
Ci
z }| { z }| {
bacq 0 βb # . . .
. . . # αaqbβ # α
We have verified that the prefix of Ci (i.e., α) is equal to αb, and the suffix of Ci (i.e., β) is equal to the suffix
b So only thing that remains to be verified is the middle part, which can be easily done since
of Ci+1 (i.e., β).
we know T’s transition function.
After that, the verifier removes the hats from the characters in Ci+1 and moves right to match Ci+1
against Ci+2 . If it gets to the end of the trace and all these checks were successful, the verifier VrfT,w accepts
the input trace t.
2 It should also be the case that no previous configuration in this trace is either accepting or rejecting. This is implied by
the fact that TM’s don’t have transitions out of the accept and reject states.
3 We have omitted some details about how to handle moves near the right and left ends of the non-blank tape area. There
details are tedious but easy to fill in, and the reader should verify that they know how to fill in the missing details.
175
Lemma 29.1.2 Given a (general) TM T and a string w, one can build a verifier VrfT,w , such that given an
accepting trace t, the verifier accepts t, and no other string. Note, that VrfT,w is a decider that always stops.
Moreover, VrfT,w is a LBA.
Proof: Proof by reduction from ATM . Assume for the sake of contradiction that ELBA is decidable, and
let ELBA −Decider be the TM that decides it. We will build a decider decider5 -ATM for ATM .
decider5 -ATM hT, wi
Check that hTi is syntactically correct TM code
Compute hVrfT,w i from hT, wi.
res ← ELBA −Decider hVrfT,w i .
if res == accept then
reject
else
accept
Since we can compute hVrfT,w i from hT, wi, it follows that this algorithm is a decider. Furthermore,
given hT, wi such that T accepts w, then
there exists
an accepting trace t for T accepting w, and as such,
L(VrfT,w ) 6= ∅. As such ELBA −Decider hVrfT,w i rejects its input, which imply that decider5 -ATM accepts
hT, wi.
Similarly, if T does not accept w, then L(VrfT,w ) = ∅. As such, ELBA −Decider hVrfT,w i accepts its
input, which imply that decider5 -ATM rejects hT, wi.
Thus decider5 -ATM is indeed a decider for ATM , but this is impossible, and we thus conclude that our
assumption, that ELBA is decidable, was false, implying the claim.
We provide a direct proof of Theorem 29.1.3 because it is shorter and simpler. The benefit of the previous
proof is that it introduces the idea of verifying accepting traces, which we would revisit shortly.
Alternative direct proof of Theorem 29.1.3: We are given hT, wi, were T is a TM and w is an input for it.
We will assume that the tape alphabet of T is Γ, its input alphabet is Σ, and assume that z and $ are not
in Γ. We build a new machine Zw from T and w that gets as input a word of the form zk $. The machine
Zw first writes w on the input tape, move the head to te beginning of the tape, and then just runs T on the
input, with the modification that the new machine treats z as a space. However, if the new machine ever
reaches the T $ character on the input (in any state), it immediately stops and rejects.
Clearly, Zw is an LBA (by definition). Furthermore, if T accepts w after k steps, then Zw would accept
the word wzk+1 $. Similarly, if wzj $ is accepted by Zw then T would accept w. We thus conclude that L(Zw )
is not empty if and only if w ∈ L(T).
Going back to the proof, given hTi and w the construction of hZw i is easy. As such, assume for the sake
of contradiction, that ELBA is decidable, and we are given a decider for membership of ELBA , we can feed it
hZw i, and if this decider accepts (i.e., L(Zw ) = ∅, then we know that T does not accept w. Similarly, if Zw is
being rejected by the decider, then L(Zw ) 6= ∅, which implies that T accepts w. Namely, we just constructed
a decider for ATM , which is undecidable. A contradiction.
176
29.2 On undecidable problems for context free grammars
We would like to prove that some languages involved with context-free grammars are undecidable. To this
end, to reduce ATM to a question involving CFGs, we somehow need to map properties of TMs to CFGs.
is a CFG.
Proof: Let Γ be the tape alphabet of T, and Q be the set of states of T. Let δ be the transition function of
T. We have the following rewriting rules depending on δ:
∀α, β ∈ Γ∗ ∀b, c, d ∈ Γ ∀q ∈ Q
R
if δ(q, c) = (q 0 , d, R) then αqcβ 7→ αdq 0 β ≡ αqcβ 7→ β R q 0 dαR
R
if δ(q, c) = (q 0 , d, L) then αbqcβ 7→ αq 0 bdβ. ≡ αbqcβ 7→ β R dbq 0 αR .
Intuitively, x 7→ y is equivalent to saying that the string x can be very locally edited and generate y. In
the above, we need to copy the α and β portions, and then do the rewriting which only involves at most 3
letters. As such, the grammar
=⇒ S1 → C
C → xCx ∀x ∈ Γ
C → T
T → qcZq 0 d ∀b, c, d ∈ Γ ∀q ∈ Q such that δ(q, c) = (q 0 , d, R)
0
T → bqcZbdq ∀b, c, d ∈ Γ ∀q ∈ Q such that δ(q, c) = (q 0 , d, L)
Z → xZx ∀x ∈ Γ
C → #.
177
n o
Theorem 29.2.3 The language hG, G 0 i L(G) ∩ L(G 0 ) 6= ∅ is undecidable. Namely, given two context free
grammars, there is no decider that can decide if there is a word that they both generates.
Proof: If this was decidable, then given hT, wi, we can decide if the language LT,w,trace of Lemma 29.2.2 is
empty or not, since it is the intersection of two context free grammars that can be computed from hT, wi. But
if this language is not empty, then T accepts w. Namely, we got a decider for ATM , which is a contradiction.
The idea
The idea is that given T and w to build a verifier to an accepting traces for T and w. Here the verifier is
going to be a CFG. The problem is, if you think about it, is that there is no way that a CFG can verify a
trace, as the checks needed to be performed are too complicated to be performed by a CFG.
Luckily, we can generate a CFG VrfGT,w that would accept all the traces that are not accepting traces
for T on w. Indeed, we will build several CFGs, each one “checking” one condition, and their union would
be the required grammar. As such, L(VrfGT,w ) is the set of all strings, if and only if, T does not have an
accepting trace for w.
The alphabet of our grammar is going to be
Σ = Γ ∪ Q ∪ {#} ,
where Γ is the tape alphabet of T, Q is the set of states of T, and # is the special separator character.
(Or, almost. There is a small issue that needs to be fixed, but we will get to that in a second.)
or, if there are an even number of configurations in the trace, the trace would be written as
178
Our basic plan is still valid. Indeed, there will be an accepting trace in this modified format if and only
if T accepts w.
Verifying two consecutive configurations. Let us build a pushdown automata that reads two config-
urations X#Y R # and decides if the configuration X does not imply Y . To make things easier, let us first
build a PDA that checks that the configuration X does imply the configuration Y .
The PDA P would scan X and push it as it to the stack. As it reads X and read the state written
in X, it can push on the stack how the output configuration should look like (there is a small issue with
having to write the state on the stack. This can be easily be done by some a few pushes and pops, but this
is tedious but manageable). Thus, by the time we are done reading X (when P encounters #), the stack
already contains the implied (reversed) configuration of X, let use denote it by Z R . Now, P just read the
input (Y R ) and matches it to the stack content. It accepts if and only if the configuration X implies the
configuration Y .
Interestingly, the PDA P is deterministic, and as such, we can complement it (this is not true of a general
PDA because of the nondeterminism). Alternatively, just observe that P has a reject state that is arrived
to after the comparison fails. In the complement PDA, we just make this “hell” state into an accept state.
Thus, we have a PDA P that accepts X#Y R # iff the configuration X does not imply the configuration Y .
Now, its easy to modify this PDA so that it accepts the language
n o
L1 = Σ∗ #X#Y R #Σ∗ Configuration X does not imply configuration Y ,
which clearly contains only invalid traces. Similarly, we can build a PDA that accepts the language
n o
L2 = Σ∗ #X R #Y #Σ∗ Configuration X does not imply configuration Y ,
Putting these two PDAs together, yield a PDA that accepts all strings containing two consecutive configura-
tions, such that the first one does not imply the second one.
Now, since we a PDA for this language, we clearly can build a CFG GM that accepts all such strings.
Strings with invalid initial configurations. Consider all traces having invalid initial configurations.
Clearly, they are generated by strings of the form
∗
(Σ \ {#, q0 }) #Σ∗ .
Strings with invalid final configurations. Consider all traces having invalid initial configurations.
Clearly, they are generated by strings of the form
∗
Σ∗ #(Σ \ {#, qacc }) .
Putting things together. Clearly, all invalid (i.e., non-accepting) traces of T on w are generated by the
grammars GI , GM , GF . Thus, consider the context free grammar GM,w formed by the union of GI , GM , GF .
When T does not accept w, there is no accepting trace for T on w, so L(GM,w ) (the strings that are not
accepting traces) is Σ∗ . When T accepts w, there is an accepting trace for T on w, so L(GM,w ) (the strings
that are not accepting traces) is not equal to Σ∗ .
is undecidable.
179
Proof: Let us assume, for the sake of contradiction, that the language ALLCFG is decidable, and let
deciderAllCFG be its decider. We will now reduce ATM to it, by building a decider for it as follows.
decider6 -ATM hM, wi
Check that hM i is syntactically correct TM code
Compute hGM,w i from hM, wi, as described above.
res ← deciderAllCFG hGM,w i .
if res == accept then
reject
else
accept
Clearly, this is a decider, and indeed if T accepts w, then there exists an accepting trace t showing it.
As such, L(GM,w ) = Σ∗ \ {t} = 6 Σ∗ . Thus, deciderAllCFG rejects hGM,w i, and thus decider6 -ATM accepts
hM, wi.
Similarly, if T rejects w then L(GM,w ) = Σ∗ , and as such deciderAllCFG accepts hGM,w i. Implying that
decider6 -ATM rejects hM, wi.
Thus, decider6 -ATM is a decider for ATM , which is impossible. We conclude that our assumption, that
ALLCFG is decidable, is false, implying the claim.
Now, suppose that ALLCFG is decided by R. We construct a decider for ATM as follows:
is undecidable. This proof is almost identical to the reduction of ETM to EQTM that we saw in lecture 21.
Proof: Proof by contradiction. Suppose that EQCFG is decidable and let deciderEqCFG be a TM that
decides it.
Given an alphabet Σ, it is not hard to construct a grammar FΣ that generates all strings in Σ∗ . E.g. if
Σ = {c1 , c2 , . . . ck }, then we could use rules:
S → XS |
X → c1 | c2 | . . . | ck
It is easy to verify that deciderAllCFG is indeed a decider. However, we have already shown that ALLCFG
is undecidable. So this decider deciderAllCFG can not exist. As such, our assumption that EQCFG is
decidable is false. As such, EQCFG is undecidable decidable.
180
29.3 Avoiding PDAs
The proofs we used above are simpler when one uses PDAs. The same argumentation can be done without
PDAs by slightly changing the rules. The basic idea is to interleave two configurations together. This is best
imagined by thinking about each character as being a tile of two characters. Thus, the following
b d x a b q c e b d x
b d x a q0 b d e b d x
describes the two configurations x = bdxabqcebdx and y = bdxaq 0 bdebdx. If we are given a TM T with tape
alphabet Γ and set of states Q, then the alphabet of the tiles is
b= x
Σ x, y ∈ Γ ∪ Q .
y
Note, that in the above example x yields y, which implies that except for a region of three columns the two
strings are identical, see
b d x a b q c e b d x
.
b d x a q0 b d e b d x
Thus, a single step of a TM is no more than a local rewrite of the configuration string.
Given two configurations x, y of T, we will refer to the string resulting from writing them together
b as described above as pairing , denoted by x . Note, that if one of the configurations
interleaved over Σ
y
is shorter than the other, we will pad the other configuration by introducing blanks characters (i.e., ␣) so
that they are of the same length.
x
Lemma 29.3.1 Given a TM T, one can construct an NFA D, such that D accepts a pairing if and only
y
if x and y are two valid configurations of T, and x 7→ y.
x
Proof: First making sure x and y are valid configurations when reading the string s = is easy using
y
a DFA (you verify that x contains only a single state in it, and the rest of the characters of x are from the
tape alphabet of x, one also has to do the same check for y). Let refer to the DFAs verifying the x and y
parts of s as Dx and Dy , respectively. Note that Dx (resp. Dy ) reads the string s but ignores the bottom (resp.
top) part of each character of s.
As such, we just need to verify that x yields y. To this end, observe that x yields y if and only if they
are identical except for three positions where the transitions happens. We build a NFA that verify that the
top and bottom parts are equal, till it guess that it needs to rewrite this 3 tile region. It then guesses what
is the tile that needs to be written (note, that the transition function of T specify all valid such tiles), it
verifies that indeed thats what in the next three characters of the input, and then it compares the rest of
the input. Let this NFA be D= .
Now, we construct a DFA that accepts the language of L(Dx ) ∩ L(Dy ) ∩ L(D= ). Clearly, this DFA accepts
the required language.
xR
Similarly, it is easy to build a DFA that verifies that the pairing is valid and x yields y (according
yR
to T). Now, consider an extended execution trace
We would like to verify that this encodes a valid accepting trace for T on the input string w. This would
require verifying that following conditions are met.
181
(i) The trace has the right format of pairings separated with dollar tiles. Can be easily be done by a DFA.
Let L1 be the language that this DFA accepts.
(ii) Check that C1 = q0 w - can be done with a DFA.
Let L2 be the language that this DFA accepts.
(iii) The last configuration Ck is an accepting configuration. Easily can be done by a DFA.
Let L3 be the language that this DFA accepts.
C2i−1 CR2i
(iv) The pairs and are valid pairing such that C2i 7→ C2i+1 and C2i−1 7→ C2i , for all i
C2i CR2i+1
(again, according to TM. This can be done by a DFA, by Lemma 29.3.1.
Let L4 be the language that this DFA accepts.
(v) Finally, we need to verify that the configurations are copied correctly from the bottom of one tile to
the top of the next tile.
Let L5 be the language of all string that their copying is valid.
Clearly, the set of all valid traces of T on w is the set L = L1 ∩ L2 ∩ L3 ∩ L4 ∩ L5 .
We are interested in building a CFG that recognized the complement language L, which is the language
L = L1 ∪ L2 ∪ L3 ∪ L4 ∪ L5 .
$ xR $ y0
... ...
$ yR $ z
R
where y R 6= y 0 . But if we ignore the rest of the string and top and bottom portions of these two
pairings, this is just recognized the language “not palindrome”, which we know is CFG. Indeed, the grammar
of not-palindrome over an alphabet Γ is
=⇒ S2 → xS2 x ∀x ∈ Γ
S2 → xCy ∀x, y ∈ Γ and x 6= y
C → Cx | xC ∀x ∈ Γ
C → $.
b as follows
We now extend this grammar for the extended alphabet Σ
u x
=⇒ S3 → S ∀u, v, x ∈ ΓT
x 3 v
u y
S2 → C ∀x, y, u, v ∈ ΓT and x 6= y
x v
x x
C → C | C ∀x, y ∈ ΓT
y y
$
C → ,
$
b∗ $ $ b∗
Σ L(S3 ) Σ
$ $
182
is exactly L5 . We conclude that L is a context-free language (being the union of 5 context-free/regular
b ∗ if and only if T rejects w. We conclude the following.
languages). Furthermore, L = Σ
is undecidable.
183
Chapter 30
"Then you must begin a reading program immediately so that you man understand the crises of our age," Ignatius
said solemnly. "Begin with the late Romans, including Boethius, of course. Then you should dip rather extensively
into early Medieval. You may skip the Renaissance and the Enlightenment. That is mostly dangerous propaganda.
Now, that I think about of it, you had better skip the Romantics and the Victorians, too. For the contemporary
period, you should study some selected comic books."
"You’re fantastic."
"I recommend Batman especially, for he tends to transcend the abysmal society in which he’s found himself. His
morality is rather rigid, also. I rather respect Batman."
– A confederacy of Dunces, John Kennedy Toole
30.1 Introduction
The question governing this course, would be the development of efficient algorithms. Hopefully, what is an
algorithm is a well understood concept. But what is an efficient algorithm? A natural answer (but not the
only one!) is an algorithm that runs quickly.
What do we mean by quickly? Well, we would like our algorithm to:
1. Scale with input size. That is, it should be able to handle large and hopefully huge inputs.
2. Low level implementation details should not matter, since they correspond to small improvements in
performance. Since faster CPUs keep appearing it follows that such improvements would (usually) be
taken care of by hardware.
3. What we will really care about are asymptotic running time. Explicitly, polynomial time.
In our discussion, we will consider the input size to be n, and we would like to bound the overall running
time by a function of n which is asymptotically as small as possible. An algorithm with better asymptotic
running time would be considered to be better.
Example 30.1.1 It is illuminating to consider a concrete example. So assume we have an algorithm for
a problem that needs to perform c2n operations to handle an input of size n, where c is a small constant
(say 10). Let assume that we have a CPU that can do 109 operations a second. (A somewhat conservative
assumption, as currently [Jan 2006]1 , the blue-gene supercomputer can do about 3 · 1014 floating-point
operations a second. Since this super computer has about 131, 072 CPUs, it is not something you would
have on your desktop any time soon.) Since 210 ≈ 103 , you have that our (cheap) computer can solve in
(roughly) 10 seconds a problem of size n = 27.
1 But the recently announced Super Computer that would be completed in 2011 in Urbana, is naturally way faster. It
supposedly would do 1015 operations a second (i.e., petaflop). Blue-gene probably can not sustain its theoretical speed stated
above, which is only slightly slower.
184
But what if we increase the problem size to n = 54? This would take our computer about 3 million years
to solve. (In fact, it is better to just wait for faster computers to show up, and then try to solve the problem.
Although there are good reasons to believe that the exponential growth in computer performance we saw in
the last 40 years is about to end. Thus, unless a substantial breakthrough in computing happens, it might
be that solving problems of size, say, n = 100 for this problem would forever be outside our reach.)
The situation dramatically change if we consider an algorithm with running time 10n2 . Then, in one
second our computer can handle input of size n = 104 . Problem of size n = 108 can be solved in 10n2 /109 =
1017−9 = 108 which is about 3 years of computing (but blue-gene might be able to solve it in less than 20
minutes!).
Thus, algorithms that have asymptotically a polynomial running time (i.e., the algorithms running time
is bounded by O(nc ) where c is a constant) are able to solve large instances of the input and can solve the
problem even if the problem size increases dramatically.
Can we solve all problems in polynomial time? The answer to this question is unfortunately no.
There are several synthetic examples of this, but in fact it is believed that a large class of important problems
can not be solved in polynomial time.
Problem: Satisfiability
Instance: A boolean formula F with m variables
Question: Is there an assignment of values to variables, such that F evaluates to true?
The common belief is that SAT can NOT be solved in polynomial time in the size of the formula.
SAT has two interesting properties.
1. Given a supposed positive solution, with a detailed assignment (i.e., proof): x1 ← 0, x2 ← 1, ..., xm ← 1
one can verify in polynomial time if this assignment really satisfies F . This is done by computing F
on the given input.
Intuitively, this is the difference in hardness between coming up with a proof (hard), and checking that
a proof is correct (easy).
2. It is a decision problem. For a specific input an algorithm that solves this problem has to output
either TRUE or FALSE.
A teaser. Can one find a satisfying assignment for the following circuit in polynomial time?
185
Definition 30.2.2 (NP: Nondeterministic Polynomial time) Let NP be the class of all decision prob-
lems that can be verified in polynomial time. Namely, for an input of size n, if the solution to the given
instance is true, one (i.e., an oracle) can provide you with a proof (of polynomial length!) that the answer
is indeed TRUE for this instance. Furthermore, you can verify this proof in polynomial time in the length of
the proof.
Definition 30.2.4 (co-NP) The class co-NP is the opposite of NP – if the answer is FALSE, then there
exists a short proof for this negative answer, and this proof can be verified in polynomial time.
See Figure 30.2 for the currently believed relationship between these classes (of course, as mentioned
above, P ⊆ NP and P ⊆ co-NP is easy to verify). Note, that it is quite possible that P = NP = co-NP,
although this would be extremely surprising.
Definition 30.2.5 A problem Π is NP-Hard, if being able to solve Π in polynomial time implies that
P = NP.
Intuitively, being NP-Hard implies that a problem is ridiculously hard. Conceptually, it would imply
that proving and verifying are equally hard - which nobody that did 473g believes is true.
In particular, a problem which is NP-Hard is at least as hard as ALL the problems in NP, as such it is
safe to assume, based on overwhelming evidence that it can not be solved in polynomial time.
Definition 30.2.8 A problem Π is NP-Complete (NPC in short) if it is both NP-Hard and in NP.
Clearly, Circuit Satisfiability is NP-Complete, since we can verify a positive solution in polynomial time
in the size of the circuit,
By now, thousands of problems have been shown to be NP-Complete. It is extremely unlikely that any
of them can be solved in polynomial time.
186
Input: boolean formula F
⇓ n = size of F
transform F into a boolean circuit C
⇓
Find SAT assign’ for C using CSAT solver
⇓
Return TRUE if C is satisfied, otherwise false.
Figure 30.1: An algorithm for solving SAT using an algorithm that solves the CSAT problem
30.2.1 Reductions
Let A and B be two decision problems.
Given an input I for problem A, a reduction is a transformation of the input I into a new input I 0 , such
that
A(I) is TRUE ⇔ B(I 0 ) is TRUE.
Thus, one can solve A by first transforming and input I into an input I 0 of B, and solving B(I 0 ).
This idea of using reductions is omnipresent, and used almost in any program you write.
Let T : I → I 0 be the input transformation that maps A into B. How fast is T ? Well, for our nafarious
purposes we need polynomial reductions; that is, reductions that take polynomial time.
Problem: Circuit Satisfiability
Instance: A circuit C with m inputs
Question: Is there an input for C such that C returns true for it.
For example, given an instance of SAT, we would like to generate an equivalent circuit C. We will
explicitly write down what the circuit computes in a formula form. To see how to do this, consider the
following example.
The resulting reduction is depicted in Figure 30.1.
Namely, given a solver for CSAT that runs in TCSAT (n), we can solve the SAT problem in time
where n is the size of the boolean formula. Namely, if we have polynomial time algorithm that solves CSAT
then we can solve SAT in polynomial time.
Another way of looking at it, is that we believe that solving SAT requires exponential time; namely,
TSAT (n) ≥ 2n . Which implies by the above reduction that
187
Namely, TCSAT (n) ≥ 2n/c − O(n), where c is some positive constant. Namely, if we believe that we need
exponential time to solve CSAT then we need exponential time to solve SAT.
This implies that if CSAT ∈ P then SAT ∈ P.
We just proved that CSAT is as hard as SAT. Clearly, CSAT ∈ NP which implies the following theorem.
188
Chapter 31
This lecture covers Post’s Correspondence Problem (section 5.2 in Sipser). Undecidability of this problem
implies the undecidability of CFG ambiguity. We will also see how to simulate a TM with 2D tiling patterns
and, as a consequence, show how undecidability implies the existence of aperiodic tilings.
A match for S is an ordered list of, one or more, dominos from S, such that if you read the symbols on
the tops, this makes the same string as reading the symbols on the bottom. You can use the same domino
more than once in a match, and you do not have to use all the elements of S. For example, here is a match
for our example set
a b ca a abc
.
ab ca a ab c
The tops and bottoms of the dominos both form the string abcaaabc.
Not all sets have a match. For example, T does not have a match because the tops are all longer than
the bottoms.
abc ba bb
T = , , .
c a b
The set R does not have a match because there are no d’s or f’s in the top strings.
ab cb aa
R= , , .
df bd fa
It seems like it should be fairly easy to figure out whether a set of dominos has a match, but this problem
is actually undecidable.
189
Post’s Correspondence Problem (PCP) is the problem of deciding whether a set of dominos has a match
or not.
The modified Post’s Correspondence Problem (MPCP) is just like PCP except that we specify both
the set of tiles and also a special tile. Matches for MPCP have to start with the special tile.
We will show that PCP undecidable in two steps. First, we will reduce ATM to MPCP. Then we will
reduce MPCP to PCP.
#
,
#q0 w#
where q0 w is the initial configuration of M when executed on w. At this point, the bottom string is way long
than the shorter string. As such, to get a match, the tiling would have to copy the content of the bottom row
to the top row. We will set up the tiles so that the copying would result in copying also the configuration
C0 to the bottom row, while performing the computation of M on C0 . As such, in the end of this process,
the tiling would look like
#C0 #
.
#C0 #C1 #
The trick is that again the bottom part is longer, and again to get a match, the only possibility is to copy
C1 to the top part, and in the process writing out C2 on the bottom part, resulting in
#C0 #C1 #
.
#C0 #C1 #C2 #
Now, how are we going to do this copying/computation? The idea is to introduce for every character
x ∈ ΣM ∪ {#}, a copying tile
x
.
x
Here ΣM denotes the alphabet set used by M . Similarly, δM denotes the transition function of M .
Next, assume we have the transition δM (q, a) = (q 0 , b, R), then we will introduce the computation tile
qa
.
bq0
yq2 c
∀y ∈ ΣM ..
b
qyd
Here is what is going on in the ith stage: The bottom row as an additional configuration Ci written in
it. To get a match, the tiling has to copy the configuration to the top row. But the copying tiles, only copy
190
regular characters, and it can not copy states. Thus, when the copying process reaches the state character in
Ci , it must use the right computation tile to copy this substring to the top row. Then it continues copying
the rest of the configuration to the top. Naturally, as the copying goes on from the bottom row to the top
row, new characters are added to the bottom row. The critical observation is that the computation tiles
guarantee, that the added string to the bottom row is exactly Ci+1 , since we copied the characters verbatim
in the areas of Ci unrelated to the computation of M on Ci , and the computation tile copied exactly the
right string for the small region where the computation changes.
Thus, if the starting configuration before the ith stage was
#C0 #C1 # . . . Ci−1 #
#C0 #C1 # . . . #Ci #
then after the ith stage, the string generated by the tiling (which is trying so hard to match the bottom and
top rows) is
#C0 #C1 # . . . Ci−1 # Ci #
.
#C0 #C1 # . . . #Ci # Ci+1 #
Let this process continue till we reach the accepting configuration Cn = αqacc xβ, where α, β are some
string and x is some character in ΣM . Here, the tiling we have so far looks like
#C0 #C1 # . . . Cn−1 #
.
#C0 #C1 #C2 # . . . #Cn #
The question is how do we make this into a match? The idea is that now, since Cn is an accepting config-
uration, we should now treat the rest of the tiling as a cleanup stage, and slowly reduct Cn to the empty
string, as we copy it up and down. How do we do that? Well, let us introduce a delete tile
qacc x
qacc
into our set of tiles T (also known as a pacman tile). Clearly, if we use the copying tiles to copy Cn = αqacc xβ
to the top row, and erase in the process the character x, using the above “delete x” tile. We get
#C0 #C1 # . . . Cn−1 #αqacc xβ#
.
#C0 #C1 #C2 # . . . #Cn−1 #αqacc xβ#αqacc β#
We can now repeat this process, by introducing such delete tiles for every character of Σ, and also introducing
backward delete tiles like
xqacc
.
qacc
Thus, by using these delete tiles, we will get the tiling
#C0 #C1 # . . . Cn−1 #αqacc xβ#αqacc β# . . . #qacc y#
.
#C0 #C1 #C2 # . . . #Cn−1 #αqacc xβ#αqacc β# . . . #qacc y#qacc #
To finish of the tiling, we introduce a stopper tile
qacc ###
.
##
Adding it to the tiling results in the required match. This is the tiling
the accepting trace winding down the match
z }| { z }| {
#C0 #C1 # . . . #Cn−1 #αqacc xβ# αqacc β# . . . #qacc y#qacc ###
.
#C0 #C1 # . . . #Cn−1 #αqacc xβ# αqacc β# . . . #qacc y#qacc ###
| {z } | {z }
the accepting trace winding down the match
Its now easy to argue that with this set of tiles, if there is a match it must have the above structure
which encodes an accepting trace.
191
Computing the tiles
Let us recap the above description. We are given a string hM, wi, and we are generating a set of tiles T , as
follows. The set containing the initial tile is
#
T1 = . /* initial tile */
#q 0 w#
Let T = T1 ∪T2 ∪T3 ∪T4 ∪T5 ∪T6 . Clearly, given hM, wi, we can easily compute the set T . Let AlgT M 2M P CP
denote the algorithm performing this conversion.
We summarize our work so far.
Lemma 31.1.1 Given a string hM, wi, the algorithm AlgT M 2M P CP computes a set of tiles T that is an
instance of MPCP. Furthermore, T contains a match if and only if M accepts w.
Proof: The reduction is from ATM . Indeed, assume for the sake of contradiction that the MPCP problem is
decidable, and we are given a decider decider_MPCP for it. Next, we use it to build the following decider
for ATM .
192
decider7 -ATM hM, wi
T ← AlgT M 2M P CP (hM,
wi)
res ← decider_MPCP T .
return res
Clearly, this is a decider, and it accepts if and only if M halts on w. But this is a contradiction, since
ATM is undecidable.
Thus, our assumption (that MPCP is decidable) is false, implying the claim.
Note, that in this new set of tiles, the only tile that can serve as the first tile in the match is
?t1
,
?b1 ?
since its the only tile that has in the bottom string a ? as the first character. Now, to take care of the balance
of stars in the end of the string, we also add the tile
??
Y =X∪
?
Its now easy to verify that if the original instance of T of MPCP had a match, then the set Y also has a
match.
The important thing about Y is that it does not need to specify what is the special initial tile (this is a
minor difference to T , but a difference nevertheless). As such, U is an instance of PCP. We conclude:
Lemma 31.1.3 Given a string hM, wi, the algorithm AlgT M 2P CP computes a set of tiles T that is an
instance of PCP. Furthermore, T contains a match if and only if M accepts w.
As before, this implies the described result.
193
31.2 Reduction of PCP to AMBIGCFG
We can use the PCP result to prove a useful fact about context-free grammars. Let us define
n o
AMBIGCFG = hGi G is a CFG and G is ambiguous .
We remind the reader that a grammar G is ambiguous if there are two different ways for G to generate some
word w.
We will show this problem is undecidable by a reduction from PCP. That is, given a PCP problem S, we
will construct a context-free grammar which is ambiguous exactly when S has a match. This means that
any decider for AM BIGCFG could be used to solve PCP, so such a decider can not exist.
Specifically, suppose that S looks like
t1 tk
S= ,... .
b1 bk
D→T|B
T → t1 T | t2 T | . . . tk T | t1 | . . . | tk
B → b1 B | b2 B | . . . bk B | b1 | . . . | bk ,
with D as the initial symbol. This grammar is ambiguous if the tops of a sequence of tiles form the same
string as the bottoms of a sequence of tiles. However, there is nothing forcing the two sequences to use the
same tiles in the same order.
So, we will add some labels to our rules which name the set of tiles we have used. Let us suppose the
tiles are named d1 through dk . Then we will make our grammar generate strings like ti tj . . . tm dm . . . dj di
where the second part of the string contains the labels of the tiles used to build the first part of the string
(in reverse order).
So our final grammar H looks like
D→T |B
Here V D is the initial symbols. Clearly, there is an ambiguous word winL(H) if and only if the given instance
of PCP has a match. Namely, deciding if a grammar is ambiguous is equivalent to deciding an instance of
PCP. But since PCP is undecidable, we get that deciding if a CFG grammar is ambiguous is undecidable.
31.3 2D tilings
Show some of the pretty tiling pictures linked on the 273 lectures web page, walking through the following
basic ideas.
A tiling of the plane is periodic if it is generated by repeating the contents of some rectangular patch of
the plane. Otherwise the tiling is aperiodic.
A set of tiles is aperiodic if these tiles can tile the whole plane but all tilings generated by these set are
aperiodic.
Wang’s conjectured, in 1961, that if a set of tiles can cover the plane at all, it can cover the plane
periodically.
This is a tempting conjecture, but it is wrong.
194
In 1966, Robert Berger found a set of 20426 Wang tiles that is aperiodic. A Wang tile is a square tile
with colors on its edges. The colors need to match when you put the tiles together.
green
blue black
red
Various people found smaller and smaller aperiodic sets of Wang tiles. The minimum so far is due to
Karel Culik II, and it is made out of 13 tiles.
Other researchers have built aperiodic sets of tiles with pretty shapes, e.g. the Penrose tiles. (Show
pretty pictures.)
However, this procedure will loop forever if the set T has only aperiodic tilings.
In general, we have the following correspondence between 2D tilings and Turing machine behaviors:
tile set Turing machine
can not cover the plane halts
has a periodic tiling loops repeating configurations
has an aperiodic tilings runs forever without repeating configurations
195
B q0 B B
The tile set is engineered so that the rest of this row can only be filled by repeating the left and right
tiles from this set.
Copy tiles: For every character c in Mw ’s alphabet, we have a tile that just copies c from one row to
the next. We also have an entirely blank tile which must (given the design of this tile set) cover the lower
half-plane.
Action tiles: A transition of Mw δ(q, c) = (r, d, R) is implemented using a “split tile” (left below) and
a set of “merge tiles” (right below) for every character t in the tape alphabet.
d rt
r r
qc t
d a d ra b
... r r ...
d a qc a b
So, this reduction shows that the tiling completion problem is undecidable.
References
Branko Grünbaum and G. C. Shephard (1986) Tilings and Patterns, W H Freeman.
196
Raphael M. Robinson (1971) Undecidability and nonperiodicity for tilings of the plane Inventiones Mathe-
maticae 12/3, pp. 177-209.
197
Chapter 32
32.1 Introduction
The theory of computation is perhaps the fundamental theory of computer science. It sets out to define,
mathematically, what exactly computation is, what is feasible to solve using a computer, and also what is
not possible to solve using a computer.
The main objective is to define a computer mathematically, without the reliance on real-world computers,
hardware or software, or the plethora of programming languages we have in use today. The notion of a Turing
machine serves this purpose and defines what we believe is the crux of all computable functions.
The course is also about weaker forms of computation, concentrating on two classes, regular languages
and context-free languages. These two models help understand what we can do with restricted means of
computation, and offer a rich theory using which you can hone your mathematical skills in reasoning with
simple machines and the languages they define. However, they are not simply there as a weak form of
computation— the most attractive aspect of them is that problems formulated on them are tractable, i.e.
we can build efficient algorithms to reason with objects such as finite automata, context-free grammars and
pushdown automata. For example, we can model a piece of hardware (a circuit) as a finite-state system and
solve whether the circuit satisfies a property (like whether it performs addition of 16-bit registers correctly).
We can model the syntax of a programming language using a grammar, and build algorithms that check if
a string parses according to this grammar.
On the other hand, most problems that ask properties about Turing machines are undecidable. Undecid-
ability is an important topic in this course. You have seen and proved yourself that several tasks involving
Turing machines are unsolvable— i.e. no computer, no software, can solve it. For example, you know now
that there is no software that can check whether a C-program will halt on a particular input. This is quite
amazing, if you think about it. To prove something is possible is, of course, challenging, and you will learn
in other courses several ways of showing how something is possible. But to show something is impossible
is rare in computer science, and you will probably see no other instance of it in any other undergraduate
course. To show something is impossible requires an argument quite unlike any other, and you have seen the
method of diagonalization to prove impossibilities and reduction that help you prove infer one impossibility
from another. Impossibility results for regular languages and context-free languages are shown using the
pumping lemma.
In conclusion, you have formally learnt how to define a computer, and analyze the properties of com-
putable functions, which surely is the theoretical foundation of computer science.
198
The main players in our drama have been the four classes of languages: regular language (REG), context-
free languages (CFL), Turing-decidable languages (TM-DEC) and Turing-recognizable languages (TM-DEC).
Regular languages are the languages accepted by deterministic finite automata (DFAs) and context-free
languages are those languages generated by context-free grammars (CFGs). Turing-decidable languages are
those languages L for which there are Turing machines that always halt on every input, and decide whether
a word is in L or not.
Turing-recognizable languages are more subtle. A language L is Turing-recognizable if there is a TM M
which (a) when run on a word in L, halts eventually and accepts, and (b) when run on a word not in L,
M either halts and rejects, or does not halt. In other words, a TM recognizing L has to halt and accept all
words in L, and for words not in L, can reject or go off into a loop.
The main things to remember are:
• Each of the above inclusions is strict: i.e. there is a language that is context-free but not regular, there
is a language that is TM-DEC but not context-free, etc.
Regular languages are trivially contained within context-free languages (as DFAs can be easily converted
to PDAs). However, it is not easy to see that a CFG/PDA for L can be converted to a TM deciding L. However,
this is possible (see Theorem 4.9). TM-DEC languages are clearly TM-RECOG as well, by definition.
For example, if Σ = {a, b}, then
• {ai bj | i, j ∈ N} is regular (and hence also a CFL, and TM-DEC and TM-RECOG),
The notion of what we call an “algorithm” in computer science accords with Turing-decidability. In other
words, when we build an algorithm for a decision problem in computer science, we want it to always halt
and say ’yes’ or ’no’. Hence the notion of a computable function is that it be TM-decidable.
| a | ∅ | R1 ∪ R2 | R1 · R2 | R1∗ .
199
• Non-deterministic finite automata can be converted to equivalent DFAs. This construction is the “subset
construction” and is important. See Theorem 1.39 in Sipser. Intuitively, for any NFA, we can build a
DFA that tracks the set of all states the NFA can be in. Handling -transitions is a bit complex, and
you should know this construction. Hence NFAs are equivalent to DFAs.
• Regular languages are closed under union, intersection, complement, concatenation and Kleene-* (The-
orems 1.45, 1.47 and 1.49 in Sipser).
• Regular expressions define exactly the class of regular languages (Theorem 1.54). In other words, any
language generated by a regular expression is accepted by some DFA (Lemma 1.55) and any language
accepted by a DFA/NFA can be generated by a regular expression (Lemma 1.60).
• So the trinity: DFA ≡ NFA ≡ Regular Expression holds.
• The pumping lemma says that, if L is regular, then there is a p ∈ N such that for every s ∈ L with
|s| > p, there are words x, y, z with s = xyz such that (a) |y| > 0 (b) |xy| ≤ p and (c) for every i,
xy i z ∈ L. In mathematical language,
s = xyz
L s∈L |y| > 0
is ⇒∃p ∈ N. ∀s ∈ L
∧ ⇒ ∗
∃x, y, z ∈ Σ :
|xy| ≤ p
regular |s| > p
∀i xy i z ∈ L
• The contrapositive to the pumping lemma says that, if for every p ∈ N, there is an s ∈ L with |s| > p,
such that for every x, y, z with s = xyz, |y| > 0, and |xy| ≤ p, there is some i such that xy i z 6∈ L, then
L is not regular.
In mathematical language,
|s|> p and
∀p ∈ N L
s = xyz ⇒ is not
∃s ∈ L ∀x, y, z ∈ Σ∗ : |y| > 0 → (∃i. xy i z 6∈ L)
regular.
|xy| ≤ p
• The contrapositive to the pumping lemma gives us a way to prove a language is not regular. We take
an arbitrary p, and construct a particular word wp , which depends on p, such that wp ∈ L and |wp | > p.
Then we show that no matter which x, y, z is chosen such that s = xyz, |xy| ≤ p and |y| > 1, there is
an i such that xy i z 6∈ L.
Knowing how to prove a language non-regular using the pumping lemma is important.
• We can using the above technique, show several languages to be non-regular, for example (see Eg.1.73,
1.74, 1.75, 1.76, 1.77):
– {0n 1n |n ≥ 0} is not regular.
– {w|w has an equal number of 0s and 1s } is not regular.
– {ww|w ∈ Σ∗ } is not regular.
2
– {1n |n ≥ 0} is not regular.
Choosing wp ∈ L should be done carefully and cleverly. However, the choice of i being 0 or 2 usually
works for most example.
Note that you are allowed to pick wp (but not p), and allowed to pick i (not x,y or z).
• Deterministic finite automata can be uniquely minimized. In other words, for any regular language,
there is a unique minimal automaton accepting it (here, by minimal, we mean an automaton with the
least number of states). Moreover, given a DFA A, we can build an efficient algorithm to build the
minimal DFA for the language L(A). This is not covered in Sipser; see the handout on suffix languages
and minimization:
200
http://uiuc.edu/class/fa07/cs273/Handouts/minimization/suffix.pdf
and the minimization algorithm:
http://uiuc.edu/class/fa07/cs273/Handouts/minimization/minimization.pdf.
For the final exam, you are not required to know this algorithm, but just know that regular languages
have a unique minimal DFA.
Turning to algorithms for manipulating automata, here are some things worth knowing (read Sipser
Section 4.1):
• We can build an algorithm that checks, given a DFA/NFA A, whether L(A) 6= ∅. In other words, the
problem of checking emptiness of an automaton is decidable. (see Sipser Theorems 4.1 and 4.2). In
fact, this algorithm runs in linear (i.e. O(n)) time.
• Automata are closed under operations union, intersection, complement, concatenation, Kleene-*, etc.
Moreover, we can build algorithms to do all these closures. That is, we can build algorithms that will
take two automata and compute an automaton accepting the union of the languages accepted by the
two automata, etc.
All constructions we did on automata are actually computable algorithmically. For example, we can
build algorithms to convert regular expressions to automata, automata to regular expressions, etc.
Several other questions regarding automata are also decidable: For example:
201
• Pushdown automata define exactly the class of context-free languages. I.e. PDA ≡ CFG. (Sipser
Theorems 2.20).
• Context-free languages are closed under union, concatenation and Kleene-*, but not under intersection
or complement. (See the last section in this article for more details.)
• Deterministic pushdown automata are strictly weaker than pushdown automata (since they can be
complemented by toggling the final states).
• The membership problem for CFGs and PDAs is decidable. In particular, the CYK algorithm uses
dynamic programming to solve the membership problem for CFGs, and in fact produces a parse tree
as well, in O(n3 ) time.
• The problem of checking if a context-free language generates all words (i.e. if L(G) = Σ∗ ) is undecid-
able. This is proved in Theorem 5.13, using context-free grammars that check computation histories
of Turing machines.
• The problem of checking if a context-free language is ambiguous is undecidable (Exercise 5.21 in Sipser),
and is proved by a reduction from the Post’s correspondence problem. You need to know this fact, not
the proof.
• The language {an bn cn | n ∈ N} is not a context-free language. There is a pumping lemma for context-
free languages, and we can use it to show that this language is not context-free. You are not required
to know this proof.
202
• AT M is not Turing-decidable (Theorem 4.11). This is the fundamental undecidable problem and is
shown undecidable using a diagonalization method, which takes the code for a purported TM deciding
AT M and pits it against itself to lead to a contradiction. Diagonalization is an important technique to
prove impossibility results (almost the only technique we know!).
• AT M is Turing-recognizable. This is easy to show: we can build a Turing machine that on input
hM, wi, simulates M on w and accepts if M accepts w. Hence the class TM-DEC is a strict subclass of
TM-RECOG.
• A language L is Turing-decidable iff L and L are Turing-recognizable (Theorem 4.22).
If L is decidable, then L is decidable as well, and so L and L are Turing-recognizable. If L and L
are both Turing-recognizable, we can build a decider for L by simulating the machines for L and L in
“parallel”, and accept or reject depending on which of them accepts.
• A corollary to the above theorem is that if L is Turing-recognizable and not Turing-decidable, then L
is not TM recognizable. Hence, AT M is not even TM-recognizable (Corollary 4.23).
32.5.2 Reductions
Reductions are a technique to deduce undecidability of problems using another problem that is known to be
undecidable.
A language S reduces to a language T if, given a TM deciding T , we can build a TM that decides S.
In other words, if T is decidable, then S is decidable. Which, paraphrased, says that if S is undecidable
then T is undecidable.
Hence, to show T is undecidable, we choose a language S that we know is undecidable, and reduce S to
T.
Many reductions are from AT M ; to show L is undecidable, we try to reduce AT M to L, i.e. assuming we
have a decider for L, we show that we can build a decider for AT M . Since AT M has no decider, it follows
that L has no decider.
Reduction proofs are important to understand and learn. Reductions from AT M to languages that accept
Turing machine descriptions often go roughly like this:
• Assume L has a decider R; we build a decider D for AT M .
• D takes as input hM, wi.
D then modifies M to construct a TM NM,w .
D then feeds this machine NM,w to R.
Depending on whether R accepts or rejects, D accepts or rejects (sometimes switching the answer).
Using reductions we can prove several languages undecidable. For example, (see Theorems 5.1, 5.2, 5.3,
5.4)
• HALTT M = {hM, wi | M is a TM and M halts on input w} is undecidable.
It is Turing-recognizable, though.
• ET M = {hM, wi | M is a TM and L(M ) = ∅} is undecidable.
It is not even Turing-recognizable since its complement is Turing-recognizable.
• REGU LART M = {hM i | L(M ) is regular } is undecidable.
• EQT M = {hM1 , M2 i | L(M1 ) = L(M2 )} is undecidable.
Rice’s theorem generalizes many undecidability results. Consider a class P of Turing machine descriptions.
Assume that P is a property of Turing machines that depends only on the language of the Turing machines
(i.e. if M and N are Turing machines accepting the same language, then either both are in P or both are not
in P ). Also assume that P is not the empty set nor the set of all Turing machines. Then P is undecidable.
Note that if P was the empty set or the set of all Turing machine descriptions, then clearly it is decidable.
Note: There will be a question on reductions in the exam. The reduction will be one which is a direct
corollary of Rice’s theorem, but you will be asked to give a proof without using Rice’s theorem.
203
32.5.3 Other undecidability problems
There were several other problems that were shown to be undecidable. Knowing these are undecidable is
important; you will not be asked for proofs of any of these, however:
• A linear bounded automaton (LBA) is a Turing machine that uses only the space occupied by the input,
and does not use any extra cells. The emptiness problem for LBAs in undecidable (Theorem 5.10): i.e.
ELBA = {hM iM is an LBA and L(M ) 6= ∅} is undecidable.
However, the membership problem for LBAs is decidable (Theorem 5.9):
i.e. ALBA = {hM, wi | M is an LBA accepting w} is decidable.
The results for regular languages are in Sipser and class notes.
See also
http://uiuc.edu/class/fa07/cs273/Handouts/closure/regular-closure.html.
Sipser doesn’t cover closure properties of context-free languages very clearly. However, note that closure
under union is easy as it is simple to combine two grammars to realize their union. Non-closure under
intersection follows from the fact that L1 = {ai bj ck | i = j} and L2 = {ai bj ck | j = k} are both context-free,
but their intersection L1 ∩ L2 = {ai bj ck | i = j = k} is not. Non-closure under complement is easy to see
as L = {an bn cn | n ∈ N} is not context-free but its complement is context-free. Closure under Kleene-* and
homomorphisms are easy as one can easily transform a grammar to do these operations.
See
http://uiuc.edu/class/fa07/cs273/Handouts/closure/cfl-closure.html
for more detailed proofs.
Turning to TM-DEC, they are closed under union as you can run TM M1 followed by M2 , and accept if one
of them accept. For intersection, you can run them one after the other, and accept if both accept. Closure
under complement is easy as we can swap the accept and reject states of a TM. Kleene-* and homomorphisms
were not covered, but it is easy to see that TM-DEC languages are closed under these operations (try them
as an exercise!).
Finally, TM-RECOG is closed under union as you can run two Turing machines in “parallel”, and accept if
one of them accepts. The class is closed under intersection, as we can run them one after another, and accept
if both accept. (Note the subtleties of the construction here; simulating a TM that recognizes a language has
to be done carefully as it may not halt). The class of TM-RECOG languages is not closed under complement
(for example, AT M is TM-RECOG but its complement is not). In fact, if L is TM-RECOG and its complement
is also TM-RECOG, then L is TM-DEC. Since we know AT M is not TM-DEC, it follows that its complement
is not TM-RECOG. TM-RECOG languages are closed under Kleene-* and homomorphisms— we leave these
as exercises.
204
32.6.2 Decision problems
For each class of languages, let’s consider four problems—
Notice that regular languages are the most tractable class, and context-free languages have the important
property that membership (Theorem 4.7) and emptiness (Theorem 4.8) are decidable. In particular, mem-
bership of context-free languages is close to the problem of parsing, and hence is an important algorithmic
problem. Context-free languages do not admit a decidable inclusion or equivalence problem (Theorem 5.13
shows that checking if a CFG generates all words is undecidable; we can reduce this to both the problem of
inclusion– L(A) ⊆ L(B)– and equivalence– L(A) = L(B)– by setting A to be a CFG generating all words).
For Turing machines, almost nothing interesting is decidable. However, note that the membership
problem for Turing machines (AT M ) (but not emptiness problem for Turing machines (ET M )) is Turing-
recognizable.
205
Part II
Discussions
206
Chapter 33
Discussion 1: Review
20 January 2009
Purpose: Most of this material is review from CS 173, though students may have forgotten
some of it, especially details of notation. The exceptions are the sections on strings and graph
induction.
Before you start, introduce yourself. Promise office hours will be posted very soon and encourage them
to come.
Do not forget to register with the news server and start reading the newsgroups!
33.2 Numbers
What are Z, N (no zero), N0 , (mention quickly Q and IR).
33.3 Divisibility
What does it mean for x to divide q? Namely, there exists integer n such that q = xn. As such, every
number n is divisible by 1 and n (i.e., itself).
An integer number is even if it can be divided by 2. A number which is not even, is odd.
208
√
33.4 2 is not rational
A number x is rational if it can be written as the ratio of two integers x = α/β. The fraction α/β is
irreducible if α and β do not have any common divisors (except 1, of course). Note that a rational number
has a unique way to be written in irreducible.
Proof: If k is even, then it can be written as k = 2u, where u is an integer. As such, k 2 = (2u)2 = 4u2 is
even, since it can be divided by 2. As for the other possibility, if k = 2u + 1, then
is odd (since it is the sum of an even number and an odd number), implying the claim.
√ √
Theorem 33.4.2 The number 2 is not rational (i.e., 2 is an irrational number).
√
Proof:
√ Assume for the sake of contradiction that 2 is rational, and it can written as the irreducible ratio
2 = α/β.
Let us square both size of this equation. We get that
α2
2= .
β2
That is the number 2 is a divisor of α2 . Namely, α2 is an even number. But then, α must be an even number.
So, let α = 2a. We have that
(2a)2
2= . ⇒ 2β 2 = (2a)2 = 4a2 .
β2
As such,
β 2 = 2a2 .
Namely, β 2 is even, which implies, again that β is even. As such, let us write β = 2b, where b is an integer.
We thus have that
√ α 2a a
2= = = .
β 2b b
Namely, we started from a rational number in irreducible form√(i.e., α/β) and we reduced it further to a/b.
√ is impossible. A contradiction. Our assumption that 2 is a rational number is false. We conclude
But this
that 2 is irrational.
33.6 Strings
strings, empty string, sets of strings
substring, prefix, suffix, string concatenation
What happens when you concatenate the empty string with another string?
Suppose w is a string. Is the empty string a substring of w? (Yes!)
209
33.7 Recursive definition
Recursive definition (concrete example).
Let U be the set that contains the point (0, 0). Also:
• if (x, y) ∈ U , then (x + 1, y) ∈ U .
• if (x, y) ∈ U , then (x, y + 1) ∈ U
Q: What is U ?
A: Clearly, (0, 1) ∈ U , (0, 2) ∈ U ,.... (0, k) ∈ U (for any k).
As such, (1, k) ∈ U (for any k)
As such, (2, k) ∈ U (for any k)
As such, (j, k) ∈ U (for any j > 0, k > 0).
Note that U = N0 × N0 .
(A more complicated example of this is in the homework.)
Lemma 33.8.1 A connected acyclic graph G over n > 1 vertices, must contain a leaf.
Proof: Indeed, start from a vertex v ∈ V (G), if it is a leaf we are done, otherwise, there must be an edge
e = vu of G adjacent to v (since G is connected), and travel on this edge to u. Mark the edge e as used.
Repeat this process of “walking” in the graph till you reach a leaf (and then you stop), where the walk can
not use an edge that was used before.
If this walk process reached a leaf then we are done. The other possibility is that the walk never ends.
But then, we must visit some vertex we already visited a second time (since the graph is finite). But that
would imply that G contains a cycle. But that is impossible, since G is acyclic (i.e., it does not contain
cycles).
Claim 33.8.2 A connected acyclic graph G over n vertices has exactly n − 1 edges.
Proof: The proof is by induction on n.
The base of the induction is n = 2. Here we have two vertices, and since its connected, this graph must
have the edge connecting the two vertices, implying the claim.
As for n > 2, we know by the above lemma that G contains a leaf, and let w denote this leaf.
Consider the graph H formed by G by removing from it the vertex w and the single edge e0 attached to
it. Clearly the graph H is connected and acyclic, and it has n − 1 vertices. By the induction hypothesis, the
graph H has n − 2 edges. But then, the graph G has all the edges of H plus one (i.e., e0 ). Namely, G has
(n − 2) + 1 = n − 1 edges, as claimed.
210
Chapter 34
Purpose: This discussion demonstrates a few constructions of DFAs. However, its main
purpose is to show how to move from a diagram describing a DFA into a formal description,
in particular of the transition function.
This material here (probably) can not be covered in one discussion section.
Questions on homework 1?
Any questions? Complaints, etc?
H a
a,b
Advice To TA:: Do not erase this diagram from the board, you would need to modify it shortly,
for the next example. end
This automata formally is the tuple (Q, Σ, δ, S, F ).
1. Q = {S, T, H, q0 , q1 } - states.
2. Σ = {a, b} - alphabet.
211
a b
S T H
T q0 H
H H H
q0 H q1
q1 H q0
34.1.2 aab5i
Consider the following language:
n o
L5 = aabn n is a multiple of 5 .
a a b b b b
S T q0 q1 q2 q3 q4
b b
a
H
a
a
a,b a
a
Advice To TA:: Do not erase this diagram from the board, you would need to modify it shortly,
for the next example. end
This automata formally is the tuple (Q, Σ, δ, S, F ).
δ a b
1. Q = {S, T, H, q0 , q1 , q2 , q3 , q4 } - states. S T H
T q0 H
2. Σ = {a, b} - alphabet.
H H H
3. δ : Q × Σ → Q - see table. q0 H q1
q1 H q2
4. S is the start state. q2 H q3
q3 H q4
5. F = {q0 } is the set of accepting states.
q4 H q0
34.1.3 aabki
Let k be a fixed constant, and consider the following language:
n o
Lk = aabn n is a multiple of k .
212
b
Advice To TA:: Skip the first two forms in the discussion. Show only the one in Eq. (34.1). end
Another way of writing the transition function δk , for the above example, is the following:
δk (S, a) = T,
δk (S, b) = H,
δk (T, a) = q0 ,
δk (T, b) = H,
δk (H, a) = H,
δk (H, b) = H,
δk (qi , a) = H, ∀i
δk (qi , b) = qi+1 for i < k − 1,
δk (qk−1 , b) = q0 .
213
This can be made slightly more compact using the mod notation:
T s = S, x = a
H s = S or T, x = b
q0 s = T, x = a
δk (s, x) = (34.1)
H s = H, x = a or b
H s = qi , x = a, ∀i
q(i+1) mod k s = qi , x = b, ∀i.
Note, that using good state names would help you to describe the automata compactly (thus q0 here is
not that the starting state). Generally speaking, the shorter your description is, the least work needed to be
done, and the chance you make a silly mistake is lower.
214
0 1
e q0 q1
1. States: q0 q00 q01
Q = {e, q0 , q1 , q00 , q01 , q10 , q11 , q000 , q001 , q010 , q011 , q1 q10 q11
q00 q000 q001
q100 , q101 , q110 , q111 }
q01 q010 q011
2. Σ = {0, 1} - alphabet. q000 q000 q001
q001 q010 q011
3. δ : Q × Σ → Q - see table. q010 q100 q101
q011 q110 q111
4. e is the start state.
q100 q000 q001
5. F = {q000 , q001 , q010 , q011 } is the set of accepting states. q101 q010 q011
q110 q100 q101
q111 q110 q111
Advice To TA:: Please show the explicit long way of writing the transition table (shown above), and
only then show the following more compact way. Its beneficial to see how using more formal representation,
can save you a lot of time and space. Say it explicitly in the discussion. end
0 1
e q0 q1
qx qx0 qx1
qxy qxy0 qxy1
qxyz qyz0 qyz1
Which is clearly the most compact way to describe this transition function. And here is a drawing of
this automata:
q0 e 1 q1
0
1 0
0 1
q00 q01 0 q11 1 q100
1
0
0 1 1
q000 1 q001 1 q011 1 q111 q110 1 q101 0 q010 0 q100
0 1
0 1
0
1 0
0
0 1
Advice To TA:: Trust me. You do not want to erase this diagram before doing the next example..
end
215
34.3.1 Being smarter
Since we care only if the third character from the end is zero, we could pretend that when the input starts,
the automata already saw, three ones on the input. Thus, setting the initial state to q111 . Now, we can get
rid of the special states we had before.
1. States:
Q = {q000 , q001 , q010 , q011 ,
q100 , q101 , q110 , q111 }
2. Σ = {0, 1} - alphabet. 0 1
qxyz qyz0 qyz1
3. δ : Q × Σ → Q - see table.
4. q111 is the start state.
0 1
This brings to the forefront several issues: (i) the most natural way to design an automata does not
necessarily leads to the simplest automata, (ii) a bit of thinking ahead of time will save you much pain, and
(iii) how do you know that what you came up with is the simplest (i.e., fewest number of states) automata
accepting a language?
The third question is interesting, and we will come back to it later in the course.
Let L0k be the language of all binary strings such that none of the least k characters is 0. We can of course
adapt the previous automata to this language, by changing the accepting states and the start state. However,
we can solve it more efficiently, by remembering what is the maximum length of the suffix of the input seen
so far, such that its all one.
In particular, let qi be the state such that the suffix of the input is a zero followed by i ones. Clearly, qk
is the accept state, and q0 is the starting state. The transition function is also simple. If the automata sees
a 0, it goes back to q0 . If it at qi and it accepts a 1 it goes to qi+1 , if i < k. . We get the following automata.
216
n o
1. States: Q = qi i = 0, . . . , k .
2. Σ = {0, 1} - alphabet.
3. δk : Q × Σ → Q, where
q0 x=0
δk (qi , x) =
qmin(i+1,k) x = 1.
So, a little change in the definition of the language can make a dramatic difference in the number states
needed.
217
Chapter 35
Purpose: This discussion demonstrates a few simple NFAs, and how to formally define a
NFA. We also demonstrate that complementing a NFA is a tricky business.
Questions on homework 2?
Any questions? Complaints, etc?
In the above NFA, we have δ(A, 0) = {C}. Despite the -transition from C to D. As such, δ(A, 0) 6=
{C, D}. If δ(A, 0) = {C, D} then the NFA is a different NFA:
ǫ
C D
0 1
0 1
E H
0 0
A 1 1
0
B G
In any case, the NFA M1 (depicted in first figure) is the 5-tuple (Q, Σ, δ, A, F), where
δ : Q × Σ → P(Q).
218
Here Σ = {0, 1}, Q = {A, B, C, D, E, G, H}, and F = {H}.
δ 0 1 ǫ
C D
A {C} {B} ∅
1
B {E, G} ∅ ∅ 0 1
C ∅ ∅ {D} M1 : E H
D ∅ {E} ∅
0 0
E ∅ {H} ∅ A 1 1
G {H} {E} ∅ 0
H ∅ ∅ ∅ B G
Claim 35.1.1 The NFA N accepts a string w ∈ Σ∗ , if and only if there exists two strings x, y ∈ Σ∗ , such
that w = xy and x ∈ L(M ) and y ∈ L(M 0 ).
Proof: If x ∈ L(M ) then there is an accepting trace (i.e., a sequence of states and inputs that show that x
is being accepted by M , and let the sequence of states be A = r0 , r1 , . . . , rα , and the corresponding input
sequence be x1 , . . . , xα ∈ Σ . Here x = x1 x2 . . . xα (note that some of these characters might be ).
Similarly, let A0 = r00 , r10 , . . . , rβ0 be accepting trace of M 0 accepting y, with the input characters y1 , y2 , . . . , yβ ∈
Σ , where y = y1 y2 . . . yβ .
Note, that by our assumption rα = f . As such, the following is accepting trace of w = xy for N :
Indeed, its a valid trace, as can be easily verified, and rβ0 ∈ F 0 (otherwise y would not be in L(M 0 ).
Similarly, given a word W ∈ L(N ), and accepting trace for it, then we can break this trace into two parts.
The first part is trace before using the transition f → A0 , and the other is the rest of the trace. Clearly, if
we remove this transition from the given accepting trace, we end up with two accepting traces for M and
M 0 , implying that we can break w into two strings x and y, such that x ∈ L(M ) and y ∈ L(M 0 ).
had 8 states. Note that the following NFA does the same job, by guessing position of third character from
the end of string.
219
0, 1
0 0, 1 0, 1
A B C D
M3 :
Q: Is there a language L where we have a DFA for L with smaller number of states that of any NFA for
L?
A: No. Because any DFA is also a NFA.
Naively, the easiest thing would be to complement the states of the NFA. We get the following NFA M4 .
0, 1
0 0, 1 0, 1
A B C D
M4 :
But this is of course complete and total nonsense. Indeed, the language of L(M4 ) = Σ∗ , which is definitely
not L3 . Here is the correct solution.
0,1
1 0,1 0,1
A B C D
M5 :
The conclusion of this tragic and sad example is that complementing a NFA is a non-trivial task (unlike
DFAs where all you needed to do was to just flip the accepting/non-accepting states). So, for some tasks
DFAs are better than DFAs, and vice versa.
Designing a DFA for L, using the most obvious logic, we will have:
1 E
C 0, 1
0 1
1
1 0 0 G
A B 1
0 0
D 1
0 F
220
With NFA we can go this way:
0, 1
0, 1
1 0
C D E
0
1 0, 1
A B
1
1 1
C’ D’ E’
Note that the NFA approach is easily extendable to more than 2 substrings.
abc?(ba)∗ b
∗
Where ? represents a substring of length 1 or more and represents 0 of more of the previous expression.
The NFA for this pattern would be
a,b,c a,b,c ǫ
a b c a,b,c ǫ b a
1. r0 = q0
3. rn ∈ F
How do we formally show a string w is accepted by M . Lets show that the (automaton on page 1) accepts
the string 101.
We show that there must exists states r0 , r1 , ..., r3 statisfying the above three conditions. We claim that
the sequence A,B,E,G satisfies the three claims.
1. A = q0
221
2. δ(A, 1) = B
δ(B, 0) = E
δ(E, 1) = G
3. G ∈ F
222
Chapter 36
Discussion 4: More on
non-deterministic finite automatas
10 February 2008
Questions on homework 3?
1
B C
ǫ
ǫ
0 0
A
1 1
1 D
1 E
0
And after removing -transitions:
223
0,1
0,1
0,1
0,1
0,1
B C
0,1
0,1
0,1 0
A
1 0,1 1
1 D
0,1 E
0
1
General rule: For every state Q and every a ∈ Σ, compute the set of all reachable states from Q when
we feed just character a.
ABCDE
1
A 1 0
0
ABCD
Note that to convert an NFA to a DFA, here we first remove -transitions and then we apply subset
construction. You could reverse the order of these two operations (like what Sipser does), but note that the
new start state will be the -closure of old start state.
224
Chapter 37
Discussion 5: More on
non-deterministic finite automatas
17 February 2009
Questions on homework 4?
Any questions? Complaints, etc?
Direct proof
n o
Lemma 37.1.2 The language L2 = x #a (x) + #b (x) = #c (x) , is not regular.
225
have that M accepts the string cj , if we start from qj , since M accepts the string aj cj . But then, it must
be that M accepts ai cj . Indeed, after M reads ai it is in state qi = qj , and we know that it accepts cj , if
we start from qj . But this is a contradiction, since ai cj ∈
/ L2 , for i 6= j.
This implies that M has an infinite number of states, which is of course impossible.
By closure properties
Here is another proof of Lemma 37.1.2.
Proof: Assume, for the sake of contradiction, that L2 is regular. Then, since regular languages are closed
under intersection, and the language a∗ c∗ is regular, we have that L3 = L2 ∩ a∗ c∗ is regular. But L3 is
clearly the language n o
L3 = an cn n ≥ 0 ,
which is not regular. Indeed, if L3 was regular then f (L3 ) would be regular (by closure under homomor-
phism), which is false by Lemma 37.1.1, where f (·) is the homomorphism mapping f (a) = 0 and f (b) =
and f (c) = 1.
is not regular.
Is it regular or not? It seems natural to think that it is not regular. However, it is in fact regular. Indeed, L7
is the set of all strings where the first and last character are the same, which is definitely a regular language.
226
Chapter 38
Questions on homework 5?
Any questions? Complaints, etc?
L/a = {xsepxa ∈ L}
This operation preserves regularity: consider a DFA for L with set of final states F . Redefine F as follows:
n o
F 0 = s δ(s, a) ∈ F } ∪ {t ∈ F : δ(t, a) = t} = {s : δ(s, a) ∈ F
Again consider a DFA for L and remove all outgoing transitions from final states of that DFA to have a NFA
for min(L).
227
Chapter 39
S → aSb | .
S → aSb | bSa | SS | ,
Proof: It is easy to see that every string that G generates has equal number of a’s and b’s. As such,
L(G) ⊆ Lmix
a=b .
We will use induction on the length of string x ∈ L(G), 2n = |x|. For n = 0 we can generate by G. For
n = 1, we can generate both ab and ba by G.
Now for n > 1, consider a balanced string with length 2n, x = x1 x2 x3 · · · x2n ∈ Lmix
a=b . Let #c (y) be the
number of appearances of the character c in the string y. Let αi = #a (x1 · · · xi ) − #b (x1 · · · xi ). Observe
that α0 = α2n = 0. If αj = 0, for some 1 < j < 2n, then we can break x into two words y = x1 . . . xj and
z = xj+1 . . . x2n that are both balanced. By induction, y, z ∈ L(G), and as such S ⇒∗ y and S ⇒∗ z. This
implies that
S ⇒ SS ⇒∗ yz = x.
228
Namely, x ∈ L(G).
The remaining case is that αj 6= 0 for j = 2, . . . , 2n − 1. If x1 = a then α1 = 1. As such, for all
j = 1, . . . , 2n − 1, we must have that αj > 0. But then α2n = 0, which implies that α2n−1 = 1. We conclude
that x1 = a and x2n = b. As such, x2 . . . x2n−1 is a balanced word, which by induction is generated by L(G).
Thus, the x can be derived via S → aSb ⇒∗ ax2 x3 . . . x2n−1 b = x. Thus, x ∈ L(G).
The case x1 = b is handled in a similar fashion, and implies that x ∈ L(G) also in this case. We conclude
that Lmix
a=b ⊆ L(G).
Thus Lmix a=b = L(G).
If n 6= m then either n > m or m > n, therefore we can design this grammar by first starting with the basic
grammar for when n = m, and then transition into making more a’s or b’s.
Let X be the non terminal representing “choosing” to generate more a’s than b’s and Y be the non-terminal
for the other case. One grammar that generates La6=b will therefore be:
S → aSb | aA | bB, A → aA | , B → bB | .
We can essentially combine two copies of the previous grammar (with one version that works on b and c) in
order to create a grammar that generates L2 :
S → Sa=b C | ASb=c
n o
Exercise 39.1.2 Derive a CFG for the language L04 = ai bj ck i = j or j = k or i = k .
We can essentially combine two copies of the previous grammar (with one version that works on b and c) in
order to create a grammar that generates L2 :
A → Aa | B → Bb | C → Cc | .
229
39.1.6 Anything but balanced
n o
Let Σ = {a, b}, and let Let La=b = Σ∗ \ an bn n ≥ 1 .
The idea is that lets first generate all words that contain b in them, and then later the contain a. The
grammar for this language is
S1 → ZbZaZ Z → aZ | bZ | .
Clearly L(Z) ⊆ La=b . The only words we miss, must have all their as before their bs. But these are all
words of the form ai bj , where i 6= j ≥ 0. But we already saw how to generate such words in Section 39.1.3.
Putting everything together, we get the following grammar.
⇒S → S1 | Sa6=b
S1 → ZbZaZ
Z → aZ | bZ |
Sa6=b → aSa6=b b | aA | bB,
A → aA | ,
B → bB | .
where #a (w) is the number of appearances of the character a in w. The grammar for this language is
S → | bS | aS0.
E→E∗E|E+E N → 0N | 1N | 0 | 1.
The ambiguity caused because there is no inherent preference from combining expressions with ∗ over +
or vise versa. It was then fixed by introducing a preference :
E → E ∗ E | T, T→N ∗T |N N → 0N | 1N | 0 | 1.
However some languages are inherently ambiguous, no context free grammar without ambiguity can generate
it.
Consider the following language:
n o n o
L = an bn ck dk n, k ≥ 1 ∪ an bk ck dn n, k ≥ 1 .
1. the number of a’s equals the number of b’s and the number c’s equals the number of d’s
2. the number of a’s equals the number of d’s and the number of b’s equals the number of c’s
230
The
n reason why allo grammars for this language must be ambiguous can be seen in strings of the form
n n n n
a b c d n ≥ 1 . Any grammar needs some way of generating the string in a way that either the a’s
and b’s are equal and the c and d’s are equal or the a’s and d’s are equal and the b’s and c’s are equal.
When generating equal a’s and b’s, it must be still possible to have the same number of c’s and d’s. When
generating equal a’s and d’s , it must still be possible
n to have the same
o number of b’s and c’s. No matter
n n n n
what grammar is designed, any string of the form a b c d n ≥ 1 must have at least two possible parse
trees.
(This is of course only an intuitive explanation. A formal proof that any grammar for this language must
be ambiguous is considerably more tedious and harder.)
It should be clear that this language cannot be regular. However, it may not be obvious that we can in
fact design a context free grammar for it. x and y are guaranteed to be different if, for some k, the kth
character is 0 in x and 1 in y (or vise versa). It is important to notice that we should not try to build x
and y separately as, in a CFG, we would have no way to enforce them being of the same length. Instead,
we just remember that if the string is of length 2n, the first n characters are considered x and the second n
characters are y. Similarly, notice that we cannot choose k ahead of time for similar reasons.
So, consider the following string
Now, X is a word of odd length with 1 in the middle (and we definitely know how to generate this kind of
words using context free grammars). And Y is a word of odd length, with 0 in the middle. In particular,
any word of L can be written as either XY or Y X, where X and Y are as above. We conclude, that the
grammar for this language is
S → XY | YX X → DXD | 1 Y → DYD | 0 D → 0 | 1.
231
Chapter 40
Questions on homework 7?
Any questions? Complaints, etc?
ǫ, ǫ → $ b, a → ǫ ǫ, $ → ǫ
p q r s
The equivalent grammar is (note that in this case we can simplify it to get our familiar grammar for L):
232
Chapter 41
Questions on homework 8?
Any questions? Complaints, etc?
S → ASA | aB
A→ B |S
B→
S → AS | SA | ASA | a
A→ S
233
Chapter 42
Questions on homework 9?
Any questions? Complaints, etc?
4. yzw is between the first run of b’s and second run of a’s: xzv 6∈ L (why?)
5. yzw is in the second run of a’s: xy 2 zw2 v 6∈ L (why?).
6. yzw is between the second run of a’s and b’s: xzv 6∈ L (why?)
234
Chapter 43
0, 0, ? → 1, 0, ?, R, R, S ?, 0, ? →?, 0, ?, S, L, S
$, $, ? → $, $, ?, R, R, S 0, ␣, ? → 0, ␣, ?, S, L, S
n0 n1 n2
␣, 0/␣, ? →?, ?, ?, L, S, S ?, ?, ? → $, ?, ?, R, S, S
1, ?, ␣ → 1, ?, 0, L, S, R n4
n3
$, ?, ? →?, ?, ?, S, S, S
n5
?, $, ? →?, $, ?, L, R, S
The basic idea is that immediately after we move the head of 2 back to the beginning of the tape (state
n2 ), we write a $ on the first tape (i.e., transition from n3 to n1 ). Thus, conceptually, every time this loop
is being performed a block of b charactere of 0 are being chopped of 1 .
To use this box as a template in our future designs (less and more like Macros in C++), we name this
Mod( 1 , 2 , 3 ).
235
L1
L2 :
carry= 0 1, 0, ␣ → 1, 0, 0, R, R, R
L1 :
␣, ␣, ␣ → ␣, ␣, ␣, S, S, S
0, 1, ␣ → 0, 1, 0, R, R, R
0, 0, ␣ → 0, 0, 0, R, R, R 1, 1, ␣ → 1, 1, 1, R, R, R
1, 1, ␣ → 1, 1, 0, R, R, R L3 1, 0, ␣ → 1, 0, 1, R, R, R ␣, 1, ␣ → ␣, 1, 0, S, R, R
0, 1, ␣ → 0, 1, 1, R, R, R 1, ␣, ␣ → 1, ␣, 0, R, S, R
L2 carry= 1 ␣, 0, ␣ → ␣, 0, 0, S, R, R
␣, 1, ␣ → ␣, 1, 1, S, R, R
0, ␣, ␣ → 0, ␣, 0, R, S, R L3 :
␣, ␣, ␣ → ␣, ␣, 1, S, S, S 1, ␣, ␣ → 1, ␣, 1, R, S, R 0, 0, ␣ → 0, 0, 1, R, R, R
␣, 0, ␣ → ␣, 0, 1, S, R, R
qacc 0, ␣, ␣ → 0, ␣, 1, R, S, R
43.2.2 Multiplication
Question 43.2.2 Design a TM that given $0a on tape
1 and $0b on tape
2 , writes 0ab on tape
3 .
Solution: The idea is to attach a copies of tape
2 at the end of tape
3 .
n0
?, 0, ␣ →?, 0, 0, S, R, R
$, $, ? → $, $, ?, R, R, S
0, ?, ? → 0, ?, ?, R, S, S
n1 n2
?, ␣, ? →?, ␣, ?, S, L, S
?, $, ? →?, $, ?, S, R, S
n3
␣, ?, ? →?, ?, ?, S, S, S
?, 0, ? →?, 0, ?, S, L, S
n4
To use this box as a template in our future designs (less and more like Macros in C++), we name this
Mult( 1 , 2 , 3 ).
∗
where w1 , w2 , w3 ∈ {0, 1} and w3 is the binary addition of w1 and w2 .
Thus, if 1 = 01 (i.e., this is the number 102 = 2 and 2 = 1011 (i.e., this is the binary number
11012 = 13 then the output should be 1111 (which is 15).
236
/***
2 = 0a and
2 = 0b ***/
CLEAR(
1 )
CLEAR(
7 )
Mod(
2 ,
3 ,
5 )
/***
5 = 0a mod b ***/
do
if EQ(t7 , t5 ) then accept.
COPY( 1 , t4 )
Mult( 1 , t4 , t6 )
Mod(t6 , 3 , t7 )
while true
Solution: We sum starting from the least significant digit, the normal procedure. See Figure 43.1. Here,
a transition of the form 0, 1, ␣ → 0, 1, 0, R, R, R stands for the situation where the TM reads 0 on the first
tape, 1 on the second tape and ␣ on the third tape, next it writes 0, 1 and 0 to these three tapes respectively,
and move the three heads to the right.
43.3 MTM
Question 43.3.1 Show that an MTM (an imaginary more powerful TM) whose head can read the character
under the head and the character to the left of the head (if such a character does not exist, it will read a ␣,
i.e. blank character) and can just rewrite the character under the head, is equivalent to a normal TM.
Solution: First of all it is obvious that an MTM can simulate a TM since it can ignore the extra
information that it can read using its special head.
Now observe that a TM can simulate an MTM this way: For making a move using the transition function
of MTM, the TM that simulates it must read the character under the head (which a normal TM can) and the
237
character to the left of head (which a normal TM can’t). What the simulating TM does is that it remembers
the current state of MTM in its states (note that we have done several times, this kind of “remembering
finite amount of information inside states by redefining states, e.g. extending them to tuples” in class),
brings the head to the left and reads that character and remembers it inside its states, moves the head to
the right, so now head is in its original place and our TM knows the missed character and can perform the
correct move using MTM’s transition function.
238
Chapter 44
Questions on homework 8?
Any questions? Complaints, etc?
Definition 44.1.1 For two arbitrary sets (maybe infinite) X and Y , we have |X| ≤ |Y |, iff there exists an
injective mapping f : X → Y .
Definition 44.1.2 Two arbitrary sets (maybe infinite) X and Y , are of the same cardinality (i.e., same
“size”) if |A| = |B|. Namely, there exists an injective and onto mapping f : X → Y .
Observation 44.1.3 For two sets X and Y , if |X| ≤ |Y | and |Y | ≤ |X| then |X| = |Y |.
For N, the set of all natural numbers, we define |N| = ℵ0 . Any set X, with |X| ≤ ℵ0 , is referred to as a
countable set.
Claim 44.1.4 For any set A, we have |X| < |P(X)|. That is, |X| ≤ |P(X)| and |P(X)| =
6 |X|.
(Here P(X) is the power set of X.)
Proof: It is easy to verify that |X| ≤ |P(X)|. Indeed, consider the mapping h(x) = {x} ∈ P(X), for all
x ∈ X.
So, assume for the sake of contradiction thatn|X| = |P(X)|,
andolet f be a one-to-one and onto mapping
from X onto P(X). Next, consider the set B = x ∈ X x ∈ / f (x) .
Now, consider element b = f −1 (B), and consider the question of whether it is a member of the set B
or not. Now if f −1 (B) = b ∈ B, then by the definition of B, we hvae bf −1 (B) = b ∈ / B. Similarly, if
f −1 (B) = b ∈
/ B, then by definition of B, we have f −1 (B) = b ∈ B.
A contradiction. We conclude that our assumption that f exists (since X and P(X) have the same
cardinality) is false. We conclude that |X| =
6 |P(X)|.
239
Definition 44.1.5 An enumerator T for a language L is a Turing Machine that writes out a list of all
strings in L. It has no input tape, only an output tape on which it prints the strings, with some separator
character (say, #) printed between them.
The strings can be printed in any order and the enumerator is allowed to print duplicates of a string it
already printed. However, sooner or later all strings in L must be printed eventually by T. Naturally, all the
strings printed by T are in L.
We remind the reader that two natural numbers a and B are relatively prime (or coprime) if they
have no common factor other than 1 or, equivalently, if their greatest common divisor is 1. Thus 2 and 3 are
coprime, but 4 and 6 are not coprime. Thus, although 2/3 = 4/6, we will consider only the representation
2/3 to be in the set Q.
We show that this set is enumerable by giving the pseudo-code for an enumerator for it.
EnumerateRationals
for i = 1 . . . ∞ do
for x = 0 . . . i − 1 do
y =i−x
if x, y are relatively prime then
print x/y onto the tape followed by #
print −x/y onto the tape followed by #.
It is obvious that every rational number will be enumerated at some point. Any rational number is of
the form a/b and as such when i = a + b an d y = b it will enumerate this rational number.
It helps to picture this as travelling along each line x + y = i.
It is now easy to verify that given w ∈ Σ∗ , we can figure out the i such that wi = w. Similarly, given i,
we can output the word wi .
We just demonstrated that there is a one-to-one and onto mapping from N to Σ∗ , and we can conclude
the following.
240
44.4 Languages are not countable
Let n o
∗
Lall = L L is some language, and L ⊂ {a, b} .
241
Chapter 45
Solution: Let N be a decider for L. To decide hM, wi ∈ ATM , build a new TM M 0 (with input x) that
simulates M on w and if M accepts, and if x is 00, accepts (how can it verify this?), otherwise rejects. Now
observe that:
hM, wi ∈ ATM ⇐⇒ M 0 ∈ L
Lefthand side can be decided using N .
Solution: Let N be a decider for L. To decide hM, wi ∈ ATM , build a new TM M 0 (with input x) that
simulates M on w and if M accepts, and if |x| is even, accepts (how can it verify this?), otherwise rejects.
Now observe that:
hM, wi ∈ ATM ⇐⇒ M 0 ∈ L
Lefthand side can be decided using N .
242
Chapter 46
Questions on homework ?
Any questions? Complaints, etc?
Solution:
Proof: We assume, for the sake of contradiction, that ODDTM is decidable. Given hT, wi, consider the following procedure.
Z(x):
if x = ab then
Accept
r ← Simulate T on w
if r =Accept then Accept
Reject.
Thus L(Z) = Σ∗ (which includes odd length strings) if T accepts w, and L(Z) = {ab} (a set of even length strings) otherwise.
We reduce ATM TM to ODDTM .
Suppose isOdd(T) is a decider for ODDTM . We build the following decider for ATM .
Clearly, this is a decider for ATM , which is impossible. We conclude our assumption is false, and thus ODDTM is not decidable.
n o
(Q2) Prove that the language SUBSETTM = hM, N i L(M ) ⊆ L(N ) is undecidable. Hint: Reduce
n o
EQTM = hM, N i L(M ) = L(N ) to SUBSETTM for this purpose.
Solution:
Assume, for the sake of contradiction, that isSubset(M, N ) is a decider for SUBSETTM . Then, we have the following decider
for EQTM .
DeciderEQTM(M, N ):
if isSubset(M, N ) and isSubset(N, M ) then
return Accept
else
return Reject.
243
However, we already know that EQTM is undecidable. A contradiction.
(Q3) Assume that the languages L1 and L2 are recognizable languages, where L1 ∪ L2 = Σ∗ . Now, assume
you have a decider for the language L1 ⊕ L2 . Show that L1 is decidable.
Solution:
Let ORACxor be a decider for L1 ⊕ L2 and let T1 (resp. T2 ) be a machine that recognizes L1 (resp. L2 ).
Decider1 (w)
r ← Simulate ORACxor on w.
x1 = start a simulation of T1 on w
x2 = start a simulation of T2 on w
while (true)
Advance x1 and x2 by one step
if x1 accepts then accept
if x1 rejects then reject
if x2 accepts then
return not r
if x2 rejects then
return r
‘
Since both L1 and L2 are recognizable, and L1 ∪ L2 = Σ∗ , it follows that one of the two simulations x1 and x2 must stops.
This implies that the above procedure is indeed a decider. It is now an easy case analysis to verify that this is indeed a decider
for L1 .
Indeed, if x1 accepts than w ∈ L1 and we are done. Similarly, if x1 rejects than w ∈
/ L1 and we are done.
If x2 accepts, but r =reject this implies that w ∈ L2 and w ∈
/ L1 ⊕ L2 . Namely, it must be that w ∈ L1 and we accept.
If x2 accepts, but r =accept this implies that w ∈ L2 and w ∈ L1 ⊕ L2 . Namely, it must be that w ∈
/ L1 and we reject.
If x2 rejects, but r =reject this implies that w ∈
/ L2 and w ∈
/ L1 ⊕ L2 . This implies that w ∈
/ L1 ∪ L2 , which is impossible in
our case (so this case never happens.)
If x2 rejects, but r =accept this implies that w ∈
/ L2 and w ∈ L1 ⊕ L2 . This implies that w ∈ L1 , and we accept.
n o
(Q4) Let EQTM = hM, N i L(M ) = L(N ) . Reduce ATM to EQTM as another way to prove that EQTM is
undecidable
Solution:
For a given hT, wi, consider Tw defined as follows
Tw (y) :
if y 6= w then
reject
r ←Simulate T on w
return r
And, consider Nw defined as follows:
Nw (y) :
if y = w then
accept
else
reject
Let deciderEQ be a decider for EQTM . We can design a decider for ATM as follows using Tw and Nw :
DeciderA (hT, wi):
Compute hTw i and hNw i from hT, wi.
r ←Simulate deciderEQ on hTw , Nw i.
return r
Nw only accepts w and Tw accepts nothing if T does not accept w, and {w} otherwise. Therefore the two languages will be
equal iff T accepts w.
(Q5) Prove that the following language is not recursively enumerable (namely, it is not recognizable)
n o
L = hTi T is a TM, and L(T) is infinite .
244
Solution:
We reduce the not recognizable language ATM to L. Let us check the following routine first (fix T and w):
Tw (x) :
Simulate T(w) for |x| steps.
if simulation above does not accept then
accept
else
reject
Observe that L(Tw ) is infinite iff T(w) 6=accept . So now we have the following reduction: Assume, for the sake of contradiction,
that we are given a recognizer recogL. We get the following recognizer for ATM .
recognizerA (hM, wi):
TM
Compute hTw i
return recogL(hTw i)
So, if T does not accept w, then L(Tw ) = Σ∗ and then recogL(hTw i) would stop and accept.
If T accepts w, then the language L(Tw ) is finite, and then recogL(hTw i) might reject (or it might run forever. If recogL(hTw i)
halts and rejects, then recognizerA would reject.
TM
In any case, we got a recognizer for ATM which is impossible, since this language is not recognizable.
245
Chapter 47
Questions on homework ?
Any questions? Complaints, etc?
47.1 Problems
(Q1) If B is regular (or CFL), and A ⊆ B, can we deduce that B is regular (or CFL)?
Solution:
No, every language is a subset of Σ∗ which is regular. More interesting sample is
n ˛ o n ˛ o n ˛ o
L1 = an bn cn ˛ n ≥ 0 ⊆ L2 = an bn ck ˛ n, k ≥ 0 ⊆ L3 = ai bj ck ˛ i, j, k ≥ 0 .
˛ ˛ ˛
The language L3 is regular, L2 is not regular but is a CFL and L1 is not a CFL.
n o
(Q2) Give a direct proof why L = ww w ∈ Σ∗ is not regular.
Solution:
Proof: Assume, for the sake of contradiction, that this language is regular, and let D = (Q, Σ, qinit , δ, F ) be a DFA for it.
Consider the strings wi = ai b, for i = 1, . . . , ∞. Let qi = δ(qinit , wi ), for i = 1, . . . , ∞. We claim all the qi s are distinct. Indeed,
if for some i 6= j, we had qi = qj then
Solution:
For a given G1 , G2 and k, the TM produces all strings w of length at most k and checks whether each string is in both L(G1 )
and L(G2 ) (first we convert the grammar into CNF and then we use CYK algorithm).
n o
(Q4) Show that L = hG1 , G2 i L(G1 ) \ L(G2 ) = ∅ (G1 and G2 are grammars) is undecidable.
Solution: n ˛ o
We can reduce EQCF G = hG1 , G2 i ˛ L(G1 ) = L(G2 ) to L. If we have a decider for L, we can build a decider for EQCFG by
˛
querying it once for hG1 , G2 i and once for hG2 , G1 i. (Note that for any two sets A and B we have that A = B if and only if
A \ B = ∅ and B \ A = ∅.)
246
n o
(Q5) Show that L = hP, wi P is an RA and w ∈ L(P) is decidable.
Solution:
Given hP, wi, the decider converts P into a CFG G (remember we have seen the algorithm for this) and then converts G into CNF
(we also saw this algorithm), and finally using CYK it detects if w ∈ L(G) or not.
(Q6) Assume we have a language L and a TM T with the following property. For every string w with length
at least 2, T(w) halts and outputs k strings w1 , · · · , wk where |wi | < |w| for all i (k is not a constant
and depends on string w). We know that w ∈ L iff for all i, wi ∈ L.
Assuming that 0 ∈ L, 1 6 inL, 6∈ L, design a decider for set L (you can use T as a subroutine) and
prove that your decider works.
Solution:
DeciderL (w):
if |w| ≤ 1 then
if w = 0 then
return Yes
else
return No
w1 , · · · , wk ← T(w)
for i = 1, . . . , k do
if DeciderL (wi ) =No then
return No
return Yes
Claim 47.1.1 For any string w ∈ Σ∗ , we have that DeciderL (w) halts, and furthermore DeciderL (w) = Yes ⇐⇒ w ∈ L.
247
Part III
Exams
248
Chapter 48
250
48.1 Midterm 1 - Spring 2009
6 May 2009
NAME:
NETID: DISC:
• The point value of each problem is indicated next to the problem, and in the table below.
• Points may be deducted for solutions which are correct but excessively complicated, hard to understand,
or poorly explained. Please keep your solutions short and crisp.
• The exam is designed for one hour, but you have the full two hours to finish it.
• It is wise to skim all problems and point values first, to best plan your time.
• This is a closed book exam. No notes of any kind are allowed. Do all work in the space provided, using
the backs of sheets if necessary. See the proctor if you need more paper.
• After the midterm is over, discuss its contents with other CS 373 students only after verifying that
they have also taken the exam (e.g. they aren’t about to take the conflict exam).
• We indicate next to each problem how much time we suggest you spend on it. We also suggest you
spend the last 25 minutes of the exam reviewing your answers.
(A) If a DFA M has k states then M must accept some word of length at most k − 1. True or false?
True False
Yes No
251
(C) Suppose that L is a regular language over the alphabet Σ = {a, b}. And consider the language
n o
L0 = 0i i = |w| , w ∈ L .
Yes No
True False
n o
(E) Let L = wx w ∈ Σ∗ , x ∈ Σ∗ , |w| = |x| . Is L regular?
Yes No
T∞
(F) Let Li be a regular language, for i = 1, . . . , ∞. Is the language i=1 Li always regular? True or false?
True False
(G) If L1 and L2 are two regular languages that are accepted by two DFAs D1 and D2 , respectively, each
with k1 and k2 states, then the language L1 \ L2 can always be recognized by a DFA with k1 ∗ k2 states?
True or false?
True False
(H) The minimum size NFA for a regular language L, always has strictly fewer states then the minimum
size DFA for the language L. True or False?
True False
b
q1 a q2 c q3 a q4
b c b
b
b
a,b
q7 q6 q5
b
Fill in the following values:
(A) F =
(B) δ(q2 , a) =
(C) δ(q3 , b) =
252
(D) δ(q6 , c) =
n o
(E) List the members of the set q ∈ Q q5 ∈ δ(q, b) :
(F) Does the NFA accepts the word acbacbb? (Yes / No)
Yes No
a
q
a
a,b a
z p
a s
b
a
r
b
a t
a
b b b
a,b q2 q3
a
Draw M 0 for this case.
(B) In general, define the language of M 0 in terms of the language of an arbitrary M . (Hint: Make sure
that your answer works for the above example!)
253
Problem 5: NFA construction (8 points)
[15 minutes.]
A string y is a subsequence of string x = x1 x2 · · · xn ∈ Σ∗ , if there exists indices i1 < i2 < · · · < im
such that y = xi1 xi2 · · · xim . Note, that the empty word is a subsequence of every string. For example,
aaba is a subsequence of cadcdacba, but abc is not a subsequence of cbacba.
Let N = (Q, Σ, δ, q0 , F ) be an NFA with language L. Describe a construction for an NFA N 0 such that:
n o
L(N 0 ) = x there is a y ∈ L, such that x is a subsequence of y .
Proof:
254
Prove formally that L(B) = S.
Hint: The proof goes by showing that L(B) is contained in S, and that L(B) is contained in S. So, you
must show both inclusions using precise arguments. You do not need to use complete mathematical notation,
but your proof should be precise, correct, short and convincing. (Make sure you are not writing unnecessary
text [like copying the hint, or writing text unrelated to the proof].)
255
48.2 Midterm 2 - Spring 2009
6 May 2009
NAME:
NETID: DISC:
• There are 7 problems, on pages numbered 1 through 8. Make sure you have a complete exam.
• The point value of each problem is indicated next to the problem, and in the table below.
• Points may be deducted for solutions which are correct but excessively complicated, hard to understand,
or poorly explained.
• The exam is designed for slightly over one hour, but you have the full two hours to finish it.
• It is wise to skim all problems and point values first, to best plan your time.
• This is a closed book exam. No notes of any kind are allowed. Do all work in the space provided, using
the backs of sheets if necessary. See the proctor if you need more paper.
• After the midterm is over, discuss its contents with other students in the class only after verifying
that they have also taken the exam (e.g. they aren’t about to take the conflict exam).
Yes: No:
(b) If L is a language over Σ∗ , and h is a homomorphism, and h(L) is regular. Then L must be regular. Is
this statement correct?
Yes: No:
∗
(c) Suppose L is a language over {0} , such that if 0i is in L then 0i+2 is in L. Then L must be regular. Is
this statement correct?
Yes: No:
256
(d) If L1 and L2 be two languages. If L1 and L2 are both context-free, then L1 \L2 must also be context-free.
Is this statement correct?
Yes: No:
(f) Consider a parallel recursive automata D – it is made out of two recursive automatas B1 and B2 and
it accepts a word w if both B1 and B2 accepts w. The parallel recursive automata D might accept a
language that is not context-free. Is this statement correct?
Yes: No:
where #z (w) is the number of appearances of the character z in w. For example, the word x = baccacbbcb ∈
L(J) since #a (x) = 2, #b (x) = 4, and #c (x) = 4. Similarly, the word y = abbccc ∈ / L(J) since #a (y) = 1,
#b (y) = 2, and #c (y) = 3.
Give a context-free grammar whose language isnJ. Be sure to indicate what its
o start symbol is. (Hint:
∗
First provide a CFG for the easier language K = w ∈ {a, b} #a (w) = #b (w) and modify it into the
desired grammar.)
257
where #z (w) is the number of appearances of the character z in w. Prove that L is not context-free
using closure properties and the fact that B is not context-free.
(b) Let w be a word of length n in a language generated by a grammar G which is in Chomsky Normal Form
(CNF). First, how many internal nodes does the parse tree for w using G has?
Secondly, assume that G has k variables, and n = |w| > 2k . Is the language L(G) finite or not? Justify
your answer.
258
48.3 Final - Spring 2009
6 May 2009
259
n o
(a) Consider a regular language L, an let L0 = w w ∈ L and w is a palindrome . The language L0 is
regular. True or false?
False: True:
(c) There exist two context-free languages L1 and L2 such that L1 ∩ L2 is also context-free.
False: True:
(d) Let T and Z be two non-deterministic Turing machine deciders. Then there is a TM that recognizes the
language L(T) ∩ L(Z).
False: True:
(e) Let h : Σ∗ → Σ∗ be a string sorting operator; that is, it sorts the characters of the given string and
returns the sorted string (it uses alphabetical ordering on the characters of Σ). For example, we have
h(baba) = aabb and h(peace) = aceep. Let L be a language, and consider the sorted language
n o
h(L) = h(w) w ∈ L .
The statement “if L is regular, then h(L) is also regular” is true or false?
False: True:
(f) Let T be a TM decider for some language L. Then h(L) (see previous question) is decidable.
False: True:
(h) If a language L is context-free and there is a Chomsky normal form (CNF) for it with k variables, then
either L is infinite, or L is finite and contains only words of length < 2k+4 .
False: True:
(i) If a language L is accepted by a RA D then L will be accepted by a RA C which is just like D except that
the set of accepting states has been complemented.
False: True:
(j) Let f : Σ∗ → N be a function that for a string w ∈ Σ∗ it returns some positive integer number f (w).
Furthermore, assume that you are given a TM T that, given w, computes f (w) and this TM always halts.
Then the language n o
L = hM, wi M stops on w after at most f (w) steps
is decidable.
False: True:
260
Problem 2: Classification (16 points)
For each language L described below, we have listed 2–3 language classes. Mark the most restrictive listed
class to which L must belong. E.g. if L must always be regular and we have listed “regular” and “context-free”,
mark only “regular”.
For example, if you are given a language L, and given choices Regular, Context-free, and Decidable, then
you must mark Regular if L is regular, you must mark Context-free if L is context-free and not regular, and
mark Decidable if L is Decidable but not context-free.
n o
∗
(a) L = xcn x ∈ {a, b} , n = |x| .
n o
(b) L = hGi G is a CFG and L(G) is not empty .
n o
(c) L = ai bj ck dm i + j + k + m is a multiple of 3 and i + j = k + m
w is a string,
(d) L = hw, M1 , M2 i M1 and M2 are TMs, .
M1 accepts w, or M2 accepts w
(e) L = x1 #x2 # . . . #xn | xi ∈ {a, b}∗ for each i and, for some r < s, we have xr = xR
s .
n o
(f) L = hG, Di G is a CFG, D is a DFA, and L(D) ∩ L(G) = ∅ .
n o
(g) L = hG1 , G2 i G1 and G2 are CFGs and L(G1 ) ∩ L(G2 ) > 0 .
n o
(h) L = hGi G is a CFG and L(G) 6= Σ∗
261
Problem 3: Context-free grammars (9 points)
n o
(i) Let Σ = [ , ] , and consider the language L0 of all balanced bracketed words. For example, we have
[ ][ ] ∈ L0 and [ [ ][ ] ] ∈ L0 , but ][ ∈
/ L0 .
Give a context-free grammar for this language, with the start symbol S0 .
(ii) Let L1 be the language of all balanced bracketed words, where the character x might appear somewhere
at most once. For example, we have [ x ][ ] ∈ L1 and [ [ ][ ] ] ∈ L1 , but ][ ∈
/ L1 and [ x ]x ∈
/ L1 .
Give an explicit context-free grammar for L1 , with the start symbol S1 . You can use the grammar for
the language L0 that you constructed above.
(iii) Let L2 be the language of all balanced bracketed words, where the character x might appear somewhere
in the word at most twice. For example, we have x[ x ][ ] ∈ L2 and [ [ ][ ] ] ∈ L2 , but ][ ∈
/ L2 and
x[ x ]x ∈
/ L2 .
Give a context-free grammar for this language, with the start symbol S2 .
=⇒ S → aTb
T → aTb | .
(b) A Quadratic Bounded Automaton (QBA) is a Turing machine that can use at most n2 calls of the
tape (assume it has a single tape) given an input string of length n. Let
n o
LQBA = hD, wi D is a QBA and D halts on w. .
Explain why LQBA is Turing decidable, and shortly describe a TM that decides this language.
a →?, ?
p1 GGGGGGGGGGGGGGGGA p
2
backtrackq
then it resets the current contents of the tape and the tape head position to that which they were at the last
time it was in state q, and then it sets the control to be at state p2 .
262
In other words, assume the BTM has run through a sequence of configurations c1 , c2 , . . . ck and ck is a
configuration where the state is p1 . Then, if it uses the transition t when being in the configuration ck it
can go to a new configuration c = wp2 w0 , where ci = wqw0 is the last configuration in the sequence where
the state was q.
Joe thinks that this greatly enhances a Turing machine, as it allows the Turing machine to roll-back to
an earlier configuration by undoing changes.
Note, that a BTM can perform several (say three) consecutive transitions of backtrackx one after the
other. That would be equivalent to restoring to the second to last configuration in the execution history
with the state x in it. Also, observe that BTM are deterministic.
Show that Joe is wrong by giving, for any BTM, a deterministic Turing machine that performs the same
job. Keep your description of the deterministic Turing machine to high-level psuedo-code, and do write down
the intuition of how the deterministic Turing machine will work.
(Your answer should not exceed a hundred words.)
(Recall that for any word w, wR denotes the reverse of the word w.) We assume here that the input alphabet
of T has at least two characters in it.
Show that L is undecidable using a reduction from ATM .
Prove this directly, not by citing Rice’s Theorem.
263
48.4 Mock Final Exam - Spring 2009
6 May 2009
9. There is a bijection between the set of recognizable languages and the set of decidable languages.
False: True:
264
Problem 2: Classification (20 points)
For each language L described below, classify L as
• R: Any language satisfying the information must be regular.
• C: Any language satisfying the information must be context-free, but not all languages satisfying the
information are regular.
• DEC: Any language satisfying the information must be decidable, but not all languages satisfying the
information are context-free.
• NONDEC: Not all languages satisfying the information are decidable. (Some might be only Turing
recognizable or perhaps even not Turing recognizable.)
For each language, circle the appropriate choice (R, C, DEC, or NONDEC). If you change your
answer be sure to erase well or otherwise make your final choice clear. Ambiguously marked answers
will receive no credit.
1. R C DEC NONDEC
n o
L = hTi T is a linear bounded automaton and L(T) = ∅ .
2. R C DEC NONDEC
n o
L = wwR w w ∈ {a, b}∗ .
3. R C DEC NONDEC
n o
L = w the string w occurs on some web page indexed by Google on May 3, 2007 .
4. R C DEC NONDEC
w = x#x1 #x2 #...#xn such that
L= w .
n ≥ 1 and there is some i for which x 6= xi
5. R C DEC NONDEC
n o
L = ai bj i + j = 27 mod 273 .
6. R C DEC NONDEC
L = L1 ∩ L2 ,
where L1 and L2 are context-free languages
7. R C DEC NONDEC
n o
L = hTi T is a TM and L(T) L(M ) is finite .
8. R C DEC NONDEC
L = L1 \ L2 ,
where L1 is context-free and L2 is regular.
9. R C DEC NONDEC
L = L1 ∩ L2 ,
where L1 is regular and L2 is an arbitrary language.
265
Problem 3: Short answer I (8 points)
(a) Give a regular expression for the set of all strings in {0, 1}∗ that contain at most one pair of consecutive
1’s.
(b) Let M be a DFA. Sketch an algorithm for determining whether L(M ) = Σ∗ . Do not generate all strings
(up to some bound on length) and feed them one-by-one to the DFA. Your algorithm must manipulate
the DFA’s state diagram.
δ 0 (P, a) =?????
Suppose N has -transitions. How would your answer to the previous question change?
266
Problem 9: RA modification (8 points)
Let L be a context-free language on the alphabet Σ. Let (M, main, {(Qm , Σ ∪ M, δm , q0m , Fm )}m∈M ) be an
RA recognizing L. Give an RA recognizing the language
n o
L0 = xy x ∈ L and y ∈ Σ∗ and |y| is even .
267
48.5 Quiz 1 - Spring 2009
6 May 2009
Quiz I
n o
(Q1) Consider L = an bn n ≥ 0 and the homomorphism h(a) = a and h(b) = a.
Which of the following statements is true?
268
(e) All of the above.
(Q6) Suppose L is a context-free language (CFL) and R is regular, then L \ R is CFL. This is
(a) True.
(b) False.
(Q7) Consider the grammar G = (V, Σ, R, S), where V = {S, X, Y} with the rules (i.e., R) defined as
S → aSb | X.
X → cXd | Y.
Y → eYf | .
If we associate a language with each variable, what is the language of X ?
n o
(a) en fn n ≥ 0
n o
(b) ck en fn dk n ≥ 0, k ≥ 0
n o
(c) cn en fn dn n ≥ 0
n o
(d) an cn en fn dn bn n ≥ 0
(a) 0012202
(b) 002
(c) (122222)2
( S )
q0 q1 q2 q3
S q4 S
n o
(a) (n )n n ≥ 0
n o
(b) (n )n (m )m n, m ≥ 0
n o
(c) w w is a string with balanced parenthesis .
269
(Q10) Consider the execution trace of the string (()) on the following recursive-automata.
ǫ
S:
( S )
q0 q1 q2 q3
S q4 S
( ( pop
S
(b) q0 , hi →
− q1 , hi −
→ q0 , hq2 i →
− q1 , hq2 i −−→ q0 , hq2 , q2 i →
− q3 , hq2 , q2 i
) pop )
S
−
→ q2 , hq2 i →− q3 , hq2 i −−→ q2 , hi → − q3 , hi .
( (
S S
(c) q0 , hi →
− q1 , hi −
→ q4 , hq2 i →
− q4 , hq2 i −
→ q0 , hq2 , q2 i →
− q3 , hq2 , q2 i
pop
) pop )
−−→ q2 , hq2 i → − q3 , hq2 i −−→ q2 , hi → − q3 , hi
270
48.6 Quiz 2 – Spring 2009
6 May 2009
(Q1) Consider the Turing Machine , T M = {Q, Σ, Γ, δ, q0 , qacc , qrej } where Q = {q0 , q1 , q2 , qacc , qrej }, Σ =
{0, 1} and Γ = {0, 1, B} and the transitions are defined as follows:
δ(q0 , 0) = (q1 , 0, R); δ(q0 , 1) = (q1 , 1, R);
δ(q1 , 0) = (q2 , 0, R); δ(q1 , 1) = (q2 , 1, R);
δ(q2 , B) = (qacc , B, R) and all other transitions are to the reject state.
Assuming that we start the T M on some input and the current configuration is 1q1 100, what is the
next configuration?
(Q2) Consider the following statements about the T M described in the question above. Which of them is
correct?
(a) the T M always halts.
(b) there exist inputs on which the T M does not halt.
(c) there exist inputs on which the T M halts and inputs on which it does not halt.
(Q3) How many Turing machines can you have with the same Q, Σ, Γ as described in the first question?
(a) 309 .
(b) 306 .
(c) 3015 .
(d) infinitely many.
(Q4) Which of the following is true?
The set of Turing recognizable languages is closed under
(a) complementation and intersection.
(b) intersection.
(c) complementation.
(d) neither intersection nor complement.
(Q5) Suppose L is undecidable and suppose L is a Turing recognizable language. Which of the following
statements is true?
(a) L is not recognizable.
(b) L is recognizable.
(c) L may or may not be recognizable.
(Q6) Which of the following properties is a property of the language of T M M ?
(a) M is a T M that has 481 states.
(b) M is a T M and |L(M )| = 481.
(c) M is a T M takes more than 481 steps on some input.
271
n o
(Q7) Consider the language L = hM i takes no more than 481 steps on some input
Which of the following statements is true?
(a) L is decidable.
(b) L is not decidable.
(c) L may be decidable or may not be decidable.
n o
(Q8) Consider the language L = hM i M has 481 states
Which of the following statements is true?
(a) L is decidable and recognizable.
(b) L is decidable but not recognizable.
(c) L is not decidable but recognizable.
(d) L is not decidable nor recognizable.
(Q9) For any TM M and word w, consider a procedure W riteN (M, w) that produces the following procedure
N as output:
Assume the procedure IsEmpty(M ) is one that takes a Turing machine M as input and returns ”yes”
if L(M ) is empty and ”N o” otherwise.
Consider this following procedure:
¯
Decider(M, w){N = WriteN; return IsEmpty(N)}
(a) G is decidable.
(b) G is undecidable by Rices theorem.
n o
(c) G is not decidable as we can reduce L = M L(M ) = ∅ to it .
n o
(d) G is decidable as we can reduce L = M L(M ) = ∅ to it .
272
Chapter 49
273
49.1 Midterm 1
19 February 2008
1. If an NFA M accepts the empty string (), does M ’s start state have to be an accepting state? Why
or why not?
3. Suppose that an NFA M = (Q, Σ, δ, q0 , F ) accepts a language L. Create a new NFA M 0 by flipping
the accept/non-accept markings on M . That is, M 0 = (Q, Σ, δ, q0 , Q − F ). Does M 0 accept L (the set
complement of L)? Why or why not?
2. Let Σ and Γ be alphabets. Suppose that h is a function from Σ∗ to Γ∗ . Define what it means for h to
be a homomorphism.
(a) F =
(b) δ(A, 0) =
274
(c) δ(C, 1) =
(d) δ(D, 1) =
(e) List the members of the set {q ∈ Q | D ∈ δ(q, 2)}:
(f) Does the NFA accept the word 11120? (Yes / No)
B
1
0 1
A 0,1 C
1 1
(a) Briefly explain the idea behind your construction, using English and/or pictures.
(b) Suppose that M = (Q, Σ, δ, q0 , F ). Give the details of your construction of M 0 , using tuple notation.
275
49.2 Midterm 2
27 March 2008
n o
∗
(a) Is the language wwR w w ∈ {a, b} a context-free language?
Yes: No:
(b) If L is a non-regular language over Σ∗ , and h is a homomorphism, then h(L) must also be non-regular.
Is this statement correct?
Yes: No:
(c) Suppose all the words in language L are no more than 1024 characters long. Then L must be regular.
Is this statement correct?
Yes: No:
If L1 and L2 are both context-free, then L1 ⊕ L2 must also be context-free. Is this statement correct?
Yes: No:
If L is context-free, then the language P (L) is context free. Is this statement correct?
Yes: No:
(f) A PDA that is allowed to enter the accept state only if the stack is empty is a strict PDA . There are
context-free languages for which no strict PDA exists. Is this statement correct? Yes: No:
276
Problem 3: PDA design (8 points)
Let
w contains only a’s
J = w ∈ {a, b}
∗
or .
w has an equal number of a’s and b’s
(b) Define what it means for a grammar G to be in Chomsky Normal Form (CNF).
Suppose that L were regular. Let p be the constant given by the pumping lemma.
Since is not in L, we have a contradiction. Therefore, L must not have been regular.
Fill in
Describe how to construct a PDA recognizing L0 . (You can safely assume that # is not in the alphabets Σ
and Γ used by M and L.)
(a) Describe the ideas behind your construction in words and/or pictures.
(b) When you read the # from the input, or shortly after you read it, you will need to do something about
whatever is left on the stack from reading the x part of the input string. Do you need to push anything
onto the stack or pop anything off? If so, what? Namely, describe what your PDA does upon reading
#.
(c) Give the details of your construction in formal notation. That is, for the new PDA recognizing L0 , specify
the set of states, the initial and final states, the stack alphabet, the details of the transition function.
The input alphabet for the new machine will be Σ ∪ {#}.
277
Problem 7: Induction (8 points)
Let Σ = {a, b}. Given any string w in Σ∗ , let A(w) be the number of a’s in w and B(w) be the number of
b’s in w.
Suppose that grammar G has the following rules:
where S is the start symbol. Use induction on the derivation length to prove that A(w) ≥ B(w) for any
string w in L(G).
278
49.3 Final – Spring 2008
7 May 2008
• The exam contains 9 problems on pages numbered 1 through 10. Make sure you have a complete exam.
• The point value of each problem is indicated next to the problem and in the table on the right.
• It is wise to skim all problems and point values first, to best plan your time. If you get stuck on a
problem, move on and come back to it later.
• Points may be deducted for solutions which are correct but excessively complicated, hard to understand,
hard to read, or poorly explained.
• If you change your answers (especially for T/F questions or multiple-choice questions), be sure to erase
well or otherwise make sure your final choice is clear.
• Please bring apparent bugs or unclear questions to the attention of the proctors.
• This is a closed book exam. You may only consult a cheat sheet, handwritten (by yourself) on a single
8.5 x 11 inch sheet (both sides is ok). You can only use your normal eyeglasses (if any) to read it, e.g.
no magnifying glasses.
• Do all work in the space provided, using the backs of sheets if necessary. See the proctor if you need
more paper.
(b) Let G be a context-free grammar given in Chomsky normal form (CNF). Since the grammar G is in CNF
form it must be ambiguous.
False: True:
279
n o
(d) Suppose that the language L = an bn cn n ≥ 0 is not context-free. Let h(·) be a homomorphism.
Then the language h(L) cannot be context-free.
False: True:
(e) For any k > 1, there is no language that is decided by a TM with k tapes, but is undecidable by any
TM having k − 1 (or less) tapes.
False: True:
(g) If a language L is regular and recognized by a DFA with k states, then either L is infinite, or L is finite
and contains only words of length < k.
False: True:
(h) If a language L is accepted by a PDA P then L will be accepted by a PDA R which is just like P except
that the set of accepting states has been complemented.
False: True:
n o
(i) The language ATM = hM, wi M does not accept w is TM recognizable.
False: True:
n o
(b) L = hGi G is a CFG and G is not ambiguous
n o
(c) L = ai bj ck dm i + j + k + m is a multiple of 13
w is a string,
k is an odd number larger than 2,
(d) L = hw, M1 , M2 , . . . , Mk i
each Mi is a TM,
and a majority of the Mi ’s accept w
280
(e) L = {x1 #x2 # . . . #xn | xi ∈ {a, b}∗ for each i and, for some i, xi is a palindrome}.
n o
(f) L = hG, Di G is a CFG, D is a DFA, and L(G) ⊆ L(D)
n o
(g) L = hGi G is a CFG and L(G) is finite
n o
(h) L = hGi G is a CFG and L(G) 6= Σ∗
(a) [4 points] Give a context-free grammar for L for the case that x and y have the same length, where the
start symbol is S.
(b) [4 points] Give a context-free grammar for L for the case that x and y have different lengths, where
the start symbol is X. (You can use portions of the grammar you created for part (a) in your answer, if
you want to. Do not re-use variables from your answer to part (a).)
(c) [1 point] Give a context-free grammar for L for all cases. Let the start symbol be Y. (You can use
portions of the grammar you created for parts (a) and (b) in your answer, if you want to.)
Prove that L is not regular by filling in the missing parts of the following pumping lemma proof.
Suppose that L were regular. Let p be the pumping length given by the pumping lemma.
Consider the string wp = ‘
Because wp ∈ L and |wp | ≥ p, there must exist strings x, y, and z such that wp = xyz, |xy| ≤ p, |y| > 0,
and xy i z ∈ L for every i ≥ 0.
Since isn’t in L, because
281
As such, we have a contradiction. Therefore, L must not have been regular.
Let δ be the transition function of M , and δ 0 be the transition function of M 0 , recalling that M 0 is a
two-tape machine. Suppose that in M , δ(p, a) = (p0 , b, D) where D ∈ {L, R, S} indicates that M moves
left, right, or remains in one place, respectively. Then your construction of M 0 should somehow simulate
this transition of M . Below, give values of the place-holder keywords to describe how your simulation
works.
(c) For the above transition of M , we create the transition
δ 0 (state, tape1sym, tape2sym) = (new-state, new-tape1sym, new-tape2sym, direction1, direction2)
where (fill in the blanks)
direction1 = direction2 =
282
Problem 7: Palindrome (10 points)
n o
Let L = hM i M is a Turing machine and M accepts at least one palindrome .
Show that L is TM-recognizable, i.e. explain how to construct a Turing machine that accepts hM i exactly
when M accepts at least one palindrome. Assume that it is easy to extract M ’s alphabet Σ from hM i. Of
course, M might run forever on some input strings.
Suppose that R is a decider for ECFG and M is a PDA accepting J. Show how to construct a decider for L .
283
49.4 Mock Final
Spring 2007
This mock final exam was generated from previous semester. Some parts where modified or removed (if
they were not relevant).
• The exam contains 12 pages and 11 problems. Make sure you have a complete exam.
• The point value of each problem is indicated next to the problem and in the table below.
• It is wise to skim all problems and point values first, to best plan your time. If you get stuck on a
problem, move on and come back to it later.
• Points may be deducted for solutions which are correct but excessively complicated, hard to understand,
hard to read, or poorly explained.
• This is a closed book exam. No notes of any kind are allowed. Do all work in the space provided, using
the backs of sheets if necessary. See the proctor if you need more paper.
• Please bring apparent bugs or unclear questions to the attention of the proctors.
(a) Let M be a DFA with n states such that L(M ) is infinite. Then L(M ) contains a string of length at
most 2n − 1.
(b) Let Lw = {< M >| M is a TM and M accepts w}, where w is some fixed string. Then there is an
enumerator for Lw .
(d) For a TM M and a string w, let CHM,w = {x | x is an accepting computation history for M on w}.
Then CHM,w is decidable.
(e) The language {< M, w >| M is a linear bounded automaton and M accepts w} is undecidable.
(f) The language {< G >| G is a context-free grammar and G is ambiguous} is Turing-recognizable.
(h) There is a bijection between the set of Turing-recognizable languages and the set of decidable languages.
284
Problem 2: Classification (20 points)
For each language L described below, classify L as
• C: Any language satisfying the information must be context-free, but not all languages satisfying the
information are regular.
• DEC: Any language satisfying the information must be decidable, but not all languages satisfying the
information are context-free.
• NONDEC: Not all languages satisfying the information are decidable. (Some might be only Turing
recognizable or perhaps even not Turing recognizable.)
For each language, circle the appropriate choice (R, C, DEC, or NONDEC). If you change your
answer be sure to erase well or otherwise make your final choice clear. Ambiguously marked answers
will receive no credit.
1. R C DEC NONDEC
L = {< M >| M is a linear bounded automaton and L(M ) = ∅}.
2. R C DEC NONDEC
L = {wwR w | w ∈ {a, b}∗ }.
3. R C DEC NONDEC
L = {w | the string w occurs on some web page indexed by Google on May 3, 2007}
4. R C DEC NONDEC
L = {w | w = x#x1 #x2 #...#xn such that n ≥ 1 and there is some i for which x 6= xi }.
5. R C DEC NONDEC
L = {ai bj | i + j = 27 mod 273}
6. R C DEC NONDEC
L = L1 ∩ L2 where L1 and L2 are context-free languages
7. R C DEC NONDEC
L = {< M >| M is a TM and L(M ) is finite}
8. R C DEC NONDEC
L = L1 − L2 where L1 is context-free and L2 is regular.
9. R C DEC NONDEC
L = L1 ∩ L2 where L1 is regular and L2 is an arbitrary language.
(b) Let M be a DFA. Sketch an algorithm for determining whether L(M ) = Σ∗ . Do not generate all strings
(up to some bound on length) and feed them one-by-one to the DFA. Your algorithm must manipulate
the DFA’s state diagram.
285
Problem 5: Pumping Lemma (8 points)
For each of the languages below you can apply the pumping lemma either to prove that the language is
non-regular or to prove that the language is not context-free. Assuming that p is the pumping length, your
goal is to give a candidate string for wp that can be pumped appropriately to obtain a correct proof. You
ONLY need to give the string and no further justification is necessary. Assume Σ = {a, b} unless specified
explicitly.
(b) L = {w | w contains at least twice as many a’s as b’s}. To show L is not regular using the pumping
lemma
wp =
δ 0 (P, a) =
Suppose N has -transitions. How would your answer to the previous question change?
δ 0 (P, a) =
286
Problem 10: Decidability (6 points)
Show that
A, B, C are DFAs over the same alphabet Σ
EQIN TDF A = hA, B, Ci
and L(A) = L(B) ∩ L(C)
is decidable.
This question does not require detail at the level of tuple notation. Rather, keep your proof short by
exploiting theorems and constructions we’ve seen in class.
287
49.5 Mock Final with Solutions
Spring 2007
1. Let M be a DFA with n states such that L(M ) is infinite. Then L(M ) contains a string of length at
most 2n − 1.
Solution: True.
2. Let Lw = {< M >| M is a TM and M accepts w}, where w is some fixed string. Then there is an
enumerator for Lw .
Solution: True.
4. For a TM M and a string w, let CHM,w = {x | x is an accepting computation history for M on w}.
Then CHM,w is decidable.
Solution: True.
5. The language {< M, w >| M is a linear bounded automaton and M accepts w} is undecidable.
Solution: False.
Solution: True.
9. There is a bijection between the set of Turing-recognizable languages and the set of decidable languages.
Solution: True.
288
Problem 2: Classification (20 points)
For each language L described below, classify L as
• C: Any language satisfying the information must be context-free, but not all languages satisfying the
information are regular.
• DEC: Any language satisfying the information must be decidable, but not all languages satisfying the
information are context-free.
• NONDEC: Not all languages satisfying the information are decidable. (Some might be only Turing
recognizable or perhaps even not Turing recognizable.)
For each language, circle the appropriate choice (R, C, DEC, or NONDEC). If you change your
answer be sure to erase well or otherwise make your final choice clear. Ambiguously marked answers
will receive no credit.
1. R C DEC NONDEC
L = {< M >| M is a linear bounded automaton and L(M ) = ∅}.
Solution: NONDEC
2. R C DEC NONDEC
L = {wwR w | w ∈ {a, b}∗ }.
Solution: DEC
3. R C DEC NONDEC
L = {w | the string w occurs on some web page indexed by Google on May 3, 2007}
Solution: R
4. R C DEC NONDEC
L = {w | w = x#x1 #x2 #...#xn such that n ≥ 1 and there is some i for which x 6= xi }.
Solution: C
5. R C DEC NONDEC
L = {ai bj | i + j = 27 mod 273}
Solution: R
6. R C DEC NONDEC
L = L1 ∩ L2 where L1 and L2 are context-free languages
Solution: DEC
7. R C DEC NONDEC
L = {< M >| M is a TM and L(M ) is finite}
Solution: UNDEC
289
8. R C DEC NONDEC
L = L1 − L2 where L1 is context-free and L2 is regular.
Solution: C
9. R C DEC NONDEC
L = L1 ∩ L2 where L1 is regular and L2 is an arbitrary language.
Solution: NONDEC
Solution:
First, a regular expression for the set of strings that have no consecutive 1’s is:
R = 0∗ .(10.0∗ )∗ .( + 1).
The regular expression for the set of strings that contain at most one pair of consecutive 1’s is just:
R.R = 0∗ .(10.0∗ )∗ .( + 1). 0∗ .(10.0∗ )∗ .( + 1)
since a string w has at most one pair of consecutive 1’s iff w = xy, for some x, y, where x and y have no pair of consecutive
ones.
There are, of course, many solutions to this question; whatever your solution is, be careful to check border cases. For example,
check if you allow initial 0’s, final 0’s, allow 1 to occur in the beginning and the end, allow , and also test your expression
against some random strings, some in the language and some outside it.
2. Let M be a DFA. Sketch an algorithm for determining whether L(M ) = Σ∗ . Do not generate all strings
(up to some bound on length) and feed them one-by-one to the DFA. Your algorithm must manipulate
the DFA’s state diagram.
Solution:
If M does not accept all of Σ∗ , then there must be a word w 6∈ L(M ). On word w, M would reach a unique state
which is non-final. Also, if there is some way to reach a non-final state from the initial state, then clearly the DFA does
not accept the word that labels the path to the non-final state. Hence it is easy to see that the DFA M does not accept
Σ∗ iff there is a path from the initial state to a non-final state (which can be the initial state itself).
The algorithm for detecting whether L(M ) = Σ∗ proceeds as follows:
1. Check whether the input is a valid DFA (in particular, make sure it is complete; i.e. from any state, on any
input, there is a transition to some other state).
2. Consider the transition graph of the DFA.
3. Do a depth-first search on this graph, searching for a state that is not final.
4. If any non-final state is reached on this search, report that M does not accept Σ∗ ; otherwise report that M accepts
Σ∗ .
An alternate solution will be to flip the final states to non-final and non-final states to final (i.e. complementing
the DFA), and then checking the emptiness of the resulting automaton by searching for a reachable final state from the
initial state.
290
Solution:
The grammar G0 = (V 0 , Σ, R0 , S 0 ) where
• V 0 = V ∪ {S 0 } where S 0 is a new variable, not in V ;
• R0 consists of all rules in R as well as the following rules:
– One rule S 0 −→ aS 0 a, for each a ∈ Σ
– The rule S 0 −→ S
Intuitively, S 0 is the new start variable, and for any word wxwR ∈ L0 , S 0 generates it by first generating the w and
wR parts, and then calling S to generate x.
a cba
2. Let P = { ac aa , b , } be an instance of the Post correspondence problem. Does P have a match?
Show a match or explain why no match is possible.
Solution: ˆ a ˜ ˆ a ˜ h cba i ˆ a ˜
Yes, the PCP has a match: aa c b
, aa
The top word and the bottom word are both aacbaa.
For each of the languages below you can apply the pumping lemma either to prove that the language is
non-regular or to prove that the language is not context-free. Assuming that p is the pumping length, your
goal is to give a candidate string for wp that can be pumped appropriately to obtain a correct proof. You
ONLY need to give the string and no further justification is necessary. Assume Σ = {a, b} unless specified
explicitly.
Solution:
wp = a bba
p p
(b) L = {w | w contains at least twice as many a’s as b’s}. To show L is not regular using the pumping
lemma. Solution:
wp = b a
p 2p
Give the state diagram of a TM M that does the following on input #w where w ∈ {0, 1}∗ . Let n = |w|. If
n is even, then M converts #w to #0n . If n is odd, then M converts #w to #1n . Assume that is an even
length string.
The TM should enter the accept state after the conversion. We don’t care where you leave the head at
the end of the conversion. The TM should enter the reject state if the input string is not in the right format.
However, your state diagram does not need to explicitly show the reject state or the transitions into it.
Solution:
The Turing machine’s description is:
291
In the diagram, the initial state is qin , and the accept state is qacc ; the reject state is not shown, and we assume that all
transitions from states that are not depicted go to the reject state. We are assuming that the blank tape-symbol is #.
Intuitively, the TM first reads the tape content w, moving right, alternating between states q0 and q1 in order to determine
whether |w| is even or odd. If |w| is even, it ends in state q0 , moves left rewriting every letter in w with a 0, till it reaches the
first symbol on the tape, and then accepts and halts. If |w| is odd, it does the same except that it rewrites letters in w with
1’s.
292
δ 0 (P, a) = {q ∈ Q | ∃p ∈ P, q ∈ E(δ(p, a))}
where
E(R) = {s | s can be reached from some state in R using zero or more -edges}
is the epsilon-closure of the set R.
Solution:
ALLCF G = {hGi | G is not the encoding of a CFG, or, G is a CFG and L(G) 6= Σ∗ }.
First, recall that membership of a word in the language generated by a grammar is decidable, i.e.. for any grammar
G and any word w, we can build a TM that decides whether G generates w.
Also, for any Σ, we can fix some ordering of symbols in Σ, and enumerate all words in Σ∗ in the lexicographic
ordering. In particular, we can build a TM that can construct the i’th word in Σ∗ for any given i.
We can build a TM recognizing ALLCF G as follows:
1. Input is hGi.
2. Check if hGi is a proper encoding of a CFG; if not, halt and accept.
3. Set i := 1;
4. while (true) do {
5. Generate the i’th word wi in Σ∗ , where Σ is the set of terminals in G.
6. Check if wi is generated by G. If it is not, halt and accept.
7. Increment i;
8. }
The TM above systematically generates all words in Σ∗ and checks if there is any word that is not generated by G;
if it finds one it accepts hGi.
Note that if hGi is either not a well-formed grammar or L(G) 6= Σ∗ , then the TM will eventually halt and accept.
If L(G) = Σ∗ , the TM will never halt, and hence will never accept.
Solution:
ALLCF G is not Turing-recognizable. We know that ALLCF G is not Turing-decidable (the universality problem
for context-free grammars in undecidable), and we showed above that ALLCF G is Turing-recognizable. Also, for any
language R, if R and R are Turing-recognizable, then R is Turing-decidable. Hence, if ALLCF G was Turing-recognizable,
then ALLCF G would be Turing-decidable, which we know is not true. Hence ALLCF G is not Turing-recognizable.
Solution:
The idea would be to construct the PDA M 0 which will essentially simulate M , and from any of the final states
of M , nondeterministically jump on an -transition to a new state that checks whether the rest of the input is of even
length.
Solution:
M 0 = (Q0 , Σ, Γ, δ 0 , q, {p0 }) where Q0 = Q ∪ {p0 , p1 } where p0 and p1 are two new states (that are not in Q), and the transition
function δ 0 : Q0 × Σ × Γ → P(Q0 × Γ ) is defined as follows:
293
• For every q ∈ Q,
δ 0 (q, , ) = δ(q, , ) ∪ {(p0 , )}, if q ∈ F , and
δ 0 (q, , ) = δ(q, , ), if q 6∈ F .
• For every a ∈ Σ
δ 0 (p0 , a, ) = {(p1 , )}, and δ 0 (p1 , a, ) = {(p0 , )}.
• For every a ∈ Σ , d ∈ Γ , where either a = or d 6= ,
δ 0 (p0 , a, d) = ∅, and δ 0 (p1 , a, d) = ∅.
Solution:
Let AT M = {hM, wi | M is a TM that accepts w}. We know that AT M is undecidable. Let us reduce AT M to ODDT M
to prove that ODDT M is undecidable. In other words, given a decider R for ODDT M , let us show how to build a decider for
AT M .
The decider D for AT M works as follows:
1. Input is hM, wi.
2. Construct the code for a TM NM,w that works as follows:
. a. Input to NM,w is a word x
. b. Simulate M on w; accept x if M accepts w.
3. Feed NM,w to R, the decider of ODDT M .
4. Accept if R rejects; reject if R accepts.
For any TM M and word w, if M accepts w, then NM,w is a TM that accepts all words, and if M does not accept w then
NM,w accepts no word. Hence M accepts w iff L(NM,w ) contains an odd-length word.
Hence D accepts hM, wi iff R rejects hNM,w i iff L(NM,w ) contains an odd-length word iff M accepts w. Hence D decides
AT M .
Note that D does not simulate M on w; it simply constructs the code of a TM NM,w (which if run may simulate M on w).
So D simply constructs NM,w and runs R on it. Since R is a decider, it halts always, and hence D also always halts.
294
Since AT M reduces to ODDT M , i.e. any decider for ODDT M can be turned into a decider for AT M , we know that if
ODDT M is decidable, then AT M is decidable as well. Since we know AT M is undecidable, ODDT M must be undecidable.
295
49.6 Quiz 1
6 February 2008
This quiz has 3 pages containing 7 questions. None requires a long answer. No proofs are required;
explanations are only required when explicitly requested. Ensure your answers are legible. You have 20
minutes to finish.
1. (2 points) To formally define a DFA, what five components do you need to specify?
2. (3 points) Suppose that A = {aa, bb} and B = {1, 2}. List the members of B × P(A).
3. (2 points) Is the following a valid state diagram for a DFA? Explain your answer.
A
0 1
0
B D
0
1 1
C
Q0 a Q1 a Q2 ǫ Q3
a ǫ b
Q4 b Q5
Suppose the transition function is named δ. Fill in the following output values for the transition
function:
(a) δ(Q0, a) =
(b) δ(Q4, a) =
(c) δ(Q4, ) =
296
5. (5 points) Give the state diagram of an NFA which recognizes the language represented by the regular
expression (a + )(ba)∗ ba. (It’s not necessary to follow any specific mechanical construction.)
6. (3 points) Give a regular expression for the following language
297
49.7 Quiz 2
5 March 2008
This quiz has 3 pages containing 7 questions. None requires a long answer. No proofs are required;
explanations are only required when explicitly requested. Ensure your answers are legible. You have 20
minutes to finish.
1. (2 points) Given a context-free grammar G with start symbol S, generating language L, explain how
to construct a grammar for L∗ .
2. (5 points) Finish the following statement of the pumping lemma.
If L is a regular language, there is an integer p such that, if w is any string in L of length at least p,
then we can divide w into substrings w = xyz such that
(1) |x| ≥ 1
(2)
(3)
3. (3 points) Suppose that δ is the transition function for a PDA. What types of objects does δ take as
input? What type of object does δ produce as its output value?
4. (4 points) Let our alphabet be Σ = {a, b, c}. Give a context-free grammar that generates L = {an bj cn |
n ≥ 0, j ≥ 1}. Just give the rule(s) for each grammar. Your start symbol should be S and all variables
should be uppercase letters.
5. (2 points) Let Σ = {0, 1}. Let L = {0n x1n | x ∈ Σ∗ , n ≥ 1}. Is L regular? Why or why not?
6. (5 points) Suppose we know that the language B = {an bn | n ≥ 0} is not regular. Let L = {an bj cn |
j ≥ 1, n ≥ 1}. Prove that L is not regular using closure properties and the fact that B isn’t regular.
7. (4 points) Let G be the grammar with start symbol S and the following rules:
S → AS | B
A → ab | b
B→c
Give a parse tree and a leftmost derivation for the string abbc.
298
49.8 Quiz 3
23 April 2008
This quiz has 6 questions. None requires a long answer. No proofs are required; explanations are only
required when explicitly requested. Ensure your answers are legible. You have 20 minutes to finish.
(c) There are more possible languages than there are different Turing machines.
2. (4 points) Suppose that a Turing machine M contains the transition δ(q, 0) = (r, 2, R) (q and r are
states; 0 and 2 are tape symbols.) If M is now in configuration 021q03, what will its next configuration
be?
3. (3 points) An LBA is a Turing machine with one important restriction. What’s the restriction?
4. (4 points) Briefly explain why the following statement is true. (Don’t give a proof, just outline the key
ideas.)
5. (4 points) Suppose that an extended TM is like a normal TM, except its transitions have the option
of staying in place (S) rather than always having to move left or right. Show how to simulate the
transition δ(s, a) = (r, b, S) using transitions of a normal TM. (Either write out the transitions as
equations or draw a fragment of a state diagram.)
6. (4 points) Let M be a TM and w be a string. Define a TM Mw (with input alphabet Σ) as follows:
• Input = x
• If x is a palindrome, then accept.
• Otherwise, simulate M on w.
• Accept if M accepts. Reject if M rejects.
299
Part IV
Homeworks
300
Chapter 50
Spring 2009
302
Show why the above induction argument is wrong. In particular, illustrate one set for which the
inductive argument fails.
4. True/false/whatever.
[Category: Notation, Points: 20]
Answer each of the following with true, false or meaningless. The notation \ denotes “set-minus” or
“set-difference”.
D1) ∅ ∈ {∅, 1}
D2) ∅ ⊆ {∅, 1}
D3) 1 ⊆ {∅, 1}
D4) {1} + {1, 2} = {1, 2}
D5) {1, 2} \ {1} = {2}
D6) {1, 2} \ {0} = {1, 2}
D7) {1, 2} ∩ {3, 4} = {}
D8) {1, 2} ∩ {3, 4} = {∅}
D9) {1, 2} ∪ {1, 3} = {1, 1, 2, 3}
D10) {1, 2} ∪ {1, 3} = {1, 2, 3}
D11) {1, {1}, {{1}}} = {1}
D12) {1, {1}, {{1}}} = {1, 1, 1}
D13) {1} ∈ {1, {1}, {{1}}}
D14) {1} ⊆ {1, {1}, {{1}}}
D15) {{1}} ∈ {1, {1}, {{1}}}
303
D16) {A, B} × {C, D} = {(A, B), (C, D)}.
D17) {A, B} × {C, D} ∩ {C, D} × {A, B} = {}.
D18) |{A, B, C} × {D, E}| = 6.
D19) {A, B} × {} = {A, B}.
D20) {A, B} \ {B, A} = {}.
5. Getting to 100.
[Category: Proof, Points: 10]
(Extra credit.)
We are given one copy of every digit in the list 0, 1, 2, 3, 4, 5, 6, 7, 8, 9. You are asked to form numbers
from these digits (you can use each digit only once, and you must use each digit once), such that the
sum of these numbers add up to 100.
For example, a valid set of numbers you can form from the digits is {23, 17, 40, 5, 6, 8, 9} but it adds
up to 108, not 100.
Prove that this is impossible; i.e. no matter how you form the numbers, you can not find a way for
them to add up to exactly 100.
1. DFA building
[Category: Construction, Points: 10]
Let Σ = {a, b}. Let L be the set of words w in Σ∗ such that w has an even number of b’s and an odd
number of a’s and does not contain the substring ba.
(a) [1 Points] Give a regular expression for the language of all strings over Σ that do not contain ba
as a substring.
(b) [1 Points] Give a regular expression for the language L above (shorter expression is better, natu-
rally).
(c) [8 Points] Construct a deterministic finite automaton for L that has at most 5 states. Make sure
that your DFA is complete (has transitions from all actions on all states).
If you find this hard, you can also give a DFA with more states that accepts L for partial credit.
Hint: You may want to try to enumerate some elements in L, and see whether you can simplify
the description of L.
2. Control.
[Category: Construction, Points: 10]
We want to build the controller for an automatic door, that opens for some time when a sensor senses
that a person is approaching the door.
Let Σ = {tick , approach, open, close}
Every odd event is a “tick” or an “approach”, and every even event is the controller’s response to it,
“open” or “close”. An “open” event directs the door to open if it is closed, and to remain open if it
is open. Similarly, a “close” event directs the door to close or remain closed. An “approach” event
happens when the the sensor detects that a person is approaching the door, and a “tick” event denotes
the passage of one second of time. We want the door to open immediately after detecting an “approach”
event, and remain open for exactly 2 ticks after the last “approach” event, at which point the door
304
must close. (Thus, the input has even length and is made out of pair of word. Each pair is a command,
followed by current status report.)
Build an automaton that accepts all valid sequences of this controller behavior. Your automaton should
be over the alphabet Σ, and, for example, must accept
tick .close.approach.open.tick .open.tick .open.tick .close,
and accepts
approach.open.tick .open.approach.open.tick .open.tick .open.tick .close.
But it rejects the word
tick .open,
and rejects the word
tick .close.approach.open.tick .open.tick .close.
3. Recursive definitions
[Category: Construction, Points: 10]
Consider the following recursive definition of a set S.
(a) (1, 58) ∈ S
(b) If (x, y) ∈ S then (x + 2, y) ∈ S
(c) If (x, y) ∈ S then (−x, −y) ∈ S
(d) If (x, y) ∈ S then (y, x) ∈ S
(e) S is the smallest set satisfying the above conditions.
Give a nonrecursive definition of the set S. Explain why it is correct.
4. Set Theory
[Category: Proof, Points: 10]
For any two arbitrary sets X and Y , we have that (X \ (X ∩ Y )) ∩ (Y \ (Y ∩ X)) is an empty set.
(a) Explain informally why this is true, using words and/or a Venn diagram.
(b) Prove it formally.
5. No such thing.
[Category: Proof, Points: 10]
(Extra credit.)
Let Lsame be the language of all strings over {0, 1, 2} that have the same number of 0s as the sum
of the number of 1s and 2s. Provide a formal (correct) proof that no finite automata (i.e., DFA) can
accept Lsame .
305
2. Express yourself
[Category: Category: Comprehension, Points: 10]
3. Modifying DFAs
[Category: Comprehension, Points: 5+5]
Suppose that M = (Q, Σ, δ, q0 , F ) and N = (R, Σ, γ, r0 , G) are two DFAs sharing the common alphabet
Σ = {a, b}.
(a) Define a new DFA M 0 = ((Q ∪ {qx }) × {0, 1}, Σ ∪ {#}, δ 0 , (q0 , 0), F 0 ) whose transition function is
defined as follows
(δ(q, t), i) q ∈ Q and t ∈ Σ
δ 0 ((q, i), t) = (q0 , 1) q ∈ F, i = 0, t = #
(qx , i) otherwise.
and where (qj , i) ∈ F 0 iff qj ∈ F and i = 1
Describe the language accepted by M 0 in terms of the language accepted by M .
(b) Show how to design a DFA N 0 over Σ ∪ {#} that accepts the language
n o
L0 = x#y#z x ∈ L(M ) , y ∈ L(N ) and z ∈ a∗
Define your DFA formally using notation similar to the definition of M 0 in part (a).
4. Multiple destinations
[Category: Proof, Points: 7+3]
Let L = aa∗ + bb∗ .
(a) Prove that any DFA accepting L must have more than one final state.
(b) Show that L is acceptable by an NFA with only one final state.
306
5. Equality and more.
[Category: Construction, Points: 10]
Let Σ = {0, 1, $}. For any n ∈ N, let the language Ln be:
n o
∗
Ln = w1 $w2 w1 , w2 ∈ {0, 1} , |w1 | = |w2 | = n, and w1 = w2 .
Argue,as precisely as you can (a proof would be best), that L is not regular.
(d) We can express the language L as ∪∞ k=1 Lk . It is tempting to reason as follows: The language L
is the union of regular languages, and hence is regular.
What is the flaw in this argument, and why is it wrong in our case?
6. How big?
[Category: Proof, Points: 10]
(Extra credit.)
Let Σ = {0, 1, $}, and consider the language
n o
∗
Ln = w1 $w2 w1 , w2 ∈ {0, 1} , |w1 | = |w2 | = n, and w1 = w2 .
(i) Prove that any DFA accepting Ln must have at least 2(2n+1 − 1) states.
(ii) Prove that any NFA accepting Ln must have at least 2(2n+1 − 1) states.
1. NFA interpretation.
[Category: Notations., Points: 10]
Consider the following NFA M .
a b
A B E
b ǫ
a
F
ǫ
(a) Give a regular expression that represents the language of M . Explain briefly why it is correct.
Note: You needn’t go through the process of converting this to a regular expression using GNFAs;
you can guess (correctly) the regular expression.
(b) Recall the definition of an NFA accepting a string w (Sipser p. 54). Show formally that M accepts
the string w = abba
307
(c) Let Σ = {a, b}. Give the formal definition of the following NFA N (in tuple notation). Make sure
you describe the transition function completely (for every state and every letter).
B
a a b ǫ
a a
A C E
b
b, ǫ
2. NFA to DFA.
[Category: Construction., Points: 10]
Convert the following NFA to an equivalent DFA (with no more than 10 states). You must construct
this DFA using the subset-construction as done in class. Draw the DFA diagram, and also write down
the DFA in tuple notation (there is no need to include states that are unreachable from the initial
state).
a, b C
b
a
A a D a
ǫ
b b
308
– removing states p, q, s in that order.
Provide detailed drawing of the GNFA after each step in this process.
Note, that in this problem you will get interesting self-loops. For example, one can travel to from q to
p and then back q. This creates a self-loop at q when p is removed.
5. NFA to Regex by other means.
[Category: Proof., Points: 10]
(Extra credit.)
There is another technique one can use to compute a regular expression from an NFA. As a concrete
example, consider the following NFA M seen in class (lecture 8, the examples section).
a
A B
b
b
a
C
a, b
Consider, for each state in the above automata, the language that this automata would accept if we
set this state to be the initial state. The language accepted from state A, denoted by L(A), which we
will write as A to make the notations cleaner. Clearly, a word in A is either a followed by a word that
is in B (the language the automaton accepts when B is the initial state), or its b followed by a word
in C (the language the automaton accepts with C as initial state). We can write this observation as an
equation over the three languages:
A = aB ∪ bC.
As a concrete example, in the above automata, C is a final state, which implies that ∈ C. As such,
by the above equation, A must contain the word b = b ∈ bC. Now, since A is the initial state of M ,
it follows that L(M ) = A. This implies that b ∈ L(M ). (Thats a very indirect way to see that, but it
would be useful for us shortly.)
(A) Write down the system of three equations for the languages in the above automata. (Note, that
one gets one equation for each state of the NFA.)
(B) Let r, s be two regular expression over some alphabet Σ, and consider an equation of the form
D = rD ∪ s.
let D be the minimal language that satisfies this equation. Give a regular expression for the
language D. Prove your answer.
(C) For a character a ∈ Σ, what is the smallest language E satisfying the equation E = aE? Prove
your answer.
309
(D) The above suggests a way to get the regular expression for a language. Take the system of
equations from (A), and eliminate from it (by substitution) the variable B. Then, eliminate from
it the variable C. After further manipulation, you remain with an equation of the form
A = something,
where “something” does not contain any of our variables (i.e., B, and C). Now, convert the right
side into a regular expression describing all the words in A.
Carry out this procedure in detail for the above NFA M (specifying all the details in your solution).
What is the regular expression of the language of M that results from your solution?
since for any w ∈ L0 , we have that h(w) ∈ L. Note, that the inverse homomorphism is not a
unique mapping. A word w ∈ L might have several inverse words w1 , . . . , wj ∈ L0 , such that
h(w1 ) = h(w2 ) = · · · = h(wj ) = w. For example, for the word a6 ∈ L, we have h−1 (a6 ) =
{000, 001, 010, 011, 100, 101, 110, 111}, since, for example, h(000) = h(101) = a6 .
Similarly, it might be that a word w ∈ L has no inverse in h−1 (L). For example, aaa ∈ L, but has no
inverse in L0 , since h(w) is always an even length word.
To prove that if L is regular then h−1 (L) is regular, assume you are given a DFA D such that L = L(D).
Show how to modify this DFA D = (Q, Γ, δ, q0 , F ) into a DFA C for h−1 (L). Describe formally the
construction, and prove formally that h−1 (L) = L(C). (Hint: If w = w1 w2 . . . wk ∈ h−1 (L) then
h(w) = h(w1 )h(w2 ) . . . h(wk ) ∈ L.)
For example, for the languages show above, we have the following two DFAs.
0, 1
q0 a q1 a q2
q0 q1 q2
0, 1 0, 1
a
n o
i
DFA for L == (aaa) i ≥ 0 DFA for L0 = h−1 (L)
310
This implies that regular languages are closed under inverse homomorphism.
(a) Give a direct proof (without using the pumping lemma) that Lpal , the language of all palindromes
over the alphabet {a, b} is not regular. Your proof should show that any DFA for this language
has an infinite number of states.
(b) (Hard?) Prove using only closure properties that Lpal is not a regular languages.
q0 a q1 a q2
a
is the following grammar
2 Which is the longest palindromic word in the Oxford English Dictionary is tattarrattat, coined by James Joyce in Ulysses
for a knock on the door.
311
⇒ X0 → aX1 |
(G) X1 → aX2
X2 → aX0
Prove that in general this construction works. That is, prove formally that for any DFA D, we have
that the language of LG = L(G(D)) is equal to L(D).
(Hint: Do not use induction in your proof. Instead, argue directly about accepting traces and deriva-
tions of words.)
(a) Is abcd in the language of the grammar? If so give an accompanying derivation and parse tree.
(b) Is acaada in the language of the grammar, if so give an accompanying derivation and parse tree.
(c) What is the language generated by the grammar and explain your answer.
(Note: assume Σ = {a, b, c, d} and the start symbol is S for both grammars.)
S → aSd | A | C
S → B | AA
A → aAc | B
G1 : G2 : A → cA | dB
B → bBc |
B → aSa | .
C → bCd | B
312
(Q3) Prove this.
[Category: Proof, Points: 10]
Lemma 50.6.1 If L1 and L2 are both context-free languages, then L1 ∪ L2 is a context-free language.
Proof: Let G1 = (V1 , Σ, R1 , S1 ) and G2 = (V2 , Σ, R2 , S2 ) be context free grammars for L1 and L2 ,
respectively, where V1 ∩ V2 = ∅. Create a new grammar
G = (V1 ∪ V2 , Σ, R, S) ,
n o
where S ∈
/ V1 ∪ V2 and R = R1 ∪ R2 ∪ S → S1 , S → S2 .
We next prove that L(G) = L1 ∪ L2 .
L(G) ⊆ L1 ∪ L2 :
Consider any w ∈ L(G), and any derivation of w by G. It must be of the following form:
S → Si → X1 X2 → . . . → w,
where i is either 1 or 2. Assume, without loss of generality, that i = 1, and observe that if we
remove the first step, this derivation becomes
S1 → X1 X2 → . . . → w.
∗
Namely, S1 =⇒ w using grammar rules only from R1 . We conclude that w ∈ L(G1 ) = L1 , as S1
is the start symbol of G1 .
The case i = 2 is handled in a similar fashion.
Thus, we conclude that w ∈ L1 ∪ L2 , implying that L(G) ⊆ L1 ∪ L2 .
L1 ∪ L2 ⊆ L(G):
Consider any word w ∈ L1 ∪ L2 , and assume without limiting generality, that w ∈ L1 . As such,
∗
we have that S1 =⇒ w. But S → S1 is a rule in G, and as such we have that
G1
∗
S → S1 =⇒ w.
G1
∗
Namely, S =⇒ w, since all the rules of G1 are in G. We conclude that w ∈ L(G).
G
Provide a detailed formal proof to the following claim, similar in spirit and structure to the above
proof.
Lemma 50.6.2 If L! and L2 are both context-free languages, then L1 L2 is a context-free language.
The edit distance between two strings w and w0 , is the minimal number of edit operations one has
to do to modify w into w0 . We denote this distance between two strings x and y by EditDist(x, y). We
allow the following edit operations: (i) insert a character, (ii) delete a character, and (iii) replace a
character by a different character.
313
For example, the edit distance between shalom and halo is 2. The edit distance between har-peled
and sharp␣eyed is 4:
For the sake of simplicity, assume that Σ = {a, b, $}. For a parameter k, describe a CFG for the
language
n o
∗
Lk = x$y R x, y ∈ {a, b} , EditDist(x, y) ≤ k .
For example, since EditDist(aba, bab) = 2, we have that aba$bab ∈ L2 , but aba$bab ∈
/ L1 . Similarly,
EditDist(aaaa, abb) = 3, and as such aaaa$bba ∈ L3 , but aaaa$bba ∈
/ L2 .
(Hint: What is the language L0 ? Try to give a grammar to L1 before solving the general case.)
Provide a short argument why your CFG works.
Assume you are given a CFG G = (V, Σ, R, S), such that any word w ∈ L(G) has a derivation of w with
f (n) steps, where n = |w|. Here f (n) is some function.
Prove the following claim.
Claim 50.6.3 There exists a grammar G 0 such that L(G) = L(G 0 ), and furthermore, any derivation
of a word w ∈ L(G) of length n requires at most df (n)/2e derivation steps.
S0 → aaS0 bb | ab | .
314
(Q2) Understanding Recursive Automata.
[Category: Understanding, Points: 10]
For the following recursive automaton with initial module S, give the language of the automata precisely.
a, b
p0 S′ p1
S:
q1 S′ q2
a a
q0 b q7 S′
b q
q6 3
# ǫ
q4
S′ : a, b
315
(Q5) Shuffle
[Category: Proof, Points: 10]
For a given language, L, let
n o
Shuffle(L) = w y ∈ L, |y| = |w| and, w is a permutation of letters in y .
For instance if L = {ab, ada}, then Shuffle(L) = {ab, ba, aad, ada, daa}.
Prove that if L is a regular language, then Shuffle(L) is not necessarily a CFL. In other words, prove
that the statement “For every regular language L, the language Shuffle(L) is a CFL.” is false.
(Eg. The word “(¬(0 ∨ 1)) = 0” is in L but the word (¬(0 ∨ 1)) = 1 is not in L.)
Construct a recursive automaton for L and briefly describe why it works.
(b) For the string w = (0 ∨ 1) = 1 show the run of your automaton on it, including stack contents at
each point of the run (i.e., list the sequence of configurations in the accept trace for w for your
RA). Use the formal definition given in the lecture notes.
(Q2) Recursive Automata with Finite Memory.
[Category: Proof, Points: 10]
You are working on a computer, which has a limited stack size of (say) 5. You know this means that
you can have a call depth of at most 5 recursive calls.
(a) Argue that the language accepted by any RA on this machine is regular.
More formally, given a RA
n o
D = M, main, (Qm , Σ ∪ M, δm , q0m , Fm ) m ∈ M ,
316
50.9 Homework 9: Turing Machines
Spring 09
(Q2) TM Encoding.
[Category: Comprehension, Points: 10]
In this problem we demonstrate a possible encoding of a TM using the alphabet {0, 1, ;} where ; is used
as a seperator. We encode M = (Q, Σ, Γ, δ, q0 , qacc , qrej ) as a string representing |Q|, |Σ|, |Γ|, q0 , qacc ,
qrej and then the δ. Each of the first five quantities is represented using a numbering system where n
is represented by a 1 followed by n 00 s.
Thus is |Q| = n, this means we have n states numbered 1 to n.
If |Σ| = n , this means we have n symbols in the alphabet. We adopt the convention that these symbols
are 0 to n − 1 where each can be represented using the numbering system mentioned above.
317
We use a similar scheme for Γ with the restriction that the blank symbol is assigned the largest number.
The remaining string represents δ as follows. Each transition (q, a) → (q 0 , b, D) is represented as the
concatenation of the represntation of each of the 5 quantities and the reresentation of δ is simply the
concatenation of the representation of each transitions. We use the convention that transitions not
mentioned go to the reject state in the encoded machine.
1000000;
1000;
10000;
10;
100000;
1000000;
10; 100; 100; 100; 100;
100; 1; 100; 1; 100;
100; 10; 100; 10; 100;
100; 1000; 1000; 1000; 10;
1000; 1; 100000; 10; 100;
1000; 10; 10000; 1; 10;
10000; 10; 10000; 1; 10;
10000; 100; 100000; 10; 10;
10000; 1; 100000; 10; 10
(a) Draw a state diagram for this TM (omitting the reject state).
An ITM is a special TM with one head and infinite number of tapes (one sided tapes). The tapes are
numbered 0, 1, 2, 3, · · · . The cells on each tape are also numbered from left to right with 0, 1, 2, 3, · · · .
When the machine starts, the head is on cell number 0 of tape number 0. If the head is on cell number
i of tape number j, it can overwrite that cell and move either to cell i + 1 or i − 1 (if i ≥ 1) of tape j
or move to cell i of tape j + 1 or tape j − 1 (if j ≥ 1).
318
0 → 0, R
0 → X, R ␣ → $, R ␣ → 0, R
q5 q8 q9 q2
X → X, R X → 0, L 0 → 0, L
␣ → 0, L
0 → 0, L $ → $, L
q1 q0 q11 q10
$ → $, L
X → X, R 0 → 0, L
$ → $, R 0 → 0, R X → X, R
(Q1) Enumerators
[Category: Construction, Points: 10]
(a) Design an enumerator that will list all positive integral solutions to a polynomial inequality in
three variables, which will be given as input on the input tape of the Turing machine.
I.e., you need to give all positive integral solutions to a given inequality P (x, y, z) ≥ c. An example
of such an inequality is 2x2 y 2 + xy + z ≥ 5.
You may use pseudo code to describe your solution.
(b) Modify the enumerator above to give all integral (positive or negative) solutions.
L2 = {hM, x, ii : M (x) =“yes” and head of M will use only the first i cells of its tape}
319
(c) Explain how to build an enumerator for L3 :
(Q4) I am a liar.
[Category: Puzzle, Points: 4]
A town has two kinds of people, visually indistinguishable, called Grubsies and Greepies. Greepies
always tell the truth; Grubsies always lie (names can be deceiving, you see).
You come to the town, and chance upon a person (who could be a Grubsy or a Greepy) and you want
to find out whether a particular road leads to Wimperland (assume all people in the town know the
answer to the question).
Find a single YES/NO question that you can ask the person so that you can figure out whether the
road leads to Wimperland.
Grubsies and Greepies are fictional; any resemblance to person or persons living or dead or undead is
purely coincidental.
Hint: think of the diagonalization proof of undecidability of the membership problem for Turing ma-
chines.
3 See http://en.wikipedia.org/wiki/Invisible_Pink_Unicorn.
320
50.12 Homework 12: Preparation for Final
Spring 09
1. Language classification.
[Category: Understanding, Points: 10]
Suppose that we have a set of Turing machine encodings defined by each of the following properties.
That is, we have a set
n o
L = hM i M is a TM and M has property P ,
and we are considering different ways to fill in P . Assume that the Turing machines M have only a
single tape.
For each of these languages, determine whether it is Turing decidable, Turing recognizable, or not
Turing recognizable. Briefly justify your answers.
2. Reduction I.
[Category: Construction, Points: 10]
Define the language L to be
n o
L = M M is a TM and L(M ) is decidable but not context free .
Show that L is undecidable by reducing ATM to L. (Do the reduction directly. Do not use Rice’s
Theorem.)
3. Reduction II
[Category: Construction, Points: 10]
Define the language L to be
n o
L = M M is a TM and ∀n ∃x ∈ L(M ) where |x| = n .
Show that L is undecidable by reducing ATM to L. (Do the reduction directly. Do not use Rice’s
Theorem.)
4. Enumerate this.
[Category: Construction, Points: 10]
Construct an enumerator for the following set:
n o
L = hTi T is a Turing Machine and |L(T)| ≥ 3 .
321
5. DFAs are from Mars, TMs are from Venus.
[Category: Understanding / Proof., Points: 10]
Consider the language n o
L = hD, wi D is a DFA and it accepts w .
Proof: For every hD, wi, let TD be a TM that simulates DFA D. So D will accept w
iff TD (w) halts and accepts w, which is exactly equivalent to hTD , wi ∈ ATM ; that
is,
hD, wi ∈ L ⇐⇒ hTD , wi ∈ ATM .
This completes the reduction. But since ATM is undecidable, L should be unde-
cidable too.
Why this result is strange? Is it because we did something wrong in the above proof? Is this proof
correct? Explain briefly.
6. Using reductions when building algorithms.
[Category: Construction., Points: 10]
Reductions are not only a technique to prove hardness of problems, but more “importantly” it is used
to solve problems by transforming them into other problems that we know how to solve. This is an
extremely useful technique, and the following is an example of such a reduction in the algorithmic
context.
A set of domino pieces has a solution iff one can put them around
a circle (face up), such that every two adjacent numbers from different
pieces match. The figure on the right shows a set of pieces which has a
solution.
Assume we have an algorithm (i.e., think of it as a black box) is-
Domino which given a set of dominos, can tell use whether they have a
solution or they do not have a solution (it returns “yes” or “no” answer).
Using isDomino, describe an algorithm which given a set of dominos
outputs “no” if there is no solution and if there is a solution, prints out
one possible solution (that is, prints out an ordered list of input dominos,
such that one can put them in the same order around a circle properly).
For example if the input to your algorithm is (1, 2)#(3, 5)#(4, 3)#(1, 3)#(1, 1)#(2, 4) #(3, 7) #(7, 5),
it may output (1, 3)#(3, 7)#(7, 5)#(3, 5)#(4, 3)#(2, 4)#(1, 2)#(1, 1) (which is the solution depicted
in the above figure).
(We emphasize that your solution must use isDomino to construct the solution. These is a direct
algorithm to solve this problem, but we are more interested in the reduction here, than in an efficient
solution.)
322
Chapter 51
Spring 2008
C = {ba | a ∈ A, b ∈ B}.
(a) (3, 5) ∈ S
(b) If (x, y) ∈ S, then (x + 2, y) ∈ S
(c) If (x, y) ∈ S, then (−x, y) ∈ S
(d) If (x, y) ∈ S, then (y, x) ∈ S
(e) S is the smallest set satifying the above conditions.
323
51.2 Homework 2: Problem Set 2
Spring 08
1. Building a DFA.
Let Σ = {0, 1}. Give DFA state diagrams for the following languages.
(c) L = Σ∗ − {}
2. Interpreting a DFA
(a) What is the language of the following DFA? That is, explicitly specify all the strings that the
following DFA accepts. Briefly explain why your answer is correct.
1
F
1
B 0
1
0
0
G
A 1 0
C
0
1 0 E
D 0
1
(b) What about this one? Again, briefly justify your answer.
324
0
B 1
1 0 C
0
A 0
1
1
1
E
0 D
1
G 0
F 0
For example, R = {{3, 2}, 4} and P = {3, {{7, 9}, {8, 3}} are SBBTs. But {2, {4, 5}, 27} and {2, {3}}
are not SBBTs. Let T be the set of all SBBTs.
(a) Let’s define the following function f mapping SBBTs to sets of integers:
S
f (X) = Y ∈X f (Y ) if X is a set
{X} if X is an integer
S
Notice that f (Y ) is always a set, for any input Y . The operation Y ∈X f (Y ) unions together the sets
f (Y ), for all the items Y that are in the set X.
For the SBBTs P and R defined above, compute f (P ) and f (R). Give a general description of what
f does.
(b) Similarly, we can define a function g mapping SBBTs to integers:
P
g(X) = Y ∈X g(Y ) if X is a set
1 if X is an integer
Give the values for g(P ) and g(R), as well as a general description of what g does.
(c) For certain SBBTs, g(X) = |f (X)|. For which SBBTs does this equation work? Explain why it’s
not true in general.
4. Balanced strings
A string over {0, 1} is balanced, if it has the same number of zeros and ones.
325
(a) Provide a pseudo-code (as simple as possible) for a program that decides if the input is balanced.
You can use only a single integer typed variable in your program, and one variable containing the
current character (of type char). You can read the next character using a library function called,
say, get_char, which returns −1, if it reached the end of the input. Otherwise, get_char returns
the next character in the input stream.
In particular, the program prints “accept” if the input is a balanced string, and print “reject”,
otherwise.
(b) For any fixed prespecified value of k, describe how to construct an automata that accepts a
balanced string if, in every prefix of the input string, the absolute difference in the number of
zeros and ones does not exceed k. How many states does your automata needs?
(c) Provide an intuitive explanation of why the number of states in automata for the problem of part
(B), must have at least, say, k states.
(d) Argue, that there is no finite automata that accepts only balanced strings.
For bonus credit, you can provide formal proofs for the claims above.
5. Bonus problem (Coins)
A journalist, named Jane Austen, unfortunately (for her) interviews one of the presidential candidates.
The candidate refuses to let Jane end the interview going on and on about the candidate plans how
to solve all the problems in the world. In the end, the candidate offers Jane a game. If she wins the
game she can leave.
The game board is made out of 2 × 2 coins:
H T
T H
At each round, Jane can decide to flip one or two coins, by specifying which coins she is flipping (for
example, flip the left bottom coin and the right top coin), next the candidate goes and rotates the
board by either 90, 180, 270, or 0 degrees. (Of course, rotation by 0 degrees is just keeping the coins in
their current configuration.)
The game is over when all the four coins are either all heads or all tails. To make things interesting,
Jane does not see the board, and does not know the starting configuration.
Describe an algorithm that Jane can deploy, so that she always win. How many rounds are required
by your algorithm?
326
n o
(c) Lc = w w does not contain two consecutive b’s .
2. Product construction
(a) Consider the following two DFAs. Use the product construction (pp. 45–47 in Sipser) to construct
the state diagram for a DFA recognizing the intersection of the two languages.
b a,b
A a
a
b b E F
a B b C a
(b) When the product construction was presented in class (and in Sipser), we assumed that the two
DFAs had the same alphabet. Suppose that we are given two DFAs M1 and M2 with different
alphabets. E.g. M1 = (Q1 , Σ1 , δ1 , q1 , F1 ) and M2 = (Q2 , Σ2 , δ2 , q2 , F2 ). To build a DFA M that
recognizes L(M1 ) ∪ L(M2 ), we need to add two additional sink states s1 and s2 . We send the first
or the second element of each pair to the appropriate sink state if the incoming character is not in
the alphabet for its DFA.
Write out the new equations for M ’s transition function δ and its set of final states F .
(a) Define a new DFA M 0 = (Q ∪ {qX , qR }, Σ ∪ {#}, δ 0 , q0 , {qX }) whose transition function is defined
as follows
δ(q, t) q ∈ Q and t ∈ Σ
qX q ∈ F, t = #
δ 0 (q, t) =
qX q = qX
qR otherwise.
Define your DFA using notation similar to the definition of M 0 in part (a).
4. Shared channel.
Funky Computer Systems, who have now gone out of business, submitted the lowest bid for wiring
the DFAs supporting the Siebel center classrooms. These idiots wired two of the DFAs M and N so
that their inputs come in on a shared input channel. When you try to submit a string w to M and a
string y to N , this single channel receives the characters for the two strings interleaved. For example,
if w = abba and y = cccd, then the channel will get a string like abbcccad or acbccbad.
327
Fortunately, these two DFAs have alphabets that do not overlap, so it’s possible to sort this out. Your
job is to design a DFA that accepts a string on the shared channel exactly when M and N would have
accepted the two input strings separately.
Specifically, let M = (Q, Σ, δ, q0 , F ) and N = (R, Γ, γ, r0 , G), where Σ ∩ Γ = ∅. Your new machine M 0
should read strings from (Σ ∪ Γ)∗ . It should be designed using a variation on the product construction,
i.e. its state set should be Q × R.
Give a formal definition of M 0 in terms of M and N . Also briefly explain the ideas behind how it
works (very important especially if your formal notation is buggy).
(a) Let Lk be the language of all palindromes of length k, over the alphabet Σ = {a, b}. Show a DFA
for L4 .
(b) For any fixed k, specify the DFA accepting Lk .
(c) Let L be the language of all palindromes over Σ. Argue,as precisely as you can, that L is not
regular.
(d) We can express the language L as ∪∞ k=1 Lk . It’s tempting to reason as follows: the language L is
the union of regular languages, and as such it is regular.
What is the flaw in this argument, and why is it wrong in our case?
328
(b) Recall the definition of an NFA accepting a string w (Sipser p. 54). Show formally that M accepts
the string w = aabb
(c) Let Σ = {a, b}. Give the formal definition of the following NFA N .
a,b
E D
a b
a
A a,b
a
ǫ C
B
That is, each xi is a string of two characters from Σ. And two of the xi ’s need to be identical, but you
don’t know which two are identical. So the language contains ab#bb#cc#ab and ac#bb#ac#ab, but
not aa#ac#bb.
Design an NFA that recognizes L. This NFA should “guess” when it is at the start of each matching
string and verify that its guess is correct.
4. NFA modification.
The 2SWP operation on strings interchanges the character in each odd position with the character in
the following even position. That is, if the string length k is even, the string w1 w2 w3 w4 . . . wk−1 wk
becomes w2 w1 w4 w3 . . . wk wk−1 . E.g. abcbac becomes babcca. If the string has odd length, we just
leave the last (unpaired) character alone. E.g. abcba becomes babca.
n o
Given a whole language L, we define 2SWP(L) to be 2SWP(w) w ∈ L .
Show that regular languages are closed under the 2SWP operation. That is, show that if L is a regular
language, then 2SWP(L) is regular. That is, suppose that L is recognized by some DFA M . Explain
how to build an NFA N which accepts 2SWP(L).
329
Provide detailed drawing of the GNFA after each step in this process.
Note, that in this problem you will get interesting self-loops. For example, one can travel to from B to
A and then back B. This creates a self-loop at B when A is removed.
L = {ak bm : k ≤ m or m ≤ 2k}
L = {00, 11}
From the fact that it satisfies the pumping lemma, can we deduce that L is regular? Why or why
not?
2. Decide of the following languages are regular. If they are regular, give a DFA, NFA or regular expression
for the language. If it is not regular, then give a proof using either closure properties or the pumping
lemma.
3. Let T be the language {0n 1n : n ≥ 0}. Use closure properties to show that the following languages are
not regular, using a proof by contradiction and the fact that T is known not to be regular.
(a) L = {an bm cn+m : n ≥ m ≥ 0}
(b) J = {0n 1n 2n : n ≥ 1}
330
51.7 Homework 7: CFGs
Spring 08
1. Suffix languages.
Consider the following DFA:
a,b
6 7 5
a b
b b a a
a
1 4
b
b
a,b 3 2
a
(a) Write down the suffix language for each state.
(b) Draw a DFA that has the same language as the one above, but has the minimal number of states.
(a) What is the language of this grammar? The alphabet is {a, b, c, d} and start symbol is T .
S → aSb |
T → S | cT | T d
(b) Answer the same question for this grammar, with same alphabet and start symbol.
S → aSb |
T → S | cS | Sd
(c) Answer the same question for this grammar, with same alphabet and start symbol.
S→Tb
T → aaS | cd
331
4. NFA pattern matching.
Pattern-search programs take two inputs: a pattern given by the user and a file of text. The program
determines whether the text file contains a match to the pattern, typically using some variation on
NFA/DFA technology. Fully developed programs, such as grep, accept patterns containing regular-
expression operators (e.g. union) and also other convenient shorthands. Our patterns will be much
simpler.
Let’s fix an alphabet Σ = {a, b, ....z, t}. Let Γ = Σ ∪ {?, [,], ∗}. A pattern will be any string in Γ∗ .
A string w matches a pattern p if you can line up the characters in the two strings such that:
• When p contains a character from Σ, it must be paired with an identical character in w.
• The character ? in p can match any substring x in w, where x contains at least one character.
• When p contains a substring of the form [w]∗, this can match zero or more repetitions of whatever
w matches.
For example, the pattern “fleck” matches only the string “fleck”. The pattern “margaret?fleck”
will match anything containing “margaret” and “fleck”, separated by at least one character. The
pattern “i t ate t [manyt] ∗ donuts” matches strings like
“i t ate t donuts” and
“i t ate t many t many t donuts”
Instances of []* can be nested. So the pattern cc[bb[a] ∗ bb] ∗ dd matches strings like ccdd or ccbbaaaaabbdd
or ccbbabbbbabbdd.
A text file t contains a match to a pattern p if t contains some substring w such that w matches p.
Design an algorithm which converts a pattern p to an NFA Np that searches for matches to p. That
is, the NFA Np will read an input text file t and accept t if and only if t contains a match to p. Np
searches for only one fixed pattern p. However you must describe a general method of constructing Np
from any input pattern p.
You can assume that your input pattern p has been checked to ensure that it’s well-formed and that we
have a function m which matches open and close brackets. For example, you can assume that an open
bracket (]) at position i in the pattern is immediately followed by a star (*). You can also assume that
there is a matching open bracket ([) at position m(i) in the pattern. The function m is a bijection, so if
there is an open bracket at position j in the pattern, m−1 (j) returns the corresponding close bracket.
ǫ, ǫ → ǫ ǫ, $ → ǫ
a, ǫ → aa
ǫ, ǫ → ǫ
ǫ, ǫ → $ b, ǫ → ǫ c, a → ǫ
b, ǫ → ǫ
ǫ, ǫ → ǫ ǫ, $ → ǫ
332
2. Converting CFG to PDA.
For each of the following languages (with alphabet Σ = {a, b}) construct a pushdown automaton
recognizing that language, following the general construction for converting a context-free grammar to
a PDA (lecture 13, pp. 115–118 in Sipser).
For each language, also give a parse tree for the word w, a leftmost derivation for w, and the first 10
configurations (state and stack contents) for the PDA as it accepts w.
3. Language to PDA.
Let Σ = {a, b, c} and consider the language
n o
L = ai bj ck i 6= j or j 6= k .
Design a PDA for L. Present your PDA as a state diagram, with brief comments about how it works.
(a) Notice that for any word x ∈ Σ∗ , it holds that (x, x) is always a S-pair. Briefly explain why.
n o
(b) Let LS = wxR x, w ∈ Σ∗ and (x, w) is an S-pair .
Give a context-free grammar that generates LS .
(c) Let
w ∈ Σ∗ for all i and
LT = w1 #w2 # . . . wi # i .
∃i, j such that i < j and (wi , wjR ) is an S-pair
S ⇒ aS | aSbS |
(a) Show that the grammar is ambiguous, by giving two parse trees for some string w.
(b) Give an efficient test to determine whether a string w in L(S) is ambiguous. Explain informally
why your test works.
333
51.9 Homework 9: Chomsky Normal Form
Spring 08
(a) Remove nullable variables from the following grammar (with start symbol S):
S→ aAa | bBb | BB
A→ C
B→ S|A
C→ S|
(b) This grammar (with start symbol S) has no nullable variables. Generate its Chomsky normal
form.
S → ASB
A → aAS | a
B → SbS | A | b
Using closure properties of context-free languages, and the fact that C is not context-free, prove that
the following languages are not context free:
n o
(a) J = ai bj ck−1 1 ≤ i ≤ j ≤ k
n o
(b) K = ai cj bk cn 0 ≤ i ≤ j ≤ k, 0 ≤ n
3. Grammar-based induction.
Let G be the grammar with start symbol S, terminal alphabet Σ = {a, b} and the following rules:
S → aX | Y.
X → aS | a.
Y → bbY | aa.
It’s actually easier to prove the following, stronger and more explicit claim:
Claim version 2. For any n, show that if a string w ∈ Σ∗ can be derived from either S or Y in n
steps, then w contains an even number of a’s.
(a.) The original claim involved strings in L(G), i.e. strings that can be derived from the start symbol
S.
Why did we extend the claim to include derivations starting with the variable Y? Why didn’t we
extend it even further to include derivations starting with the variable X?
(b.) Prove version 2 of the claim using strong induction on the derivation length n.
334
4. PDA to CFG conversion.
Consider the following PDA.
ǫ, ǫ → $ ǫ, ǫ → x ǫ, x → ǫ ǫ, $ → ǫ
A B X D E
ǫ, ǫ → a a, ǫ → a
b, a → ǫ
C
Recall the proof that a PDA is a context free language on page 119–120 of Sipser (and in the notes for
Lecture 14). For the above PDA.
(a.) Generate rules defined by the first bullet point of the proof on page 120.
(b.) What is the start variable?
(c.) How many rules are generated by the second bullet point. Explain why your answer is correct.
5. Give me that old time PDA, its good enough for me, it was good enough for my father.
Tony had just released into the market a new model of PDA called Blu-PDA (Sushiba released a
competing PDA product called HD-PDA, but thats really a topic for a different exercise).
Instead of a stack, like the good old PDA, the new Blu-PDA has a queue. You can push/pop characters
from both sides of the queue (thus, the Blu-PDA can see both characters stored in the front and back
of the queue when making a transition decision [and the current input character of course]). Since
Tony is targeting this product to the amateur market, they decided to limit the queue size to 25 (if
the queue size exceeds 25, then it stops and reject the input). Tony claims that the new Blu-PDA is a
major breakthrough and a considerably stronger computer than a PDA.
(Of course, if the Blu-PDA does strange things like reading characters from an empty queue, or popping
characters from an empty queue, then it immediately rejects the input.)
(a) So, given a Blu-PDA, is it equivalent to a PDA, DFA, or is it stronger than both?
(b) Explain clearly why your answer is correct.
(c) (5 point bonus) Prove your answer in detail.
1. Turing machines.
Give the state diagram for a Turing Machine for the following language.
To simplify your design, you can assume the beginning of the string is marked with ∗. (Inputs that
don’t start with a ∗ should be rejected.) For example, the input may look like ∗abc.
You do not need to draw transitions that lead to the (implicit) reject state. Hence, any transition that
is not present in your diagram will be assumed to lead to the reject state. Indicate which symbol (e.g.
t or B) you are using for the special blank symbol that fills the end of the tape.
335
x→R
x→R
∗→R
b → x, R
a → x, R c → x, R
a, x → R
∗→R b → x, R c → x, R
b, x → R
c →, R ⊔→R
a, b, c, x → L
n o
L = ai bj ck i < j < k .
(b) Informally and briefly explain why this TM accepts the language you claimed in the previous part.
(c) Trace the execution of this TM as it processes string aaaaaaaaa (i.e., a sequence of 9 as) by
providing the sequence of configurations it goes through (i.e., tape & state in each step - use the
configuration notation shown in class).
336
a → b, R 5
a → b, R
c → c, L d → d, L
c → ,R
a → a, L a → d, L a → c, R
2 3 → ,R
4 8 1
,R
b → c, R
→
d → b, R
a → d, L
7 6
c → c, R
d → d, R
Design a Turing Machine that for a given input tape with n cells containing O’s, marks the positions
which are composite numbers. Specifically, cells at the prime-numbered positions are left containing 0,
cells at composite-numbered locations are left containing X, and the cell at the first (leftmost) location
is left containing U (for unit). For example, consider the input OOOOOOOOOO. This represents the first
10 numbers. So the Turing machine should halt with UOOXOXOXXX on the tape.
In this problem we demonstrate a possible encoding of a TM using the alphabet {0, 1, ;, |} where | is the
newline character. We encode M = (Q, Σ, Γ, δ, q0 , qacc , qrej ) as a string n|i|j|t|r|s|w where n, i, j, t, r, s
are integers in binary representing |Q|, |Σ|, |Γ|, q0 , qacc , qrej and w represents Σ as described below.
We adopt the convention that states are numbered from 0 to n − 1, the input alphabet symbols are
numbered from 0 to i − 1, and the tape alphabet symbols are numbered from 0 to j − 1 with j − 1
representing the special blank symbol (therefore j > i).
Here is the representation of a mystery Turing machine M , using this encoding. For ease of reading,
337
we have shown | as an actual line break and given the integers in decimal rather than binary.
8
2
3
7
3
5
7; 1; 0; 2; 1
0; 1; 0; 1; 1
0; 0; 1; 1; 1
1; 1; 1; 1; 1
1; 0; 0; 0; 1
0; 2; 2; 2; 0
6; 2; 3; 2; 1
2; 1; 2; 1; 0
2; 0; 6; 0; 0
6; 1; 6; 1; 0
6; 0; 4; 0; 0
4; 0; 4; 0; 0
4; 1; 4; 1; 0
4; 2; 0; 2; 1
(a) Draw a state diagram for this TM (omitting the reject state).
(b) What is the language of this TM? Give a brief justification.
From this, we can conclude that TM’s and PlaneTM’s are equivalent.
338
51.12 Homework 12: Enumerators
Spring 08
1. Decidable problems.
Prove that L is a decidable language:
D accepts no string of length ≤ k,
L = hD, ki .
and D is a NFA
2. Enumerators I.
An enumerator for a language L is a Turing machine that writes out a list of all strings in L. See p.
152–153 in Sipser.
The enumerator has no input tape. Instead, it has an output tape on which it prints the strings, with
some sort of separator (e.g. #) between then. The strings can be printed in any order and duplicates
of the same string are ok. But each string in L must be printed eventually.
Design an enumerator that writes all tuples of the form (n, p) where n ∈ N, p ∈ N, and n is a multiple
of p.
3. Enumerators II.
If L and J are two languages, define L ⊕ J to be the language containing all strings that are in exactly
one of L and J. That is
L ⊕ J = {w | w ∈ L and w 6∈ J or w ∈ J and w 6∈ L}
(a) Design an enumerator that will print all strings in L(G) ⊕ (H).
(b) Is L(G) ⊕ (H) context-free? TM recognizable? TM decidable? Briefly justify your answer.
(c) Recall that n o
EQCF G = hG, Hi G and H are CFG’s and L(G) = L(H) .
We have mentioned in class that EQCF G is undecidable. Why is this problem harder than the
ones you just solved in parts (a) and (b)?
1. Language classification.
Suppose that we have a set of Turing machine encodings defined by each of the following properties.
That is, we have a set
n o
L = hM i M is a TM and M has property P ,
and we are considering different ways to fill in P . Assume that the Turing machines M have only a
single tape.
(a) P is “there is an input string which M accepts after no more than 327 transitions.”
(b) P is “on blank input, M halts leaving the entire tape blank.”
(c) P is “M ’s code has no transitions into the reject state.”
339
(d) P is “on input UIUC, M never changes the contents of the even-numbered positions on its tape.”
(That is, it can read the even-numbered positions, but not write a different symbol onto them.)
For each of these languages, determine whether it is Turing decidable, Turing recognizable, or not
Turing recognizable. Briefly justify your answers.
2. Reduction I.
Define the language L to be
n o
L = hM i M is a TM, L(M ) is context free but is not regular .‘
Show that L is undecidable by reducing ATM to L. (Do the reduction directly. Do not use Rice’s
Theorem.)
3. Reduction II.
Define the language L to be
n o
L = M M is a TM and 100 ≤ |L(M )| ≤ 200 .
Show that L is undecidable by reducing ATM to L. (Do the reduction directly. Do not use Rice’s
Theorem.)
4. Interleaving.
Suppose that we have Turing machines M and M 0 which enumerate languages L and L0 , respectively.
(a) Describe how to construct an enumerator P for the language L ∪ L0 . The code for P will need
to make use of the code for M and M 0 (e.g. call it as a subroutine or run it in simulation using
UTM ).
(b) Suppose that the languages L and L0 are infinite and suppose that M and M 0 enumerate their
respective languages in lexicographic order. Explain how to modify your construction from part
(a) so that the output of P is in lexicographic order.
5. Confusing but Interesting(?) Reduction. (bonus)
Reduce L to ATM (notice the different direction of reduction, in particular don’t reduce ATM to L):
1. Dovetailing
(a) Briefly sketch an algorithm for enumerating all Turing machine encodings. Remember that each
encoding is just a string, with some specific internal syntax (e.g. number of states, then number of
symbols in Σ etc).
(b) Now consider a language L that contains Turing machines which take other Turing machines as
input. Specifically,
n o
L = hM i M halts on some input hN i where N is a TM .
340
If M1 is a decider that checks if a given input hXi (that encodes a TM) halts in ≤ 37 steps, then hM1 i
is in L.
Suppose that M2 halts and rejects if its input hXi is not the encoding of a TM, and M2 spins off into
an infinite loop if hXi is a TM encoding. Then, M3 is not in L.
Show that L is (nevertheless) TM recognizable by giving a recognizer for it.
and we are considering different ways to fill in P . Assume that the Turing machines M have only a
single tape.
(a) P is “M accepts some word w ∈ Σ∗ which has |w| ≤ 58”.
(b) P is “M does not accept any word w ∈ Σ∗ which has |w| ≤ 249”.
(c) P is “M stops on some string w containing the character a in ≤ 37 steps.”
n o
(d) P is “M stops on some string w ∈ an bn n ≥ 0 ”.
(e) Given some additional (fixed) TM M 0 , the property P is “there is a word w such that both M
and M 0 accepts it.”
For each of these languages, determine whether it is Turing decidable, Turing recognizable, or not
Turing recognizable. Briefly justify your answers.
3. LBAs emptiness revisited. (15 points)
Consider a TM M and a string w. Suppose that $ and c are two fixed haracters that are not in M ’s
tape alphabet Γ. Now define the following language
n o
LM,w = z = w$$$ci i ≥ 1 and M accepts w in at most i steps .
(a) Show that given M and w, the language LM,w can be decided by an LBA. That is, explain how to
build a decider DM,w for LM,w that uses only the tape region where the input string z is written,
and no additional space on the tape.
(b) M accepts w if and only if L(DM,w ) 6= ∅. Explain briefly why this is true.
(c) Assume that we can figure out how to compute the encoding hDM,w i, given hM i and w. Prove
that the language n o
ELBA = hM i M is a LBA and L(M ) = ∅ .
341
Bibliography
[EZ74] A. Ehrenfeucht and P. Zeiger. Complexity measures for regular expressions. In Proc. 6th Annu. ACM
Sympos. Theory Comput., pages 75–79, 1974. http://portal.acm.org/citation.cfm?id=803886.
[Sip05] M. Sipser. Introduction to the Theory of Computation. 2ed edition, February 2005.
342
Index
343
enumerated, 172 PDA strict, 276
enumerator, 240 PDA, 12, 80, 92–95, 101, 102, 112–116, 126–131, 173,
exactly, 36 178, 179, 181, 199, 201, 202, 205, 232, 276,
277, 280, 286, 293, 297, 334
final, 29, 135, 136 periodic, 194
finite, 70 polynomial reductions, 187
Finite-state Transducers,FST,FSTs, 32 pop, 122
FST, 32, 33 Post’s Correspondence Problem, 189
prefix, 22
generalized non-deterministic finite automata, 58
Problem
GNFA, 58–63, 308, 329
3Colorable, 188
graph search, 144
3DM, 188
halting, 153 A, 157
halting problem, 152 B, 157
halts, 158 Circuit Satisfiability, 186, 187
height, 108 Clique, 188
homomorphism, 63 CSAT, 187, 188
formula satisfiability, 187, 188
Kleene star, 36 Hamiltonian Cycle, 188
Independent Set, 188
language, 22, 138 Partition, 188
language recognized, 30 SAT, 185–188
LBA, 163, 173–176, 178, 204, 282, 340 Satisfiability, 185
LBA, 173 Subset Sum, 188
leftmost derivation, 103 TSP, 188
length, 21 Vec Subset Sum, 188
Lexicographic ordering, 22 Vertex Cover, 188
linear bounded automata, 173 X, 157
product automata, 31, 35
match, 189 push, 122
modified Post’s Correspondence Problem, 190 pushdown automaton, 126
MPCP, 190, 192, 193
QBA, 262
NFA, 39
Quadratic Bounded Automaton, 262
NFA, 39–56, 58, 59, 61–63, 80, 92–94, 112, 121, 123,
Quote
127, 128, 131, 144, 145, 147, 173, 181, 199–
A confederacy of Dunces, John Kennedy Toole,
201, 218–221, 223, 224, 252–254, 266, 274,
184
275, 286, 304–309, 315, 327–329, 331, 338,
Andrew Jackson, 173
342
Dirk Gently’s Holistic Detective Agency, Dou-
non-deterministic finite automata, 39
glas Adams., 130
non-deterministic finite automaton, 199
Moby Dick, Herman Melville, 12
non-deterministic Turing machine, 171
The Hitch Hiker’s Guide to the Galaxy, by Dou-
nondeterministic Turing machine, 202
glas Adams., 152
NTM, 171, 172, 202
The lake of tears, by Emily Rodda, 152
nullable, 86, 87
Through the Looking Glass, by Lewis Carroll, 37
onto, 239 QBA, 262
oracle, 157
RA, 12, 121, 124, 125, 128, 131, 148, 247, 257, 260,
pairing, 181 267, 315
palindrome, 310 recognizable, 141
parallel recursive automata, 257 recursion, 131
parse tree, 80 Recursive automata, 12
PCP, 190, 193, 194 Acceptance, 122
PDA recursive automaton, 121
344
reduces, 157
regex, 59
regular, 30, 44
Regular expressions, 36
regular operations, 46
reject, 131
rejecting, 135, 136
rejects, 133
relatively prime, 240
reverse, 45
reversed, 45
Rice Theorem, 164
roots, 130
terminal, 82
Theorem
Rice, 164
tiling completion problem, 195
TM, 12, 132–145, 147–178, 180–182, 189, 190, 192,
193, 195, 199, 203–205, 235–238, 242–245,
247, 260–267, 279–283, 298, 316, 317, 320,
321, 335–340
top of the stack, 122
transition function, 29
Turing decidable, 139, 141
Turing machine, 132, 134, 136
Turing recognizable, 138
Union, 36
unistate TM, 282
unit pair, 88
unit production, 85
unit rule, 85, 88
unit-rules, 87
universal Turing machine, 151
useless, 85, 86
xor, 276
345