0% found this document useful (0 votes)

884 views345 pages

Book

Lecture notes of CS273: Introduction to the theory of computation spring semester, 2008 and CS373: Theory of Computation Spring Semester, 2009 UIUC Margaret Fleck I Lectures 1 Lecture 1: Overview and Administrivia 17 1.

Uploaded by

Jasdeep Singh Khurana

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

884 views345 pages

Book

Uploaded by

Jasdeep Singh Khurana

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 345

Lecture notes of

CS273: Introduction to the Theory of Computation

Spring Semester, 2008

and CS373: Theory of Computation

Spring Semester, 2009

CS, UIUC

Margaret Fleck Sariel Har-Peled1

May 18, 2009

1
Department of Computer Science; University of Illinois; 201 N. Goodwin Avenue; Urbana, IL, 61801, USA;
sariel@uiuc.edu; http://www.uiuc.edu/~sariel/.
Contents

Contents 2

Preface 12

Preface 14

I Lectures 15
1 Lecture 1: Overview and Administrivia 17
1.1 Course overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.2 Necessary Administrivia . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

2 Lecture 2: Strings, Languages, DFAs 21

2.1 Alphabets, strings, and languages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.1.1 Alphabets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.1.2 Strings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.1.3 Languages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.1.4 Strings and programs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.2 State machines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.2.1 A simple automata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.2.2 Another automata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.2.3 What automatas are good for? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.2.4 DFA- deterministic finite automata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.3 More examples of DFAs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.3.1 Number of characters is even . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.3.2 Number of characters is divisible by 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.3.3 Number of characters is divisible by 6 . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.3.4 Number of ones is even . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.3.5 Number of zero and ones is always within two of each other . . . . . . . . . . . . . . . 26
2.3.6 More complex language . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.4 The pieces of a DFA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

3 Lecture 3: More on DFAs 28

3.1 JFLAP demo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.2 Some special DFAs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.3 Formal definition of a DFA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.4 Formal definition of acceptance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.5 Closure properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.5.1 Closure under complement of regular languages . . . . . . . . . . . . . . . . . . . . . . 30
3.6 Closure under intersection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.7 (Optional) Finite-state Transducers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

2
4 Lecture 4: Regular Expressions and Product Construction 34
4.1 Product Construction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
4.1.1 Product Construction: Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
4.2 Product Construction: Formal construction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.3 Operations on languages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
4.4 Regular Expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
4.4.1 More interesting examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

5 Lecture 5: Nondeterministic Automata 39

5.1 Non-deterministic finite automata (NFA) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
5.1.1 NFA feature #1: Epsilon transitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
5.1.2 NFA Feature #2: Missing transitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
5.1.3 NFA Feature #3: Multiple transitions . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
5.2 More Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
5.2.1 Running an NFA via search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
5.2.2 Interesting guessing example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
5.3 Formal definition of an NFA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
5.3.1 Formal definition of acceptance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

6 Lecture 6: Closure properties 44

6.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
6.2 Closure under string reversal for NFAs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
6.3 Closure of NFAs under regular operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
6.3.1 NFA closure under union . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
6.3.2 NFA closure under concatenation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
6.3.3 NFA closure under the (Kleene) star . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
6.3.4 Translating regular expressions into NFAs . . . . . . . . . . . . . . . . . . . . . . . . . 48

7 Lecture 7: NFAs are equivalent to DFAs 51

7.1 From NFAs to DFAs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
7.1.1 NFA handling an input word . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
7.1.2 Simulating NFAs with DFAs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
7.1.3 The construction of a DFA from an NFA . . . . . . . . . . . . . . . . . . . . . . . . . . 54

8 Lecture 8: From DFAs/NFAs to Regular Expressions 58

8.1 From NFA to regular expression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
8.1.1 GNFA— A Generalized NFA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
8.1.2 Top-level outline of conversion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
8.1.3 Details of ripping out a state . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
8.1.4 Proof of correctness of the ripping process . . . . . . . . . . . . . . . . . . . . . . . . . 60
8.1.5 Running time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
8.2 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
8.2.1 Example: From GNFA to regex in 8 easy figures . . . . . . . . . . . . . . . . . . . . . . 62
8.3 Closure under homomorphism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

9 Lecture 9: Proving non-regularity 64

9.1 State and regularity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
9.1.1 How to tell states apart . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
9.2 Irregularity via differentiation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
9.2.1 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
9.3 The Pumping Lemma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
9.3.1 Proof by repetition of states . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
9.3.2 The pumping lemma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
9.3.3 Using the PL to show non-regularity . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
9.3.4 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

3
9.3.5 A note on finite languages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
9.4 Irregularity via closure properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
9.4.1 Being careful in using closure arguments . . . . . . . . . . . . . . . . . . . . . . . . . . 70

10 Lecture 10: DFA minimization 72

10.1 On the number of states of DFA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
10.1.1 Starting a DFA from different states . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
10.1.2 Suffix Languages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
10.2 Regular Languages and Suffix Languages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
10.2.1 A few easy observations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
10.2.2 Regular languages and suffix languages . . . . . . . . . . . . . . . . . . . . . . . . . . 74
10.2.3 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
10.3 Minimization algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
10.3.1 Idea of algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
10.3.2 The algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

11 Lecture 11: Context-free grammars 79

11.1 Context-free grammars . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
11.1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
11.1.2 Deriving the context-free grammars by example . . . . . . . . . . . . . . . . . . . . . . 80
11.2 Derivations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
11.2.1 Formal definition of context-free grammar . . . . . . . . . . . . . . . . . . . . . . . . . 82
11.2.2 Ambiguity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

12 Lecture 12: Cleaning up CFGs and Chomsky Normal form 85

12.1 Cleaning up a context-free grammar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
12.1.1 Example of a messy grammar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
12.1.2 Removing useless variables unreachable from the start symbol . . . . . . . . . . . . . . 86
12.1.3 Removing useless variables that do not generate anything . . . . . . . . . . . . . . . . 86
12.2 Removing -productions and unit rules from a grammar . . . . . . . . . . . . . . . . . . . . . 87
12.2.1 Discovering nullable variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
12.2.2 Removing -productions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
12.3 Removing unit rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
12.3.1 Discovering all unit pairs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
12.3.2 Removing unit rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
12.4 Chomsky Normal Form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
12.4.1 Outline of conversion algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
12.4.2 Final restructuring of a grammar into CNF . . . . . . . . . . . . . . . . . . . . . . . . 89
12.4.3 An example of converting a CFG into CNF . . . . . . . . . . . . . . . . . . . . . . . . . 90

13 Leftover: Pushdown Automatas – PDAs 92

13.1 Bracket balancing example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
13.2 The language wwr . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
13.3 The language an bn with n being even . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
13.4 The language an b2n . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
13.5 Formal notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
13.6 A branching example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95

14 Lecture 13: Even More on Context-Free Grammars 96

14.1 Grammars in CNF form have compact parsing trees . . . . . . . . . . . . . . . . . . . . . . . . 96
14.2 Closure properties for context-free grammars . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
14.2.1 Proving some CFG closure properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
14.2.2 CFG are closed under intersection with a regular language . . . . . . . . . . . . . . . . 98

4
15 Leftover: CFG to PDA, and Alternative proof of CNF effectiveness 101
15.1 PDA– Pushing multiple symbols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
15.2 CFG to PDA conversion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
15.3 Alternative proof of CNF effectiveness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103

16 Lecture 14: Repetition in context free languages 104

16.1 Generating new words . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
16.1.1 Example of repetition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
16.2 The pumping lemma for CFG languages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
16.2.1 If a variable repeats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
16.2.2 How tall the parse tree have to be? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
16.2.3 Pumping Lemma for CNF grammars . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
16.3 Languages which are not context-free . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
16.3.1 The language an bn cn is not context-free . . . . . . . . . . . . . . . . . . . . . . . . . . 110
16.4 Closure properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
16.4.1 Context-free languages are not closed under intersection . . . . . . . . . . . . . . . . . 111
16.4.2 Context-free languages are not closed under complement . . . . . . . . . . . . . . . . . 111

17 Leftover: PDA to CFG conversion 112

17.1 NFA to CFG conversion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
17.2 PDA to CFG conversion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
17.2.1 From PDA to a normalized PDA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
17.2.2 From a normalized PDA to CFG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
17.2.3 Proof of correctness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115

18 Lecture 15: CYK Parsing Algorithm 117

18.1 CYK parsing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
18.1.1 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
18.1.2 CYK by example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
18.2 The CYK algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120

19 Lecture 16: Recursive automatas 121

19.1 Recursive automata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
19.1.1 Formal definition of RAs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
19.2 CFGs and recursive automata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
19.2.1 Converting a CFG into a recursive automata . . . . . . . . . . . . . . . . . . . . . . . . 123
19.2.2 Converting a recursive automata into a CFG . . . . . . . . . . . . . . . . . . . . . . . . 123
19.3 More examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
19.3.1 Example 1: RA for the language an b2n . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
19.3.2 Example 2: Palindrome . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
19.3.3 Example 3: #a = #b . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
19.4 Recursive automata and pushdown automata . . . . . . . . . . . . . . . . . . . . . . . . . . . 126

20 Instructor notes: Recursive automatas vs. Pushdown automatas 127

20.1 Instructor Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127

21 Lecture 17: Computability and Turing Machines 130

21.1 Computability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
21.1.1 History . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
21.2 Turing machines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
21.2.1 Turing machines at a high level . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
21.2.2 Turing Machine in detail . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
21.2.3 Turing machine examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
21.2.4 Formal definition of a Turing machine . . . . . . . . . . . . . . . . . . . . . . . . . . . 134

5
22 Lecture 18: More on Turing Machines 136
22.1 A Turing machine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
22.2 Turing machine configurations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
22.3 The languages recognized by Turing machines . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
22.4 Variations on Turing Machines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
22.4.1 Doubly infinite tape . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
22.4.2 Allow the head to stay in the same place . . . . . . . . . . . . . . . . . . . . . . . . . 139
22.4.3 Non-determinism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
22.4.4 Multi-tape . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
22.5 Multiple tapes do not add any power . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140

23 Lecture 19: Encoding problems and decidability 141

23.1 Review and context . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
23.2 TM example: Adding two numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
23.2.1 A simple decision problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
23.2.2 A decider for addition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
23.3 Encoding a graph problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
23.4 Algorithm for graph reachability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
23.5 Some decidable DFA problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
23.6 The acceptance problem for DFA’s . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145

24 Lecture 20: More decidable problems, and simulating TM and “real” computers 147
24.1 Review: decidability facts for regular languages . . . . . . . . . . . . . . . . . . . . . . . . . . 147
24.2 Problems involving context-free languages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
24.2.1 Context-free languages are TM decidable . . . . . . . . . . . . . . . . . . . . . . . . . 148
24.2.2 Is a word in a CFG? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
24.2.3 Is a CFG empty? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
24.2.4 Undecidable problems for CFGs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
24.3 Simulating a real computer with a Turing machine . . . . . . . . . . . . . . . . . . . . . . . . 149
24.4 Turing machine simulating a Turing machine . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
24.4.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
24.4.2 The universal Turing machine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151

25 Lecture 21: Undecidability, halting and diagonalization 152

25.1 Liar’s Paradox . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
25.2 The halting problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
25.2.1 Implications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
25.3 Not all languages are recognizable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
25.4 The Halting theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
25.4.1 Diagonalization view of this proof . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
25.5 More Implications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155

26 Lecture 22: Reductions 157

26.1 What is a reduction? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
26.1.1 Formal argument . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
26.2 Halting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
26.3 Emptiness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
26.4 Equality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
26.5 Regularity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
26.6 Windup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162

6
27 Lecture 23: Rice Theorem and Turing machine behavior properties 163
27.1 Outline & Previous lecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
27.1.1 Forward outline of lectures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
27.1.2 Recap of previous class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
27.2 Rice’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
27.2.1 Another Example - The language L3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
27.2.2 Rice’s theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
27.3 TM decidability by behavior . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
27.3.1 TM behavior properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
27.3.2 A decidable behavior property . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
27.3.3 An undecidable behavior property . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
27.4 More examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
27.4.1 The language LUIUC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
27.4.2 The language Halt_Empty_TM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
27.4.3 The language L111 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168

28 Lecture 24: Dovetailing and non-deterministic Turing machines 169

28.1 Dovetailing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169
28.1.1 Interleaving . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169
28.1.2 Interleaving on one tape . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169
28.1.3 Dovetailing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170
28.2 Nondeterministic Turing machines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
28.2.1 NTMs recognize the same languages . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
28.2.2 Halting and deciders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172
28.2.3 Enumerators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172

29 Lecture 25: Linear Bounded Automata and Undecidability for CFGs 173
29.1 Linear bounded automatas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
29.1.1 LBA halting is decidable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
29.1.2 LBAs with empty language are undecidable . . . . . . . . . . . . . . . . . . . . . . . . 174
29.2 On undecidable problems for context free grammars . . . . . . . . . . . . . . . . . . . . . . . 177
29.2.1 TM consecutive configuration pairs is a CFG . . . . . . . . . . . . . . . . . . . . . . . . 177
29.2.2 The language of a context-free grammar generates all strings is undecidable . . . . . . 178
29.2.3 CFG equivalence is undecidable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180
29.3 Avoiding PDAs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181

30 Lecture 26: NP Completeness I 184

30.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184
30.2 Complexity classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
30.2.1 Reductions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187
30.3 Other problems that are known to be NP-Complete . . . . . . . . . . . . . . . . . . . . . . 188
30.4 Proof of Cook-Levin theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188

31 Lecture 27: Post’s Correspondence Problem and Tilings 189

31.1 Post’s Correspondence Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189
31.1.1 Reduction of ATM to MPCP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190
31.1.2 Reduction to MPCP to PCP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193
31.2 Reduction of PCP to AMBIGCFG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194
31.3 2D tilings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194
31.3.1 Tilings and undecidability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
31.4 Simulating a TM with a tiling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195

7
32 Review of topics covered 198
32.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198
32.2 The players . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198
32.3 Regular languages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199
32.4 Context-free Languages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201
32.5 Turing machines and computability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202
32.5.1 Turing machines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202
32.5.2 Reductions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203
32.5.3 Other undecidability problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204
32.6 Summary of closure properties and decision problems . . . . . . . . . . . . . . . . . . . . . . . 204
32.6.1 Closure Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204
32.6.2 Decision problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205

II Discussions 206

33 Discussion 1: Review 208

33.1 Homework guidelines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208
33.2 Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208
33.3 Divisibility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208
√
33.4 2 is not rational . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209
33.5 Review of set theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209
33.6 Strings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209
33.7 Recursive definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210
33.8 Induction example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210

34 Discussion 2: Examples of DFAs 211

34.1 Languages that depend on k . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211
34.1.1 aab2i . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211
34.1.2 aab5i . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212
34.1.3 aabki . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212
34.2 Number of changes from 0 to 1 is even . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214
34.3 State explosion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214
34.3.1 Being smarter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216
34.4 None of the last k characters from the end is 0 . . . . . . . . . . . . . . . . . . . . . . . . . . 216

35 Discussion 3: Non-deterministic finite automatas 218

35.1 Non-determinism with finite number of states . . . . . . . . . . . . . . . . . . . . . . . . . . . 218
35.1.1 Formal description of NFA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218
35.1.2 Concatenating NFAs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219
35.1.3 Sometimes non-determinism keeps the number of states small . . . . . . . . . . . . . . 219
35.1.4 How to complement an NFA? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220
35.1.5 Sometimes non-determinism keeps the design logic simple . . . . . . . . . . . . . . . . 220
35.2 Pattern Matching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221
35.3 Formal definition of acceptance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221

36 Discussion 4: More on non-deterministic finite automatas 223

36.1 Computing -closure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223
36.2 Subset Construction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224

8
37 Discussion 5: More on non-deterministic finite automatas 225
37.1 Non-regular languages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225
37.1.1 L(0n 1n ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225
37.1.2 L(#a + #b = #c) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225
37.1.3 Not too many as please . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226
37.1.4 A Trick Example (Optional) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226

38 Discussion 6: Closure Properties 227

38.1 Closure Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227
38.1.1 sample one . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227
38.1.2 sample two . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227
38.1.3 sample three . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227

39 Discussion 7: Context-Free Grammars 228

39.1 Context free grammar for languages with balance or without it . . . . . . . . . . . . . . . . . 228
39.1.1 Balanced strings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228
39.1.2 Mixed balanced strings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228
39.1.3 Unbalanced pair . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229
39.1.4 Balanced pair in a triple . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229
39.1.5 Unbalanced pair in a triple . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229
39.1.6 Anything but balanced . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230
39.2 Similar count . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230
39.3 Inherent Ambiguity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230
39.4 A harder example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231

40 Discussion 8: From PDA to grammar 232

40.1 Converting PDA to a Grammar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232

41 Discussion 9: Chomsky Normal Form and Closure Properties 233

41.1 Chomsky Normal Form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233
41.2 Closure Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233

42 Discussion 10: Pumping Lemma for CFLs 234

42.1 Pumping Lemma for CFLs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234

43 Discussion 11: High-level TM design 235

43.1 Questions on homework? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235
43.2 High-level TM design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235
43.2.1 Modulo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235
43.2.2 Multiplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236
43.2.3 Binary Addition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236
43.2.4 Quadratic Remainder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237
43.3 MTM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237

44 Discussion 12: Enumerators and Diagonalization 239

44.1 Cardinality of a Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239
44.2 Rationals are enumerable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 240
44.3 Counting all words . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 240
44.4 Languages are not countable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241

45 Discussion 13: Reductions 242

45.1 Easy Reductions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242

46 Discussion 14: Reductions 243

46.1 Undecidability and Reduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243

9
47 Discussion 15: Review 246
47.1 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 246

III Exams 248

48 Exams – Spring 2009 250
48.1 Midterm 1 - Spring 2009 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251
48.2 Midterm 2 - Spring 2009 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 256
48.3 Final - Spring 2009 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259
48.4 Mock Final Exam - Spring 2009 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264
48.5 Quiz 1 - Spring 2009 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 268
48.6 Quiz 2 – Spring 2009 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271

49 Exams – Spring 2008 273

49.1 Midterm 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274
49.2 Midterm 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 276
49.3 Final – Spring 2008 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279
49.4 Mock Final . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284
49.5 Mock Final with Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 288
49.6 Quiz 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295
49.7 Quiz 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 297
49.8 Quiz 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 298

IV Homeworks 299
50 Spring 2009 301
50.1 Homework 1: Problem Set 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . 301
50.2 Homework 2: DFAs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303
50.3 Homework 3: DFAs II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 304
50.4 Homework 4: NFAs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 306
50.5 Homework 5: On non-regularity. . . . . . . . . . . . . . . . . . . . . . . . 309
50.6 Homework 6: Context-free grammars. . . . . . . . . . . . . . . . . . . 311
50.7 Homework 7: Context-free grammars II . . . . . . . . . . . . . . . . 313
50.8 Homework 8: Recursive Automatas . . . . . . . . . . . . . . . . . . . . 315
50.9 Homework 9: Turing Machines . . . . . . . . . . . . . . . . . . . . . . . . 316
50.10Homework 10: Turing Machines II . . . . . . . . . . . . . . . . . . . . . 316
50.11Homework 11: Enumerators and Diagonalization . . . . . . . . 318
50.12Homework 12: Preparation for Final . . . . . . . . . . . . . . . . . . . 320

51 Spring 2008 322

51.1 Homework 1: Overview and Administrivia . . . . . . . . . . . . . . 322
51.2 Homework 2: Problem Set 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . 323
51.3 Homework 3: DFAs and regular languages . . . . . . . . . . . . . . 325
51.4 Homework 4: NFAs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 327
51.5 Homework 5: NFA conversion . . . . . . . . . . . . . . . . . . . . . . . . . . 328
51.6 Homework 6: Non-regularity via Pumping Lemma . . . . . . . 329
51.7 Homework 7: CFGs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 330
51.8 Homework 8: CFGs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 331
51.9 Homework 9: Chomsky Normal Form . . . . . . . . . . . . . . . . . . 333

10
51.10Homework 10: Turing Machines . . . . . . . . . . . . . . . . . . . . . . . 334
51.11Homework 11: Turing Machines . . . . . . . . . . . . . . . . . . . . . . . 335
51.12Homework 12: Enumerators . . . . . . . . . . . . . . . . . . . . . . . . . . . 338
51.13Homework 13: Enumerators . . . . . . . . . . . . . . . . . . . . . . . . . . . 338
51.14Homework 14: Dovetailing, etc. . . . . . . . . . . . . . . . . . . . . . . . . 339

Bibliography 341

Index 342

11
Preface – Spring 2009
Finally: It was stated at the outset, that this system would not be here, and at once, perfected. You cannot but
plainly see that I have kept my word. But I now leave my cetological System standing thus unfinished, even as
the great Cathedral of Cologne was left, with the crane still standing upon the top of the uncompleted tower.
For small erections may be finished by their first architects; grand ones, true ones, ever leave the copestone to
posterity. God keep me from ever completing anything. This whole book is but a draft - nay, but the draft of a
draft. Oh, Time, Strength, Cash, and Patience!
– Moby Dick, Herman Melville

This manuscript is a collection of class notes used in teaching CS 373 (Theory of Computation), in the
spring of 2009, in the Computer Science department in UIUC. The instructors were Sariel Har-Peled and
Madhusudan Parthasarathy. They are based on older class notes – see second preface for details.
This class notes diverse from previous semesters in two main points:

(A) Regular languages pumping lemma. Although we still taught the pumping lemma for regular
languages, we did not expected the students to use it to proving languages are not regular. Instead, we
provided direct proofs that shoes that any automaton for these languages would require infinite number
of states. This leads to much simpler proofs than using the pumping lemma, and it seems the students
find them easier to understand. Naturally, we are not the first to come up with this idea, it is sometimes
referred to as the “technique of many states”.
The main problem with the pumping lemma is the large number of quantifiers involved in stating it.
They seem to make it harder for the student to use it.

(B) Recursive automatas. Instead of teaching PDAs, we used an alternative machine model of Recursive
automata (RA) for context-free languages. RAs are PDAs that do not manipulate the stack directly,
but only through the calling stack. For a discussion of this issue, see Chapter 20 (page 127).
This lead to various changes later in the course. In particular, the fact that the intersect of CFL language
and a regular language is still CFL, is proven directly on the grammar. Similarly, the proof that deciding
if a grammar generates all words is undecidable now follows by a simpler but different proof, see relevant
portion for details.
In particular, the alternative proof uses the fact that given two configurations of a TM written on top
of each other, then a DFA can verify that the top configuration yields the bottom configuration. This
is a cute observation that seems to be worthy of describing in class, and it leads naturally into the
Cool-Levin theorem proof.

What remains to be done. Students suggested that more examples would be useful to have in the class
notes. In future instances in the class it would be a good idea to allocate two lectures towards the end to
teach the Cook-Levin Theorem properly. A more algorithmic emphasize might be a good preparation for
later courses.
And of course, no class notes are prefect. These class notes can definitely be further improved.

Format. Every chapter corresponds to material covered in one lecture in the course. Every week we also
had a discussion section run by the TAs. The TAs also wrote (most of) the notes included for the discussion
section.

12
Acknowledgements
The chapters on recursive automata and the review chapter was written by Madhusudan Parthasarathy.
The TAs in the class (Reza Zamani, Aparna Sundar, and Micha Hadosh) provided considerable help in
writing the exercises, and their solutions, and we thank them for their valuable work.
For further acknowledgements, see the older preface.

Getting the source for this work

These class notes would ultimately be available online. Stay tuned.

Copyright
This work is licensed under the Creative Commons Attribution-Noncommercial 3.0 License. To view a
copy of this license, visit http://creativecommons.org/licenses/by-nc/3.0/ or send a letter to Creative
Commons, 171 Second Street, Suite 300, San Francisco, California, 94105, USA.

Sariel Har-Peled
May 2009, Urbana, IL. USA.

13
Preface - Spring 2008
This manuscript is a collection of class notes used in teaching CS 273 (Introduction to The Theory of
Computation), in the spring of 2008, in the Computer Science department in UIUC. The instructors were
Margaret Fleck and Sariel Har-Peled.
These class notes are an initial effort to have class notes covering the material taught in this class, and
is largely based on hand written class notes from previous semesters, and the book used in the class (Sipser
[Sip05]).

Quality. We do not consider these class notes to be perfect, and in fact, far from it. However, it is our
hope that people would improve these class notes in proceedings semesters, and bring them into acceptable
quality. From previous experience, it takes 3–4 iterations before the class notes reach acceptable quality.
Even getting the class notes to their current form required non-trivial amount of work.

Why? We have no complaints about the book, but rather we prefer the form of class notes over the form
a book. Writing a class notes is also an effective (if somewhat time consuming) way to prepare for lecture.
And is usual at some points we preferred to present some material in our own way.
Ultimately, we hope that after several semesters of polishing these class notes they would be good enough
to replace the required text book in the class.

Acknowledgements
We had the benefit of interacting with several people on the work in this class notes. Other instructors
that taught this class and had contributed (directly or indirectly) to the material covered in this class are
Chandra Chekuri Madhusudan Parthasarathy, Lenny Pitt, and Mahesh Viswanathan.
In addition, the TAs in the class (Reza Zamani, James Lai, and Raman Sharykin) provided considerable
help in writing the exercises, and their solutions, and we thank them for their valuable work.
We would also like to thank the students in the class for their input, which helped in discovering numerous
typos and errors in the manuscript.

Getting the source for this work

These class notes would ultimately be available online. Stay tuned.

Margaret Fleck and Sariel Har-Peled

May 2008, Urbana, IL. USA.

14
Part I

Lectures

15
Chapter 1

Lecture 1: Overview and

Administrivia
20 January 2009

1.1 Course overview

The details vary from term to term. This is a rough outline of the course and a motivation for why we study
this stuff.

1. Theory of Computation.
• Build formal mathematical models of computation.
• Analyze the inherent capabilities and limitations of these models.
2. Course goals:
• Simple practical tools you can use in later courses, projects, etc. The course will provide you with
tools to model complicated systems and analyze them.
• Inherent limits of computers: problems that no computer can solve.
• Better fluency with formal mathematics (closely related to skill at debugging programs).
3. What is computable?
(a) check if a number n is prime
(b) compute the product of two numbers
(c) sort a list of numbers
(d) find the maximum number from a list
4. Computability, complexity, automata.
5. Example:

input n;
assume n>1;
while (n !=1) {
if (n is even)
n := n/2;
else
n := 3*n+1;
}

17
Does this program always stop? Not known.

6. Course divides into three sections

• regular languages ———— practical tools

• context-free languages
• Turing machines.

7. Turing machines and decidability → limits of computation.

Regular languages

Context free languages

Turing decidable languages
Turing recognizable language
Not Turing recognizable languages

8. Regular languages and context-free languages:

• Simple and computationally efficient.

• heavily used in programming languages and compilers.
• also in computational linguistics.
• well-known application: grep

9. Difference in scope

• regular languages describe tokens (e.g. what is a legal variable name?)

• context-free languages describe syntax (whole program, whole sentence)

Illustrate with your favorite example from programming languages or natural language.

10. State machines

• widely used in other areas of CS (e.g. networking)

• equivalent to regular languages (we will see later in course)
• used to implement algorithms for regular languages e.g. grep.

Illustrate with your favorite simple state machine, e.g. a vending machine.

11. Decidability:

• Are there problems that computers can not solve? ⇒ yes!

• By the end of the course, you will know why.
Example: the CS 225 grader problem.
– Given a random C program (maybe very badly written).
– Will it stop or will it keep running forever?
– Will it return the right answer for all possible inputs?

12. Models of mathematics

• 19th century - major effort to formalize calculus.

• Georg Cantor - starts set theory. Proves that the “number” of integers is strictly smaller than the
number of integers, using the diagonalization argument.

18
• David Hilbert (1920’s) tries to formalize all of math and prove it correct
• Kurt Gödel (1931) shows that one can not prove consistency of a mathematical formalism having
non-trivial power.

13. Formal models of computation

• Alonzo Church: lambda calculus (like LISP).

• Alan Turing: Turing machines (very simple computers).
• Church/Turing thesis: these models can do anything that any computer can do.
• Both showed (1936) that their respective models contain undecidable problems.

14. It is mysterious and “cool” that some simple-looking problems are undecidable.

15. The proofs of undecidability are a bit abstract. Earlier parts of the course will help prepare you, so you
can understand the last part.

1.2 Necessary Administrivia

This lecture mentions the highlights. Browse the web page for more details:

http://www.cs.uiuc.edu/class/sp09/cs373/{}.

• Prerequisites: CS 125, CS 173, CS 225 (or equivalents). Other experience can sometimes substitute
(e.g. advanced math). Speak to us if you are not sure.

• Vital to join the class newsgroup (details on web page). Carries important announcements, e.g. exam
times, hints and corrections on homeworks.
Especially see the Lectures page for schedule of topics, readings, quiz/exam dates.

• Homework 1 should be available on the class website. Due next Thursday. (Normally they will be
due Thursdays on 12:30, but next Monday is a holiday.) Browse chapter 0 and read section 1.1.
Normally, homeworks and readings will not be announced in class and you must watch the website and
newsgroups.

• Read and follow the homework format guidelines on the web page. Especially: each problem on a
separate sheet, your name on each problem, your section time (e.g. 10) in upper-right corner. This
makes a big difference grading and sorting graded homeworks.

• Course staff.

• Discussion sections. Office hours will be posted in the near future. Email and the newsgroup are always
an option. Please do not be shy about contacting us.

• Problem sets, exams, etc are common to all sections. It may be easier to start with your lecture and
discussion section instructors, but feel free to also talk to the rest of us.

• Sipser textbook: get a copy. We follow the textbook fairly closely. Our lecture notes only outline what
was covered and don’t duplicate the text. Used copies, international or first editions, etc are available
cheap through Amazon.

• Graded work:

(a) 30%: Final.

(b) 20%: First midterm.
(c) 20%: Second midterm.

19
(d) 25%: Homeworks and self-evaluations.
Worst homework will be dropped.
Self evaluations would be online quizes on the web.
(e) 5%: Attending discussion section.

• Late homeworks are not accepted, except in rare cases where you have a major excuse (e.g. serious
illness, family emergency, weather unsafe for travel).
• Homeworks can be done in groups of ≤ 3 students. Write their names under your own on your
homework. Also document any other major help you may have gotten. Each person turns in their own
write-up IN THEIR OWN WORDS.

• Doing homeworks is vital preparation for success on the exams. Getting help from your partners is
good, but don’t copy their solutions blindly. Make sure you understand the solutions.
• See the web pages for details of our cheating policy. First offense → zero on the exam or assignment
involved. Second offense or cheating on the final ⇒ fail the course. Please do not cheat.

• If you are not sure what is allowed, talk to us and/or document clearly what you did. That is enough
to ensure it is not “cheating” (though you might lose points).
• Bugs happen, on homeworks and even in the textbook and on exams. If you think you see a bug,
please bring it to our attention.
• Please tell us if you have any disabilities or other special circumstances that we should be aware of.

20
Chapter 2

Lecture 2: Strings, Languages, DFAs

17 January 2008

This lecture covers material on strings and languages from Sipser chapter 0. Also chapter 1 up to (but
not including) the formal definition of computation (i.e. pages 31–40).

2.1 Alphabets, strings, and languages

2.1.1 Alphabets
An alphabet is any finite set of characters.
Here are some examples for such alphabets:
(i) {0, 1}.
(ii) {a, b, c}.
(iii) {0, 1, #}.
(iv) {a, ...z, A, ...Z}: all the letters in the English language.
(v) ASCII - this is the standard encoding schemes used by computers mappings bytes (i.e., integers in the
range 0..255) to characters. As such, a is 65, and the space character ␣ is 32.
(vi) {moveforward, moveback, rotate90, reset}.

2.1.2 Strings
This section should be recapping stuff already seen in discussion section 1.
A string over an alphabet Σ is a finite sequence of characters from Σ.
Some sample strings with alphabet (say) Σ = {a, b, c} are abc, baba, and aaaabbbbccc.
The length of a string x is the number of characters in x, and it is denoted by |x|. Thus, the length of
the string w = abcdef is |w| = 6.
The empty string is denoted by , and it (of course) has length 0. The empty string is the string
containing zero characters in it.
The concatenation of two strings x and w is denoted by xw, and it is the string formed by the string
x followed by the string w. As a concrete example, consider x = cat, w = nip and the concatenated strings
xw = catnip and wx = nipcat.
Naturally, concatenating with the empty string results in no change in the string. Formally, for any string
x, we have that x = x. As such = .
For a string w, the string x is a substring of w if the string x appears contiguously in w.
As such, for w =abcdef
we have that bcd is a substring of w,
but ace is not a substring of w.

21
A string x is a suffix of w if its a substring of w appearing in the end of w. Similarly, y is a prefix of
w if y is a substring of w appearing in the beginning of w.

As such, for w =abcdef

we have that abc is a prefix of w,
and def is a suffix of w.

Here is a formal definition of prefix and substring.

Definition 2.1.1 The string x is a prefix of a string w, if there exists a string z, such that w = xz.
Similarly, x is a substring of w if there exist strings y and z such that w = yxz.

2.1.3 Languages
A language is a set of strings. One special language is Σ∗ , which is the set of all possible strings generated
over the alphabet Σ∗ . For example, if

Σ = {a, b, c} then Σ∗ = {, a, b, c, aa, ab, ac, ba, . . . , aaaaaabbbaababa, . . .} .

Namely, Σ∗ is the “full” language made of characters of Σ. Naturally, any language over Σ is going to be
a subset of Σ∗ .

Example 2.1.2 The following is a language

L = {b, ba, baa, baaa, baaaa, ...} .

Now, is the following a language?

{aa, ab, ba, } .
Sure – it is not a very “interesting” language because its finite, but its definitely a language.
How about {aa, ab, ba, ∅}. Is this a language? No! Because ∅ is no a valid string (which comes to
demonstrate that the empty word and ∅ are not the same creature, and they should be treated differently.

Lexicographic ordering of a set of strings is an ordering of strings that have shorter strings first, and
sort the strings alphabetically within each length. Naturally, we assume that we have an order on the given
alphabet.
Thus, for Σ = {a, b}, the Lexicographic ordering of Σ∗ is

, a, b, aa, ab, ba, bb, aaa, aab, . . . .

Languages and set notation

Most of the time it would be more useful to use set notations to define a language; that is, define a language
by the property the strings in this language posses.
For example, consider the following set of strings
n o
∗
L1 = x x ∈ {a, b} and |x| is even .

In words, L1 is the language of all strings made out of a, b that have even length.
Next, consider the following set
n o

L2 = x there is a w such that xw = illinois .

So L2 is the language made out of all prefixes of L2 . We can write L2 explicitly, but its tedious. Indeed,

L2 = {, i, il, ill, illi, illin, illino, illinoi, illinois} .

22
Why should we care about languages?
Consider the language Lprimes that contains all strings over Σ = {0, 1, . . . , 9} which are prime numbers. If
we can build a fast computer program (or an automata) that can tell us whether a string s (i.e., a number)
is in Lprimes , then we decide if a number is prime or not. And this is a very useful program to have, since
most encryption schemes currently used by computers (i.e., RSA) rely on the ability to find very large prime
numbers.
Let us state it explicitly: The ability to decide if a word is in a spe-
cific language (like Lprimes ) is equivalent to performing a computational Yes
task (which might be extremely non-trivial). You can think about this
Input Program decide-
schematically, as a program that gets as input a number (i.e., string made ing if ihe input is
out of digits), and decides if it is prime or not. If the input is a prime a prime number. No
number, it outputs Yes and otherwise it outputs No. See figure on the
right.

2.1.4 Strings and programs

An text file (i.e., source code of a program) is a long one dimensional string with special hNLi (i.e., newline)
characters that instruct the computer how to display the file on the screen. That is, the special hNLi characters
instruct the computer to start a new line. Thus, the text file
if x=y then
jump up and down and scream.
Is in fact encoded on the computer as the string
if␣x=y␣thenhNLi␣␣jump␣up␣and␣down␣and␣scream.
Here, ␣ denote the special space character and hNLi is the new-line character.
It would be sometime useful to use similar “complicated” encoding schemes, with sub-parts separated by
# or $ rather than by hNLi.
Program input and output can be consider to be files. So a standard program can be taught of as a
function that maps strings to strings.1 That is P : Σ∗ → Σ∗ . Most machines in this class map input strings
to two outputs: “yes” and “no”. A few automatas and most real-world machines produce more complex
output.

2.2 State machines

2.2.1 A simple automata
Here is a simple state machine (i.e., finite automaton) M that accepts all strings starting with a.
q0 a q1 ∗
6=
a

qrej ∗
Here ∗ represents any possible character.
Notice key pieces of this machine: three states, q0 is the start state (arrow coming in), q1 is the final
state (double circle), transition arcs.
To run the machine, we start at the start state. On each input character, we follow the corresponding
arc. When we run out of input characters, we answer “yes” or “no”, depending on whether we are in the final
state.
The language of a machine M is the set of strings it accepts, written L(M ). In this case L(M ) =
{a, aa, ab, aaa, . . .}.
1 Here, we are considering simple programs that just read some input, and print out output, without fancy windows and stuff
like that.

23
2.2.2 Another automata
(This section is optional and can be skipped in the lecture.)
Here is a simple state machine (i.e., finite automaton) M that accepts all ASCII strings ending with
ing.
?

i n g
q0 q1 q2 q3
Notice key pieces of this machine: four states, q0 is the start state (arrow coming in), q3 is the final state
(double circle), transition arcs.
To run the machine, we start at the start state. On each input character, we follow the corresponding
arc. When we run out of input characters, we answer “yes” or “no”, depending on whether we are in the final
state.
The language of a machine M is the set of strings it accepts, written L(M ). In this case L(M ) =
{walking, flying, ing, . . .}.

2.2.3 What automatas are good for?

People use the technology of automatas in real-world applications:

– Find all files containing -ing (grep)

– Translate each -ing into -iG (finite-state transducer)

– How often do words in Chomsky’s latest book end in -ing?

2.2.4 DFA - deterministic finite automata

We will start by studying deterministic finite automata (DFA). Each node in a deterministic machine
has exactly one outgoing transition for each character in the alphabet. That is, if the alphabet is {a, b},
then all nodes need to look like

q1
a
q0
b
q2

Both of the following are bad, where q1 6= q2 and the right hand machine has no outgoing transition for
the input character b.

q1 q1
a a
q0 q0
a
q2

So our -ing detector would be redrawn as:

24
not i or g
not i i
n g

q0 i q1 q2 q3
i
not i or n
i
not i

2.3 More examples of DFAs

2.3.1 Number of characters is even
Input: Σ = {0}.
Accept: all strings in which the number of characters is even.

0
q0 q1
0

2.3.2 Number of characters is divisible by 3

Input: Σ = {0}.
Accept: all strings in which the number of characters is divisible by 3.

q0 0 q1 0 q2

2.3.3 Number of characters is divisible by 6

Input: Σ = {0}.
Accept: all strings in which the number of characters is divisible by 6.

q0 0 q1 0 q2 0 q3 0 q4 0 q5

0
This example is especially interesting, because we can achieve the same purpose, by observing that
n mod 6 = 0 if and only if n mod 2 = 0 and n mod 3 = 0 (i.e., to be divisible by 6, a number has to be
divisible by 2 and divisible by 3 [a generalization of this idea is known as the Chinese remainder theorem]).
So, we could run the two automatas of Section 2.3.1 and Section 2.3.2 in parallel (replicating each input
character to each one of the two automatas), and accept only if both automatas are in an accept state.
This idea would become more useful later in the course, as it provide a building operation to construct
complicated automatas from simple automatas.

25
2.3.4 Number of ones is even
Input is a string over Σ = {0, 1}.
Accept: all strings in which the number of ones is even.

0 0

1
q0 q1
1

2.3.5 Number of zero and ones is always within two of each other
Input is a string over Σ = {0, 1}.
Accept: all strings in which the difference between the number of ones and zeros in any prefix of the
string is in the range −2, . . . , 2. For example, the language contains , 0, 001, and 1101. You even have an
extended sequence of one character e.g. 001111, but it depends what preceded it. So 111100 isn’t in the
language.

q−2 1 q−1 1 q0 1 q1 1 q2
0 0
0 0
0 0, 1
1
qrej

Notice that the names of the states reflect their role in the computation. When you come to analyze
these machines formally, good names for states often makes your life much easier. BTW, the language of
this DFA is
n o
∗
L(M ) = w w ∈ {0, 1} and for every x that is a prefix of w, |#1(x) − #0(x)| ≤ 2 .

2.3.6 More complex language

The input is strings over Σ = {0, 1}.
Accept: all strings of the form 00w, where w contains an even number of ones.

0 0

0 0 1
A B C D
1
1 1

qrej 0, 1

You can name states anything you want. Names of the form qX are often convenient, because they remind
you of what’s a state. And people often make the initial state q0 . But this isn’t obligatory.

26
2.4 The pieces of a DFA
To specify a DFA (deterministic finite automata), we need to describe

– a (finite) alphabet
– a (finite) set of states

– which state is the start state?

– which states are the final states?
– what is the transition from each state, on each input character?

27
Chapter 3

Lecture 3: More on DFAs

27 January 2009

This lecture continues with material from section 1.1 of Sipser.

3.1 JFLAP demo

Go to http://www.jflap.org. Run the applet (“Try applet” near the bottom of the menu on the lefthand
side). Construct some small DFA and run a few concrete examples through it.

3.2 Some special DFAs

For Σ = {a, b}, consider the following DFA that accepts Σ∗ :
a,b

S
The DFA that accepts nothing, is just
a,b

3.3 Formal definition of a DFA

Consider the following automata, that we saw in the previous lecture:
0
q0 q1
0
We saw last class that the following components are needed to specify a DFA:

(i) a (finite) alphabet

(ii) a (finite) set of states

(iii) which state is the start state?

(iv) which states are the final states?

(v) what is the transition from each state, on each input character?

28
Formally, a deterministic finite automaton is a 5-tuple (Q, Σ, δ, q0 , F ) where

• Q: A finite set (the set of states).

• Σ: A finite set (the alphabet)

• δ : Q × Σ → Q is the transition function.

• q0 : The start state (belongs to Q).

• F : The set of accepting (or final ) states, where F ⊆ Q.

For example, let Σ = {a, b} and consider the following DFA M , whose language L(M ) contains strings
consisting of one or more a’s followed by one or more b’s.
a b

q0 a q1 b q2
b a

qrej a,b
Then M = (Q, Σ, δ, q0 , F ), Q = {q0 , q1 , q2 , qrej }, and F = {q2 }. The transition function δ is defined by

δ a b
q0 q1 qrej
q1 q1 q2
q2 qrej q2
qrej qrej qrej

We can also define δ using a formula

δ(q0 , a) = q1

δ(q1 , a) = q1
δ(q1 , b) = q2
δ(q2 , b) = q2
δ(q, t) = qrej for all other values of q and t.

Tables and state diagrams are most useful for small automata. Formulas are helpful for summarizing a
group of transitions that fit a common pattern. They are also helpful for describing algorithms that modify
automatas.

3.4 Formal definition of acceptance

We’ve also seen informally how to run a DFA. Let us turn that into a formal definition. Suppose M =
(Q, Σ, δ, q0 , F ) is a given DFA and w = w1 w2 . . . wk ∈ Σ∗ is the input string. Then M accepts w iff there
exists a sequence of states r0 , r1 , . . . rk in Q, such that

1. r0 = q0

2. δ(ri , wi+1 ) = ri+1 for i = 0, . . . , k − 1.

3. rk ∈ F .

29
n o

The language recognized by M , denoted by L(M ), is the set w M accepts w .
For example, when our automaton above accepts the string aabb, it uses the state sequence q0 q1 q1 q2 q2 .
(Draw a picture of the transitions.) That is r0 = q0 , r1 = q1 , r2 = q1 , r3 = q2 , and r4 = q2 .
Note that the states do not have to occur in numerical order in this sequence, e.g. the following DFA
accepts aaa using the state sequence q0 q1 q0 q1 .
a

q0 q1

a
A language (i.e. set of strings) is regular if it is recognized by some DFA.

3.5 Closure properties

Consider the set of odd integers. If we multiply two odd integers, the answer is always odd. So the set of
odd integers is said to be closed under multiplication. But it is not closed under addition. For example,
3 + 5 = 8 which is not odd.
To talk about closure, you need two sets: a larger universe U and a smaller set X ⊆ U . The universe
is often supposed to be understood from context. Suppose you have a function F that maps values in U to
values in U . Then X is closed under f if F applied to values from X always produces an output value
that is also in X.
For automata theory, U is usually the set of all languages and X contains languages recognized by some
specific sort of machine, e.g. regular languages.

3.5.1 Closure under complement of regular languages

Here we are interested in the question of whether the regular languages are closed under set complement.
(The complement language keeps the same alphabet.) That is, if we have a DFA M = (Q, Σ, δ, q0 , F )
accepting some language L, can we construct a new DFA M 0 accepting L = Σ∗ \ L?
Consider the automata M from above, where L is the set of all strings of at least one a followed by at
least one b.
a b

q0 a q1 b q2
b a

qrej a,b

The complement language L contains the empty string, strings in which some b’s precede some a’s, and
strings that contain only a’s or only b’s.
Our new DFA M 0 should accept exactly those strings that M rejects. So we can make M 0 by swapping
final/non-final markings on the states:

30
a b

q0 a q1 b q2
b a

qrej a,b
Formally, M 0 = (Q, Σ, δ, q0 , Q \ F ).

3.6 Closure under intersection

We saw in previous lecture an automatas that accepts strings of even length, or that their length is a product
of 3. Here are their automatas:

a a a
q0 q1 q0 q1 q2
a
a
M1 : M2 :

Assume, that we would like to build an automata that accepts the language which is the intersection of
the language of both automatas. That is, we would like to accept the language L(M1 ) ∩ L(M2 ). How do we
build an automata for that?
The idea is to build a product automata of both automatas. See the following for an example.
a
q0 q1
a

p0 (q0 , p0 ) (q1 , p0 )
a
a
a
a

q p1 (q0 , p1 ) a (q1 , p1 )
a

a
a
p2 (q0 , p2 ) (q1 , p2 )
Given two automatas M = (Q, Σ, δ, q0 , F ) and M 0 = (Q0 , Σ0 , δ 0 , q00 , F 0 ), their product automata is the
automata formed by the product of the states. Thus, a state in the resulting automata N = M × M 0 is a
pair (q, q 0 ), where q ∈ Q and q 0 ∈ Q0 .

31
The key invariant of the product automata is that after reading a word w, its in the state (q, q 0 ), where,
q is that state that M is at after reading w, and q 0 is the state that M 0 is in after reading w.
As such, the intersection language L(M ) ∩ L(M 0 ) is recognized by the product automata, where we set
the pairs (q, q 0 ) ∈ Q(N ) to be an accepting state for N , if q ∈ F and q 0 ∈ F 0 .
Similarly, the automata accepting the union L(M ) ∪ L(M 0 ) is created from the product automata, by
setting the accepting states to be all pairs (q, q 0 ), such that either q ∈ F or q 0 ∈ F 0 .
As such, the automata accepting the union language L(M1 ) ∪ L(M2 ) is the following.
a
q0 q1
a

p0 (q0 , p0 ) (q1 , p0 ) a
a
a
a
q p1 (q0 , p1 ) a (q1 , p1 )
a

a
a
p2 (q0 , p2 ) (q1 , p2 )

3.7 (Optional) Finite-state Transducers

In many applications, transitions also perform actions. E.g. a transition reading WANNATALK from the network
might also call some C code that opens a new HTTP connection.
Finite-state transducers are a simple case of this. An FST is like a DFA but each transition optionally
writes an output symbol. These can be used to translate strings from one alphabet to another. For example,
the following FST translates binary numbers into base-4 numbers. E.g. 011110 becomes 132. We’ll assume
that FSTs don’t accept or reject strings, just translate them.
1/ǫ
q0 q1
0/2, 1/3
1/1, 0/0 0/ǫ

q2
So, formally, an FST is a 5-tuple (Q, Σ, Γ, δ, q0 ), where

– Q is a finite set (the states).

– Σ and Γ are finite sets (the input and output alphabets).

– δ : Q × Σ → Q × Γ is the transition function.

– q0 is the start state

32
Notation: Γ = Γ ∪ {} .
The transition table for our example FST might look like the following.
δ 0 1
q0 (q1 , ) (q2 , )
q1 (q0 , 0) (q0 , 1)
q2 (q0 , 2) (q0 , 3)

33
Chapter 4

Lecture 4: Regular Expressions and

Product Construction
29 January 2009

This lecture finishes section 1.1 of Sipser and also covers the start of 1.3.

4.1 Product Construction

4.1.1 Product Construction: Example
Let Σ = {a, b} and L is the set of strings in Σ∗ that have the form a∗ b∗ and have even length. L is the
intersection of two regular languages L1 = a∗ b∗ and L2 = (ΣΣ)∗ . We can show they are regular by exhibiting
DFAs that recognize them.
r0

a b a,b a,b a,b

q0 b q1 a r1
drain

L1 L2

We can run these two DFAs together, by creating states that remember the states of both machines.
(q0 , r0 ) (q1 , r0 ) (drain, r0 )
b a a
a a b b b a,b a,b

(q0 , r1 ) (q1 , r1 ) (drain, r1 )

Notice that the final states of the new DFA are the states (q, r) where q is final in the first DFA and r is
final in the second DFA. To recognize the union of the two languages, rather than the intersection, we mark
all the states (q, r) such that either q or r are accepting states in the their respective DFAs.

34
State of a DFA after reading a word w. In the following, given a DFA M = (Q, Σ, δ, q0 , F ) , we will
be interested in what state the DFA M is in, after reading the characters of a string w = w1 w2 . . . wk ∈ Σ∗ .
As in the definition of acceptance, we can just define the sequence of states that M would go through as it
reads w. Formally, r0 = q0 , and
ri = δ(ri−1 , wi ) , for i = 1, . . . , k.
As such, rk is the state M would be after reading the string w. We will denote this state by δ(q0 , w). Note,
that by definition
δ(q0 , w) = δ δ(q0 , w1 . . . wk−1 ) , wk .

In general, if the DFA is in a state q, and we want to know in what state it would be after reading a string
w, we will denote it by δ(q, w).

4.2 Product Construction: Formal construction

We are given two DFAs M = (Q, Σ, δ, q0 , F ) and M 0 = (Q0 , Σ, δ 0 , q00 , F 0 ) both working above the same
alphabet Σ. Their product automata is the automata

N = Q, Σ, δN , (q0 , q00 ) , FN ,

where Q = Q × Q0 , and δN : Q × Σ → Q. Here, for q ∈ Q, q 0 ∈ Q0 and c ∈ Σ, we define

δN ( (q, q 0 ) , c ) = δ(q, c), δ 0 (q 0 , c) . (4.1)
| {z }
state of N

The set FN ⊆ Q of accepting states is free to be whatever we need it to be, depending on what we want
N to recognize. For example, if we would like N to accept the intersection L(M ) ∩ L(M 0 ) then we will set
FN = F ×F 0 . If we want N to recognize the union language L(M )∪L(M 0 ) then FN = (F × Q0 ) ∪ ∪(Q×F 0 ).

Lemma 4.2.1 For any input word w ∈ Σ∗ , the product automata N of the DFAs M = (Q, Σ, δ, q0 , F ) and
M 0 = (Q0 , Σ, δ 0 , q00 , F 0 ), is in state (q, q 0 ) after reading w, if and only if (i) M in the state q after reading w,
and (ii) M 0 is in the state q 0 after reading w.
Proof: The proof is by induction on the length of the word w.
If w = is the empty word, then N is initially in the state (q0 , q00 ) by construction, where q0 (resp. q00 )
is the initial state of M (resp. M 0 ). As such, the claim holds in this case.
Otherwise, assume w = w1 w2 . . . wk−1 wk , and the claim is true by induction for all input words of length
strictly smaller than k.
0
Let (qk−1 , qk−1 ) be the state that N is in after reading the string w b = w1 . . . wk−1 . By induction, as
|w|
b = k − 1, we know that M is in the state qk−1 after reading w, b and M 0 is in the state qk−1 0
after reading
w.
b
Let qk = δ(qk−1 , wk ) = δ(δ(q0 , w)
b , wk ) = δ(q0 , w) and

qk0 = δ 0 (qk−1
0
, wk ) = δ 0 (δ 0 (q00 , w)
b , wk ) = δ 0 (q00 , w) .

As such, by definition, M (resp. M 0 ) would in the state qk (resp. qk0 ) after reading w.
Also, by the definition of its transition function, after reading w the DFA N would be in the state

δN ((q0 , q00 ), w) = δN (δN ((q0 , q00 ), w)
b , wk ) = δN (qk−1 , qk−1 0
), wk
0

= δ(qk−1 , wk ), δ(qk−1 , wk ) = (qk , qk0 ) ,

see Eq. (4.1). This establishes the claim.

Lemma 4.2.2 Let M = (Q, Σ, δ, q0 , F ) and M 0 = (Q0 , Σ, δ 0 , q00 , F 0 ) be two given DFAs. Let N be their
produced automata, where its set of accepting states is F × F 0 . Then L(N ) = L(M ) ∩ L(M 0 ).

35
Proof: If w ∈ L(M ) ∩ L(M 0 ), then qw = δ(q0 , w) ∈ F and qw 0
= δ 0 (q00 , w) ∈ F 0 . By Lemma 4.2.1, this
0 0 0
implies that δN ((q0 , q0 ), w) = (qw , qw ) ∈ F × F . Namely, N accepts the word w, implying that w ∈ L(N ),
and as such L(M ) ∩ L(M 0 ) ⊆ L(N ).
Similarly, if w ∈ L(N ), then (pw , p0w ) = δN ((q0 , q00 ), w) must be an accepting state of N . But the set
of accepting states of N is F × F 0 . That is (pw , p0w ) ∈ F × F 0 , implying that pw ∈ F and p0w ∈ F 0 .
Now, by Lemma 4.2.1, we know that δ(q0 , w) = pw ∈ F and δ 0 (q00 , w) = p0w ∈ F 0 . Thus, M and M 0 both
accept w, which implies that w ∈ L(M ) and w ∈ L(M 0 ). Namely, w ∈ L(M ) ∩ L(M 0 ), implying that
L(N ) ⊆ L(M ) ∩ L(M 0 ).
Putting the above together implies the claim.

4.3 Operations on languages

Regular operations on languages (sets of strings). Suppose L and K are languages.
n o

• Union: L ∪ K = x x ∈ L or x ∈ K .
n o

• Concatenation: L ◦ K = LK = xy x ∈ L and y ∈ K .

• Star (Kleene star ):

n o

L∗ = w1 w2 . . . wn w1 , . . . , wn ∈ L and n ≥ 0 .

We (hopefully) all understand what union does. The other two have some subtleties. Let

L = {under, over} , and K = {ground, water, work} .

Then
LK = {underground, underwater, underwork, overground, overwater, overwork} .
Similarly,  
 , ground, water, work, groundground, 
K∗ = groundwater, groundwork, workground, .
 
waterworkwork, . . .
For star operator, note that the resulting set always contains the empty string (because n can be zero).
Also, each of the substrings is chosen independently from the base set and you can repeat. E.g.
waterworkwork is in K ∗ .
Regular languages are closed under many operations, including the three “regular operations” listed above,
set intersection, set complement, string reversal, “homomorphism” (formal version of shifting alphabets). We
have seen (last class) why regular languages are closed under set complement. We will prove the rest of these
bit by bit over the next few lectures.

4.4 Regular Expressions

Regular expressions are a convenient notation to specify regular languages. We will prove in a few lectures
that regular expressions can represent exactly the same languages that DFAs can accept.
Let us fix an alphabet Σ. Here are the basic regular expressions:
regex conditions set represented
a a∈Σ {a}
{}
∅ {}
Thus, ∅ represents the empty language. But represents that language which has the empty word as its
only word in the language.

36
In particular, for a regular expression hexpi, we will use the notation L(hexpi) to denote the language
associated with this regular expression. Thus,

L() = {} and L(∅) = {} ,

which are two different languages.

We will slightly abuse notations, and write a regular expression hexpi when in reality what we refer to is
the language L(hexpi). (Abusing notations should be done with care, in cases where it clarify the notations,
and it is well defined. Naturally, as Humpty Dumpty did, you need to define your “abused” notations
explicitly.1 )
Suppose that L(R) is the language represented by the regular expression R. Here are recursive rules
that make complex regular expressions out of simpler ones. (Lecture will add some randomly-chosen small
concrete examples.)
regex conditions set represented
R ∪ S or R + S R, S regexes L(R) ∪ L(S)
R ◦ S or RS R, S regexes L(R)L(S)
R∗ R a regex L(R)∗
And some handy shorthand notation:
regex conditions set represented
R+ R a regex L(R)L(R)∗
Σ Σ
Exponentiation binds most tightly, then multiplication, then addition. Just like you probably thought.
Use parentheses when you want to force a different interpretation.
Some specific boundary case examples:

1. R = R = R.

2. R∅ = ∅ = ∅R.
This is a bit confusing, so let us see why this is true, recall that
n o

R∅ = xy x ∈ R and y ∈ ∅ .

But the empty set (∅) does not contain any element, and as such, no concatenated string can be created.
Namely, its the empty language.

3. R ∪ ∅ = R (just like with any set).

4. R ∪ = ∪ R.
This expression can not always be simplified, since might not be in the language L(R).

5. ∅∗ = {}, since the empty word is always contain in the language generated by the star operator.

6. ∗ = {}.
1 From Through the Looking Glass, by Lewis Carroll:

‘And only one for birthday presents, you know. There’s glory for you!’
‘I don’t know what you mean by “glory”,’ Alice said.
Humpty Dumpty smiled contemptuously. ‘Of course you don’t – till I tell you. I meant “there’s a nice knock-
down argument for you!” ’
‘But “glory” doesn’t mean “a nice knock-down argument”,’ Alice objected.
‘When I use a word,’ Humpty Dumpty said, in rather a scornful tone, ‘it means just what I choose it to mean
– neither more nor less.’
‘The question is,’ said Alice, ‘whether you can make words mean so many different things.’
‘The question is,’ said Humpty Dumpty, ‘which is to be master – that’s all.’

37
4.4.1 More interesting examples
Suppose Σ = {a, b, c}.

1. (ΣΣ)∗ is the language of all even-length strings.

(That is, the language associated with the regular expression (ΣΣ)∗ is made out of all the even-length
strings over Σ.)
2. Σ(ΣΣ)∗ is all odd-length strings.

3. aΣ∗ a + bΣ∗ b + cΣ∗ c is all strings that start and end with the same character.

Regular expression for decimal numbers

Let D = {0, 1, ..., 9}, and consider the alphabet E = D ∪ {−, .}. Then decimal numbers have the form

(− ∪ ) D∗ ( ∪ .)D∗ .

But this does not force the number to contain any digits, which is probably wrong. As such, the correct
expression is
(− ∪ )(D+ ( ∪ .)D∗ ∪ D∗ ( ∪ .)D+ ).
Notice that an is not a regular expression. Some things written nwith non-star
o exponents are regular
2n
and some are not. It depends on what conditions you put on n. E.g. a n ≥ 0 is regular (even length
n o

strings of a’s). But an bn n ≥ 0 is not regular.
However, a3 (or any other fixed power) is regular, as it just a shorthand for aaa. Similarly, if R is a
regular expression, then R3 is regular since its a shorthand for RRR.

38
Chapter 5

Lecture 5: Nondeterministic
Automata
February 3, 2009

This lecture covers the first part of section 1.2 of Sipser, through p 54.

5.1 Non-deterministic finite automata (NFA)

A non-deterministic finite automata (NFA) is like a DFA but with three extra features. These features
make them easier to construct, especially because they can be composed in a modular fashion. Furthermore,
they are easier to read, and they tend to be much smaller and as such easier to describe. Computationally,
they are equivalent to DFAs, in the sense that they recognize the same languages.
For practical applications of any complexity, users can write NFAs or regular expressions (trivial to convert
to NFAs). A computer algorithm might compile these to DFAs, which can be executed/simulated quickly.

5.1.1 NFA feature #1: Epsilon transitions

An NFA can do a state transition without reading input. This makes it easy to represent optional characters.
For example, “Northampton” is commonly misspelled as “Northhampton”. A web search type application can
recognize both variants using the pattern North(h)ampton.
h
... t h a ...
ǫ
Epsilon transitions also allow multiple alternatives (set union) to be spliced together in a nice way. E.g.
we can recognize the set {UIUC, MIT, CMU} with the following automaton. This allows modular construction
of large state machines.
U I U C
ǫ ǫ
ǫ M I T ǫ
ǫ ǫ
C M U

How do we execute an NFA?

Assume a NFA N is in state q, and the next input character is c. The NFA N may have multiple transitions
it could take. That is, multiple possible next states. An NFA accepts if there is some path through its state

39
diagram that consumes the whole input string and ends in an accept state.
Here are two possible ways to think about this:

(i) the NFA magically guesses the right path which will lead to an accept state.

(ii) the NFA searches all paths through the state diagram to find such a path.

The first view is often the best for mathematical analysis. The second view is one reasonable approach to
implementing NFAs.

5.1.2 NFA Feature #2: Missing transitions

Assume a NFA N is in state q, and the next input character is c. The NFA may have no outgoing transition
from q that corresponds to the input character c.
This means that you can not get to an accepting state from this point on. So the NFA will reject the
input string unless there is some other alternative path through the state diagram. You can think of the
missing transitions as going to an implicit sink state. Visually, diagrams of NFAs are much simpler by not
having to put in the sink state explicitly.

Example. Consider the DFA that accepts all the strings over {a, b} that starts with aab. Here is the
resulting DFA.
q0 a q1 a q2 b q3

b b a
a, b
snk

a, b
The NFA for the same language is even simpler if we omit transitions, and the sink state. In particular,
the NFA for the above language is the following.
q0 a q1 a q2 b q3

a, b

5.1.3 NFA Feature #3: Multiple transitions

b
A state q in a NFA may have more than one outgoing transition for
some character t. This means that the NFA needs to “guess” which
path will accept the input string. Or, alternatively, search all possible q1
paths. This complicates deciding if a string is accepted by a NFA, but a b
it greatly simplifies the resulting machines. Thus, the automata on q a q3
the right accepts all strings in the language ab∗ b + aa∗ a. Of course,
its not too hard to build a DFA for this language, but even here the a a
description of the NFA is simpler.
q2

As another example, the automata below accepts strings containing the substring abab.

40
a,b a,b

(N1) a b a b
1 2 3 4 5
The respective DFA, shown below, needs a lot more transitions and is somewhat harder to read.
b a a,b
a
a b a b
B C D E G

5.2 More Examples Possible Remaining

Time states input
5.2.1 Running an NFA via search
t=0 {1} ababa
Let us run an explicit search for the above NFA (N1) on the input t=1 {1, 2} baba
string ababa. Initially, at time t = 0, the only possible state is t=2 {1, 3} aba
the start state 1. The search is depicted in table on the right. t=3 {1, 2, 4} ba
When the input is exhausted, one of the possible states (E) is an t=4 {1, 3, 5} a
accept state, and as such the NFA (N1) accepts the string ababa. t=5 {1, 2, 4, 5}

5.2.2 Interesting guessing example

Some NFAs are easier to construct and analyze if you take the “guessing” view on how they work.
Let Σ = {0, 1, . . . , 9}, denote this as [0, 9] in short form. Let
n o

L = w#c c ∈ Σ, w ∈ Σ∗ , and c occurs in w .

For example, the word 314159#5 is in L, and so is 314159#3. But the word 314159#7 is not in L.
Here is the NFA M that recognizes this language.

[0, 9]
q0 q0′
#

[0, 9]
0 0
q1 q1′
#
1 1
.. ..
qs . . qf
9 9
[0, 9]
[0, 9] q9 q9′
#
The NFA M scans the input string until it “guesses” that it is at the character c in w that will be at the
end of the input string. When it makes this guess, M transitions into a state qc that “remembers” the value
c. The rest of the transitions then confirm that the rest of the input string matches this guess.

41
A DFA for this problem is considerably more taxing. We will need a state to remember each digit
encountered in the string read so far. Since there are 210 different subsets, we will require an automata with
at least 1024 states! The NFA above requires only 22 states, and is much easier to draw and understand.

5.3 Formal definition of an NFA

An NFA is a 5-tuple (Q, Σ, δ, q0 , F ). Similar to a DFA except that the type signature for δ is

δ : Q × Σ → P(Q),

where Σ = Σ ∪ {} and P(Q) is the power set of Q (i.e., all possible subsets of Q). As such, the input
character for δ(·) can be either a real input character or (in this case the NFA does not eat [or drink] any
input character when using this transition). The output value of δ is a set of states (unlike a DFA).

Example 5.3.1 Consider the language L = (a + b)∗ a(b + )b.

Its respective NFA is the following:
a,b

b
a b
A B ǫ C D
Here
δ(A, a) = {A, B}
δ(B, a) = ∅ (NB: not {∅})
δ(B, ) = {C} (NB: not just C)
δ(B, b) = {C} (NB: just follows one transition arc).
The trace for recognizing the input abab:
t = 0: state = A, remaining input abab.
t = 1: state = A, remaining input bab.
t = 2: state = A, remaining input abi.
t = 3: state = B, remaining input b.
t = 4: state = C, remaining input b ( transition used, and no input eaten).
t = 5: state = D, remaining input .
Is every DFA an NFA? Technically, no (why?1 ). However, it is easy to convert any DFA into an NFA.
If δ is the transition function of the DFA, then the corresponding transition of the NFA is going to be
δ 0 (q, t) = {δ(q, t)}.

5.3.1 Formal definition of acceptance

Let M = Q, Σ, δ, q0 , F be an NFA. Let w be a string in Σ∗ .
The NFA M accepts w if and only if there is a sequence of states r0 , r1 , . . . , rn and a sequence of inputs
x1 , x2 , . . . , xn , where each xi is either a character from Σ or , such that
(i) w = x1 x2 . . . xn .
(The input string “eaten” by the NFA is the input string w.)
1 Because, the transition function is defined differently.

42
(ii) r0 = q0 .
(The NFA starts from the start state.)
(iii) rn ∈ F .
(The final state in the trance in an accepting state.)

(iv) ri+1 ∈ δ(ri , xi+1 ) for every i in [0, n − 1].

(The transitions in the trace are all valid. That is, the state ri+1 is one of the possible states one can
go from ri , if the NFA consumes the character xi+1 .
So, in the above example, n = 6, our state sequence is AAABCD, and our sequence of inputs is abab.
Key differences the notation of acceptance from DFA are

(i) Inserting/allowing into input character sequence.

(ii) Output of δ is a set, so in condition (iv) above, ri+1 is a member of δ’s output. (For a DFA, in this
case, we just had to check that the new state is equal to δ’s output.)

43
Chapter 6

Lecture 6: Closure properties

February 5, 2009

This lecture covers the last part of section 1.2 of Sipser (pp. 58–63), part of 1.3 (pp. 66–69), and also
closure under string reversal and homomorphism.

6.1 Overview
We defined a language to be regular if it is recognized by some DFA. The agenda for the new few lectures
is to show that three different ways of defining languages, that is NFAs, DFAs, and regexes, and in fact all
equivalent; that is, they all define regular languages. We will show this equivalence, as follows.
next next lecture

DFA trivial Regex

NFA today
next lecture
One of the main properties of languages we are interested in are closure properties, and the fact that
regular languages are closed under union, intersection, complement, concatenation, and star (and also under
homomorphism).
However, closure operations are easier to show in one model than the other. For example, for DFAs
showing that they are closed under union, intersection, complement are easy. But showing closure of DFA
under concatenation and ∗ is hard.
Here is a table that lists the closure property and how hard it is to show it in the various models of
regular languages.

Model

Property ∩ ∪ L ◦ ∗
intersection union complement concatenation star

DFA Easy (done) Easy (done) Easy (done) Hard Hard

Easy: Easy: Easy:
NFA Doable (hw?) Hard
Lemma 6.3.1 Lemma 6.3.2 Lemma 6.3.3

regex Hard Easy Hard Easy Easy

44
Recall what it means for regular languages to be closed under an operation op. If L1 and L2 are regular,
then L1 op L2 is regular. That is, if we have an NFA recognizing L1 and an NFA recognizing L2 , we can
construct an NFA recognizing L1 op L2 .
The extra power of NFAs makes it easy to prove closure properties for NFAs. When we know all DFAs,
NFAs, and regexes are equivalent, these closure results then apply to all three representations. Namely, they
would imply that regular languages have these closure properties.

6.2 Closure under string reversal for NFAs

Consider a word w, we denote by wR the reversed word. It is just w in the characters in reverse order. For
example, for w = barbados, we have wR = sodabrab. For a language L, the reverse language is
n o

LR = wR w ∈ L .

We would like to claim that if L is regular, then so is LR . Formally, we need to be a little bit more
careful, since we still did not show that a language being regular implies that it is recognized by an NFA.

Claim 6.2.1 If L is recognized by an NFA, then there is an NFA that recognizes LR .

Proof: Let M be an NFA recognizing L. We need to construct an NFA N recognizing LR .

The idea is to reverse the arrows in the NFA M = (Q, Σ, δ, q0 , F ), and swap final and initial states. There
is a bug in applying this idea in a naive fashion. Indeed, there is only one initial state but multiple final
states.
To overcome this, let us modify M to have a single final state qS , connected to old ones with epsilon
transitions. Thus, the modified NFA accepting L, is

M 0 = Q ∪ {qS } , Σ, δ 0 , q0 , {qS } ,

where qS is the only accepting state for M . Note, that δ 0 is identical to δ, except that

∀q ∈ F δ 0 (q, ) = qS . (6.1)

Note, that L(M ) = L(M 0 ) = L.

As such, qS will become the start state of the “reversed” NFA.
Now, the new “reversed” NFA N , for the language LR , is

N = Q ∪ {qS }, Σ, δ 00 , qS , {q0 } .

Here, the transition function δ 00 is defined as

(i) δ 00 (q0 , t) = ∅ for every t ∈ Σ.

n o

(ii) δ 00 (q, t) = r ∈ Q q ∈ δ 0 (r, t) , for every q ∈ Q ∪ {qS }, t ∈ Σ .

(iii) δ 00 (qS , ) = F (the reversal of Eq. (6.1)). 1

Now, we need to prove formally that if w ∈ L(M ) then wR ∈ L(N ), but this is easy induction, and we
omit it.

Note, that this will not work for a DFA. First, we can not force a DFA to have a single final state. Second, a
state may have two incoming transitions on the same character, resulting in non-determinism when reversed.

45
6.3 Closure of NFAs under regular operations
We consider the regular operations to be union, concatenation, and the star operator.
Advice to instructor: Do the following constructions via pictures, and give detailed tuple notation !!!
for only one of them.

6.3.1 NFA closure under union

Given two NFAs, say N and N , we would like to build an NFA for the language L(N ) ∪ L(N ). The idea
is to create a new initial state qs and connect it with an -transition to the two initial states of N and N .
Visually, the resulting NFA M looks as follows.

M fm
N′ fm
q0 ..
.. ǫ .
q0 .
f1
f1
qs
=⇒
′
fm
N ′
fm ǫ
q0′ ..
.. .
q0′ .
f1′
f1′

Formally, we are given two NFAs N = (Q, Σ, δ, q0 , F ) and N 0 = (Q0 , Σ, δ 0 , q00 , F 0 ), where Q ∩ Q0 = ∅ and
the new state qs is not in Q or Q0 . The new NFA M is

M = (Q ∪ Q0 ∪ {qs } , Σ, δM , sq , F ∪ F 0 ) ,

where 

 δ(q, c) q ∈ Q, c ∈ Σ

δ 0 (q, c) q ∈ Q0 , c ∈ Σ
δM (q, c) =

 {q0 , q00 } q = qs , c =


∅ q = qs , c 6= .
We thus showed the following.

Lemma 6.3.1 Given two NFAs N and N 0 one can construct an NFA M , such that L(M ) = L(N ) ∪ L(N 0 ).

6.3.2 NFA closure under concatenation

Given two NFAs N and N 0 , we would
n o like to construct an NFA for the concatenated language L(N )◦L(N 0 ) =
0
xy x ∈ L(N ) and y ∈ L(N ) . The idea is to concatenate the two automatas, by connecting the final
states of the first automata, by -transitions, into the start state of the second NFA. We also make the
accepting states of N not-accepting. The idea is that in the resulting NFA M , given input w, it “guesses”
1 This can be omitted, since it is implied by the (ii) rule.

46
how to break it into two strings x ∈ L(N ) and y ∈ L(N 0 ), so that w = xy. Now, there exists an execution
trace for N accepting x, then we can jump into the starting state of N 0 and then use the execution trace
accepting y, to reach an accepting state of the new NFA M . Here is how visually the resulting automata
looks like.

N fm N′ ′
fm

q0 .. q0′ ..
. .

f1 f1′

M N fm N′ ′
fm
ǫ
q0 .. q0′ ..
. ǫ .

f1 f1′

Formally, we are given two NFAs N = (Q, Σ, δ, q0 , F ) and N 0 = (Q0 , Σ, δ 0 , q00 , F 0 ), where Q ∩ Q0 = ∅. The
new automata is
M = (Q ∪ Q0 , Σ, δM , q0 , F 0 ) ,
where 

δ(q, ) ∪ {q00 } q ∈ F, c =

δ(q, c) q ∈ F, c 6=
δM (q, c) =

δ(q, c) q ∈ Q \ F, c ∈ Σ

 0
δ (q, c) q ∈ Q0 , c ∈ Σ .

Lemma 6.3.2 Given two NFAs N and N 0 one can construct an NFA M , such that L(M ) = L(N ) ◦ L(N 0 ) =
L(N )L(N 0 ).

Proof: The construction is described above, and the proof of the correctness (of the construction) is easy
and sketched above, so we skip it. You might want to verify that you know how to fill in the details for this
proof (wink, wink).

6.3.3 NFA closure under the (Kleene) star

We are given a NFA N , and we would like to build an NFA for the Kleene star language
n o
∗
(L(N )) = w1 w2 . . . wk w1 , . . . , wk ∈ L(N ), k ≥ 0 .

The idea is to connect the final states of N back to the initial state using -transitions, so that it can
loop back after recognizing a word of L(N ). As such, in the ith loop, during the execution, the new NFA M
recognized the word wi . Naturally, the NFA needs to guess when to jump back to the start state of N . One
∗
minor technicality, is that ∈ (L(N )) , but it might not be in L(N ). To overcome this, we introduce a new
start state qs (which is accepting), and its connected by (you guessed it) an -transition to the initial state
of N . This way, ∈ L(M ), and as such it recognized the required language. Visually, the transformation
looks as follows.

47
ǫ
M
N fm N fm

q0 .. qs ǫ q0 ..
. =⇒ .

f1 f1

ǫ
Formally, we are given the NFA N = (Q, Σ, δ, q0 , F ), where qs ∈
/ Q. The new NFA is

M = Q ∪ {qs } , Σ, δM , qs , F ∪ {qs } ,

where 

δ(q, ) ∪ {q0 } q ∈ F, c =




δ(q, ) q ∈ F, c 6=
δM (q, c) = δ(q, c) q ∈Q\F



{q0 } q = q0 , c =


∅ q = q0 , c 6= .

Why the extra state? The construction for star needs some explanation. We add arcs from final states
back to initial state to do the loop. But then we need to ensure that is accepted. It’s tempting to just
make the initial state final, but this doesn’t work for examples like the following. So we need to add a new
initial state to handle .
b

q0 a q1
Notice that it also works to send the loopback arcs to the new initial state rather than to the old initial
state.
∗
Lemma 6.3.3 Given an NFA N , one can construct an NFA M that accepts the language (L(N )) .

6.3.4 Translating regular expressions into NFAs

Lemma 6.3.4 For every regular expression R over alphabet Σ, there is a NFA NR such that L(R) = L(NR ).

Proof: The proof is by induction on the structure of R (can be interpreted as induction over the number of
operators in R)
The base of the induction is when R contains no operator (i.e., the number operators in R is zero), then
R must be one of the following:

q0 c q1 .
(i) If R = c, where c ∈ Σ, then the corresponding NFA is

(ii) If R = then the corresponding NFA is q2 .

(iii) If R = ∅ then the corresponding NFA is q3 .

48
As for induction step, assume that we proved the claim for all expressions having at most k − 1 operators,
and R has k operators in it. We consider if R can be written in any of the following forms:

(i) R = R1 + R2 . By the induction hypothesis, there exists two NFAs N1 and N2 such that L(N1 ) = L(R1 )
and L(N2 ) = L(R2 ). By Lemma 6.3.1, there exists an NFA M that recognizes the union; that is
L(M ) = L(N1 ) ∪ L(N2 ) = L(R1 ) ∪ L(R2 ) = L(R).

(ii) R = R1 ◦ R2 ≡ R1 R2 . By the induction hypothesis, there exists two NFAs N1 and N2 such that
L(N1 ) = L(R1 ) and L(N2 ) = L(R2 ). By Lemma 6.3.2, there exists an NFA M that recognizes the
concatenated language; that is, L(M ) = L(N1 ) ◦ L(N2 ) = L(R1 ) ◦ L(R2 ) = L(R).

∗
(iii) R = (R1 ) . By the induction hypothesis, there exists a NFA N1 , such that L(N1 ) = L(R1 ). By
∗
Lemma 6.3.3, there exists an NFA M that recognizes the star language; that is, L(M ) = (L(N1 )) =
∗
(L(R1 )) = L(R).

This completes the proof of the lemma, since we showed for all possible regular expressions with k operators
how to build a NFA for them.

Example: From regular expression into NFA

Consider the regular expression R = (a + )(aa + ba)∗ . We have that R = R1 ◦ R2 , where R1 = a + and
R2 = (aa + ba)∗ . Let use first build an NFA for R1 = a + . The NFA for is q2 . and for a is

q0 a q1 . By Lemma 6.3.1, the NFA for their union, and thus of R1 , is

ǫ q2
qs
ǫ
q0 a q1

∗ q0 a q1 , and as such the NFA

Now, R2 = (R3 ) , where R3 = aa + ba. The NFA for a is

q4 a q5 ǫ q6 a q7 , by Lemma 6.3.2. Similarly, the NFA for ba is

for aa is

q8 b q9 ǫ q10 a q11 . As such, by Lemma 6.3.1, the NFA for R3 = aa + ba is

q4 a q5 ǫ q6 a q7
ǫ
q12
ǫ
q8 b q9 ǫ q10 a q11

∗
By Lemma 6.3.3, the NFA for R2 = (R3 ) is

49
q2 ǫ
ǫ ǫ
qs q4 a q5 ǫ q6 a q7
ǫ ǫ
q0 a q1 ǫ q ǫ q
13 12
ǫ
q8 b q9 ǫ q10 a q11

Figure 6.1: The NFA constructed for the regular expression R = (a + )(aa + ba)∗ .

q4 a q5 ǫ q6 a q7
ǫ
q13 ǫ q12
ǫ
q8 b q9 ǫ q10 a q11

ǫ
Now, R = R1 R2 = R1 ◦ R2 , and by Lemma 6.3.2, the NFA for R is depicted in Figure 6.1.
Note, that the resulting NFA is by no way the simplest and more elegant NFA for this language (far from
it), but rather the NFA we get by following our construction carefully.

50
Chapter 7

Lecture 7: NFAs are equivalent to

DFAs
10 February 2009

7.1 From NFAs to DFAs

7.1.1 NFA handling an input word
For the NFA N = (Q, Σ, δ, q0 , F ) that has no -transitions, let us define ∆N (X, c) to be the set of states that
N might be in, if it was in a state of X ⊆ Q, and it handled the input c. Formally, we have that
[
∆N (X, c) = δ(x, c).
x∈X

We also define ∆N (X, ) = X. Given a word w = w1 , w2 , . . . , wn , we define

∆N (X, w) = ∆N ∆N (X, w1 . . . wn−1 ) , wn = ∆N (∆N (. . . ∆N (∆N (X, w1 ), w2 ) . . .) , wn ) .

That is, ∆N (X, w) is the set of all the states N might be in, if it starts from a state of X, and it handles
the input w.
The proof of the following lemma is by an easy induction on the length of w.

Lemma 7.1.1 Let N = (Q, Σ, δ, q0 , F ) be a given NFA with no -transitions. For any word w ∈ Σ∗ , we have
that q ∈ ∆N ({q0 } , w), if and only if, there is a way for N to be in q after reading w (when starting from the
start state q0 ).
More details. We include the proof for the sake of completeness, but the reader should by now be able to fill in such
a proof on their own.
Proof: The proof is by induction on the length of w = w1 w2 . . . wk .
If k = 0 then w is the empty word, and then N stays in q0 . Also, by definition, we have ∆N ({q0 } , w) = {q0 }, and
the claim holds in this case.
Assume that the claim holds for all word of length at most n, and let k = n + 1 be the length of w. Consider a
state qn+1 that N reaches after reading w1 w2 . . . wn wn+1 , and let qn be the state N was before handling the character
wn+1 and reaching qn+1 . By induction, we know that qn ∈ ∆N ({q0 } , w1 w2 . . . wn ). Furthermore, we know that
qn+1 ∈ δ(qn , wn+1 ). As such, we have that
[
qn+1 ∈ δ(qn , wn+1 ) ⊆ δ(q, wn+1 )
q∈∆N({q0 },w1 w2 ...wn )

= ∆N (∆N ({q0 } , w1 w2 . . . wn ) , wn+1 ) = ∆N ({q0 } , w1 w@ . . . wn+1 )

= ∆N ({q0 } , w) .
Thus, qn+1 ∈ ∆N ({q0 } , w).
As for the other direction, if pn+1 ∈ ∆N ({q0 } , w), then there must exist a state pn ∈ ∆N ({q0 } , w1 . . . wn ), such that
pn+1 ∈ δ(pn , wn+1 ). By induction, this implies that there is execution trace for N starting at q0 and ending at pn , such
that N reads w1 . . . wn to reach pn . As such, appending the transition from pn to pn+1 (that read the character wn+1
to this trace, results in a trace for N that starts at q0 , reads w, and end up in the state pn+1 .
Putting these two arguments together, imply the claim.

51
7.1.2 Simulating NFAs with DFAs
One possible way of thinking about simulating NFAs is to consider each state to be a “light” that can be
either on or off. In the beginning, only the initial state is on. At any point in time, all the states that the
NFA might be in are turned on. As a new input character arrives, we need to update the states that are on.
As a concrete examples, consider the automata below (which you had seen before), that accepts strings
containing the substring abab.
a,b a,b

(N1) a b a b
A B C D E
Let us run an explicit search for the above NFA (N1) on the input string ababa.

a,b a,b
t = 0:
a b a b
A B C D E
Remaining input: ababa.

a,b a,b
t = 1:
a b a b
A B C D E
Remaining input: baba.

a,b a,b
t = 2:
a b a b
A B C D E
Remaining input: aba.

a,b a,b
t = 3:
a b a b
A B C D E
Remaining input: ba.

a,b a,b
t = 4:
a b a b
A B C D E
Remaining input: a.

a,b a,b
t = 5:
a b a b
A B C D E
Remaining input: .

52
b

a,b

a,b
A

A
b a
a

a,b
A
B

a
b a b

B
a
C

b
a

C
D

a
b

D
a,b

a,b

a,b
E

a,b
E
a
b
a b
a,b

a,b

a,b
A

A
a
a

a
B

B
a b a
b

b
b
C

C
b
a

a
D

D
b

b
a,b

a,b

a,b
E

E
Figure 7.1: The resulting DFA

Note, that (N1) accepted ababa because when its done reading the input, the accepting state is on.

This provide us with a scheme to simulate this NFA with a DFA: (i) Generate all possible configurations of
states that might be turned on, and (ii) decide for each configuration what is the next configuration, what is
the next configuration. In our case, in all configurations the first state is turned on. The initial configuration
is when only state A is turned on. If this sounds familiar, it should, because what you get is just a big nasty,
hairy DFA, as shown on the last page of this class notes. The same DFA with the unreachable states removed
is shown in Figure 7.1.

Every state in the DFA of Figure 7.1 can be identified by the subset of the original states that is turned
on (namely, the original automata might be any of these states).

53
b a
Thus, a more conventional drawing of {A, C}
this automata is shown on the right. b
Thus, to convert an NFA N with a set {A} a {A, B} a
of states Q into a DFA, we consider all the
subsets of Q that N might be realized as.
Namely, every subset of Q (i.e., a member a {A, B, D}
of P(Q) – the power set of Q) is going to
b
be a state in the new automata. Now, con- b b
sider a subset X ⊆ Q, and for every input
character c ∈ Σ, let us figure out in what
a {A, C, E}
states the original NFA N might be in if it b
is in one of the states of X, and it handles a {A, B, E}
the characters c. Let Y be the resulting set {A, E} b a
of such states. a
{A, B, D, E}
b
Clearly, we had just computed the transition function of the new (equivalent) DFA, showing that if the
NFA is in one of the states of X, and we receive c, then the NFA now might be in one of the states of Y .
Now, if the initial state of the NFA N is q0 , then the new DFA MDFA would start with the state (i.e.,
configuration) {q0 } (since the original NFA might be only in q0 at this point in time).
Its important that our simulation is faithful : At any point in time, if we are in state X in MDFA then
there is a path in the original NFA N , with the given input, to reach each state of Q that is in X (and
similarly, X includes all the states that are reachable with such an input).
When does MDFA accepts? Well, if it is in state X (here X ⊆ Q), then it accepts only if X includes one
of the accepting states of the original NFA N .
Clearly, the resulting DFA MDFA is equivalent to the original NFA.

7.1.3 The construction of a DFA from an NFA

Let N = (Q, Σ, δ, q0 , F ) be the given NFA that does not have any -transitions. The new DFA is going to be

b qb0 , Fb ,
MDFA = P(Q) , Σ, δ,

where P(Q) is the power set of Q, and δb (the transition function), qb0 the initial state, and the set of accepting
states Fb are to be specified shortly. Note that the states of MDFA are subsets of Q (which is slightly confusing),
and as such the starting state of MDFA , is qb0 = {q0 } (and not just q0 ).
We need to specify the transition function, so consider X ∈ P(Q) (i.e., X ⊆ Q), and a character c. For a
state s ∈ X, the NFA might go into any state in δ(s, c) after reading q. As such, the set of all possible states
the NFA might be in, if it started from a state in X, and received c, is the set
[
Y = δ(s, c).
s∈X

As such, the transition of MDFA from X receiving c is the state of MDFA defined by Y . Formally,
[
b
δ(X, c) = Y = δ(s, c). (7.1)
s∈X

As for the accepting states, consider a state X ∈ P(Q) of MDFA . Clearly, if there is a state of F in X,
then X is an accepting state; namely, F ∩ X 6= ∅. Thus,
n o

Fb = X X ∈ P(Q) , X ∩ F 6= ∅ .

54
Proof of correctness
Claim 7.1.2 For any w ∈ Σ∗ , the set of states reached by the NFA N on w is precisely the state reached by
b 0 } , w).
MDFA on w. That is ∆N ({q0 } , w) = δ({q

Proof: The proof is by induction on the length of w.

If w is the empty word, then N is at q0 after reading (i.e., ∆N ({q0 } , ) = {q0 }), and the MDFA is still
in its initial state which is {q0 }.
So assume that the claim holds for all words of length at most k.
Let w = w1 w2 . . . wk+1 . Let X be the set of states that N might reach from q0 after reading w0 = w1 . . . wn ;
that is X = ∆N ({q0 } , w0 ). By the induction hypothesis, we have that MDFA is in the state X after reading
w0 (formally, we have that δ({qb 0 } , w0 ) = X).
Now, the NFA N , when reading the last character wk+1 , can start from any state of X, and use any
transition from such a state that reads the character wk+1 . Formally, the NFA N is in one of the states of
[
Z = ∆N (X, wk+1 ) = δ(s, wk+1 ) .
s∈X

Similarly, by the definition of MDFA , we have that from the state X, after reading wk+1 , the DFA MDFA is in
the state [
b
Y = δ(X, wk+1 ) = δ(s, wk+1 ),
s∈X

see Eq. (7.1). But clearly, Z = Y , which establishes the claim.

Lemma 7.1.3 Any NFA N , without -transitions, can be converted into a DFA MDFA , such that MDFA
accepts the same language as N .

Proof: The construction is described above.

So consider a word w ∈ Σ∗ , and observe that w ∈ L(N ) if and only if, the set of states N might be in
after reading w (that is ∆N ({q0 } , w)), contains an accepting state of N . Formally, w ∈ L(N ) if and only if

∆N {q0 } , w ∩ F 6= ∅.

The DFA MDFA is in the state δ({q b 0 } , w) after reading w. Claim 7.1.2, implies that Y = δ({q
b 0 } , w) =
∆N ({q0 } , w). By construction, the MDFA accepts at this state, if and only if, Y ∈ Fb, which equivalent to
that Y contains a final state of N . That is Y ∩ F 6= ∅. Namely, MDFA accepts w if

δb {q0 } , w ∩ F 6= ∅ ⇐⇒ ∆N {q0 } , w ∩ F 6= ∅.

Implying that MDFA accepts w if and only if N accepts w.

Handling -transitions
Now, we would like to handle a general NFA that might have -transitions. The problem is demonstrated in
the following NFA in its initial configuration:
a,b a,b

a, ǫ b a, ǫ b, ǫ
A B C D E
Clearly, the initial configuration here is {A, B} (and not the one drawn above), since the automata can
immediately jump to B if the NFA is already in A. So, the configuration {A} should not be considered at
all. As such, the true initial configuration for this automata is

55
a,b a,b

(N2) a, ǫ b a, ǫ b, ǫ
A B C D E
Next, consider the following more interesting configuration.
a,b a,b

a, ǫ b a, ǫ b, ǫ
A B C D E
But here, not only we can jump from A to B, but we can also jump from C to D, and from D to E. As
such, this configuration is in fact the following configuration
a,b a,b

(N3) a, ǫ b a, ǫ b, ǫ
A B C D E
In fact, this automata can only be in these two configurations because of the -transitions.

So, let us formalize the above idea: Whenever the NFA N might be in a state s, we need to extend the
configuration to all the states of the NFA reachable by -transitions from s. Let R (s) denote the set of all
states of N that are reachable by a sequence of -transitions from s (s is also in R (s) naturally, since we
can reach s without moving anywhere).
Thus, if N might be any state of X ⊆ Q, then it might be in any state of
[
E(X) = R (s) .
s∈X

As such, whenever we consider the set of states X for Q, in fact, we need to consider the extended set of
states E(X). As such, for the above automata, we have

E({A}) = {A, B} and E({A, C}) = {A, B, C, D, E} .

Now, we can essentially repeat the above proof.

Theorem 7.1.4 Any NFA N (with or without -transitions) can be converted into a DFA MDFA , such that
MDFA accepts the same language as N .

Proof: Let N = (Q, Σ, δ, q0 , F ). The new DFA is going to be

MDFA = P(Q) , Σ, δM , qS , Fb .

Here, P(Q), Σ and Fb are the same as above.

Now, for X ∈ P(Q) and c ∈ Σ, let
b
δM (X) = E δ(X, c) ,

where δb is the old transition function from the proof of Lemma 7.1.3; namely, we always extend the new set
of states to include all the states we can reach by -transitions. Similarly, the initial state is now

qS = E({q0 }) .

It is now straightforward to verify that the new DFA is indeed equivalent to the original NFA, using the
argumentation of Lemma 7.1.3.

56
57
Chapter 8

Lecture 8: From DFAs/NFAs to

Regular Expressions
12 February 2009

In this lecture, we will show that any DFA can be converted into a regular expression. Our construction
would work by allowing regular expressions to be written on the edges of the DFA, and then showing how one
can remove states from this generalized automata (getting a new equivalent automata with the fewer states).
In the end of this state removal process, we will remain with a generalized automata with a single initial
state and a single accepting state, and it would be then easy to convert it into a single regular expression.

8.1 From NFA to regular expression

8.1.1 GNFA— A Generalized NFA
ab∗ aa
Consider an NFA N where we allowed to write any regu- A B
lar expression on the edges, and not only just symbols. The
automata is allowed to travel on an edge, if it can matches a
prefix of the unread input, to the regular expression written
a∗
∅ ab ∪ ba
on the edge. We will refer to such an automata as a GNFA
(generalized non-deterministic finite automata [Don’t
(aa)∗
you just love all these shortcuts?]).
b∗
Thus, the GNFA on the right, accepts the string C E
abbbbaaba, since
ab
abbbb aa
A −−−→ B −→ B −→ E.
ba
b

To simplify the discussion, we would enforce the following conditions:

(C1) There are transitions going from the initial state to all other states, and there are no transitions into
the initial state.

(C2) There is a single accept state that has only transitions coming into it (and no outgoing transitions).

(C3) The accept state is distinct from the initial state.

(C4) Except for the initial and accepting states, all other states are connected to all other states via a
transition. In particular, each state has a transition to itself.

When you can not actually go between two states, a GNFA has a transitions labelled with ∅, which will
not match any string of input characters. We do not have to draw these transitions explicitly in the state
diagrams.

58
8.1.2 Top-level outline of conversion
We will convert a DFA to a regular expression as follows:

(A) Convert DFA to a GNFA, adding new initial and final states.

(B) Remove all states one-by-one, until we have only the initial and final states.

(C) Output regex is the label on the (single) transition left in the GNFA. (The word regex is just a shortcut
for regular expression.)

Lemma 8.1.1 A DFA M can be converted into an equivalent GNFA G.

Proof: We can consider M to be an NFA. Next, we add a special initial state qinit that is connected to the
old initial state via -transition. Next, we add a special final state qfinal , such that all the final states of M
are connected to qfinal via an -transition. The modified NFA M 0 has an initial state and a single final state,
such that no transition enters the initial state, and no transition leaves the final state, thus M 0 comply with
conditions (C1–C3) above. Next, we consider all pair of states x, y ∈ Q(M 0 ), and if there is no transition
∅ y . The resulting GNFA G from M 0 is now
between them, we introduce the transition x
compliant also with condition (C4).
It is easy now to verify that G is equivalent to the original DFA M .

We will remove all the intermediate states from the GNFA, leaving a GNFA with only initial and final
states, connected by one transition with a (typically complex) label on it. The equivalent regular expression
is obvious: the label on the transition.

Lemma 8.1.2 Given a GNFA N with k = 2 states, one can generate an equivalent regular expression.

Proof: A GNFA with only two states (that comply with conditions (C1)-(C4)) have the following form.

some regex
qS qF

The GNFA has a single transition from the initial state to the accepting state, and this transition has the
regular expression R associated with it. Since the initial state and the accepting state do not have self loops,
we conclude that N accepts all words that matches the regular expression R. Namely, L(N ) = L(R).

8.1.3 Details of ripping out a state

q1 r1
We first describe the construction. Since k > 2, there is at least one
state in N which is not initial or accepting, and let qrip denote this state.
We will “rip” this state out of N and fix the GNFA, so that we get a GNFA
with one less state.
Transition paths going through qrip might come from any of a variety q2 qrip r2
of states q1 , q2 , etc. They might go from qrip to any of another set of
states r1 , r2 , etc.
For each pair of states qi and ri , we need to convert the transition
through qrip into a direct transition from qi to ri .
q3 r3

59
Reworking connections for specific triple of states
To understand how this works, let us focus on the connections between qrip and two other specific states qin
and qout . Notice that qin and qout might be the same state, but they both have to be different from qrip .
The state qrip has a self loop with regular expression Rrip associated with it. So, consider a fragment of
an accepting trace that goes through qrip . It transition into qrip from a state qin with a regular expression
Rin and travels out of qrip into state qout on an edge with the associated regular expression being Rout . This
trace, corresponds to the regular expression Rin followed by 0 or more times of traveling on the self loop
(Rrip is used each time we traverse the loop), and then a transition out to qout using the regular expression
Rrip
Rout . As such, we can introduce a direct transition from qin to qout with the regular expression
∗
R = Rin (Rrip ) Rout .
Clearly, any fragment of a trace traveling qin → Rin qrip Rout
qrip → qout can be replaced by the direct tran- qin qout
R
sition qin −→ qout . So, let us do this replace-
ment for any two such stages, we connect them
directly via a new transition, so that they no Rin (Rrip )∗ Rout
longer need to travel through qrip .
Clearly, if we do that for all such pairs, the new automata accepts the same language, but no longer need
to use qrip . As such, we can just remove qrip from the resulting automata. And let M 0 denote the resulting
automata.
The automata M 0 is not quite legal, yet. Indeed, we will have now parallel transitions because of the
above process (we might even have parallel self loops). But this is easy to fix: We replace two such parallel
R1 R2
transitions qi −→ qj and qi −→ qj , by a single transition
R +R
qi −−
1
−→
2
qj .

As such, for the triple qin , qrip , qout , if the original label on the direct transition from qin to qout was
originally Rdir , then the output label for the new transition (that skips qrip ) will be
∗
Rdir + Rin (Rrip ) Rout . (8.1)

Clearly the new transition, is equivalent to the two transitions it replaces. If we repeat this process for
all the parallel transitions, we get a new GNFA M which has k − 1 states, and furthermore it accepts exactly
the same language as N .

8.1.4 Proof of correctness of the ripping process

Lemma 8.1.3 Given a GNFA N with k > 2 states, one can generate an equivalent GNFA M with k − 1
states.

Proof: Since k > 2, N contains least one state in N which is not accepting, and let qrip denote this state.
We will “rip” this state out of N and fix the GNFA, so that we get a GNFA with one less state.
For every pair of states qin and qout , both distinct from qrip , we replace the transitions that go through
qrip with direct transitions from qin to qout , as described in the previous section.
Correctness. Consider an accepting trace T for N for a word w. If T does not use the state qrip than
the same trace exactly is an accepting trace for M . So, assume that it uses qrip , in particular, the trace looks
like
0 or more times
z }| {
S Si+1 Sj−1 Sj−1
T = . . . qi −→
i
qrip −−→ qrip . . . −−−→ qrip −−−→ qj . . . .
Where Si Si+1 . . . , Sj is a substring of w. Clearly, Si ∈ Rin , where Rin is the regular expression associated
with the transition qi → qrip . Similarly, Sj−1 ∈ Rout , where Rout is the regular expression associated with
∗
the transition qrip → qj . Finally, Si+1 Si+2 · · · Sj−1 ∈ (Rrip ) , where Rrip is the regular expression associated
with the self loop of qrip .

60
∗
Now, clearly, the string Si Si+1 . . . Sj matches the regular expression Rin (Rout ) Rout . in particular, we
can replace this portion of the trace of T by

Si Si+1 ...Sj−1 Sj
T = . . . qi −−−−−−−−−→ qj . . . .

This transition is using the new transition between qi and qj introduced by our construction. Repeating this
replacement process in T till all the appearances of qrip are removed, results in an accepting trace Tb of M .
Namely, we proved that any string accepted by N is also accepted by M .
We need also to prove the other direction. Namely, given an accepting trace for M , we can rewrite it
into an equivalent trace of N which is accepting. This is easy, and done in a similar way to what we did
above. Indeed, if a portion of the trace uses a new transition of M (that does not appear in N ), we can
place it by a fragment of transitions going through qrip . In light of the above proof, it is easy and we omit
the straightforward but tedious details.

Theorem 8.1.4 Any DFA can be translated into an equivalent regular expression.

Proof: Indeed, convert the DFA into a GNFA N . As long as N has more than two states, reduce its
number of states by removing one of its states using Lemma 8.1.3. Repeat this process till N has only two
states. Now, we convert this GNFA into an equivalent regular expression using Lemma 8.1.2.

8.1.5 Running time

This is a relatively inefficient algorithm. Nevertheless, it establishes the equivalence between the automata
and regular expressions. Fortunately, it is a conversion that you rarely need to do in practical applications.
Usually, the input would be the regex and the application would convert it into an NFA or DFA. Converting
in that direction is more efficient.
To realize the problem, note that the algorithm for ripping a single state has three nested loops in it.

For every state qrip do

For every incoming state qin do

For every outgoing state qout do
Remove all transition paths from qin to qout via qrip by creating a direct transition between
qin and qout .

So, if the original DFA has n states, then the algorithm will do the inner step O(n3 ) times (which is not too
bad). Worse, each time we remove a state, we replace the regex on each remaining transition with a regex
that is potentially four times as large. (That is, we replace the regular expression Rdir associated with a
∗
transition, by a regular expression Rdir + Rin (Rrip ) Rout , see Eq. (8.1)p60 .)
So, every time we rip a state in the GNFA, the length of the regular expressions associated with the edges
of th GNFA get longer by a factor of four (at most). So, we repeat this n times, so the length of the final
output regex is O(4n ). And the actual running time of the algorithm is O(n3 4n ).
Typically output sizes and running times are not quite that bad. We really only need to consider triples
of states that are connected by arcs with labels other than ∅. Many transitions are labelled with or ∅, so
regular expression size often increases by less than a factor of 4. However, actual times are still unpleasant
for anything but very small examples.
Interestingly, while this algorithm is not very efficient, it is not the algorithm “fault”. Indeed, it is known
that regular expressions for automata can be exponentially large. There is a lower bound of 2n for regular
expressions describing an automata of size n, see [EZ74] for details.

61
8.2 Examples
8.2.1 Example: From GNFA to regex in 8 easy figures
1: The original NFA. 2: Normalizing it.
a ǫ a
A B init A B b
b b
b
a =⇒ a
ǫ
C C AC
a, b a+b

3: Remove state A.
a 4: Redrawn without old edges.
a
init B b
ǫ a
init A B b
b
=⇒ =⇒ a
b b
a ǫ
C AC
ǫ
C AC a+b
a+b

5: Removing B.
ab∗ a 6: Redrawn.

a init
init B b ab∗ a + b
=⇒ b =⇒
a
C ǫ AC
C ǫ AC
a+b
a+b

7: Removing C.

(ab∗ a + b)(a + b)∗ ǫ

init 8: Redrawn.
ab∗ a + b
=⇒ =⇒ (ab∗ a + b)(a + b)∗
init AC
C ǫ AC
a+b
∗
Thus, this automata is equivalent to the regular expression (ab∗ a + b)(a + b) .

62
8.3 Closure under homomorphism
Suppose that Σ and Γ are two alphabets (possibly the same, but maybe different). A homomorphism h
is a function from Σ∗ to Γ∗ such that h(xy) = h(x)h(y) for any strings x and y. Equivalently, if we divide w
into a sequence of individual characters w = c1 c2 . . . ck , then h(w) = h(c1 )h(c2 ) . . . h(ck ). (It’s a nice exercise
to prove that the two definitions are equivalent.)

Example 8.3.1 Let Σ = {a, b, c} and Γ = {0, 1}, and let h be the mapping h : Σ → Γ, such that h(a) = 01,
h(b) = 00, h(c) = . Clearly, h is a homomorphism.

So, suppose that we have a regular language L. If L is represented by a regular expression R, then it is
easy to build a regular expression for h(L). Just replace every character c in R by its image h(c).

Example 8.3.2 The regular expression R = (ac + b)∗ over Σ becomes h(R) = (01 + 00)∗ .

Lemma 8.3.3 Let L Be a regular language over Σ, and let h : Σ → Γ be a homomorphism, then the language
h(L) is regular.
Proof: (Informal.) Let R Be a regular expression for R. Replace any character c ∈ Σ appearing in R by the
string h(c). Clearly, the resulting regular expression R0 recognizes all the words in h(L).
Proof:(More formal.) Let D be a NFA for L with a single accept state qfinal and an initial state qinit , so
that the only transitions from qinit is -transition out of it, and the is no outgoing transitions from qfinal and
only -transitions into it.
h(c)
→ q 0 in D by the transition q −−→ q 0 . Clearly, the resulting automata is a
c
Now, replace every transition q −
GNFA C that accepts the language h(L). We showed in the previous lecture, that a GNFA can be converted
into an equivalent regular expression R, such that L(C) = h(R). As such, we have that h(L) = L(C) = h(R).
Namely, h(L) is a regular language, as claimed.

Note, that in the above proof, instead of creating a GNFA, we can also create a NFA, by introducing
→ q 0 in D, and h(c) = w1 w2 . . . wk , then we will
c
temporary states. Thus, if we have the transition q −
→ q 0 by the transitions
c
introduce new temporary states s1 , . . . sk−1 , and replace the transition q −
wk−1
q0 .
w w w
q −→
1
s1 , s1 −→
2
s2 , . . . sk−2 −−−→ sk−1 , sk−1 −→
k

→ q 0 by a path between q and q 0 that accepts only the string h(c). It is

c
Thus, we replace the transition q −
now pretty easy to argue that the language of the resulting NFA C is h(L).

Note that when you have several equivalent representations, do your proofs in the one that makes the
proof easiest. So we did set complement using DFAs, concatenation using NFAs, and homomorphism using
regular expressions. Now we just have to finish the remaining bits of the proof that the three representations
are equivalent.

An interesting point is that if a language L is note regular then h(L) might be regular or not.
n o

Example 8.3.4 Consider the language L = an bn n ≥ 0 . The language L is not regular. Now, consider
n o

the homomorphism h(a) = a and h(b) = a. Clearly, h(L) = an an = a2n n ≥ 0 , which is definitely
regular. However, the identify homomorphism I(a) = a and I(b) = b maps L to itself I(L) = L, and as such
I(L) is not regular.

Intuitively, homomorphism can not make a language to be “harder” than it is (if it is regular, then it
remains regular under homomorphism). However, if it is not regular, it might remain not regular.

63
Chapter 9

Lecture 9: Proving non-regularity

17 February 2009

Reminder: The first midterm Exam, takes place on

Tuesday, 24 of February 7-9pm room in 151 Loomis.
Be there. Please check for conflicts NOW. If you have
one, send a note to Sariel or Madhu explaining the
nature of the conflict and including your schedule.

In this lecture, we will see how to prove that a language is not regular.
We will see two methods for showing that a language is not regular. The “pumping lemma” shows that
certain key “seed” languages are not regular. From these seed languages, we can show that many similar
languages are also not regular, using closure properties.

9.1 State and regularity

9.1.1 How to tell states apart
You are given the following DFA M , but we do not know what is its initial state (it was made in India, and
the initial state indicator was broken during the shipment to the US). You want to figure out what is the
initial state of this DFA.
0

q1
0 1

q0 q3
0, 1
1 1 0

q2
You can do any of the following operations:

(i) Reset the DFA to its (unknown) initial position.

(ii) Feed input into the DFA.

64
The rule of the game is that when the DFA is in a final state, you would know it.
So, the question is how to decide in the above DFA what is the initial state?
Here is one possible solution.

1. Check if M is in already in a final state. If so q3 is the initial state.

2. Otherwise, feed 0 to M . If M is now in a final state, then q2 is the initial state.

3. Reset M , and feed it 1. If it accepts, then q1 is the initial state.

4. Reset M , and feed it 01. If it accepts, then q0 is the initial state.

Definition 9.1.1 For a DFA M = (Q, Σ, δ, q0 , F ), p ∈ Q and x ∈ Σ∗ , let M (p, x) be true if setting the
DFA to be in the state p, and then reading the input x causes M to arrive to an accepting state. Formally,
M (p, x) is true if and only if δ(p, x) ∈ F , and false otherwise.

The moral of this story. So, we can differentiate between two states p and q of a DFA M , by finding
strings x and y, such that M (p, x) accepts, but M (q.y) rejects, or vice versa. If x le.

Definition 9.1.2 Two states p and q of a DFA M disagree with each other, if there exists a string x, such
that M (p, x) 6= M (q, x) (that is, M (p, x) accepts but M (q, x) rejects, or vice versa).

Example 9.1.3 Note, that two states might be dif-

ferent, but yet the agree on all possible strings x. For p1
example, consider the the DFA on the right. 0 0
Clearly, p0 and p1 disagree (for example on 0). But
notice that p1 and p2 agree on all possible strings.
p0 p3
0, 1
Lemma 9.1.4 Let M be a DFA and let q1 , . . . qn be a
list of states of M , such that any pair of them disagree. 1 1 0
Then, M must have at least n states.

Proof: For i 6= j, since qi and qj disagree with each other, they can not possibly be the same state of M ,
since if they were the same state then they would agree with each other on all possible strings. We conclude
that q1 , . . . qn are all different states of M ; namely, M has at least n different states.

A Motivating Example
n o

Consider the language L = an bn n ≥ 0 . Intuitively, L can not be regular, because we have to remember
how many a’s we have seen before reading the b’s, and this can not be done with a finite number of states.
n o

Claim 9.1.5 The language L = an bn n ≥ 0 is not regular.

Proof: Suppose that L were regular. Then L is accepted by some DFA

M = (Q, Σ, δ, q0 , F ).

Let qi denote the state M is in, after reading the string ai , for i = 0, 1, 2, . . . , ∞. We claim that qi disagrees
with qj if i 6= j. Indeed, observe that M (qi , bi ) accepts but M (qj , bi ) rejects, since ai bi ∈ L and ai bj ∈
/ L.
As such, by Lemma 9.1.4, M has an infinite number of state, which is impossible.

65
9.2 Irregularity via differentiation
Definition 9.2.1 Two strings x, y ∈ Σ∗ are distinguishable by L ⊆ Σ∗ , if there exists a word w ∈ Σ∗ ,
such that exactly one of the strings xw and yw is in L.

Lemma 9.2.2 Let M = (Q, Σ, δ, q0 , F ) be a given DFA, and let x ∈ Σ∗ and y ∈ Σ∗ be two strings dis-
tinguishable by L(M ). Then qx 6= qy , where qx = δ(q0 , x) (i.e., the state M is in after reading x) and
qy = δ(q0 , y) is the state that M is in after reading y.
Proof: Indeed, let w be the string causing x and y to be distinguished by L(M ), and assume that
xw ∈ L(M ) and xy ∈ / L(M ) (the other case is symmetric). Clearly, if qx = qy , then M (q0 , xw) = M (qx , w) =
M (qy , w) = M (q0 , yw), but it is given to us that M (q0 , xw) 6= M (q0 , yw) since exactly one of the words xw
and yw is in L(M ).

Lemma 9.2.3 Let L be a language, and let W = {w1 , w2 , w3 , . . .} be an infinite set of strings, such that
every pair of them is distinguishable by L. Then L is not a regular language.
Proof: Assume for the sake of contradiction, that L is regular, and let M be a DFA for M = (Q, Σ, δ, q0 , F ).
Let us set qi = δ(q0 , wi ). For i 6= j, wi and wj are distinguishable by L, and this implies by Lemma 9.2.2,
that qi 6= qj . This implies that M has an infinite number of states, which is of course impossible.

9.2.1 Examples
Example
Lemma 9.2.4 The language
n o
∗
L = 1k y y ∈ {0, 1} , and y contains at most k ones

is not regular.
Proof: Let wi = 1i , for i ≥ 0. Observe that for j > i we have that wi 01j = 1i 01j ∈
/ L but wj 01j = 1j 01j ∈ L.
As such, wi and wj are distinguishable by L, for any i 6= j. We conclude, by Lemma 9.2.3, that L is not
regular.

Example: ww is not regular

n o

Claim 9.2.5 For Σ = {0, 1}, the language L = ww w ∈ Σ∗ is not regular.

Proof: Set wi = 0i . And observe that, for j > i, we have that

0i 10j
|{z}1 = wi 1wj 1 ∈
/L but wj 1wj 1 = 0j |{z}
10j 1 ∈ L
xj xj

but this implies that wi and wj are distinguishable by L, using the string xj = 10j 1. As such, by Lemma 9.2.3,
we have that L is not regular.

9.3 The Pumping Lemma

9.3.1 Proof by repetition of states
We next prove Claim 9.1.5 by a slightly different argument.
n o

Claim. The language L = an bn n ≥ 0 is not regular.
Proof: Suppose that L were regular. Then L is accepted by some DFA

M = (Q, Σ, δ, q0 , F ).

66
Suppose that M has p states.
Consider the string ap bp . It is accepted using a sequence of states s0 s1 . . . s2p . Right after we read the
last a, the machine is in state sp .
In the sub-sequence s0 s1 . . . sp , there are p + 1 states. Since L has only p distinct states, this means that
two states in the sequence are the same (by the pigeonhole principle). Let us call the pair of repeated states
qi and qj , i < j. This means that the path through M ’s state diagram looks like, where ap = xyz1 .
y

x si = sj z1 sp bp
s0 s2k
But this DFA will accept all strings of the form xy j z1 bp , for j ≥ 0. Indeed, for j = 0, this is just the
string xz1 bp , which this DFA accepts, but it is not in the language, since it has less as than bs. That is, if
|y| = m, the DFA accepts all strings of the form ap−m+jm bm , for any j ≥ 0. For any value of j other than
1, such strings are not in L.
So our DFA M accepts some strings that are not in L. This is a contradiction, because L was supposed
to accept L. Therefore, we must have been wrong in our assumption that L was regular.

9.3.2 The pumping lemma

The pumping lemma generalizes the above argument into a standard template, which we can prove once and
then quickly apply to many languages.

Theorem 9.3.1 (Pumping Lemma.) Let L be a regular language. Then there exists an integer p (the
“pumping length”) such that for any string w ∈ L with |w| ≥ p, w can be written as xyz with the following
properties:
• |xy| ≤ p.
• |y| ≥ 1 (i.e. y is not the empty string).
• xy k z ∈ L for every k ≥ 0.

Proof: The proof is written out in full detail in Sipser, here we just outline it.
Let M be a DFA accepting L, and let p be the number of states of M . Let w = c1 c2 . . . cn be a string of
length n ≥ p, and let the accepting state sequence (i.e., trace) for w be s0 s1 . . . sn .
There must be a repeat within the sequence from s0 to sp , since M has only p states, and as such, the
situation looks like the following.
y

x si = sj z1 sp z2
s0 sn
So if we set z = z1 z2 , we now have x, y, and z satisfying the conditions of the lemma.
• |xy| ≤ p because repeat is within first p + 1 states
• |y| ≥ 1 because i and j are distinct
• xy k z ∈ L for every k ≥ 0 because a loop in the state diagram can be repeated as many or as few times
as you want.
Formally, for any k, the word xy i z goes through the following sequence of states:
k times
z }| {
x y y y z
s0 −
→ si −
→ si −
→ ··· −
→ si = sj −
→ sn ,
and sn is an accepting state. Namely, M accepts xy k z, and as such xy k z ∈ L.

67
This completes the proof of the theorem.

Notice that we do not know exactly where the repeat occurs, so we have very little control over the length
of x and z1 .

9.3.3 Using the PL to show non-regularity

If L is regular, then it satisfies the pumping lemma (PL). Therefore, intuitively, if L does not satisfy the
pumping lemma, L cannot be regular.

Restating the Pumping Lemma via the contrapositive

We want to restate the pumping lemma in the contrapositive. Now, it is not true that if L satisfies the
conditions of the PM, then L must be regular. Reminder from CS 173: contrapositive of if-then statement
is equivalent, converse is not.
What does it mean to not satisfy the Pumping Lemma? Write out PL compactly:
  
w = xyz,
L is
=⇒ ∃p ∀w ∈ L |w| ≥ p ⇒ ∃x, y, z s.t. |xy| ≤ p, and ∀i xy i z ∈ L. .
regular.
|y| ≥ 1,
Now, we know that if A implies B, then B implies A (contraposition), as such the Pumping Lemma, can
be restated as
  
w = xyz,
∃p ∀w ∈ L |w| ≥ p ⇒ ∃x, y, z |xy| ≤ p, and ∀i xy i z ∈ L. =⇒ L is regular.
|y| ≥ 1,

Now, the logical statement A ⇒ B is equivalent to A ∨ B = A ∧ B. As such A ⇒ B = A ∧ B. In addition,

negation flips quantifies, as such, the above is equivalent to
  
w = xyz,
  L is
∀p ∃w ∈ L |w| ≥ p and ∃x, y, z |xy| ≤ p, and ∀i xy i z ∈ L. =⇒
not regular.
|y| ≥ 1,

Since, A ∧ B = A ⇒ B we have that A ∧ B = A ⇒ B . Thus, we have
  
w = xyz,
∀p ∃w ∈ L |w| ≥ p and ∀x, y, z |xy| ≤ p, =⇒ ∀i xy i z ∈ L. =⇒ L is
not regular.
|y| ≥ 1,
Which is equivalent to
  
w = xyz,
∀p ∃w ∈ L L is
|w| ≥ p and ∀x, y, z |xy| ≤ p, =⇒ ∃i xy i z ∈
/ L. =⇒
not regular.
|y| ≥ 1,
The translation into words is the contrapositive of the Pumping Lemma (stated in Theorem 9.3.2 below).

The contrapositive of the Pumping Lemma

Theorem 9.3.2 (Pumping Lemma restated.) Consider a language L. If for any integer p ≥ 0 there
exists a word w ∈ L, such that |w| ≥ p, and for any breakup of w into three strings x, y, z, such that:
• w = xyz,
• |xy| ≤ p,
• |y| ≥ 1,
implies that there exists an i such that xy i z ∈
/ L, then the language L is not regular.

68
Proving that a language is not regular

Let us assume that we want to show that a language L is not regular.

Such a proof is done by contradiction. To prove L is not regular, we assume it is regular. This gives us
a specific (but unknown) pumping length p. We then show that L satisfies the rest of the contrapositive
version of the pumping lemma, so it can not be regular.
So the proof outline looks like:

• Suppose L is regular. Let p be its pumping length.

• Consider w = [formula for a specific class of strings]

• By the Pumping Lemma, we know there exist x, y, z such that w = xyz, |xy| ≤ p, and |y| ≥ 1.

• Consider i = [some specific value, almost always 0 or 2]

• xy i z is not in L. [explain why it can’t be]

Notice that our adversary picks p. We get to pick w whose length depends on p. But then our adversary
gets to pick the specific division of w into x, y, and z.

9.3.4 Examples
The language L = an bn is not regular

Claim 9.3.3 The language L = an bn is not regular.

Proof: For any p ≥ 0, consider the word w = ap bp , and consider any breakup of w into three parts, such
that w = xyz |y| ≥ 1, and |xy| ≤ p. Clearly, xy is a prefix of w made out of only as. As such, the word xyyz
has more as in it than bs, and as such, it is not in L.
But then, by the Pumping Lemma (Theorem 9.3.2), L is not regular.

The language {ww} is not regular

n o

Claim 9.3.4 The language L = ww w ∈ Σ∗ is not regular.

Proof: For any p ≥ 0, consider the word w = 0p 10p 1, and consider any breakup of w into three parts,
such that w = xyz |y| ≥ 1, and |xy| ≤ p. Clearly, xy is a prefix of w made out of only 0s. As such, the word
xyyz has more 0s in its first part than the second part. As such, xyyz is not in L.
But then, by the Pumping Lemma (Theorem 9.3.2), L is not regular.

Consider the word w used in the above claim:

• It is concrete, made of specific characters, no variables left in it.

• These strings are a subset of L, chosen to exemplify what is not regular about L.

• Its length depends on p.

• The 1 in the middle serves as a barrier to separate the two groups of 0’s. (Think about why the proof
would fail if it was not there.)

• The 1 at the end of w does not matter to the proof, but we nee it so that w ∈ L.

69
9.3.5 A note on finite languages
A language L is finite if has a bounded number of words in it. Clearly, a finite language is regular (since
you can always write a finite regular expression that matches all the words in the language).
It is natural to ask why we can not apply the pumping lemma Theorem 9.3.1 to L? The reason is because
we can always choose the threshold p to be larger than the length of the longest word in L. Now, there is
no word in L with length larger than p in L. As such, the claim of the Pumping Lemma holds trivially for
a finite language, but no word can be pumped - and as such L stays finite. So the pumping lemma makes
sense even for finite languages!

9.4 Irregularity via closure properties

If we know certain seed languages are not regular, then we can use closure properties to show other languages
are not regular.
We remind the reader that homomorphism is a mapping h : Σ1 → Σ∗2 (namely, every letter of Σ1 is
mapped to a string over Σ2 ). We showed that if a language L over Σ1 is regular, then the language h(L) is
regular. We referred to this property as closure of regular languages under homomorphism.

Claim 9.4.1 The language L0 = {0n 1n | n ≥ 0} is not regular.

Proof: Assume for the sake of contradiction that L0 is regular. Let h be the homomorphism that maps
0 to a and 1 to b. Then h(L0 ) must be regular (closure under homomorphism). But h(L0 ) is the language
n o

L = an bn n ≥ 0 , (9.1)

which is not regular by Claim 9.1.5. A contradiction. As such, L0 is not regular.

We remind the reader that regular languages are also closed under intersection.
n o

Claim 9.4.2 The language L2 = w ∈ {a, b}∗ w has an equal # of a’s and b’s is not regular.

Proof: Suppose L2 were regular. Consider L2 ∩ a∗ b∗ . This must be regular because L2 and a∗ b∗ are
both regular and regular languages are closed under intersection. But L2 ∩ a∗ b∗ is just the language L from
Eq. (9.1), which is not regular (by Claim 9.1.5).
n o

Claim 9.4.3 The language L3 = an bn n ≥ 1 is not regular.

Proof: Assume for the sake of contradiction that L3 is regular. Consider L3 ∪ {}. This must be regular
because L3 and {} are both regular and regular languages are closed under union. But L3 ∪ {} is just L
from Eq. (9.1), which is not regular (by Claim 9.1.5).
A contradiction. As such, L3 is not regular.

9.4.1 Being careful in using closure arguments

Most closure properties must be applied in the correct direction: We show (or assume) that all inputs to the
operation are regular, therefore the output of the operation must be regular.
For example, consider (again) the language LB = {0n 1n | n ≥ 0}, which is not regular.
Since LB is not regular, LB is also not regular. If LB were regular, then LB would also have to be regular
because regular languages are closed under set complement. However, many similar lines of reasoning do
not work for other closure properties.
For example, LB and LB are both non-regular, but their union is regular. Similarly, suppose that Lk is
the set of all strings of length ≤ k. Then LB ∩ Lk is regular, even though LB is not regular.
If you are not absolutely sure of what you are doing, always use closure properties in the forward direction.
That is, establish that L and L0 are regular, then conclude that L OP L0 must be regular.

70
Also, be sure to apply only closure properties that we know to be true. In particular, regular languages
are not closed under the subset and superset relations. Indeed, consider L1 = {001, 00}, which is regular.
But L1 is a subset of LB , which is not regular. Similarly, L2 = (0 + 1)∗ is regular. And it is a superset of L
(from Eq. (9.1) in the proof of Claim 9.4.1)). But you can not deduce that L is therefore regular. We know
it is not.
So regular languages can be subsets of non-regular ones and vice versa.

71
Chapter 10

Lecture 10: DFA minimization

19 February 2009

In this lecture, we will see that every language has a unique minimal DFA. We will see this fact from
two perspectives. First, we will see a practical algorithm for minimizing a DFA, and provide a theoretical
analysis of the situation.

10.1 On the number of states of DFA

10.1.1 Starting a DFA from different states b a, b
Consider the DFA on the right. It has a a a
particular defined start state. However, we
1 2 3
could start it from any of its states. If the a, b
b
original DFA was named M , define Mq to
be the DFA with its start state changed to
state q. Then the language Lq , is the one b a
7 a 4 5 6
accepted if you start at q.
For example, in this picture, L3 is (a +
b)∗ , and L6 is the same. Also, L2 and L5 b a, b
are both b∗ a(a + b)∗ . Finally, L7 is ∅. a, b b
a 2/5 a 3/6
Suppose that Lq = Lr , for two states q and r. Then once we 1
get to q or r, the DFA is going to do the same thing from then on
(i.e., its going to accept or reject exactly the same strings). b b
a, b
So these two states can be merged. In particular, in the above
automata, we can merge 2 and 5 and the states 3 and 6. We can a
the new automata, depicted on the right. 4 7

10.1.2 Suffix Languages

Let Σ be some alphabet.

Definition 10.1.1 Let L ⊆ Σ∗ be any language.

The suffix language of L with respect to a word x ∈ Σ∗ is defined as
z n o

r
L/x = y x y ∈ L .
r z
In words, L/x is the language made out of all the words, such that if we append x to them as a prefix,
we get a word in L.
The class of suffix languages of L is
nr z o

C(L) = L/x x ∈ Σ∗ .

72
Example 10.1.2 For example, if L = 0∗ 1∗ , then:
r z
• L/ = 0∗ 1∗ = L
r z
• L/0 = 0∗ 1∗ = L
r z
• L/0i = 0∗ 1∗ = L, for any i ∈ N
r z
• L/1 = 1∗
r z
• L/1i = 1∗ , for any i ≥ 1
z n o

r
• L/10 = y 10y ∈ L = ∅.
n o
Hence there are only three suffix languages for L: 0∗ 1∗ , 1∗ , ∅. So C(L) = 0∗ 1∗ , 1∗ , ∅ .

As the above r z demonstrates, if there is a word x, such that any word w that have x as a prefix
example
is not in L, then L/x = ∅, which implies that ∅ is one of the suffix languages of L.

Example 10.1.3 The above suggests the following automata for the language of Example 10.1.2: L = 0∗ 1∗ .

0 1 0, 1

1 0
0∗ 1∗ 1∗ ∅

And clearly, this is the automata with the smallest number of states that accepts this language.

Regular languages have few suffix languages

Now, consider a DFA M = (Q, Σ, δ, qr0 , F ) zaccepting some language L. Let x ∈ Σ∗ , and let M reach the state
q on reading x. The suffix language L/x is precisely the set of strings w, such that xw is in L. But this is
r z
exactly the same as Lq . That is, L/x = Lq , where q is the state reached by M on reading x. Hence the
suffix languages of a regular language accepted by a DFA are precisely those languages Lq , where q ∈ Q.
Notice that the definition of suffix languages is more general, because it can also be applied to non-regular
languages.

Lemma 10.1.4 For a regular language L, the number of different suffix languages it has is bounded; that is
C(L) is bounded by a constant (that depends on L).
r z
Proof: Consider the DFA M = (Q, Σ, δ, q0 , F ) that accepts L. For any string x, the suffix language L/x is
r Lz
just the languages associated with q , where q is the state M is in after reading x.
Indeed, the suffix language L/x is the set of strings w such that xw ∈ L. Since the DFA reaches q on
x, it is clear that the suffix language ofrx is zprecisely the language accepted by M starting from the state q,
which is Lq . Hence, for every x ∈ Σ∗ , L/x = Lδ(q0 ,x) , where q is the state the automaton reaches on x.
As such, any suffix language of L is realizable as the language of a state of M . Since the number of states
of M is some constant k, it follows that the number of suffix languages of L is bounded by k.

An immediate implication of the above lemma is the following.

Lemma 10.1.5 If a language L has infinite number of suffix languages, then L is not regular.

73
The suffix languages of a non-regular language
n o

Consider the language L = an bn n ∈ N . The suffix language of L for ai is
z n o

r
L/ai = an−i bn n ∈ N .
r z
Note, that bi ∈ L/ai , but this is the only string made out of only bs that is in this language. As such, for
i
any i, j, where i and j are z r the zsuffix language of L with respect to a is different from that of L
r different,
with respect to aj (i.e. L/ai 6= L/aj ). Hence L has infinitely many suffix languages, and hence is not
regular, by Lemma 10.1.5.

Let us summarize what we had seen so far:

• Any state of a DFA of a language L is associated with a suffix language of L.

• If two states are associated with the same suffix language, that we can merge them into a single state.
n o

• At least one non-regular language an bn n ∈ N has an infinite number of suffix languages.

It is thus natural to conjecture that the number of suffix languages of a language, is a good indicator
of how many states an automata for this language would require. And this is indeed true, as the following
section testifies.

10.2 Regular Languages and Suffix Languages

10.2.1 A few easy observations
r z
Lemma 10.2.1 If ∈ L/x if and only if x ∈ L.
r z
Proof: By definition, if ∈ L/x then x = x ∈ L. Similarly, if x ∈ L, then x ∈ L, which implies that
r z
∈ L/x .

r z r z
Lemma 10.2.2 Let L be a language over alphabet Σ. For all x, y ∈ Σ∗ we have that if L/x = L/y
r z r z
then for all a ∈ Σ we have L/xa = L/ya .
r z r z r z r z
Proof: If w ∈ L/xa , then (by definition) xaw ∈ L. But then, aw ∈ L/x . Since L/x = L/y , this
r z r z
implies that aw ∈ L/y , which implies that yaw ∈ L, which implies that w ∈ L/ya . This implies that
r z r z r z r z r z
L/xa ⊆ L/ya , a symmetric argument implies that L/ya ⊆ L/xa . We conclude that L/xa =
r z
L/ya .

10.2.2 Regular languages and suffix languages

We can now state a characterization of regular languages in term of suffix languages.

Theorem 10.2.3 (Myhill-Nerode theorem.) A language L ⊆ Σ∗ is regular if and only if the number of
suffix languages of L is finite (i.e. C(L) is finite).
Moreover, if C(L) contains exactly k languages, we can build a DFA for L that has k states; also, any
DFA accepting L must have k states.

74
Proof: If L is regular, then C(L) is a finite set by Lemma 10.1.4.
Second, let us show that if C(L) is finite, then L is regular. Let the suffix languages of L be
nr z r z r zo
C(L) = L/x1 , L/x2 , . . . , L/xk . (10.1)
r z r z
Note that for any y ∈ Σ∗ , L/y = L/xj , for some j ∈ {1, . . . , k}.
We will construct a DFA whose states are the various suffix languages of L; hence we will have k states
r thezDFA. Moreover, the DFA will be designed such that after reading y, the DFA will end up in the state
in
L/y .
The DFA is M = (Q, Σ, q0 , δ, F ) where
nr z r z r zo
• Q= L/x1 , L/x2 , . . . , L/xk
r z
• q0 = L/ ,
nr z zo

r r z
• F = L/x ∈ L/x . Note, that by Lemma 10.2.1, if ∈ L/x then x ∈ L.
r z r z
• δ L/x , a = L/xa for every a ∈ Σ.

The transition function δ is well-defined because of Lemma 10.2.2.

r We z can now prove, by rinduction
z on the length of x, that after
r reading
z x, the DFA reaches the state
L/x . If x ∈ L, then ∈ L/x , which implies that δ(q0 , x) = L/x ∈ F . Thus, x ∈ L(M ). Similarly, if
r z r z
x ∈ L(M ), then L/x ∈ F , which implies that ∈ L/x , and by Lemma 10.2.1 this implies that x ∈ L.
As such, L(M ) = L.
We had shown that the DFA M accepts L, which implies that L is regular, furthermore M has k states.
We next prove that any DFA for L must have at least k states. So, let N = (Q0 , Σ, δN qinit , F ) any DFA
accepting L. The language L has
r k suffix
z rlanguages,
z generated by the strings x1 , x2 , . . . , xk , see Eq.r(10.1).z
For any i 6= j, we have that L/xi 6= L/xj . As such, there must exist a word w such that w ∈ L/xj
r z r z r z
and w ∈
/ L/xj (the symmetric case where w ∈ L/xj \ L/xi is handled in a similar fashion. But then,
xi w ∈ L and xj w ∈ / L. Namely, N (qinit , x) 6= N (qinit , y), and the two states that N reaches for xi and xj
respectively, are distinguishable. Formally, let qi = δ(qinit , xi ), for i = 1, . . . , k. All these states are pairwise
distinguishable, which implies that N must have at least k states.

Remark 10.2.4 The full Myhill-Nerode theorem also shows that all minimal DFAs for L are isomorphic,
i.e. have identical transitions as well as the same number of states, but we will not show that part.
This is done by arguing that any DFA for L that has k states must be identical to the DFA we created
above. This is a bit more involved notationally, and is proved by showing a 1 − 1 correspondence between
the two DFAs and arguing they must be connected the same way. We omit this part of the theorem and
proof.

10.2.3 Examples
Let us explain the theorem we just proved using examples.

The suffix languages of a non-regular language

Consider the language L ⊆ {a, b}∗ :
n o

L = w w has an odd number of a’s .

75
The suffix language of x ∈ Σ∗ , where x has an even number of a’s is:
z n o

r
L/x = w w has an odd number of a’s = L.

The suffix language of x ∈ Σ∗ , where x has an odd number of a’s is:

z n o

r
L/x = w w has an even number of a’s .

Hence there are only two distinct suffix languages for L. By the theorem, we know L must be regular
and the minimal DFA for L has two states. Going with r the r ofzthe DFA mentioned in the proof
z construction
of the theorem, we see that we have two states, q0 = L/ and q1 = L/a . The transitions are as follows:
r z r z
• From q0 = L/ , on a we go to L/a , which is the state q1 .
r z r z r z
• From q0 = L/ , on b we go to L/b , which is same as L/ , i.e. the state q0 .
r z r z r z
• From q1 = L/a , on a we go to L/aa , which is same as L/ , i.e. the state q0 .
r z r z r z
• From q1 = L/a , on b we go to L/ab , which is same as L/a , i.e. the state q1 .
r z r z
The initial state is L/ which is the state q0 , and the final states are those states L/x that have
in them, which is the set {q1 }.
We hence have a DFA for L, and in fact this is the minimal automaton accepting L.

10.3 Minimization algorithm

The above discussion leaves us with a way to decide what is the minimum number of states of a DFA that
accepts a language, but it is not clear how to turn this into an algorithm (in particular, we do not have an
efficient way to compute suffix languages of a language).
The idea is to work directly on a given DFA and compute a minimum DFA from it. So consider the DFA
of Figure 10.1. It is more complex than it needs to be.
The DFA minimization algorithm first removes any states which are not reachable from the start state,
because they obviously aren’t contributing anything to what the DFA accepts. (D in this example.) It then
marks which of the remaining states are distinct. States not marked as distinct can then be merged, to
create a simpler DFA.

10.3.1 Idea of algorithm

Suppose the given DFA is M = (Q, Σ, δ, q0 , F ).
We know by the above discussion that two states p and q are distinct if their two languages are different.
Namely, there is some word w that belongs to Lp but w is not in Lq (or vice versa). It is not clear however
how to detect when two states have different suffix languages. The idea is to start with the “easy” case, and
then propagate the information.
As such, p and q are distinct if there exists w, such that δ(p, w) is an accept state and δ(q, w) is not an
accept state. In particular, for w = , we have that p and q are distinct if p ∈ F and q ∈ / F , or vice versa.
for w = c1 c2 . . . cm , we have that p and q are distinct if

δ δ(p, c1 ) , c2 c3 . . . cm ∈ F and δ δ(q, c1 ) , c2 c3 . . . cm ∈/ F,

or vice versa.
In particular, this implies that if p and q are distinct because of word w of length m, then δ(p, c1 ) and
δ(q, c1 ) are distinct because of a word w0 = c2 . . . cm of length m − 1.
Thus, its easy to compute the pairs of states distinct because of empty words, and if we computed all
the states distinct because of words of length m − 1, we can “propagate” this information for pairs of states
distinct by states of length m.

76
10.3.2 The algorithm
The algorithm for marking distinct states follows the above (recursive) definition. Create a table Distinct
with an entry for each pair of states. Table cells are initially blank.

(1) For every pair of states (p, q)

If p is final and q is not, or vice versa,

Set Distinct(p, q) to be .

(2) Loop until there is no change in the table contents

For each pair of states (p, q) and each character a in the alphabet:
if Distinct(p, q) is empty and Distinct(δ(p, a), δ(q, a)) is not empty
Set Distinct(p, q) to be a.

(3) Two states p and q are distinct iff Distinct(p, q) is not empty.

Example of how the algorithm works

1
0,1 1
q7 q2 q3 q5 0, 1
q7 q2 /q3

0 0,1 0 1
0 0 0 1 0
0, 1
q0 /q1 q4 q5 /q6
q0 1 q1 q4 1 q6
1
1 0, 1
0
(a) (b)

Figure 10.1: (a) Original automata, (b) minimized autoamta.

The following is the execution of the algorithm on the DFA of Figure 10.1.
After step (1):

q0
q1
q2
q3
q4
q5
q6
q7
q0 q1 q2 q3 q4 q5 q6 q7

(Note, that for a pair of states (qi , qj ) we need only a single entry since (qj , qi ) is equivalent, and we do not
need to consider pair on the diagonal of the form (qi , qi ).)

77
q0 q0
q1 q1
q2 1 1 q2 1 1
q3 1 1 q3 1 1
q4 0 0 0 0 ⇒ q4 0 0 0 0
q5 q5
q6 q6
q7 1 1 0 q7 1 1 1 1 0
q0 q1 q2 q3 q4 q5 q6 q7 q0 q1 q2 q3 q4 q5 q6 q7
After one iteration of step (2) After the second iteration of step (2)

Third iteration of step (2) makes no changes to the table, so we halt. The cells (q0 , q1 ), (q2 , q3 ) and
(q5 , q6 ) are still empty, so these pairs of states are not distinct. Merging them produces the following simpler
DFA recognizing the same language.

78
Chapter 11

Lecture 11: Context-free grammars

21 February 2008

This lecture introduces context-free grammars, covering section 2.1 from Sipser.

11.1 Context-free grammars

11.1.1 Introduction
Regular languages are efficient but very limited in power. For example, not powerful enough to represent
the overall structure of a C program.
As another example, consider the following language

L = {all strings formed by properly nested parenthesis} .

Here, the string (()()) is in L. ())( is not.

Lemma 11.1.1 The language L is not regular.

∗
Proof: Assume for the sake of contradiction that L is regular. Then consider L0 = L ∩ 0 (0n)∗ . Since
L ois
0 n n
regular and regular languages are closed under intersection, L must be regular. But L is just ( ) n ≥ 0 .
We can map this, with a homomorphism, to 0n 1n , which is not regular (as we seen before). A contradiction.

Our purpose is to come up with a way to describe the above language L in a compact way. It turns out
that context-free grammars are one possible way to capture such languages.
Here is a diagram demonstrating the classes of languages we will encounter in this class. Currently, we
only saw the weakest class – regular language. Next, we will see context free grammars.

Regular
Context free grammar
! Turing decidable
Turing recognizable
Not Turing recognizable.
(Territory of the fire-breathing dragons)
A compiler or a natural language understanding program, use these languages as follows:

• It uses regular languages to convert character strings to tokens (e.g. words, variables names, function
names).

• It uses context-free languages to parse token sequences into functions, programs, sentences.

79
Just as for regular languages, context-free languages have a procedural and a declarative representation,
which we will show to be equivalent.
procedural declarative
NFAs/DFAs regular expressions
pushdown automata (PDAs) context-free grammar

11.1.2 Deriving the context-free grammars by example

So, consider our old arch-nemesis, the language
n o

L = an bn n ≥ 0 .
we would like to come up with a recursive definition for a word in the language.
So, let S denote any word we can generate in the lan-
guage, then a word w in this language can be generated #include <stdlib.h>
as #include <stdio.h>
w = an bn = a an−1 bn−1 b, #include <time.h>
| {z } int guess()
=w0
{ return random() % 16; }
0
where w ∈ L. Thus, we have a compact recursive way to void S() {
generate L. It is the language containing the empty word, if ( guess() == 0 ) return;
and one can generate a new word, by taking a word w0 al- else {
ready in the language and padding it with a before, and b printf( "a" );
after it. Thus, generating the new word aw0 b. This sug- S();
gests a random procedure S to generate such a word. It printf( "b" );
either return without generating anything, or it prints a a, }
generates a word recursively by calling S, and then it out- }
puts a b. Naturally, the procedure has to somehow guess int main() {
which of the two options to perform. We demonstrate this srand( time( 0 ) );
idea in the C program on the right, where S uses randomiza- S();
tion to decide which action to take. As such, running this }
program would generate a random word in this language.
The way to write this recursive generation algorithm using context free grammar is by specifying
S → | aSb. (11.1)
Thus, CFG can be taught of as a way to specify languages by a recursive means: We can build sole basic
words, and then we can build up together more complicated words by recursively building fragments of words
and concatenating them together.
For example, we can derive the word aaabbb from the grammar of Eq. (11.1), as follows:
S → aSb → aaSbb → aaaSbbb → aaabbb = aaabbb.
Alternatively, we can think about the recursion tree used by our program to generate this string.
S

a S b

This tree is known as the parse tree of the grammar of Eq. (11.1) for the word aaabbb.

80
Deriving the context-free grammars by constructing sentence structure
A context-free grammar defines the syntax of a program or sentence. The structure is easiest to see in a
parse tree.
......
......
S.................
...... ......
.
...
....... ......
......
...
...... ......
...... ......
......
......
NP. VP .
... ... .
....
....
... ... ...
....
... ...
....
....
..... ...
.. .
.... ....
..
. ...
D N V NP.
.. .. .. . .. .. ..
...
...
.. .. .. ....
...
... ... ... ...
...
...
.. .. .. ... ...
... .... ... ... ..

the Groke ate D

..
N
..
. .
.. ...
.. ...
... ...
.. ..
my homework
The interior nodes of the tree contain “variables”, e.g. NP , D. The leaves of the tree contain “terminals”.
The “yield” of the tree is the string of terminals at the bottom. In this case, “the Groke at my homework.” 1
The grammar for this has several components:

(i) A start symbol: S.

(ii) A finite set of variables: {S, NP , D, N, V, VP }

(iii) A finite set of terminals = {the, Groke, ate, my, . . .}

(iv) A finite set of rules.

Example of how rules look like

(i) S → NP VP

(ii) NP → D N

(iii) VP → V NP

(iv) N →| Groke | homework | lunch . . .

(v) D → the | my . . .

(vi) V → ate | corrected | washed . . .

If projection is working, show a sample computer-language grammar from the net. (See pointers on web
page.)

Synthetic examples
In practical applications, the terminals are often whole words, as in the example above. In synthetic examples
(and often in the homework
n problems),
o the terminals will be single letters.
n n
Consider L = 0 1 n ≥ 0 . We can capture this language with a grammar that has start symbol S
and rule
S → 0S1 | .
For example, we can derive the string 000111 as follows:

S → 0S1 → 00S11 → 000S111 → 000111 = 000111.

1 Groke – Also known in Norway as Hufsa, in Estonia as Urr and in Mexico as La Coca is a fictional character in the Moomin
world created by Tove Jansson.

81
n o
∗
Or, consider the language of palindromes L = w ∈ {a, b} w = wR . Here is a grammar with start
symbol P . for this language
P → aPa | bPb | | a | b.
A possible derivation of the string abbba is
P → aPa → abPba → abbba.

11.2 Derivations
Consider our Groke example again. It has only one parse tree, but multiple derivations: After we apply the
first rule, we have two variables in our string. So we have two choices about which to expand first:
S → NP VP → . . .
If we expand the leftmost variable first, we get this derivation:
S → NP VP → D N VP → the N VP → the Groke VP → the Groke V NP → . . .
If we expand the rightmost variable first, we get this derivation:
S → NP VP → NP V NP → NP V D N → NP V D homework
→ NP V my homework . . .
The first is called the leftmost derivation. The second is called the rightmost derivation. There
are also many other possible derivations. Each parse tree has many derivations, but exactly one rightmost
derivation, and exactly one leftmost derivation.

11.2.1 Formal definition of context-free grammar

Definition 11.2.1 (CFG) A context-free grammar (CFG) is a 4-tuple G = (V, Σ, R, S), where
(i) S ∈ V is the start variable,
(ii) Σ is the alphabet (as such, we refer to c ∈ Σ as a character or terminal ),
(iii) V is a finite set of variables, and
(iv) R is a finite set of rules, each is of the form B → w where B ∈ V and w ∈ (V ∪ Σ)∗ is a word made out
of variables and terminals..

Definition 11.2.2 (CFG yields.) Suppose x, y, and w are strings in (V ∪ Σ)∗ and B is a variable. Then
xBy yields xwy, written as
xBy ⇒ xwy,
if there is a rule in R of the form B → w.
Notice that x ⇒ x, for any x and any set of rules.

Definition 11.2.3 (CFG derives.) If x and y in (V ∪ Σ)∗ , then w derives x, written as

∗
w⇒x
if you can get from w to x in zero or more yields steps.
That is, there is a sequence of strings y1 , y2 , . . . yk in (V ∪ Σ)∗ such that
w = y1 ⇒ y2 ⇒ . . . ⇒ yk = x.

Definition 11.2.4 If G = (V, Σ, R, S) is a grammar, then L(G) (the language of G) is the set
n o
∗
L(G) = w ∈ Σ∗ S ⇒ w ..

That is, L(G) is all the strings containing only terminals which can be derived from the start symbol of G.

82
11.2.2 Ambiguity
Consider the following grammar G = (V, Σ, R, S). Here
V = {S, N, NP , ADJ} and Σ = {and, eggs, ham, pencilgreen, cold, tasty, . . .} .
The set R contains the following rules:

• NP → NP and NP • N → eggs | ham | pencil | . . .

• NP → ADJ NP • ADJ → green | cold | tasty | . . .
• NP → N • ...

Here are two possible parse trees for the string green eggs and ham (ignore the spacing for the time
being).
S

NP and NP S

ADJ NP N ADJ NP

green N ham green NP and NP

eggs eggs ham

The two parse trees group the words differently, creating a different meaning. In the first case, only the
eggs are green. In the second, both the eggs and the ham are green.
A string w is ambiguous with respect to a grammar G if w has more than one possible parse tree using
the rules in G.
Most grammars for practical applications are ambiguous. This is a source of real practical issues, because
the end users of parsers (e.g. the compiler) need to be clear on which meaning is intended.

Removing ambiguity
There are several ways to remove ambiguity:
(A) Fix grammar so it is not ambiguous. (Not always possible or reasonable or possible.)
(B) Add grouping/precedence rules.
(C) Use semantics: choose parse that makes the most sense.
Grouping/precedence rules are the most common approach in programming language applications. E.g.
“else” goes with the closest “if”, * binds more tightly than +.
Invoking semantics is more common in natural language applications. For example, “The policeman killed
the burgler with the knife.” Did the burgler have the knife or the policeman? The previous context from
the news story or the mystery novel may have made this clear. E.g. perhaps we have already been told that
the burgler had a knife and the policeman had a gun.
Fixing the grammar is less often useful in practice, but neat when you can do it. Here’s an ambiguous
grammar with start symbol E. N stands for “number” and E stands for “expression”.
E → E × E | E+E | N
N → 0N | 1N | 0 | 1
An expression like 0110 × 110 + 01111 has two parse trees and, therefore, we do not know which operation
to do first when we evaluate it.
We can remove this ambiguity as follows, by rewriting the grammar as

83
E→E+T|T
T→N×T|N
N → 0N | 1N | 0 | 1
Now, the expression 0110 × 110 + 01111 must be parsed with the + as the topmost operation.

84
Chapter 12

Lecture 12: Cleaning up CFGs and

Chomsky Normal form
3 March 2009

In this lecture, we are interested in transforming a given grammar into a cleaner form. We start by
describing how to clean up a grammar. Then, we show how to transform a cleaned up grammar into a
grammar in Chomsky Normal Form.

12.1 Cleaning up a context-free grammar

The main problem with a general context-free grammar is that it might be complicated, contained parts
that can not be used, and not necessarily effective.
It might be useful to think about CFG as a program, that we would like to manipulate (in a similar way
that a compiler handles programs). If want to cleanup a program, we need to understand its structure. To
this end, for CFGs, we will show a sequence of algorithms that analyze and cleanup a given CFG. Interestingly,
these procedures uses similar techniques to the one used by compilers to manipulate programs being compiled.

Note, that some of the cleanup steps are not necessary if one just wants to transform a grammar into
Chomsky Normal Form. In particular, Section 12.1.2 and Section 12.1.3 are not necessary for the CNF
conversion. Note however, that the algorithm of Section 12.1.3 gives us am immediate way to decide if a
grammar is empty or not, see Theorem 12.1.2.

12.1.1 Example of a messy grammar

For example, consider the following strange very strange grammar.

⇒ S0 → S | X | Z
S→A
A→B
B→C
(G1)
C → Aa
X→C
Y → aY | a
Z → .

This grammar is bad. How bad? Well, it has several problems:

(i) The variable Y can never be derived by the start symbol S0 . It is a useless variable.

(ii) The rule S → A is redundant. We can replace any appearance of S by A, and reducing the number of
variables by one. Rule of the form S → A is called a unit production (or unit rule.

85
(iii) The variable A is also useless since we can note derive any word in Σ∗ from A (because once we
starting deriving from A we get into an infinite loop).
∗
(iv) We also do not like Z, since one can generate from it (that is Z =⇒ . Such a variable is called
nullable. We would like to have the property that only the start variable can be derived to .

We are going to present a sequence of algorithms that transform the grammar to not have these drawbacks.

12.1.2 Removing useless variables unreachable from the start symbol

Given a grammar G = (V, Σ, R, S), we would like to remove all variables that are not derivable from S. To
this end, consider the following algorithm.

compReachableVars G = (V, Σ, R, S)
Vold ← ∅
Vnew ← {S}
while Vold 6= Vnew do
Vold ← Vnew .
for X ∈ Vold do
for (X → w) ∈ R do
Add all variables appearing in w to Vnew .
return Vnew .

Clearly, this algorithm returns all the variables that are derivable form the start symbol S. As such,
settings V 0 = compReachableVars(G) we can set our new grammar to be G 0 = (V 0 , Σ, R0 , S), where R0 is
the set of rules of R having only variables in V 0 .

12.1.3 Removing useless variables that do not generate anything

In the next step we remove variables that do not generate any string.
Given a grammar G = (V, Σ, R, S), we would like to remove all variables that are not derivable from S.
To this end, consider the following algorithm.

compGeneratingVars G = (V, Σ, R, S)
Vold ← ∅
Vnew ← Vold
do
Vold ← Vnew .
for X ∈ V do
for (X → w) ∈ R do
∗
if w ∈ (Σ ∪ Vold ) then
Vnew ← Vnew ∪ {X}
while (Vold 6= Vnew )
return Vnew .
As such, settings V 0 = compReachableVars(G) we can set our new grammar to be G 0 = (V 0 , Σ, R0 , S),
where R0 is the result of removing all rules that uses variables not in V 0 . In the new grammar, every variable
can derive some string.

Lemma 12.1.1 Given a context-free grammar (CFG) G = (V, Σ, R, S) one can compute an equivalent CFG
G 0 such that any variable of G 0 can derive some string in Σ∗ .

Note, that if a grammar G has an empty language, then the equivalent grammar generated by Lemma 12.1.1
will have no variables in it. Namely, given a grammar we have an algorithm to decide if the language it
generates is empty or not.

86
Theorem 12.1.2 (CFG emptiness.) Given a CFG G, there is an algorithm that decides if the language of
G is empty or not.

Applying the algorithm of Section 12.1.2 together with the algorithm of Lemma 12.1.1 results in a CFG
without any useless variables.

Lemma 12.1.3 Given a CFG one can compute an equivalent CFG without any useless variables.

12.2 Removing -productions and unit rules from a grammar

Next, we would like to remove -production (i.e., a rule of the form X → ) and unit-rules (i.e., a rule
of the form X → Y) from the language. This is somewhat subtle, and one needs to be careful in doing this
removal process.

12.2.1 Discovering nullable variables

Given a grammar G = (V, Σ, R, S), we are interested in discovering all the nullable variables. A variable
X ∈ V is nullable, if there is a way derive the empty string from X in G. This can be done with the following
algorithm.

compNullableVars G = (V, Σ, R, S)
Vnull ← ∅
do
Vold ← Vnull .
for X ∈ V do
for (X → w) ∈ R do
∗
if w = or w ∈ (Vnull ) then
Vnull ← Vnull ∪ {X}
while (Vnull 6= Vold )
return Vnull .

12.2.2 Removing -productions

A rule is an -production if it is of the form V X → . We would like to remove all such rules from the
grammar (or almost all of them).
To this end, we run compNullableVars on the given grammar G = (V, Σ, R, S), and get the set of all
nullable variable Vnull . If the start variable is nullable (i.e., S ∈ Vnull ), then we create a new start state S0 ,
and add the rules to the grammar
S0 → S | .
We also now remove all the other rules of the form X → from R. Let G 0 = (V 0 , Σ, R0 , S0 ) be the resulting
grammar. The grammar G 0 is not equivalent to the original rules, since we missed some possible productions.
For example, if we had the rule
X → ABC,
where B is nullable, then since B is no longer nullable (we removed all the -productions form the language),
∗
we missed the possibility that B =⇒ . To compensate for that, we need to add back the rule

X → AC,

to the set of rules.

So, for every rule A → X1 X2 . . . Xm is in R0 , we add the rules of the form A → α1 . . . αm to the grammar,
where

(i) If Xi is not nullable (its a character or a non-nullable variable), then αi = Xi .

87
(ii) If Xi is nullable, then αi is either Xi or .

(iii) Not all αi s are .

Let G 00 = (V, Σ, R0 , S0 ) be the resulting grammar. Clearly, no variable is nullable, except maybe the start
variable, and there are no -production rules (except, again, for the special rule for the start variable).
Note, that we might need to feed G 00 into our procedures to remove useless variables. Since this process
does not introduce new rules or variables, we have to do it only once.

12.3 Removing unit rules

A unit rule is a rule of the form X → Z. We would like to remove all such rules from a given grammar.

12.3.1 Discovering all unit pairs

We have a grammar G = (V, Σ, R, S) that has no useless variables or -predictions. We would like to figure
∗
out all the unit pairs. A pair of variables Y and X is a unit pair if X =⇒ Y by G. We will first compute
all such pairs, and their we will remove all unit
Since there are no transitions in G, the only way for G to derive Y from X, is to have a sequence of rules
of the form
X → Z1 , Z1 → Z2 , . . . , Zk−1 → Zz = Y,
where all these rules are in R. We will generate all possible such pairs, by generating explicitly the rules of
the form X → Y they induce.

compUnitPairs G = (V, Σ, R, S)
n o

Rnew ← X → Y (X → Y) ∈ R
do
Rold ← Rnew .
for (X → Y) ∈ Rnew do
for (Y → Z) ∈ Rnew do
Rnew ← Rnew ∪ {X → Z}.
while (Rnew 6= Rold )
return Rnew .

12.3.2 Removing unit rules

If we have a rule X → Y, and Y → w, then if we want to remove the unit rule X → Y, then we need to
introduce the new rule X → w. We want to do that for all possible unit pairs.

removeUnitRules G = (V, Σ, R, S)
U ← compUnitPairs(G)
R←R\U
for (X → A) ∈ U do
for (A → w) ∈ Rold do
R ← R ∪ {X → w}.
return (V, Σ, R, S).

We thus established the following result.

Theorem 12.3.1 Given an arbitrary CFG, one can compute an equivalent grammar G 0 , such that G 0 has
no unit rules, no -productions (except maybe a single -production for the start variable), and no useless
variables.

88
12.4 Chomsky Normal Form
Chomsky Normal Form requires that each rule in the grammar is either

(C1) of the form A → BC, where A, B, C are all variables and neither B nor C is the start variable.
(That is, a rule has exactly two variables on its right side.)

(C2) A → a, where A is a variable and a is a terminal.

(A rule with terminals on its right side, has only a single character.)

(C3) S → , where S is the start symbol.

(The start variable can derive , but this is the only variable that can do so.)

Note, that rules of the form A → B, A → BCD or A → aC are all illegal in a CNF.
Also a grammar in CNF never has the start variable on the right side of a rule.
Why should we care for CNF? Well, its an effective grammar, in the sense that every variable that being
expanded (being a node in a parse tree), is guaranteed to generate a letter in the final string. As such, a
word w of length n, must be generated by a parse tree that has O(n) nodes. This is of course not necessarily
true with general grammars that might have huge trees, with little strings generated by them.

12.4.1 Outline of conversion algorithm

All context-free grammars can be converted to CNF. We did most of the steps already. Here is an outline of
the procedure:

(i) Create a new start symbol S0 , with new rule S0 → S mapping it to old start symbol (i.e., S).

(ii) Remove nullable variables (i.e., variables that can generate the empty string).

(iii) Remove unit rules (i.e., variables that can generate each other).

(iv) Restructure rules with long righthand sides.

The only step we did not describe yet is the last one.

12.4.2 Final restructuring of a grammar into CNF

Assume that we already cleaned up a grammar by applying the algorithm of Theorem 12.3.1 to it. So, we
now want to convert this grammar G = (V, Σ, R, S) into CNF.

Removing characters from right side of rules. As a first step, we introduce a variable Vc for every
character c ∈ Σ and it to V. Next, we add the rules Vc → c to the grammar, for every c ∈ Σ.
∗
Now, for any string w ∈ (V ∪ Σ) , let wb denote the string, such that any appearance of a character c in
w, is replaced by Vc .
Now, we replace every rule X → w, such that |w| > 1, by the rule X → w. b
Clearly, (C2) and (C3) hold for the resulting grammar, and furthermore, any rule having variables on
the right side, is made only of variables.

Making rules with only two variables on the right side. The only remaining problem, is that in the
current grammar, we might have rules that are too long, since they have long string on the right side. For
example, we might have a rule in the grammar of the form

X → B1 B3 . . . Bk .

89
To make this into a binary rule (with only two variables on the right side, we remove this rule from the
grammar, and replace it by the following set of rules

X → B1 Z1
Z1 → B2 Z2 Z2 → B3 Z3
...
Zk−3 → Bk−2 Zk−2
Zk−2 → Bk−1 Bk ,

where Z1 , . . . , Zk−2 are new variables.

We repeat this process, till all rules in the grammar is binary. This gramamr is now in CNF. We summarize
our result.

Theorem 12.4.1 (CFG → CNF.) Any context-free grammar can be converted into Chomsky normal form.

12.4.3 An example of converting a CFG into CNF

Let us look at an example grammar with start symbol S.

⇒ S → ASA | aB
(G0) A→B|S
B→b|

After adding the new start symbol S0 , we get the following grammar.

⇒ S0 → S
S → ASA | aB
(G1)
A→B|S
B→b|

Removing nullable variables In the above grammar, both A and B are the nullable variables. We have
the rule S → ASA. Since A is nullable, we need to add S → SA and S → AS and S → S (which is of course a
silly rule, so we will not waste our time putting it in). We also have S → aB. Since B is nullable, we need
to add S → a. The resulting grammar is the following.

⇒ S0 → S
S → ASA | aB | a | SA | AS
(G2)
A→B|S
B→b

Removing unit rules. The unit pairs for this grammar are {A → B, A → S, S0 → S}. We need to copy
the productions for S up to S0 , copying the productions for S down to A, and copying the production B → b
to A → b.

⇒ S0 → ASA | aB | a | SA | AS
S → ASA | aB | a | SA | AS
(G3)
A → b | ASA | aB | a | SA | AS
B→b

90
Final restructuring. Now, we can directly patch any places where our grammar rules have the wrong form
for CNF. First, if the rule has at least two symbols on its righthand side but some of them are terminals,
we introduce new variables which expand into these terminals. For our example, the offending rules are
S0 → aB, S → aB, and A → aB. We can fix these by replacing the a’s with a new variable U, and adding a
rule U → a.

⇒ S0 → ASA | UB | a | SA | AS
S → ASA | UB | a | SA | AS
(G4) A → b | ASA | UB | a | SA | AS
B→b
U→a

Then, if any rules have more than two variables

⇒ S0 → AZ | UB | a | SA | AS
on their righthand side, we fix that with more new
S → AZ | UB | a | SA | AS
variables. For the grammar (G4), the offending rules
A → b | AZ | UB | a | SA | AS
are S0 → ASA, S → ASA, and A → ASA. We can (G5)
B→b
rewrite these using a new variable Z and a rule Z →
U→a
SA. This gives us the CNF grammar shown on the
Z → SA
right.

We are done!

91
Chapter 13

Leftover: Pushdown Automatas –

PDAs

This lecture introduces pushdown automata, i.e. about the first half of section 2.2 from Sipser.
PDAs are the procedural counterpart to context-free grammars. A pushdown automaton (PDA) is simply
an NFA with a stack. In a couple lectures, we’ll prove that they generate the same languages.

13.1 Bracket balancing example

Example: Let Σ = {(, ), [, ]}. L = {properly nested strings from Σ∗ }. So ([]()) is in L, but not ([)) and not
([)].
So, what do we need to store on our stack to check whether an input is properly nested? Answer: ordered
sequence of parentheses/brackets that are currently open.
Question: When should we accept? Answer: if the stack is empty when we have finished reading the
input string. PDA’s do not come with a built-in way to test if the stack is empty. They just reject input if
we try to pop an empty stack. So we need to push a bottom-of-stack symbol onto the stack before we start
to read the input.
Our machine looks like:
[,push [ (,push (

ǫ,push$ ǫ,pop $
q0 q1 q2

],pop [ ),pop (
Do a short trace of the state sequence and sequence of stack contents as this machine recognizes the
string [()()].
Formal notation for the labels on PDA transition arcs is

c, s → t ,

where c is the character to be read from the input stream, s is the character to be popped from the top of
the stack, and t is the character to be pushed back onto the stack. All of these can be .
For example, to pop something from the stack, we use a label like c, s → . To push something onto the
stack: c, → t. A transition like c, s → t pops s from the stack and substitutes t in its place.
So a properly drawn version of our state diagram would look like:

92
[, ǫ → [ (, ǫ → (

ǫ, ǫ → $ ǫ, $ → ǫ
q0 q1 q2

], [→ ǫ ), (→ ǫ

13.2 The language wwr

n o
∗
Another example: L = wwR w ∈ {a, b} .

a, ǫ → a a, a → ǫ
b, ǫ → b b, b → ǫ

ǫ, ǫ → $ ǫ, ǫ → ǫ ǫ, $ → ǫ
q0 q1 q2 q2

Notice a few things about this machine:

• It halts when we run out of input (like a DFA/NFA).

• You can take a transition if the input character and the top-of stack character both match what is on
the transition or are .

• It is non-deterministic (like an NFA): more than one transition might match and you need to pick the
“right” one. E.g. guessing when you have reached the midpoint of the string and need to take the
transition from q1 to q2 .

13.3 The language an bn with n being even

n o

Another example: L = an bn n is even .
Here’s a PDAs that recognizes this language.
a, ǫ → X

ǫ, ǫ → $ ǫ, ǫ → ǫ ǫ, $ → ǫ
q0 q1 q2 q3

b, X → ǫ b, X → ǫ

q2a
Things to notice:

• The stack alphabet is different from the input alphabet. You can use some of the same characters but
you don’t have to.

• The 2-step loop involving q2 and q2a checks that the number of b’s is even.

93
13.4 The language an b2n
n o

Another example: L = an b2n n ≥ 1 . Notice that elements of L have twice as many b’s as a’s. Also, the
empty string isn’t in the language; the shortest string in L is abb.
a, ǫ → X

ǫ, ǫ → $ a, ǫ → X ǫ, $ → ǫ
q0 q1 q2 q3

b, X → ǫ b, ǫ → ǫ

q2a
Here’s another one of the possible PDAs that recognizes this language.
b, X → ǫ

ǫ, ǫ → $ b, X → ǫ ǫ, $ → ǫ
q0 q1 q2 q3

ǫ, ǫ → X a, ǫ → X

q1a
Things to notice:
• The 2-step loop involving q1 and q1a reads one a off the input and pushes two X’s onto the stack.
• The transition from q1 to q2 explicitly reads an input character. So the empty string can not make it
through to the final state.

13.5 Formal notation

Formally, a PDA is a 6-tuple (Q, Σ, Γ, δ, q0 , F ). The meanings of these are the same as for an NFA except:

• Γ is a finite stack alphabet

• δ : Q × Σ × Γ → P(Q × Γ )

That is, the input to the transition function is a state, an input symbol, and a stack symbol. The input
and stack symbols can be real characters from Σ and Γ or they can be . The output is a set of pairs (q, t),
where q is a state and t is a stack symbol or .
A PDA M = (Q, Σ, Γ, δ, q0 , F ) accepts a string w if there is
(a) a sequence of states s0 , s1 , . . . sk ,
(b) a sequence of characters and ’s c1 , c2 , . . . ck , and
(c) a sequence of strings t0 , t1 , . . . tk (the stack snapshots),
such that
• w = c1 c2 . . . ck
• s0 = q0 and t0 = (we start at the start state with an empty stack)
• sk ∈ F (we end at an accepting state)

94
• There are characters (or ’s) a and b and a string x such that tk−1 = ax, tk = bx, and (sk , b) ∈
δ(sk−1 , ck , a). (Each change to the machine’s state follows what is allowed by δ.)
The last condition will need some discussion and probably a picture or two.

13.6 A branching example

n o

Let L = ai bj ck i = j or i = k .
b, a → ǫ c, ǫ → ǫ

a, ǫ → a ǫ, $ → ǫ
ǫ, ǫ → ǫ q2a q3a
ǫ, ǫ → $
q0 q1 ǫ, ǫ → ǫ
ǫ, ǫ → ǫ ǫ, $ → ǫ
q2b q3b q4b

b, ǫ → ǫ c, a → ǫ
It turns out that recognizing this language requires non-determinism. There is a deterministic version
of a PDA, but it does not recognize as many languages as a normal (non-deterministic) PDA.

95
Chapter 14

Lecture 13: Even More on

Context-Free Grammars
5 March 2009

14.1 Grammars in CNF form have compact parsing trees

In this section, we prove that CNF give very compact parsing trees for strings in the language of the grammar.
In the following, we will need the following easy observation.

Observation 14.1.1 Consider a grammar G which is CNF, and a variable X of G which is not the start
variable. Then, any string derived from X must be of length at least one.

Claim 14.1.2 Let G = (V, Σ, R, S) be a context-free grammar in Chomsky normal form, and w be a string
∗
of length one. Furthermore, assume that there is a X ∈ V, such that X =⇒ w (i.e., w can be derived from
X), and let T be the corresponding parse tree for w. Then the tree T has exactly 2 |w| − 1 internal nodes.

Proof: A full binary tree is a tree were every node other than the leaves has two children. It is easy to
verify by easy induction that such a tree with m leaves, has m − 1 internal nodes.
Now, the given tree T is not quite a full binary tree. Indeed, the kth leaf (from the left) of T , denoted
by `k , is the k character of w, and its parent must be a node labeled by a variable, Xk , for k = 1, . . . , n.
Furthermore, we must have that the parent of `k has only a single child. As such, if we remove the n leafs of
T , we remain with a full binary tree T 0 with n leafs (every parent of a leaf `k became a leaf). This tree is a
full binary tree, because any internal node, must correspond to a non-terminal derivation of a CNF grammar,
and any such derivation has the form X → YV Z; that is, the derivation corresponds to an internal node with
two children in T . Now, the tree T 0 has n − 1 internal nodes, by the aforementioned fact about full binary
trees. As such, T 0 has n − 1 internal nodes. Each leaf of T 0 is an internal node of T , and T 0 has n such leafs.
We conclude that T has 2n − 1 internal nodes.

Alternative proof for Claim 14.1.2.

Proof: The proof is by induction on the length of w.
If |w| = 1 then we claim that w must be derived by a single rule X → c, where c is a character. Otherwise,
if the root corresponds to a rule of the form X → YZ, then by the above observation, the generated string
for Y and Z are each of length at least one, which implies that the overall length of the word is at least 2, a
contradiction. (We are using here implicitly the property of a CNF that the start variable can never appear
on the right side of a rule, and that the start symbol is the only symbol that can yield the empty string.)
As such, if |w| = 1, then the parsing tree has a single internal node, and the claim holds as 2 |w| − 1 = 1.
So, assume that we proved the claim for all words strictly shorter than w, and consider the parse tree T
for w being derived from some variable X ∈ V. Since |w| > 1, it must be that the root of the parse tree T
corresponds to a rule of the form S → X1 X2 . Let w1 and w2 be the portions of w generated by X1 and X2

96
respectively, and let T1 and T2 denote the corresponding subtrees of T . Clearly w = w1 w2 , |w1 | > 0 and
|w2 | > 0, by the above observation. Clearly, T1 (resp. T2 ) is a tree that derives w1 (resp. w2 ) from X1 (resp.
X2 ). By induction, T1 (resp. T2 ) has 2 |w1 | − 1 (resp. 2 |w2 | − 1) internal nodes. As such, T has

# internal # internal
N =1+ +
nodes of T1 nodes of T2
= 1 +(2 |w1 | − 1) +(2 |w2 | − 1) = 2(|w1 | + |w2 |) − 1
= 2 |w| − 1,

The following is the same claim, restated as a claim on the number of derivation used.

Claim 14.1.3 if G is a context-free grammar in Chomsky normal form, and w is a string of length n ≥ 1,
then any derivation of w from any variable X contains exactly 2n − 1 steps.

Theorem 14.1.4 Given a context free grammar G, and a word w, then one can decide if w ∈ L(G) by an
algorithm that always stop.

Proof: Convert G into Chomsky normal form, and let G 0 be the resulting grammar tree. Let n = |w|. Observe
that w has a parse tree using G 0 with 2n − 1 internal nodes, by Claim 14.1.2. Enumerate all such possible
parse trees (their number is large, but finite), and check if any of them is (i) a legal parse tree for G 0 , and
(ii) it derives the word w. If we found such a legal tree deriving w, then w ∈ L(G). Otherwise, w can not be
generated by G 0 , which implies that w ∈/ L(G).

14.2 Closure properties for context-free grammars

Context-free languages are closed under the following operations: union, concatenation, star, string reversal,
homomorphism, and intersection with a regular language.
Notice that they are not closed under intersection (i.e. intersection of two context-free languages). Nor
are they closed under set complement (i.e. the complement of a context-free language is not always context-
free). We will show these non-closure facts later on, when we have established that some sample languages
are definitely not context-free.

Example 14.2.1 nHere is a quick argument

o why CFG languages are not closed under intersection. Consider

the language L = an bn cn n ≥ 0 , which (as we stated above is not CFG- a fact we will prove in the near
future). However, it is easy to verify that any word an bn cn , for n ≥ 0, is in the intersection of the languages
n o n o

L1 = a∗ bn cn n ≥ 0 and L2 = an bn c∗ n ≥ 0 .

In fact, L = L1 ∩ L2 . Now, it is easy to verify that L1 and L2 are CFG, since

S → Astar X
Astar → aAstar |
X → bXc | ,

with the start symbol being S is a CFG for L1 (a similar grammar works for L2 ). As such, both L1 and L2
are CFG, but their intersection L = L1 ∩ L2 is not CFG. Thus, context-free languages are not closed under
intersection.

14.2.1 Proving some CFG closure properties

Most of the closure properties are most easily proved using context-free grammars. These constructions are
fairly easy, but they will help you become more familiar with the features of context-free grammars.

97
CFGs are closed under union
Suppose we have grammars for two languages, with start symbols S and T , respectively. Rename variables
(in the two grammars) as needed to ensure that the two grammars do not share any variables. Then construct
a grammar for the union of the languages, with start symbol Z, by taking all the rules from both grammars
together and adding a new rule Z → S | T .

Concatenation.
Suppose we have grammars for two languages, with start symbols S and T . Rename variables as needed to
ensure that the two grammars do not share any variable. Then construct a grammar for the union of the
languages, with start symbol Z, by taking all the rules from both grammars and adding a new rule Z → S T .

Star operator
Suppose that we have a grammar for the language L, with start symbol S. The grammar for L∗ , with start
symbol T , contains all the rules from the original grammar plus the rule T → T S | .

String reversal
Reverse the character string on the righthand side of every rule in the grammar.

Homomorphism
Suppose that we have a grammar G for language L and a homomorphism h. To construct a grammar for
h(L), modify the righthand side of every rule in G to replace each terminal symbol t with its image h(t)
under the homomorphism.

14.2.2 CFG are closed under intersection with a regular language

Informal description
It is also true that the intersection of a context-free language with a regular language is always context-free.
If we are manipulating a context-free language L in a proof, we can intersect it with a regular language to
select a subset of L that has some particular form. For example, if L contains all strings with equal numbers
of a’s and b’s, we can intersect it with a∗ b∗ to get the language1 an bn .
So, assume we have a CFG G = (V, Σ, R, S) accepting the context-free language LCFG and a DFA D =
(Q, Σ, δ, qinit , F ) accepting a regular language Lreg . Furthermore, the CFG G is in Chomsky Normal Form
(i.e., CNF).
The idea of building a grammar for the intersection language is to write a new CFG grammar, where a
variable X of G would be replaced by the set of variables
n o

Xq q0 q, q 0 ∈ Q and X ∈ V .

Here, the variable Xq q0 represents all strings that can be derived by the variable X of G, and furthermore
if we feed such a string to D (starting at state q), then we would reach the state q 0 . So, consider a rule of
the form
X → YZ
that is in G. For every possible starting state q, and ending state q 0 , we want to generate a rule for the
variable Xq q0 . So we derive a substring w for Y. Feeding the D the string w, starting at q, would lead us
to a state s. As such, the string generated from Yin this case, would move D from q to s, and the string
generated by Zwould move D from s to q 0 . That is, this rule can be rewritten as

∀q, q 0 , s ∈ Q Xq q0 → Yq s Zs q0 .
1 Here, and in a lot of other places, we abuse notations. When we write an bn , what we really mean is the language
n ˛ o
an bn ˛ n ≥ 0 .
˛

98
If we have a rule of the from X → c in G, then we create the rule Xq q0 → c if there is a transition in D
from q to q 0 that reads the character c, where c ∈ Σ .
Finally, we create a new start variable S0 , and we introduce the rule S0 → Sqinit q0 , where q0 is the initial
state of D, and q 0 ∈ F is an accept state of D.
We claim that the resulting grammar accepts only words in the language LCFG ∩ Lreg .

Formal description
We have a CFG G = (V, Σ, R, S) and a DFA D = (Q, Σ, δ, qinit , F ). We now build a new grammar for the
language L(G) ∩ L(D). The set of new variables is
n o

V 0 = {S0 } ∪ Xq q0 X ∈ R, q, q 0 ∈ Q .

n o

R0 = Xq q0 → Yq s Zs q0 ∀q, q 0 , s ∈ Q (X → YZ) ∈ R (14.1)
[n o

S0 → Sqinit q0 q 0 ∈ F (14.2)
[n o

Xq q0 → c (X → c) ∈ R and δ(q, c) = q 0 . (14.3)

If S → ∈ R and qinit ∈ F (i.e., is in the intersection language) then we add the rule {S0 → } to R0 .
The new grammar is G∩D = (V 0 , Σ, R0 , S0 ).

Observation 14.2.2 The new grammar G∩D is “almost” a CNF. That is, if we ignore rules involving the
start symbol S0 of G∩D then its a CNF.

Correctness
Lemma 14.2.3 Let G be a context-free grammar in Chomsky normal form, and let D be a DFA. Then one
can construct a grammar is a grammar G∩D = (V 0 , Σ, R0 , S0 ), such that, for any word w ∈ Σ∗ \ {}, we have
∗ ∗
that X =⇒ w and δ(q, w) = q 0 if and only if Xq q0 =⇒ w.

Proof: The construction is described above, and proof is by induction of the length of w.
h i
|w| = 1 : If |w| = 1 then w = c, where c ∈ Σ.
∗
Thus, if X =⇒ w and δ(q, w) = q 0 then X → c is in R, which implies that we introduced the rule
∗
Xq q0 → c into R, which implies that Xq q0 =⇒ w.
∗
Similarly, if Xq q0 =⇒ w then since G∩D is a CNF, and |w| = 1, this implies that there must be a
derivation Xq q0 → c. But this implies, by construction, that X → c is a rule of G and δ(q, c) = q 0 , as
required.
h i
|w| > 1 : Assume, that by induction, the claim holds for all words strictly shorter than w.

∗ ∗
– X =⇒ w and δ(q, w) = q 0 =⇒ Xq q0 =⇒ w.
∗
IF X =⇒ w and δ(q, w) = q 0 , then consider the parse tree of G deriving X from w. Since G is a
CNF, we have that the root of this parse tree T corresponds to a rule of the form X → YZ. Let
wY and wZ be the two sub-words derived by these two subtrees of the T . Clearly, w = wY wZ , and
since G is a CNF, we have that |wY | , |wZ | > 0 (since any symbol except the root in a CNF derives
a word of length at least 1). As such, |wY | , |wZ | < |w|. Now, let q 00 = δ(q, wY ). We have that
∗
Y =⇒ wY , 0 < |wY | < |w| , and q 00 = δ(q, wY ).

99
∗
As such, by induction, it must be that Yq q00 =⇒ wY . Similarly, since δ(q 00 , wZ ) = q 0 , and
∗
by the same argument, we have that Zq00 q0 =⇒ wZ . Now, by Eq. (14.1), we have the rule
Xq q0 → Yq q00 Yq00 q in R0 . Namely,
∗
Xq q0 → Yq q 00 Yq 00 q =⇒ wY wZ = w,

implying the claim.

∗ ∗
– Xq q0 =⇒ w =⇒ X =⇒ w and δ(q, w) = q 0 .
∗
If Xq q0 =⇒ w, and |w| > 1, then consider the parsing tree T 0 of w from Xq q0 , and let

Xq q0 → Yq q 00 Yq 00 q.

be the ruled used in the root of T 0 , and let wY , wZ be the two substrings of w generated by these
two subtrees. That is w = wY wZ . By induction, we have that
∗ ∗
Y =⇒ wY , δ(q, wY ) = q 00 , and Z =⇒ wZ , δ(q 00 , wZ ) = q 0 .
∗
Now, by construction, the rule X → YZ must be in R. As such X → YZ =⇒ wY wZ = w, and

δ(q, w) = δ(q, wY wZ ) = δ(δ(q, wY ), wZ ) = δ(q 00 , wZ ) = q 0 .

∗
Thus, X =⇒ w and δ(q, w) = q 0 , thus implying the claim.

Theorem 14.2.4 Let L be a context-free language and L0 be a regular language. Then, L ∩ L0 is a context
free language.
Proof: Let G = (V, Σ, R, S) be a CNF for L, and let D = (Q, Σ, δ, qinit , F ) be a DFA for L0 . We apply the
above construction to compute a grammar G∩D = (V 0 , Σ, R0 , S0 ) for the intersection.

• w ∈ L ∩ L0 =⇒ w ∈ L(G∩D ).
If w = then the rule S0 → is in G∩D and we have that ∈ L(G∩D ).
∗
For any other word, if w ∈ L ∩ L0 then S =⇒ w and q 0 = δ(qinit , w) ∈ F then, by Lemma 14.2.3, we
have that
∗
Sqinit q0 =⇒ w.
Furthermore, by construction, we have the rule

S0 → Sqinit q0 .

∗
As such, S0 =⇒ w, and w ∈ L(G∩D ).

• w ∈ L(G∩D ) =⇒ w ∈ L ∩ L0 .
Similar to the above proof, and we omit it.

100
Chapter 15

Leftover: CFG to PDA, and

Alternative proof of CNF effectiveness

Here we start proving a set of results similar to those we proved for regular languages. That is, we will
show that context-free grammars and pushdown automata generate the same languages. We will show that
context-free languages are closed under a variety of operations, though not as many as regular languages.
We will also prove a version of the pumping lemma, which we can use to show that certain languages (e.g.
an bn cn ) are not context-free.

15.1 PDA– Pushing multiple symbols

Our basic PDAs only allow a single symbol to be pushed onto the stack in each transition. And it takes two
transitions to add something new to the stack if you also needed to examine the old character on its top.
However, it’s an easy extension to allow a single transition to push several symbols.
The notation for pushing several symbols is a, s → w, where w is a string of stack characters. The last
character in w is pushed first, so that the first character in w ends up on top of the stack. For example, a
transition like a, s → cde (c, d and e are characters) pushes the e first, so the c ends up on top of the stack.
This order is chosen because it makes notation simplest for the CFG to PDA conversion which we’ll see next.
We can implement the multiple pushes with a series of extra states connected by regular PDA transitions.
For example a, s → cde is implemented as follows:

q a, s → e
q
X
a, s → cde =⇒ ǫ, ǫ → d

r Y
r ǫ, ǫ → c

15.2 CFG to PDA conversion

Claim 15.2.1 Given any context-free grammar G for a language L, there is a PDA accepting L.

101
Idea: The PDA needs to verify that there is a derivation for the input string w. Here’s an algorithm that
almost works:

• Push the start symbol onto the stack.

• Use the grammar rules from G to expand variables on the stack. The PDA guesses which rule to apply
at each step.

• When the stack contains only terminals, compare this terminal string against the input w. Accept iff
they are the same.

S → XB
Consider the example grammar G with start symbol S: X → aXb |
B → cB |
n o

This generates the language L = an bn ci i ≥ 0, n ≥ 0 .
Here is a derivation for the string aabbc:

S ⇒ XB ⇒ aXbB ⇒ aaXbbB ⇒ aabbB ⇒ aabbcB ⇒ aabbc.

If we could do this on the stack, we will get the following sequence of snapshots for the stack of the PDA.
top of stack → S X a a a a
$ B X a a a
⇒ ⇒ ⇒ ⇒ ⇒
$ b b b b
B b b b
$ B c c
$ B $
$

In the end of this process, the PDA would guess that it finished applying all the grammar rules, and it
would each character of the input, verify that it is indeed equal to the top of the stack, pop the stack and go
on to the next letter. Then, the PDA would verify that we had reached the bottom of the stack (by popping
the character $ out of it), and move into the accept state.
Let us try to do this on the stack. We get two steps in, and the stack looks like the following. At this
stage, we can not expand X because it is not on the top of the stack. There is a terminal (i.e., a) blocking
our access to it.
a ⇐ top of the stack
X
b
B

So, the “real” PDA has to interleave these two steps: expanding variables by guessing grammar rules and
matching the input against the terminals at the top of the stack. We also need to mark the bottom of the
stack, so we can make sure we have completely emptied it after we’ve finished reading the input string.
So the final PDA looks like the following.
looprules

ǫ, ǫ → S$ ǫ, $ → ǫ
q0 q1 q2
The rules on the loop in this PDA are of two types:

• , X → w for every rule X → w in G.

• a, a → for every terminal a ∈ Σ.

102
For our example grammar, the loop rules are:

• S → XB, • B → cB,
• b, b → ,
• X → aXb, • B → ,
• c, c → .
• X → , • a, a → ,

15.3 Alternative proof of CNF effectiveness

Claim 15.3.1 if G is a context-free grammar in Chomsky normal form, and w is a string of length n ≥ 1,
then any derivation of w from any variable X contains exactly 2n − 1 steps.
We include the proof for the sake of completeness. This claim is actually true for all derivations, not
just leftmost ones. But the proof is easier to follow for leftmost derivations. We remind the reader that a
leftmost derivation if at every step the leftmost remaining variable is the one replaced.
Proof: Proof by induction on the length of w. Suppose that the start symbol of G is S.
Base of induction: If |w| = 1, then w contains only a single character c. So any derivation of w from a
variable X must contain exactly one step, using the rule X → c.
Induction: Suppose the claim is true for all strings of length < k, and let w be a string of length k and
suppose we have a leftmost derivation for w starting with some variable X.
Consider the first step in this derivation. If the first step uses a rule of the form X → c, then w contains
only a single character and we are back in the base case. So, let us consider the only other possibility: the
first step uses a rule of the form X → AB.
Then we can split up the derivation as follows: the first step expands X into AB. The next few steps
expand A into some (non-empty) string x. The final few steps expand B into some string y. Suppose that
|x| = i and |y| = j. Since w = xy, we have that i + j = k.
Because the grammar is in Chomsky normal form, neither A nor B is the (glorious) start symbol. So
neither A nor B can expand into the empty string. So x and y both contain some characters; that is, i > 0
and j > 0. This means that 0 < |x| = i < i + j = k and 0 < |y| = j < i + j = k. Namely, |x| < k and
|y| < k. So we can apply our inductive hypothesis to x and y.
∗ ∗
That is, since |x| = i, the derivation A ⇒ x must take exactly 2i−1 steps. Similarly, the derivation B ⇒ y
must take exactly 2j − 1 steps. But then the whole derivation of w from X takes 1 + (2i − 1) + (2j − 1) =
2(i + j) − 1 = 2k − 1 steps.
Some things to notice about this proof:

• We remove the first step in the derivation. Not, for example, the final step. In general, results about
grammars require removing the first step of a derivation or the top node in a parse tree.
• It almost never works right to start with a derivation or tree of size k and try to expand it into a
derivation or parse tree of size k + 1. Please avoid this common mistake.
As an example of why inherent hopelessness of arguing in this wrong direction, consider the following
grammar:

⇒ S→A|B
(G6) A → aaA |
B → bbB | b

All the even-length strings are generated from A and contain all a’s. We claim that all the odd-length
strings are generated from B and contain all b’s. As such, there is no way to prove this claim by
induction, by taking the parse tree for an even-length string and graft on a couple nodes to make a
parse tree for the next longer length string.

103
Chapter 16

Lecture 14: Repetition in context free

languages
10 March 2009

16.1 Generating new words

We are interested in phenomena of repetition in context free languages. We had seen that regular languages
repeat themselves if the strings are sufficiently long. We would like to make a similar statement about regular
languages, but unfortunately, while the general statement is correct, the details are somewhat more involved.

16.1.1 Example of repetition

As a concrete example, consider the following context-free grammar which is in Chomsky normal form (CNF).
(We remind the reader that any context free grammar can be converted into CNF, as such assuming that we
have a grammar in CNF form does not restrict our discussion.)

As a concrete example, consider the following grammar .....

S0 .......
..... ......
...... ......
.
..
...... ......
......
from the previous lecture: .
...... ......
..... ......
.....
A . .. ...
Z......
... .... .... ....
⇒ S0 → AZ | UB | a | SA | AS ...
...
...
... ..
....
....
...
....
....
.. .. ... ....
S → AZ | UB | a | SA | AS ...

A → b | AZ | UB | a | SA | AS
U
..
B .. ...
S... A ..
. .. ... ..
(G5) ..
.. .. ... ... ..
...
B→b ..
..
..
...
....
.
.
.
..
.
...
...
.
...
...
.
U→a a b A Z a S0
.. . ..
... ....
.
... ....
Z → SA ...
..
.. ...
..
...
... ...
...
...
...
.. .
.. .. ... ..
...

Next, consider the two words created from this grammar, as

b S .
A ..
S A
.. .. .. ...
... .. ... ..
... .. ..
.... ... .. ..
depicted on the right. .
...
.
...
.
...
.
a b a b

What if we wanted to make the second word longer without thinking too much? Well, both parse trees
have subtrees with nodes generated by the variable S. As such, we could just cut the subtree of the first
word using the variable S, and replace the subtree of S in the second word by this subtree, which would like
the following:

104
.
......
S0 .......
..... ......
...... ......
..
..
...... ......
......
..
...... ......
...... ......

A
.. ... ...
Z...... ..
S0 ..
... .... ... .... ... ....
.... ... ... ....
... ... .... .... ... ....
....
..
. ... ... ....
.... .... ....
. . ...

U ..
B ..
S A ..
S A ..
.. .. .. ...
... .. ... ..
... ..
.. ... ...
.. ... ... ..
... .. ... ... .. ... ... ...
..... ... ... ...
.... ... ... ....
.
a b A ..
Z.. ...
a A Z b
.. ... .. ...
.. ... ..... .. ... .....
.. ... ... .. ... ...
.. ..
... .
.
... ... .... ...
.. ..
b S .
A ..
b S ..
A ..
.. .. .. ..
.. .. ... ..
.. .. .. ...
.. ..
... ... .... .....

a b a b

Even more interestingly, we can do this cut and paste on the original tree:

...
........
S0 ...................
........ ........
........ ........
...
...
......... ........
........
....
........ ........
........ ........
........ ........
........ .....
A . .......
.......
.. Z....................
... ... ..
..... .......
... ..... ....... .......
... ... ....
......... .......
.......
..
... .. ....... .......
....... .......
.....
.
......
S0 ............. U B ......
S .......... A
...... ....... .. .. ...... ..
....... ....... ... .. ......
......
...
...
........ ....... .. .. ......
......
...... ..
......
. ....... .. ... ......
. ...... ..
....... ....... .. .. .
......
..
.. ...... ..
....... ....... .. .. .......
......
...... ..
....... .......
A . .....
...... Z.............. a b A ..
......
.....
Z.............. a
... ... ..... ...... .. .....
. ......
... ..... ...... ...... .. ...... ......
... ... .
....... ...... .. ...... ......
... ..... ...... .. ..... ......
.. ..... ......
..... ... ...... ......
.....
U
..
B
..
S ... A
..
b S ... A
..
... ... ..... .... ..... ...
..
.. ..... ..... .
.. ..... .....
.. ..... ..... .. ..... ..... ..
..
.. ... .
......
. ..... .. .
.....
. ..... ...
.. .
... ..... .. .
... ..... ..
.. .. ..... ..... .. ..... .... ..

a b A ....
Z...... a A . ..
Z ...... b
..
.. .... .... .. .... ...
.. ... .... .. ... ....
.. .... .... .. .... ....
.. .
.. .... .. .... ....
... ... ... ....
.
b S A
..
b S A
..
... ... .. ...
... ..
...
... .. .. ... ..
...
... ... ... ... ..
.. ... .. ... ... ..
... ...
A .
Z. .
b A .
Z
. .
b
..
.. ... ..... .. ... .....
.. ... ... ... ... ...
.. ... ... .. ... ...
... ... .. .. ... ..
. ...

b S A b S ..
A
..
... ..
.. .. .
.. .. .. ..
..
.. .. ...
... ... .... ..
..
.. ..
. .. ..
a b a b
(Pumping once.) (Pumping twice.)

Naturally, we can repeat this pumping operation (cutting and pasting a subtree) as many times as want,
see for example Figure 16.1. In particular, we get that the word abbi abi a, for any i, is in the language of the
grammar (G5). Notice that unlike the pumping lemma for regular languages, here the repetition happens in
two places in the string. We claim that such a repetition (in two places) in the word must happen for any
context free language, once we take a word which is sufficiently long.

16.2 The pumping lemma for CFG languages

So, assume we are given a context free grammar G which is in CNF, and it has m variables.

105
.............
..............
...............
S0 ...........................................
...............
.
...
...
...
.................
. ...............
............. ...............
............... ...............
............... ...............
............... ...............
............... ...............
............... .............
A . .. ......... ... .... .............
.
..............
.
Z .............
..............
..............
..............
... .... ........ .... ........ ..............
... ...
......... ....... .
. .
. .. ..............
.............
... ... . ........... ..............
.. ..
..................... ..............
...... ..............
..............
U B S A
.........
..
.......... .............
..... ...
..............
. . . .
.... ......... ... .
... .. ......... .
. .. ................. ..
.. .. ............. ............ ..
.. ... ............ ............. ..
... ..... ...
..................... .............
.......
..
.
.
........ .............
. . ............ ............. ..
............
A Z
........
a b .. ......... .
. ...............
....... ............
............
...................
a
.. ............ ............
.. ............ ............
.. ............ ............
............ ............
..
... ....................... ............
............
...
............ ...........
b .......... .
...
.......
. S ...........
.............. A
........ ....
...
.. ................. ...
.......... ........... ..
........... ........... ..
...
.............. ...........
.... ...
.
.... ... . ............. .
........... ........... ..
...........
A Z
....

... ... . ..
..... ...........
....... ..........
..........
..
.. .............
b
.......... ..........
.
.. .......... ..........
... .......... ..........
.. .......... ..........
.. ................... ..........
..........
.......... .........
b ..
........
......... S .........
...........
A .
........
...
. .... .......... ..
......... ......... ..
......... ......... ...
..
. .
............. .........
.......
..
.
.
.. .........
......... ......... ..
.........
A Z
..

.. .. ................
..... .........
........
.............
b
. ......... .........
... ........ .........
.. ........ .........
... ......... .........
.. .. .
... ......... .........
..... .........
b ...
........
S ................ .
A
........ ........ ...
........ ........ ..
...
...
......... ........
........ ..
..
....
........ ........ ...
....... ........ .
....... ......
A ...........
.... Z.................... b
.. .......
. .. .......
.. ....... .......
... .
..... .......
.... ....... .......
....... .......
.. ....... .......
.....
b ..
......
S .......... A
...... ....... ...
...... ...... ..
.
...
....... ......
...... ..
..
...... ...... ....
...... ......
...... ..... .
A .. .....
..
.....
Z.............. b
.. ...... ......
.. .
....... ......
......
.. .
......
. ......
... ...... ......
.. .....

b S ... A .
...
....
..... ..
.....
..... ..
..... ..... ..
..... ..... ..
...
. ..... ....
.... ....

A ..
Z ...... b
.. . .... ....
.. .... ....
.. .... ....
... .... ...
.. ....
..
b S A ..
.. ... ..
... ... ..
..
... ...
... ..
... ...

A Z b
... .. ...
..
.. ... .....
.. ... ...
... .... ...
..
b S ..
A .
.. ..
.. ...
.. ..
.. ..
... ...

a b
Figure 16.1: Naturally, one can pump the string as many times as one want, to get a longer and longer
string.

106
16.2.1 If a variable repeats

So, assume we have a parsing tree T for a word s (where the underlying grammar is in CNF), and there is a
path in T from the root to a leaf, such that a variables repeats twice. So, say nodes α and β have the same
variable (say S) stored in them:

The subtrees rooted at α and β corresponds to substrings of s:

sv
)

s
































In particular, the substrings su and sv break s into 5 parts:

x y z v w

Namely, s can be written as s = xyzvw.

Now, if we copy the subtree rooted at α and copy it to β, we get a new parse tree:

107
α

x y v w
0
β

y z v
The new tree is much bigger, and the new string it represents is s = xyyzvvw. In general, if we do this
cut & paste operation i − 1 times, we get the string

xy i zv i w.

16.2.2 How tall the parse tree have to be?

.
S...
.. ...
... ...
... ...
..
. ...
. .

A
..
Z
.. ..
.
... .. ....
We will refer to a parse generated from a context free grammar in CNF form as a .. ... ...
.. ... ...
... ... ...
CNF tree. A CNF tree has the special property that the parent of a leaf as a single
child which is a terminal. The height of a tree is the maximum number of edges on a b S.
A .
.. ..
.. ..
... ...
path from the root of the tree to a leaf. Thus, the tree depicted on the right has height ... .
.
.
.. ..
3. a b
The grammar G has m variables. As such, if the parse tree T has a path π from
the root of length k, and k > m (i.e., the path has k edges), then it must contain at
least m + 1 variables (the last edge is between a variable and a terminal). As such, by
the pigeon hole principle, there must be a repeated variable along π. In particular, a
parse tree that does not have a repeated variable have height at most m.
Since G is in CNF, its a binary tree, and a variable either has two children, or a single child which is a
leaf (and that leaf contains a single character of the input). As such, a tree of height at most m, contains at
most 2m leaves1 , and represents as such a string of length at most 2m .
We restate the above observation formally for the record.

Observation 16.2.1 If a CNF parse tree (or a subtree of such a tree) has height h, then the string it
generates is of length at most 2h .

Lemma 16.2.2 Let G be a grammar given in Chomsky Normal Form (CNF), and consider a word s ∈ L(G),
such that ` = |s| is strictly larger than 2m (i.e., ` > 2m ). Then, any parse tree T for s (generated by G)
must have a path from the root to some leaf with a repeated variable on it.
fact, a CNF tree of height m can have at most 2m−1 leaves (figure out why), but thats a subtlety we will ignore that
1 In

anyway works in our favor.

108
Proof: Assume for the sake of contradiction that T has no repeated variable on any path from the root, then
the height of T is at most m. But a parse tree of height m for a CNF can generate a string of length at most
2m . A contradiction, since ` = |s| > 2m .

16.2.3 Pumping Lemma for CNF grammars

We need the following observation.

Lemma 16.2.3 (CNF is effective.) In a CNF parse tree T , if u and v are two nodes, both storing variables
in them, and u is an ancestor of v, then the string Su generated by the subtree of u is strictly longer than
the substring Sv generated by the subtree of u. Namely, |Su | > |Sv |. (Of course, Sv is a substring of Su .)

Proof: Assume that the node u stores a variable X, and that we had used the rule X → BC to generate its
two children uL and uR . Furthermore, assume that uL and uR generated the strings SL and SR , respectively.
The string generated by v must be a substring of either SL or SR . However, CNF has the property that no
variable2 can generate the empty word . As such |SR | > 0 and |SL | > 0.
In particular, assume without loss of generality, that v is in the left subtree of u, and as such Sv is a
substring of SL . We have that
|Sv | ≤ |SL | < |SL | + |SR | = |Su | .

Lemma 16.2.4 Let T be a tree, and π be the longest path in the tree realizing the height h of T . Fix k ≥ 0,
and let u be the kth node from the end of π (i.e., u is in distance h − k from the root of T ). Then the tree
rooted at u has height at most k.

Proof: Let r be the root of T , and assume, for the sake of contradiction, that Tu (i.e., the subtree rooted
at u) has height larger than k, and let σ be the path from u to the leaf γ of Tu realizing this height (i.e., the
length of σ is > k). Next, consider the path formed by concatenating the path in T from r to u with the
path σ. Clearly, this is a new path of length h − k + |σ| > h that leads from the root of T into a leaf of T .
As such, the height of T is larger than h, which is a contradiction.

Lemma 16.2.5 (Pumping lemma for Chomsky Normal Form (CNF).) Let G be a CNF context-free
grammar with m variables in it. Then, given any word S in L(G) of length > 2m , one can break S into 5
substrings S = xyzvw, such that for any i ≥ 0, we have that xy i zv i w is a word in L(G). In addition, the
following holds:

1. The strings y and v are not both empty (i.e., the pumping is getting us new words).

2. |yzv| ≤ 2m .

Proof: Let T be a CNF parse tree for S (generated by G). Since ` = |s| > 2m , by Lemma 16.2.2, there is
a path in T from its root to a leaf which has a repeated variable (and its length is longer than m). In fact,
let π be the longest path in T from the root to a leaf (i.e., π is the path realizing the height of the tree T ).
We know that T has more than m + 1 variables on it and as such it has a repetition.
We need to be a bit careful in picking the two nodes α and β on π to apply the pumping to. In particular,
let α be the last node on π such that there is a repeated appearance of the symbol stored in u later in the
path. Clearly, the length of the subpath τ of π starting at α till the end of π has at most m symbols on it
(because otherwise, there would be another repetition on π). Let β be the node of τ ⊆ π which has repetition
of the symbol stored in α.
By Lemma 16.2.4 the subtree Tα (i.e., the subtree of T rooted at α) has height at most m. As above, Tα
and Tβ generate two strings Sα and Sβ , respectively. By Observation 16.2.1, we have that |Sα | ≤ 2m . By
2 Except the start variable, but this not relevant here.

109
Lemma 16.2.3, we have that |Sα | > |Sβ |. As such, the two substrings Sα and Sβ breaks S into 5 substrings
S = xyzvw. Here, we have
Sα =
z }| {
S=x y z v w.
|{z}
=Sβ

As such, we know that |yv| = |Sα | − |Sβ | > 0. Namely, the strings y and v are not both empty. Furthermore,
|yzv| = |Sα | ≤ 2m .
The remaining task is to show the pumping. Indeed, if we replace Tβ by the tree Tα we get a parse tree
generating the string xy 2 zv 2 w. If we repeat this process i − 1 times, we get the word
xy i zv i w ∈ L(G) ,
for any i, establishing the lemma.

Lemma 16.2.6 (Pumping lemma for context-free languages.) If L is a context-free language, then
there is a number p (the pumping length) where, if S is any string in L of length at least p, then S may be
divided into five pieces S = xyzvw satisfying the conditions:
1. for any i ≥ 0, we have xy i zv i w ∈ L,
2. |yv| > 0,
3. and |yzv| ≤ p.

Proof: Since L is context free it has a CNF grammar G that generates it. Now, if m is the number of variables
in G, then for p = 2m + 1, the lemma follows by Lemma 16.2.5.

16.3 Languages which are not context-free

16.3.1 The language an bn cn is not context-free
n o

Lemma 16.3.1 The language L = an bn cn n ≥ 0 is not context-free.

Proof: Assume, for the sake of contradiction, that L is context-free, and apply the Pumping Lemma to it
(Lemma 16.2.6). As such, there exists p > 0 such that any word in L longer than p can be pumped. So,
consider the word S = ap+1 bp+1 cp+1 . By the pumping lemma, it can be written as ap+1 bp+1 cp+1 = xyzvw,
where |yzv| ≤ p.
We claim, that yzv can made out of only two characters. Indeed, if yzv contained all three characters,
it would have to contain the string bp+1 as a substring (as bp+1 separates all the appearances of a from all
the appearances of c in S). This would require that |yzv| > p but we know that |yzv| ≤ p.
In particular, let ia , ib and ic be the number of as, bs and cs in the string yv, respectively. All we know
is that ia + ib + ic = |yv| > 0 and that ia = 0 or ic = 0. Namely, ia 6= ib or ib 6= ic (the case ia 6= ic implies
one of these two cases). In particular, by the pumping lemma, the word
S2 = xy 2 zv 2 w ∈ L.
We have the following:
character how many times it appears in S2
a p + 1 + ia
b p + 1 + ib
c p + 1 + ic
If ia 6= ib then S2 , by the above table, does not have the same number of as and bs and as such it is not in
L.
If ib 6= ic then S2 , by the above table, does not have the same number of bs and cs and as such it is not
in L.
In either case, we get that S2 ∈/ L, which is a contradiction. Namely, our assumption that L is context-free
is false.

110
16.4 Closure properties
16.4.1 Context-free languages are not closed under intersection
We know that the languages
n o n o

L1 = a∗ bn cn n ≥ 0 and L2 = an bn c∗ n ≥ 0

are context-free (prove this). But

n o

L = an bn cn n ≥ 0 = L1 ∩ L2

is not context-free by Lemma 16.3.1. We conclude that the intersection of two context-free languages is not
necessarily context-free.

Lemma 16.4.1 Context-free languages are not closed under intersection.

16.4.2 Context-free languages are not closed under complement

Lemma 16.4.2 Context-free languages are not closed under complement.
Proof: Consider the language n o

L3 = ai bj ck i 6= j or j 6= k .

The language L3 is clearly context-free (why?). Consider its complement language L3 . If L3 is context-
free, and context-free languages are closed under complement (which we are assuming here for the sake of
contradiction), then L3 is context-free.
Now, L3 contains many strings we are not interested in (for example, cccccba. So, let us intersect it
with the regular language a∗ b∗ c∗ . We proved (in previous lecture), that the intersection of a context-free
language and a regular language is still context-free. As such,
b = L3 ∩ {a∗ b∗ c∗ }
L

is context-free. But this language contains all the strings ai bj ck , where i = j and j = k. That is, its the
language of Lemma 16.3.1, which we know is not context-free. A contradiction.

Proof: (Alternative proof.) The language L = an bn cn can be written, by De Morgan laws, as

L = an bn c∗ ∩ a∗ bn cn = an bn c∗ ∪ a∗ bn cn . (16.1)

We assume for the sake of contradiction, that context-free languages are closed under complement, and we
already know that they are closed under union. However, the languages

an bn c∗ and a∗ bn cn

are context-free. As such, by closure properties and Eq. (16.1), the language L is context-free, which is a
contradiction to Lemma 16.3.1.

111
Chapter 17

Leftover: PDA to CFG conversion

This lecture covers the construction for converting a PDA to an equivalent CFG (Section 2.2 of Sipser).
We also cover the Chomsky Normal Form for context-free grammars and an example of grammar-based
induction.
If it looks like this lecture is too long, we can push the grammar-based induction part into lecture 15.

17.1 NFA to CFG conversion

Before converting a PDA to a context-free grammar, let’s first see how to convert an NFA to a context-free
grammar.
The idea is quite simple: We introduce a symbol in our grammar for each state in the given NFA
N = (Q, Σ, δ, q0 , F ). We introduce a symbol for every state, and a rule for every transition. In particular, a
state qj would correspond to a symbol Lj (naturally, the language generated by Lj is the suffix language of
the state qj ). A transition qj ∈ δ(qi , x), for x ∈ Σ , would be translated into the rule

Li → xLj

We also add for any accepting state qi the rule Li → . (Note, that x can be . Then the -transition of
moving from qi to qj , is translated into the rule Li → Lj .

a
As a concrete example, consider the NFA on the right. We q1 q2
c b a
introduce a CFG with symbols L1 , L2 , L3 , where L1 is the initial
symbol. We have the following rules: b
b b
L1 → aL2 | bL3
L2 → bL1 | bL2 | aL3 q3
L3 → bL2 | cL1 | aL3 | bL3 | .
a, b
Interestingly, the state q3 is an accept state, and as such we add
the transition L3 → to the rules.

17.2 PDA to CFG conversion

In this section, we will show how to convert a PDA into an equivalent CFG. We will first convert the PDA
into a “normalized” form, and then we will generate a context free grammar (CFG) that explicitly writes
down all the input strings that get the PDA from one state to another. Next, we will convert this normalized
PDA into a CFG.

112
17.2.1 From PDA to a normalized PDA
Given a PDA N 0 we would like to convert into a PDA N that has the following three properties:

(A) There is a single accept state qacc .

(B) It empties the stack before accepting.

Transforming a given PDA into an equivalent PDA with these properties might seem like a tool order
initially, but in fact can be easily done with some care.

(A) There is a single accept state qacc .

This can be easily enforced by creating a new state qacc , and creating -transitions from the old accept
states to this new accept state.

(B) It empties the stack before accepting.

This can be easily enforced by pushing a special character $ into the stack in the start state (introducing
a new start state in the process). Next, we introduce a new temporary state qtemp which replaces qacc ,
which has transitions popping all characters from the stack (excepting $), and finally, we introduce the
transition
$
qtemp −→ qacc .
And of course, qacc is the only accept state in this new automata.

(C) Each transition either pushes a symbol to the stack or pop it, but not both.
One bad case for us is a transition that both pushes and pops from the stack. For example, we have a
transition
x,b→c
qi −−−−→ qj .
(Read the character x from the input, pop b from the stack, and push c instead of it.) To remove such
0
transitions, we will introduce a special state qtemp , and introduce two transitions – one doing the pop,
the other doing the push. Formally, for the above transition, we will introduce the transitions

0
x,b→ 0 ,→c
qi −−−−→ qtemp and qtemp −−−−→ qj .

Similarly, if we have a transition that neither pushes nor pops anything, we replace it with a sequence
of two transitions that push and then immediately pop some newly-created dummy stack symbol.

In the end of this normalization process, we end up with an equivalent PDA N that complies with our
requirements.

17.2.2 From a normalized PDA to CFG

intuition
Consider a run of a normalized PDA N = (Q, Σ, δ, qinit , F ) that accepts a word w. It starts at a state qinit
(with an empty stack), and ends up at qacc , again, with an empty stack. As such, it is natural to define for
any two states p, q ∈ Q, the language Lp,q of all the strings that starts at p with an empty stack, and end
up in q with an empty stack.
For each pair of states p and q, we will have a symbol Sp,q in our CFG for the language Lp,q . Sp,q will
generate all the strings in Lp,q . The language of N is then Lqinit ,qacc .
So, consider a word w ∈ Lp,q and how the PDA N works for this word. In particular, consider the stack
as the PDA starting at p (with an empty stack) handles w.

113
Stack is empty in the middle. If during this execution, the stack ever becomes empty at some inter-
mediate state r, then a word of Lp,q can be formed by concatenating a word of Lp,r (that got N from state
p into state r with an empty stack), and a word of Lr,q (that got N from r to q).

Stack never empty in the middle. The other possibility is that the stack is never empty in the middle
of the execution as N transits from p to q, for the input w ∈ Lp,q . But then, it must be that the first
transition (from p into say p1 ) must have been a push, and the last transition into q (from say q1 ) was a
pop. Furthermore, the this pop transition, popped exactly the character pushed into the stack by the first
transition (from p to p1 ). Thus, if the PDA read the character x (from the input) as it moved from p to p1
and read the letter y (from the input) as it moved from q1 to q, then

w = xw0 y,

where w0 is an input that causes the PDA N to start from p1 with an empty stack, and end up in q1 with
an empty stack. Namely, w0 ∈ Lp1 q1 .
Formally, if there is a push transition (pushing z into the stack) from p to p1 (reading x) and pop
transition from q1 to q (popping the same z from the stack and reading y), then a word in Lp,q can be
constructed from the expression
xLp1 ,q1 y.

Notice that x and/or y could be , if one of the two transitions didn’t read anything from the input.

The construction
We now explicitly state the construction. First, for every state p, we introduce the rule

Sp,p → .

The case that the stack is empty in the middle of transitioning from p to q is captured by introducing,
for any states p, q, r of N , we define the following rule in our CFG:

Sp,q → Sp,r Sr,q .

As for the other case, that the stack is never empty, we specify for any given states p, p1 , q1 , r of N , such
that there is a push transition from p to p1 and a pop transition from q1 to r (that push and pop the same
letter), we introduce an appropriate rule. Formally, for any p, p1 , q1 , r, if there are transitions in N of the
form
x,→z y,z→
p −−−−→ p1 and q1 −−−−→ q .
| {z } | {z }
push z pop z

The introduce the rule

Sp,q → xSp1 ,q1 y
into the CFG.
We create such rules for all possible choices of states of N . Let C be the resulting grammar. This
completes the description of how we constructed the CFG equivalent to the given PDA N .
We claim that Sqinit ,qacc in the grammar C generates all the words that the PDA N accepts.

Remark 17.2.1 At the start of our construction, we got rid of all the transitions that don’t touch the stack
at all. Another option would have been to handle them with a variation of our second type of context-free
rule. That is, we have a transition from a state p to p1 that does not touch the stack (and reads the character
x from the input). A small extension of the above construction would give us the transition:

Sp,q → xSp1 ,q .

114
17.2.3 Proof of correctness
Here, prove that the language generated by Sqinit ,qacc is the language recognized by the PDA N .

Claim 17.2.2 If the string w can be generated by Sp,q then there is an execution of N stating at p (with an
empty stack) and ending at q (with an empty stack).

Proof: The proof is by induction on the number n of steps used in the derivation generating w from Sp,q
For the base of the induction, consider n = 1. The only rules in C that have no symbols in them are of
the form
Sp,p → .
Which implies the claim trivially.
Thus, consider the case where n > 1, and assume that we proved that any word generated by at most n
derivation steps (in the CFG grammar C) can be realized by an execution of the PDA N . We would like to
prove the inductive step for n + 1. So, assume that w is generated from Sp,q using n + 1 derivation steps.
There are two possibilities for what is the first derivation rule used. The first possibility is that we used the
rule
Sp,q → Sp,r Sr,q .
|{z} |{z} |{z} |{z}
w = w1 w2

As such, w1 is generated from Sp,r in at most (n + 1) − 1 = n steps, and w2 is generated from Sr,q in at
most (n + 1) − 1 = n steps. As such, by induction, there is an execution of N starting at p and ending in r
(with empty stack in the beginning and the end) and, similarly, there is an execution of N starting at r and
ending q (with empty stack in the beginning and the end). By performing these two execution one of after
the other, we end up with an execution starting at p and ending at q, with an empty stack on both ends,
such that the PDA N reads the input w during this execution. Thus, this establishes the claim in this case.
The other possibility, is that w was derived by first applying a rule of the form

Sp,q → x Sp1 ,q1 y .

|{z} |{z} |{z} | {z } |{z}
w = x w0 y

But then, by construction, the PDA N must have two transitions

x,→z y,z→
p −−−−→ p1 and q1 −−−−→ q (17.1)

that generated this rule. Furthermore, by induction, the word w0 was generated from Sp1 ,q1 using n derivation
steps. As such, there exists a compliant execution X from p1 to q1 generating w0 . Thus, if we start at p,
apply the first transition of Eq. (17.1), then the execution X and then the second transition of Eq. (17.1),
then we end up with a complaint execution of N that starts at p, ends at q (with empty stack on both ends),
and reads the string w, which establishes the claim in this case, since we showed an execution that reads N .

Claim 17.2.3 If there is an execution of N (with empty stack in both ends) starting at a state p and ending
at a state q, that reads the string w, then w can be generated by Sp,q .

Proof: The proof is somewhat similar to the previous proof. Consider the execution for w, and assume
that it takes n steps. We will prove the claim by induction on n.
For n = 0, the execution is empty, and starts at p and ends at q = p. But then, w is , and it can be
derived by Sp,p since the CFG C has the rule Sp,p → .
Otherwise, for n > 0, assume by induction that we had proved the claim for all executions of length n,
and we now consider an execution of length n + 1.
If the first transition in the execution is a push to the stack of a character z, and z is being popped by
the last transition in the execution, then the first and last transitions are of the form
x,→z y,z→
p −−−−→ p1 and q −−−−→ q ,
| {z } |1 {z }

115
respectively, and furthermore w = xw0 y. As such, we have an execution of length (n + 1) − 1 ≤ n that reads
w0 , and by induction, w0 can be generated by the symbol Sp1 ,q1 . But then, by construction, the rule

Sp,q → xSp1 ,q1 y

is in the CFG C, and as such w can be generated by Sp,q , as claimed.

The other possibility is that z is being popped out at some earlier stages, as the PDA N enters a state r
(after reading a prefix of w, denoted by w1 ). But then, arguing as above, we can break w into two strings
w1 and w2 , such that w = w1 w2 , and by induction, w1 can be generated by the rule Sp,r and w2 can be
generated by the rule Sr,q . But then, the CFG C contains the rule

Sp,q → Sp,r Sr,q .

Which implies that Sp,q can generate the string w, as claimed.

As such, we conclude the following:

Lemma 17.2.4 If the language L is accepted by a PDA N then L is a context-free language.

Together with the results from earlier lectures, we can conclude the following.

Theorem 17.2.5 A language L is context-free if and only if there is a PDA that recognizes it.

116
Chapter 18

Lecture 15: CYK Parsing Algorithm

3 March 2009

18.1 CYK parsing

18.1.1 Discussion
We already saw that one can decide if a word is in the language defined by a context-free grammar, but
the construction made no attempt to be practical. There were two problems with this algorithm. First, we
converted the grammar to be in Chomsky Normal Form (CNF). Most practical applications need the parse
to show the structure using the original input grammar. Second, our method blindly generated all parse
trees of the right size, without regard to what was in the string to be parsed. This is inefficient since there
may be a large number of parse trees (i.e., exponential in the length of the word) of this size.
The Cocke-Younger-Kasami (CYK) algorithm solves the second of these problems, using a table data-
structure called the chart. This basic technique can be extended (e.g. Earley’s algorithm) to handle
grammars that are not in Chomsky Normal Form and to linear-time parsing for special types of CFGs.
In general, the number of parses for a string w is exponential in the length of w. For example, consider
the sentence “Mr Plum kill Ms Marple at midnight in the bedroom with a sword.” There are three
prepositional phrases: “at midnight”, “in the bedroom,” and “with a sword.” Each of these can either
be describing the main action (e.g. the killing was done with a sword) or the noun right before it (e.g. there
are several bedrooms and we mean the one with a sword hanging over the dresser). Since each decision is
independent, a sentence with k prepositional phrases like this has 2k possible parses.1
So it’s really bad to organize parsing by considering all possible parse trees. Instead,
consider all substrings
n n(n+1) n+1
in our input. If w has length n, then it has Σk=1 (n − k + 1) = 2 = 2 substrings.2 CYK computes
a table summarizing the possible parses for each substring. From the table, we can quickly tell whether an
input has a parse and extract one representative parse tree.3

18.1.2 CYK by example

Suppose the input sentence w is “Jeff trains geometry students” and the grammar has start symbol
Sand the following rules:

=⇒ S → N VP
VN → N N
VP → V N
N → students | Jeff | geometry | trains
V → trains

1 In real life, long sentences in news reports often exhibit versions of this problem.
2 Draw an n by n + 1 rectangle and fill in the lower half.
3 It still takes exponential time to extract all parse trees from the table, but we usually interested only in one of these trees.

117
Given a string w of length n, we build a triangular table with n rows and n columns. Conceptually, we
write w below the bottom row of the table. The ith column correspond to the ith word. The cell at the
ith column and the jth row (from the bottom) of the table corresponds to the substring starting the ith
character of length j. The following is the table, and the substrings each entry corresponds to.

len
Jeff trains
4
geometry students
3 Jeff trains geometry trains geometry students
2 Jeff trains trains geometry geometry students
1 Jeff trains geometry students
Jeff trains geometry students
first word in substring

CYK builds a table containing a cell for each substring. The cell for a substring x contains a list of
variables V from which we can derive x (in one or more steps).

4
3
length 2
1
Jeff trains geometry students
first word in substring

The bottom row contains the variables that can derive each substring of length 1. This is easy to fill in:

4
3
length 2
1 N N,V N N
Jeff trains geometry students
first word in substring

Now we fill the table row-by-row, moving upwards. To fill in the cell for a 2-word substring x, we look
at the labels in the cells for its two constituent words and see what rules could derive this pair of labels. In
this case, we use the rules N → N N and VP → V N to produce:

4
3
length 2 N N,VP N
1 N N,V N N
Jeff trains geometry students
first word in substring

For each longer substring x, we have to consider all the ways to divide x into two shorter substrings. For
example, suppose x is the substring of length 3 starting with “trains”. This can be divided into divided into
(a) “trains geometry” plus “students” or (b) “trains” plus “geometry students.”
Consider option (a). Looking at the lower rows of the table, “students” has label N. One label for “trains
geometry” is VP , but we don’t have any rule whose righthand side contains VP followed by N. The other
label for “trains geometry” is N. In this case, we find the rule N → N N. So one label for x is N. (That is, x
is one big long compound noun.)
Now consider option (b). Again, we have the possibility that both parts have label N. But we also find
that “trains” could have the label V. We can then apply the rule VP → V N to add the label VP to the cell
for x.

118
CYK ( G, w )
G = (V, Σ, R, S), Σ ∪ V = {X1 , . . . , Xr }, w = w1 w2 . . . wn .
begin
Initialize the 3d array B[1 . . . n, 1 . . . n, 1 . . . r] to FALSE
for i = 1 to n do
for (Xj → x) ∈ R do
if x = wi then B[i, i, j] ← TRUE.
for i = 2 to n do /* Length of span */
for L = 1 to n − i + 1 do /* Start of span */
R=L+i−1 /* Current span s = wL wL+1 . . . wR */
for M = L + 1 to R do /* Partition of span */
/* x = wL wL+1 . . . wM −1 , y = wM wM +1 . . . wR , and s = xy */
for (Xα → Xβ Xγ ) ∈ R do
/* Can we match Xβ to x and Xγ to y? */
if B[L, M − 1, β] and B[M, R, γ] then
B[L, R, α] ← TRUE /* If so, then can generate s by Xα ! */
for i = 1 to r do
if B[1, n, i] then return TRUE
return FALSE

Figure 18.1: The CYK algorithm.

4
3 N,VP
length 2 N N,VP N
1 N N,V N N
Jeff trains geometry students
first word in substring

Repeating this procedure for the remaining two cells, we get:

4 N,S
3 N,S N,VP
length 2 N N,VP N
1 N N,V N N
Jeff trains geometry students
first word in substring

Remember that a string is in the language if it can be derived from the start symbol S. The top cell in
the table contains the variables from which we can derive the entire input string. Since S is in that top cell,
we know that our string is in the language.
By adding some simple annotations to these tables as we fill them in, we can make it easy to read out
an entire parse tree by tracing downwards from the top cell. In this case, the tree:
S

N VP

V N

N N

Jeff trains geometry students

119

We have O n2 cells in the table. For each cell, we have to consider n ways to divide its substring into
two smaller substrings. So the table-filling procedure takes only O n3 time.

18.2 The CYK algorithm

In general, we get the following result.

Theorem 18.2.1 Let G = (V, Σ, R, S) be a grammar in CNF with r = |Σ| + |V| variables and terminals,
and t = |R| rules. Let w ∈ Σ∗ be a word of length n. Then, one can compute a parse tree for w using G, if
w ∈ L(G). The running time of the algorithm is O(n3 t).
The result just follow from the CYK algorithm depicted in Figure 18.1. Note, that our pseudo-code just
decides if a word can be generated by a grammar. With slight modifications, one can even generate the
parse tree.

120
Chapter 19

Lecture 16: Recursive automatas

17 March 2009

19.1 Recursive automata

A finite automaton can be seen as a program with only a finite amount of memory. A recursive automaton
is like a program which can use recursion (calling procedures recursively), but again over a finite amount
of memory in its variable space. Note that the recursion, which is typically handled by using a stack,
gives a limited form of infinite memory to the machine, which it can use to accept certain non-regular
languages. It turns out that the recursive definition of a language defined using a context-free grammar
precisely corresponds to recursion in a finite-state recursive automaton.

19.1.1 Formal definition of RAs

A recursive automaton (RA) over Σ is made up of a finite set of NFAs that can call each other (like in a
programming language), perhaps recursively, in order to check if a word belongs to a language.

Definition 19.1.1 A recursive automaton (RA) over Σ is a tuple

n o

M, main, DM m ∈ M ,

where

• M is a finite set of module names,

• main ∈ M is the initial module,

• For each m ∈ M , there is an associated automaton Dm = (Qm , Σ ∪ M, δm , q0m , Fm ) which is an NFA

over the alphabet Σ ∪ M . In other words, Qm is a finite set of states, q0m ∈ Qm is the initial state
(of the module m), Fm ⊆ Qm is the set of final states of the module m (from where the module can
return), and δm : Qm × (Σ ∪ M ∪ {}) → 2Qm is the (non-deterministic) transition function.

• For any m, m0 ∈ M , m 6= m0 we have Qm ∩ Qm0 = ∅ (the set of states of different modules are disjoint).

Intuitively, we view a recursive automaton as a set of procedures/modules, where the execution starts
with the main-module, and the automaton processes the word by calling modules recursively.

Example of a recursive automata

n o

Let Σ = {0, 1} and let L = 0n 1n n ∈ N . The language L is accepted by the following recursive automa-
ton.

121
main: ε
q0 q3

0 1

q1 main q2

Why? The recursive automaton consists of single module, which is also the main module. The module
either accepts , or reads 0, calls itself, and after returning from the call, reads 1 and reaches a final state
(at which point it can return if it was called). In order to accept, we require the run to return from all calls
and reach the final state of the module main.
For example, the recursive automaton accepts 01 because of the following execution
0 call main return 1
q0 −
→ q1 −−−−−−→ q0 →
− q3 −−−−→ q2 −
→ q3 .

→ q 0 calls the module m; the module m starts with its initial

m
Note that using a transition of the form q −
state, will process letters (perhaps recursively calling more modules), and when it reaches a final state will
return to the state q 0 in the calling module.

Formal definition of acceptance

Stack. We first need the concept of a stack . A stack s is a list of elements. The top of the stack (TOS)
is the first element in the list, denoted by topOfStack(() s). Pushing an element x into a stack (i.e., push
operation) s, is equivalent to creating a new list, with the first element being x, and the rest of the list being
s. We denote the resulting stack by push(s, x). Similarly, popping the stack s (i.e., pop operation) is the
list created from removing the first element of s. We will denote the resulting stack by pop(s).
We denote the empty stack by hi. A stack containing the elements x, y, z (in this order) is written as
s = hx, y, zi. Here topOfStack(s) = x, pop(s) = hy, zi and push(s, b) = hb, x, y, zi.
n o

Acceptance. Formally, let C = M, main, (Qm , Σ ∪ M, δm , q0m , Fm ) m ∈ M be a recursive automa-
ton.
We define a run of C on a word w. Since the modules can call each other recursively, we define the run
∗
using a stack. When C is in state q and calls a module m using the transition q =⇒ mq 0 , we push q 0 onto
the stack so that we know where to go to when we return from the call. When we return from a call, we pop
the stack and go to the
S state stored on the top of the stack.
Formally, let Q = m∈M Qm be the set of all states in the automaton C. A configuration of C is a pair
(q, s) where q ∈ Q and s is a stack.
We say that a word w is accepted by C provided we can write w = y1 . . . yk , such that each yi ∈ Σ ∪ {},
and there is a sequence of k + 1 configurations (q0 , s0 ), . . . (qk , sk ), such that

• q0 = q0main and s0 = hi.

We start with the initial state of the main module with the stack being empty.

• qk ∈ Fmain and sk = hi.

We end with a final state of the main module with the stack being empty (i.e. we expect all calls to
have returned).

• For every i < k, one of the following must hold:

Internal: qi ∈ Qm , qi+1 ∈ δm (qi , yi+1 ), and si+1 = si .

0
Call: qi ∈ Qm , yi+1 = , q 0 ∈ δm (qi , m0 ), qi+1 = q0m and si+1 = push(si , q 0 ).
Return: qi ∈ Fm , yi+1 = , qi+1 = topOfStack(si ) and si+1 = pop(si ).

122
19.2 CFGs and recursive automata
We will now show that context-free grammars and recursive automata accept precisely the same class of
languages.

19.2.1 Converting a CFG into a recursive automata

Given a CFG, we want to construct a recursive automaton for the language generated by the CFG. Let us
first do this for an example. n o

Consider the grammar (where S is the start variable) which generates an cbn n ∈ N :

=⇒ S → aSb | aBb
B → c.

Each variable in the CFG corresponds to a language; this language is recursively defined using other
variables. We hence look upon each variable as a module; and define modules that accept words by calling
other modules recursively.
For example, the recursive automaton for the above grammar is:
S: S
q3 q4
a b
q0 q5
a b
q1 B q2

B: c
q6 q7′

(Here S is the main modules of the recursive automaton.)

Formal construction.
n Let G = (V, Σ, R, S) be
the given o context free grammar.
m
Let DG = M, S, (Qm , Σ ∪ M, δm , q0 , Fm ) m ∈ M where M = V, and the main module is S. Fur-
X

thermore, for each X ∈ nM , let DX = QX , Σo∪ M, δX , q0 , FX ) be an NFA that accepts the (finite, and hence)

regular language LX = w (X → w) ∈ R .
X X X
Let us elaborate on the construction of DX . We create two special states qinit and qfinal . Here qinit is
X
the initial state of DX and qfinal is the accepting state of DX . Now, consider a rule (X → w) ∈ R. We will
X X
introduce a path of length |w| in DX (corresponding to w) leading from qinit to qfinal . Creating this path
requires introducing new “dummy” states in the middle of the path, if |w| > 1. The ith transition along
this path reads the ith character of w. Naturally, if this ith character is a variable, then this edge would
correspond to a recursive call to the corresponding module. As such, if the variable X has k rules in the
X X
grammar G, then DX would contain k disjoint paths from qinit to qfinal , corresponding to each such rule. For
X X
example, if we have the derivation (X → ) ∈ R, then we have an -transition from qinit to qfinal .

19.2.2 Converting a recursive automata into a CFG

m
Let C = (M, main, {(Qm , Σ ∪ M, δm , qinit , Fm )}m∈M ) be a recursive automaton. We construct a CFG GC =
S
(V, Σ, R, S) with V = Xq | q ∈ m∈M Qm .
Intuitively, the variable Xq will represent the set of all words accepted by starting in state q and ending
in a final state of the module q is in (however, on recursive calls to this module, we still enter at the original
initial state of the module).
The set of rules R is generated as follows.

123
• Regular transitions. For any m ∈ M , q, q 0 ∈ Qm , c ∈ Σ ∪ {}, if q 0 ∈ δm (q, c), then the rule
Xq → cXq0 is added to R.
Intuitively, a transition within a module is simulated by generating the letter on the transition and
generating a variable that stands for the language generated from the next state.

• Recursive call transitions. for all m, m0 ∈ M and q, q 0 ∈ Qm , if q 0 ∈ δm (q, m0 ), then the rule
Xq → Xqm0 Xq0 is in R,
init

Intuitively, if q 0 ∈ δm (q, m0 ), then Xq can generate a word of the form xy where x is accepted using a
call to module m and y is accepted from the state q 0 .

• Acceptance/return rules.
S
For any q ∈ m∈M Fm , we add Xq → to R.
When arriving at a final state, we can stop generating letters and return from the recursive call.

The initial variable S is Xqinit

main ; that is, the variable corresponding to the initial state of the main module.

We have a CFG and it is not too hard to see intuitively that the language generated by this grammar
is equal to the RA C language. We will not prove it formally here, but we state the result for the sake of
completeness.

Lemma 19.2.1 L(GC ) = L(C).

An example of conversion of a RA into a CFG

Consider the following recursive automaton, which accepts the language
n o

ai bj ck i = j or j = k ,

and the grammar generating it.

main: m1
p1 p2

124
19.3 More examples
19.3.1 Example 1: RA for the language an b2n
n o

Let us design a recursive automaton for the language L = an b2n n ∈ N . We would like to generate this
recursively. How do we generate an+1 b2n+2 using a procedure to generate an b2n ? We read a followed by
a call to generate an b2n , and follow that by generating two b’s. The “base-case” of this recursion is when
n = 0, when we must accept . This leads us to the following automaton:
main: a main b b
p1 p2 p3 p4 p5

19.3.2 Example 2: Palindrome

Let us design a recursive automaton for the language
n o
∗
L = w ∈ {a, b, c} w is a palindrome .

Thinking recursively, the smallest palindromes are , a, b, c, and we can construct a longer palindrome by
generating awa, bwb, cwc, where w is a smaller palindrome. This give us the following recursive automaton:
main: main
p2 p3

a a
p4 main p5
b b
p1 p8
c c
p6 main p7

a, b, c, ǫ

19.3.3 Example 3: #a = #b
∗
Let us design a recursive automaton for the language L containing all strings w ∈ {a, b} that has an equal
number of a’s and b’s.
Let w be a string, of length at least one, with equal number of a’s and b’s.

Case 1: w starts with a. As we read longer and longer prefixes of w, we have the number of a’s seen is
more than the number of b’s seen. This situation can continue, but we must reach a place when
the number of a’s seen is precisely the number of b’s seen (at worst at the end of the word). Let us
consider some prefix longer than a where this happens. Then we have that w = aw1 bw2 , where the
number of a’s and b’s in aw1 b is the same, i.e. the number of a’s and b’s in w1 are the same. Hence
the number of a’s and b’s in w2 are also the same.

Case 2: If w starts with b, then by a similar argument as above, w = bw1 aw2 for some (smaller) words w1
and w2 in L.

125
Hence any word w in L of length at least one is of the form aw1 bw2 or bw1 aw2 , where w1 , w2 ∈ L, and
they are strictly shorter than w. Also, note is in L. So this gives us the following recursive automaton.
main:
p2 main p3 b p3
a ma
in
p1 b p4 main p5 a p5 main p
8

19.4 Recursive automata and pushdown automata

The definition of acceptance of a word by a recursive automaton employs a stack , where the target state
gets pushed on a call-transition, and gets popped when the called module returns. An alternate way (and
classical) way of defining automata models for context-free languages directly uses a stack. A pushdown
automaton (PDA) is a non-deterministic automaton with a finite set of control states, and where transitions
are allowed to push and pop letters from a finite alphabet Γ (Γ is fixed, of course) onto the stack. It should
be clear that a recursive automaton can be easily simulated by a pushdown automaton (we simply take the
→ q 0 with an explicit push-
m
union of all states of the recursive automaton, and replace call transitions q −
transition that pushes q 0 onto the stack and explicit pop transitions from the final states in Fm to q 0 on
popping q 0 .
It turns out that pushdown automata can be converted to recursive automata (and hence to CFGs) as
well. This is a fact worth knowing! But we will not define pushdown automata formally, nor show this
direction of the proof.

126
Chapter 20

Instructor notes: Recursive

automatas vs. Pushdown automatas

20.1 Instructor Notes

Stacks vs recursion: The class of context-free languages (CFLs) is interesting only because of context-free
grammars. Pushdown automata are not interesting as a first-class computable machine. We think moving
from NFA/DFAs to a model with a stack is highly artificial (why a stack? why not a queue? why not two
stacks?). If we didn’t have context-free grammars or parsing in mind, we would skip CFLs entirely, and cover
only finite automata and Turing machines. Hence, it’s a valid question to ask whether PDAs are the right
machine model for CFLs. We claim not!
The primary objection to PDAs is that it makes you think of a finite-state program with a stack as a data
structure. When we think of designing a PDA we think of how we can use the stack wisely to achieve our
goal. A context-free grammar is more tied to recursion than to using a stack data-structure. The variables
in a CFG are recursively defined using other variables. When we design a CFG for a language, we think
recursively.
The idea behind recursive automata (RA) introduced here is that they bring about recursive thinking
as opposed to thinking using a stack. A recursive automaton is simply a program with finite memory and
recursive procedure calls.
Having an automaton model that corresponds to recursive programs with finite memory is very intuitive,
and helpful for students. They can think of solving things recursively; this is always useful, and may have
benefits for other courses (like 473) too. We doubt dealing with a stack explicitly brings anything conceptually
interesting.
Example 3 above (L containing all words with an equal number of a’s and b’s) is a good example.
The PDA one may come up for it is perhaps quite different from the recursive automaton. The recursive
automaton is more the way you’d come up with a grammar:

S → aSbS | bSaS | .

The PDA we come up with for this language explicitly thinks of using the stack to do counting: the excess
number of letters of one kind as opposed to the other is stored in the stack. In other words, we are using the
stack as a “counter”. This example shows clearly the difference between PDA-thinking and RA-thinking...
we believe that thinking recursively of the language is a lot more useful and typical in computer science.
Anyway, if we use recursive automata, we probably are not teaching the idea behind using the stack to
count— but the point is “who cares?”
In summary, going away from using an explicit pushdown stack, and relying instead on a recursion idiom
is natural, beautiful, and pleasing!
Finally, I think this model keeps things easy and natural to remember: regular languages are accepted by
programs with finite memory; CFLs are accepted by recursive programs with finite memory; and decidable
languages are accepted by Turing machines with infinite memory.

127
CFGs to RA and back: The conversions from CFGs to RAs is very simple (and visual!). In contrast,
converting CFGs to PDAs is mildly harder and way less visual (we usually define an extension of PDAs that
can push words in reverse to a stack, and have the guess a left-most derivation).
The conversion from RAs to CFGs is a lot easier than from PDAs to CFGs. To convert PDAs to CFGs, one
must convert the PDA to a PDA that accepts with empty stack, and then argue that any run of the PDA
can be “summarized” using what happens between a push and a pop of the same symbol, and build a CFG
that captures this idea. When going from a recursive automaton to a CFG, it is more natural to divide a run
into the part that is read by a module and the suffix read after returning from the module.

Closure properties: Turning to closure properties: given two programs, clearly one can create the au-
tomaton for the union by building a new initial module that calls either of them nondeterministically. Closure
under concatenation and Kleene-* are also easy. We can quickly sketch these closure properties in about
10 minutes (we will do them for CFGs anyway). The intuition behind writing programs will make this very
accessible to students.
Closure under intersection does not work as we cannot take the product of the two recursive automata—
for if one wants to call a module and the other does not, then there is no way to simulate both of them.
An aside: in visibly pushdown languages, the recursive automata synchronize on calls and returns, and
hence the class is closed under intersection. In fact, it’s closed under complement as well, and determinizable.
Turning to closure on intersection with a regular language, this is easy. We just do the product con-
struction... and at a call, guessing the return values of the finite automaton. We will do this using CFGs as
well.

Non-context-freeness: When we have later shown {an bn cn | n ∈ N} is not context-free (using pumping
lemmas for CFLs), we think it is nice to go back and say that this means there is no finite memory recursive
program to generate an bn cn , which I think is interesting and perhaps more appreciable.

CFLs are decidable: The negative aspect that recursive automata are not clearly simulated by (determin-
istic) Turing machines exists, as it used to exist with pushdown automata. The way out is still by showing
membership in CFGs is decidable using a conversion to CNF or by showing the CYK algorithm. This has
little to do with either automaton model.

Applications of CFLs: There are three main applications of CFLs we see: the first is parsing languages
(like programming languages), the second is parsing natural languages, and the third is static program
analysis.
Parsing programming languages and the like is done using an abstraction of pushdown automata, with
a million variants and lookahead distances and what not. The recursive automaton definition does yield a
pushdown automaton definition which we will talk about, and hence the students will be able to understand
these algorithms. However, they will be a tad unfamiliar with them.
In parsing natural languages, Julia provided us with papers where they build automata for grammars
almost exactly the way we do. In particular, there are many models with NFAs calling each other. And they
argue this is a more natural representation of a machine for a grammar (and often use it to characterize
languages that admit a faster membership problem).
Turning to program analysis, statically analyzing a program (for data-flows, control-flows, etc.) often in-
volves abstracting the variables into a finite set of data-facts and looking at the recursive program accurately.
The natural model one obtains (for flow-sensitive context-sensitive analysis) is a recursive automaton! In
fact, the recursive automata we have here are motivated by the class of “recursive state machines” studied
in the verification literature and used to model programs with finite memory.

Foreseeable objections:

• PDAs are part of tradition

We agree PDAs are something we expect students to know when they graduate. But if we think it’s the
wrong model, then it’s time to change the tradition. Students will know of the pushdown automaton

128
model anyway. They wouldn’t have seen a formal conversion from PDAs to CFGs. Which, arguably,
they wouldn’t remember even if they have seen them.
Instead, now, they would know that CFGs are simply languages accepted by recursive programs with
finite memory. Which is a natural characterization worthy of knowledge.
• PDAs being far away from CFGs is a good thing
We agree; showing a model further away from CFGs is same as CFGs is a more interesting result.
However, the argument to teaching PDAs holds only if it is a natural or useful model in its own right.
There’s very little evidence to claim PDAs are better than recursive automata.
• Reasoning with an explicit stack data structure is mental weightlifting.
We don’t think the course is well-served by introducing weight-lifting exercises. The students, frankly,
lift too much weight already, given their level of maturity. Let’s teach them beautiful and natural
things; not geeky, useless, hard things.
• Don’t the upper level classes needs PDAs?
We have asked most people (Elsa, Sam, Margaret, etc.). The unanimous opinion we hear is that while
seeing some machine model for CFLs is good, pushdown automata are not crucial.
• We must carefully consider making changes to such a fundamental topic:
We think we have paid careful attention, and by seeking comments, we would have covered most
opinions. The problem is that this is not a small change. It’s a big change that can happen only if we
do it now... if we postpone it, it will never happen!

• How come no one else thinks this way?

PDAs and much of what we teach comes from the Hopcroft-Ullman textbook. People haven’t seriously
thought of changing the contents. Moreover, there are very few automata researchers in the US (Sipser
isn’t an automata-theory person).

• Why do you care so much?

Good question! We don’t really know. Maybe because CHANGE is in the air. Maybe because recursion
seems better than stacks.

129
Chapter 21

Lecture 17: Computability and

Turing Machines
13 March 2008

The Electric Monk was a labor-saving device, like a dishwasher or a video recorder. Dishwashers washed tedious
dishes for you, thus saving you the bother of washing them yourself, video recorders watched tedious television
for you, thus saving you the bother of looking at it yourself; Electric Monks believed things for you, thus saving
you what was becoming an increasingly onerous task, that of believing all the things the world expected you to
believe.
– Dirk Gently’s Holistic Detective Agency, Douglas Adams.

This lecture covers the beginning of section 3.1 from Sipser.

21.1 Computability
For the alphabet
Σ = {0, 1, . . . , 9, +, −} ,
consider the language
 
 ai , bj , ck ∈ [0, 9] and 

 

han an−1 . . . a0 i
L = an an−1 . . . a0 + bm bm−1 . . . b0 = cr cr−1 . . . c0 ,
 + hbm bm−1 . . . b0 i 

 

= hcr cr−1 . . . c0 i
Pn
where han an−1 . . . a0 i = i=0 ai · 10i is the number represented in base ten by the string an an−1 . . . a0 . We
are interested in the question of whether or not a given string belongs to this language. This is an example
of a decision problem (where the output is either yes or no), which is easy in this specific case, but clearly
too hard for a PDA to solve it1 .
Usually, we are interested in algorithms that compute something for their input and output the results.
For example, given the strings an an−1 . . . a0 and bm bm−1 . . . b0 (i.e., two numbers) we want to compute the
string representing their sum.
Here is another example for such a decision algorithm: Given a quadratic equation ax2 + bx + c = 0,
we would like to find the roots of this equation. Namely, two numbers r1 , r2 such that ax2 + bx + c =
a(x − r1 )(x − r2 ) = 0. Thus, given numbers a, b and c, the algorithm should output the numbers r1 and r2 .
To see how subtle this innocent question can be, consider the question of computing the roots of a
polynomial of degree 5. That is, given an equation

ax5 + bx4 + cx3 + dx2 + ex + f = 0,

1 We use the world clearly here to indicate that the fact that this language is not context-free can be formally proven, but it
is tedious and not the point of the discussion. The interested reader can try and prove this using the pumping lemma for CFGs.

130
can we compute the values of x for which is equation holds? Interestingly, if we limit our algorithm to use
√ √
only the standard operators on numbers +, −, ∗, /, , k then no such algorithm exists.2
In the final part of this course, we will look at the question of what (formally) is a computation? Or, in
other words, what is (what we consider to be) a computer or an algorithm? A precise model for computation
will allow us to prove that computers can solve certain problems but not others.

21.1.1 History
Early in this century, mathematicians (e.g. David Hilbert) thought that it might be possible to build formal
algorithms that could decide whether any mathematical statement was true or false. For obvious reasons,
there was great interest in whether this could really be done. In particular, he took upon himself the
project of trying to formalize the known mathematics at the time. Gödel showed in 1929 that the project
(of explicitly describing all of mathematics) is hopeless and there is no finite description of mathematical
models.
In 1936, Alonzo Church and Alan Turing independently showed that this goal was impossible. In his
paper, Alan Turing introduced the Turing machine (described below). Alonzo Church introduced the λ-
calculus, which formed the starting point for the development of a number of functional programming
languages and also formal models of meaning in natural languages. Since then, these two models and some
others (e.g. recursion theory) have been shown to be equivalent.
This has led to the Church-Turing Hypothesis.

Church-Turing Hypothesis: All reasonable models of (general-purpose) computers are equiv-

alent. In particular, they are equivalent to a Turing machine.

This is not something you could actually prove is true (what is reasonable in the above statement, for
example?). It could be proved false if someone found another model of computation that could solve more
problems than a Turing machine, but no one has done this yet. Notice that we are ignoring how fast the
computation can be done: it is certainly possible to improve on the speed of a Turing machine (in fact,
every Turing machine can be speeded up by making it more complicated). We are only interested in what
problems the machines can or can not solve.

21.2 Turing machines

21.2.1 Turing machines at a high level
So far, we have seen two simple models of computation:

• DFA/NFA: finite control, no extra memory, and

• Recursive automatas/PDA: finite control, unbounded stack.

Both types of machines read their input left-to-right. They halt exactly when the input is exhausted. Turing
machines are like a RA/PDA, in that they have a finite control and an unbounded one dimensional memory
tape (i.e., stack). However, a Turing machine is different in the following ways.

(A) The input is delivered on the memory tape (not in a separate stream).

(B) The machine head can move freely back and forth, reading and writing on the tape in any pattern.

Notice condition (C) in particular. A Turing machine can read through its input several times, or it might
halt without reading the whole input (e.g. the language of all strings that start with ab can be recognized
by just reading two letters).
2 This is the main result of Evariste Galois that died at the age of 20(!) in a duel. Niels Henrik Abel (which also died
relatively young) proved this slightly before Galois, but Galois work lead to a more general theory.

131
Figure 21.1: Comic by Geoff Draper.

Moving back and forth along the tape allows a Turing machine to (somewhat slowly) simulate random
access to memory. Surprisingly, this very simple machine can simulate all the features of “regular” computers.
Here equivalent is meant only in the sense that whatever a regular computer can compute, so can a Turing
machine compute. Of course, Turing machines do not have graphics/sound cards, internet connection and
they are generally considered to be an inferior platform for computer games. Nevertheless, computationally,
TMs can compute whatever a “regular” computer can compute.

21.2.2 Turing Machine in detail

Specifically, a Turing machine (TM) has a finite control and an infinite tape. In this class, our basic
model will have a tape that is infinite only in one direction. A Turing machine starts up with the input
string written at the start of the tape. The rest of the tape is filled with a special blank character (i.e., ‘␣’).
Initially, the head is located at the first tape position. Thus, the initial configuration of a Turing machine
for the input shalom is as follows.

Tape: ~s h a l o m ␣ ␣ ␣ ...
w
w
The read/write head

Each step of the Turing machine first reads the symbol on the cell of the tape under the head. Depending
on the symbol and the current state of the controller, it then

• (optionally) writes a new symbol at the current tape position,

• moves either left or right, and

• (optionally) changes to a new state.

For example, the following transition is taken if the controller is in state q and the symbol under the read
head is b. It replaces the b with the character c and then moves right, switching the controller to the state
r.

132
b → c, R
q r
Note, that Turing machines are deterministic. That is, once you know the state of the controller and
which symbol is under the read/write head, there is exactly one choice for what the machine can (and must)
do.
The controller has two special states qacc and qrej . When the machine enters one of these states, it halts.
It either accepts or rejects, depending on which of the two it entered.

Note 21.2.1 If the Turing machine is at the start of the tape and tries to move left, it simply stays put on
the start position. This is not the only reasonable way to handle this case.

Note 21.2.2 Nothing guarantees that a Turing machine will eventually halt (i.e., stop). Like your favorite
Java program, it can get stuck in an infinite loop3 . This will have important consequences later, when we
show that deciding if a program halts or not is in fact a task that computers can not solve.

Remark 21.2.3 Some authors define Turing machines to have a doubly-infinite tape. This does not change
what the Turing machine can compute. There are many small variations on Turing machines which do not
change the power of the machine. Later, we will see a few sample variations and how to prove they are
equivalent to our basic model. The robustness of this model to minor changes in features is yet another
reason computer scientists believe the Church-Turing hypothesis.

21.2.3 Turing machine examples

The language w$w
For Σ = {a, b, $}, consider the language
n o

L = w$w w ∈ Σ∗ ,

which is not context-free. So, let describe a TM that accepts this language.
One algorithm for recognizing L works as follows. It first
1. Cross off the first character a or b in the input (i.e. replace it with x, where x is some special character))
and remember what it was (by encoding the character in the current state). Let u denote this character.
2. Move right until we see a $.
3. Read across any x’s.
4. Read the character (not x) on the tape. If this character is different from u, then it immediately rejects.
5. Cross off this character, and replace it by x.
6. Move left past the $ and then keep going until we see an x on the tape.
7. Move one position right and go back to the first step.
We repeat this until the first step can not find any more a’s and b’s to cross off.
Figure 21.2 depicts the resulting TM. Observe, that for the sake of simplicity of exposition, we did not
include the state qrej in the diagram. In particular, all missing transitions in the diagram are transitions
that go into the reject state.
Notice that we did not include the reject state in the diagram, because it is already too messy. If there
is no transition shown, we will assume that one goes into the reject state.

Note 21.2.4 For most algorithms, the Turing machine code is complicated and tedious to write out explic-
itly. In particular, it is not reasonable to write it out as a state diagram or a transition function. This only
works for the relatively simple examples, like the ones shown here. In particular, its important to be able
to describe a TM in high level in pseudo-code, but yet be able to translate it into the nitty-gritty details if
necessary.
3 Or just get stuck inside of Mobile with the Memphis blues again...

133
b → b, R
a → a, R $ → $, R
q1 q2 x → x, R

a → x, R b → b, L
a → a, L a → x, L

q0 q3 q4 x → x, L
x → x, R $ → $, L

b → x, L
$ → x, R

b→
x,
R
$ → $, R
q6 q7

q5 x → x, R
b → b, R x → x, R
␣ → ␣, R a → a, R

qacc

Figure 21.2: A TM for the language w$w.

Mark start position by shifting

Let Σ = {a, b}. Write a Turing machine that puts a special character x at the start of the tape, shifting the
input over one position, then accepting the input.
Accepting or rejecting is not the point of this machine. Rather, marking the start of the input is a useful
component for creating more complex algorithms. So you had normally see this machine used as part of a
larger machine, and the larger machine would do the accepting or rejecting.

21.2.4 Formal definition of a Turing machine

A Turing machine is a 7-tuple
(Q, Σ, Γ, δ, q0 , qacc , qrej ) ,
where
• Q: finite set of states.
• Σ: finite input alphabet.
• Γ: finite tape alphabet.
• δ : Q × Γ → Q × Γ × {L, R}: Transition function.
As a concrete example, if δ(q, c) = (q 0 , c0 , L) means that, that if the TM is at state q, and the head on
the tape reads the character c, then it should move to state q 0 , replace c on the tape by c0 , and move
the head on the tape to the left.
• q0 ∈ Q is the initial state.

134
• qacc ∈ Q is the accepting /final state.
• qrej ∈ Q is the rejecting state.
This definition assumes that we’ve already defined a special blank character. In Sipser, the blank is
written t or ␣. A popular alternative is B. (If you use any other symbol for blank, you should write a note
explaining what it is.)
The special blank character (i.e., ␣) is in the tape alphabet but it is not in the input alphabet.

Example
For the TM of Figure 21.2, we have the following M = (Q, Σ, Γ, δ, q0 , qacc , qrej ), where
(i) Q = {q0 , q1 , q2 , q3 , q4 , q5 , q6 , q7 , qacc , qrej }.

(ii) Σ = {a, b, $}.

(iii) Γ = {a, b, $, ␣, x}.
(iv) δ : Q × Γ → Q × Γ × {L, R}.
a b $ ␣ x
q0 (q1 , x, R) (q6 , x, R) (q5 , x, R) reject reject
q1 (q1 , a, R) (q1 , b, R) (q2 , $, R) reject reject
q2 (q4 , x, L) reject reject reject (q2 , x, R)
q3 (q3 , a, L) (q3 , b, L) reject reject (q0 , x, R)
q4 reject reject (q3 , $, L) reject (q4 , x, L)
q5 reject reject reject (qacc , ␣, R) (q5 , x, R)
q6 (q6 , a, R) (q6 , b, R) (q7 , $, R) reject reject
q7 reject (q4 , x, L) reject reject (q7 , x, R)
qacc No need to define
qrej No need to define
Here, reject stands for (qrej , x, R).
(Filling this table was fun, fun, fun!)

135
Chapter 22

Lecture 18: More on Turing Machines

31 March 2009

This lecture covers the formal definition of a Turing machine and related concepts such as configuration
and Turing decidable. It surveys a range of variant forms of Turing machines and shows for one of them
(multi-tape) why it is equivalent to the basic model.

22.1 A Turing machine

A Turing machine is a 7-tuple
(Q, Σ, Γ, δ, q0 , qacc , qrej ) ,
where
• Q: finite set of states.
• Σ: finite input alphabet.
• Γ: finite tape alphabet.
• δ : Q × Γ → Q × Γ × {L, R}.
• q0 ∈ Q is the initial state.
• qacc ∈ Q is the accepting /final state.
• qrej ∈ Q is the rejecting state.
TM has a working space (i.e., tape) and its deterministic. It has a reading/writing head that can travel
back and forth along the tape and rewrite the content on the tape. TM halts immediately when it enters
the accept state (i.e., qacc ) and then it accepts the input, or when the TM enters the reject state (i.e., qrej ),
and then it rejects the input.

Example 22.1.1 Here we describe a TM that takes it input on the tape, shifts it to the right by one
character, and put a $ on the leftmost position on the tape.
So, let Σ = {a, b} (but the machine we describe would work for any alphabet). Let
n o

Q = {q0 , qacc , qrej } ∪ qc c ∈ Σ .

Now, the transitions function is

∀s ∈ Σ δ(q0 , s) = (qs , $, R)
∀s, t ∈ Σ δ(qs , t) = (qt , s, R)
∀s ∈ Σ δ(qs , ␣) = (qacc , $, R) .
δ(q0 , ␣) = (qacc , $, R)

136
␣ → $, R

a → a, R

a → $, R ␣ → a, R
q0 qa qacc
b→

a → b, R
b → a, R
$,
R R
b,
␣→

qb qrej

b → b, R
Figure 22.1: A TM that shifts its input right by one position, and inserts $ in the beginning of the tape.

The resulting machine is depicted in Figure 22.1, and here its pseudo-code:
Shift_Tape_Right
At first tape position,
remember character and write $
At later positions,
remember character on tape,
and write previously remembered character.
On blank, write remembered character and halt accepting.

22.2 Turing machine configurations

Consider a TM where the tape looks as follows,

Tape: α b
~ β ␣ ␣ ␣ ...
w
w
The read/write head

and the current control state of the TM is qi . In this case, it would be convenient to write the TM config-
uration as
αqi bβ.
Namely, imagine that the head is just to the left of the cell its reading/writing, and bβ is the string to the
right of the head.
As such, the start configuration, with a word w is

Tape: ~ w ␣ ␣ ␣ ...
w
w
The read/write head

And this configuration is just q0 w.

An accepting configuration for a TM is any configuration of the form αqacc β.

137
We can now describe a transition of the TM using this configuration notation. Indeed, imagine the given
TM is in a configuration αqi aβ and its transition is

δ(qi , a) = (qj , c, R) ,

then the resulting configuration is αcqj β. We will write the resulting transition as

αqi aβ ⇒ αcqj β.

Similarly, if the given TM is in a configuration

γ d qk e τ,

where γ and τ are two strings, and d, e ∈ Σ. Assume the TM transition in this case is

δ(qk , e) = (qm , f, L) ,

then the resulting configuration is γ qm d f τ . We will write this transition as

γ d q e τ ⇒ γ qm d f τ .
| {zk } | {z }
c c0

In this case, we will say that c yields c0 , we will use the notation c 7→ c0 .
As we seen before, the ends of tape are special, as follows:
• You can not move off the tape from the left side. If the head is instructed to move to the left, it just
stays where it is.
• The tape is padded on the right side with spaces (i.e., ␣). Namely, you can think about the tape as
initially as being full with spaces (spaced out?), except for the input that is written on the beginning
of the tape.

22.3 The languages recognized by Turing machines

Definition 22.3.1 For a TM M and a string w, the Turing machine M accepts w if there is a sequence of
configurations
C1 , C2 , . . . , Ck ,
such that
(i) C1 = q0 w, where q0 is the start state of M ,
(ii) for all i, we have Ci yields Ci+1 (using M transition function, naturally), and
(iii) Ck is an accepting configuration.

Definition 22.3.2 The language of a TM M (i.e., Turing machine M ) is

n o

L(M ) = w M accepts w .

The language L is called Turing recognizable.

Note, that if w ∈ L(M ) then M halts on w and accepts it. On the other hand, if w ∈ / L(M ) then either
M halts and rejects w, or M loops forever on the input w. Specifically, for an input w a TM can either:
(a) accept (and then it halts),
(b) reject (and then it halts),
(c) or be in an infinite loop.

138
Definition 22.3.3 A TM that halts on all inputs is called a decider .
As such, a language L is Turing decidable if there is a decider TM M , such that L(M ) = L.

The hierarchy of languages looks as follows:

Regular

Context free grammar

Turing decidable
Turing recognizable
Not Turing recognizable.

22.4 Variations on Turing Machines

There are many variations on the definition of a Turing machine which do not change the languages that
can be recognized. Well-known variations include doubly-infinite tapes, a stay-put option, non-determinism,
and multiple tapes. Turing machines can also be built with very small alphabets by encoding symbol names
in unary or binary.

22.4.1 Doubly infinite tape

What if we allow the Turing machine to have an infinite tape on both sides? It turns out the resulting
machine is not stronger than the original machine. To see that, we will show that a doubly infinite tape TM
can be simulated on the standard TM.
So, consider a TM M that uses a doubly infinite tape. We will simulate this machine by a standard TM.
Indeed, fold the tape of M over itself, such that location i ∈ [−∞, ∞] is mapped to location

2|i| i≤0
h(i) =
2i − 1 i > 0.

on the usual tape. Clearly, now the doubly infinite tape becomes the usual one-sided infinite tape, and we
can easily simulate the original machine on this new machine. Indeed, as long as we are far from the folding
point on the tape, all we need to do is to just move in jumps of two (i.e., move L is mapped into move LL).
Now, if we reach the beginning of the tape, we need to change between odd location and even location, but
that’s also easy to do with a bit of care. We omit the easy but tedious details.
Another approach would be to keep the working part of the doubly-infinite tape in its original order.
When the machine tries to move off the lefthand end, push everything to the right to make more space.

22.4.2 Allow the head to stay in the same place

Allowing the read/write head to stay in the same place is clearly not a significant extension, since we can
easily simulate this ability by moving the head to the right, and then moving it back to the left. Formally,
we allow transitions to be of the form
δ(q, c) = (q 0 , d, S),
where S denotes the command for the read/write head to stay where it is (rewriting the character on the
tape from c to d).

22.4.3 Non-determinism
This does not buy you anything, but the details are not trivial, and we will delay the discussion of this issue
to later.

139
22.4.4 Multi-tape
Consider a TM that has k tapes, where k > 1 is a some finite integer constant. Here each tape has its own
read/write head, but there is only one finite control. The transition function of this machine, is a function
k
δ : Q × Γk → Q × Γk × {L, R, S} ,
and the initial input is placed on the first tape.

22.5 Multiple tapes do not add any power

We next prove that one of these variations (multi-tape) is equivalent to a standard Turing machine. Proofs
for most other variations are similar.

Claim 22.5.1 A multi-tape TM N can be simulated by a standard TM.

Proof: We will build a standard (single tape) TM simulating N .
Initially, the input w is written on the (only) tape of M . We rewrite the tape so that it contains k strings,
each string matches the content of one of the tapes of N . Thus, the rewriting of the input, would result in
a tape that looks like the following:
$w $ ␣ $ ␣ . . . $ ␣ $.
| {z }
k−1times

The string between the ith and (i + 1)th $ in this string, is going to be the content of the ith tape. We
need to keep track on each of these tapes where the head is supposed to be. To this end, we create for each
•
character a ∈ Γ, we create a dotted version, for example a . Thus, if the initial input w = xw0 , where x is
a character, the new rewritten tape, would look like:
• • • •
$xw0 $ ␣ $ ␣ . . . $ ␣ $.
| {z }
k−1times

This way, we can keep track of the head location in each one of the tapes.
For each move of N , we go back on M to the beginning of the tape and scan the tape from left to right,
reading all the dotted characters and store them (encoding them in the current state), once we did that, we
know which transition of N needs to be executed:
0
qhc1 ,...,ck i → qhd1 ,D1 ,d2 ,D2 ,...,dk ,Dk i
,
where Di ∈ {L, R, S} is the instruction where the ith head must move. To implement this transition, we
scan the tape from left to right (first moving the head to the start of the tape), and when we encounter
the ith dotted character ci , we replace it by (the undotted) di , and we move the head as instructed by Di ,
by rewriting the relevant character (immidiately near the head location) by its dotted version. After doing
that, we continue the scan to the right, to perform the operation for the remaining i + 1, . . . , k tapes.
•
After completing this process, we might have $ on the tape (i.e., the relevant head is located on the end
of the space allocated to its tape). We use the Shift_Tape_Right algorithm we describe above, to create
space to the left of such a dotted dollar, and write in the newly created spot a dotted space. Thus, if the
tape locally looked like
•
. . . ab $ c . . .
then after the shifting right and dotting the space, the new tape would look like
•
. . . ab ␣ $c . . .
By doing this shift-right operation to all the dotted $’s, we end up with a new tape that is guaranteed to
have enough space if we decide to write new characters to any of the k tapes of N .
Its easy to now verify that we can now simulate N on this Turing machine M , which uses a single tape.
In particular, any language that N recognizes is also recognized by M , which is a standard TM, establishing
the claim.

140
Chapter 23

Lecture 19: Encoding problems and

decidability
7 April 2009

This lecture presents examples of languages that are Turing decidable.

23.1 Review and context

Remember that a Turing machine D can do three sorts of things on an input w. The TM D might halt and
accept. It might halt and reject. Or it might never halt. A TM is a decider if it always halts on all inputs.
A TM recognizable language is a language L for which there is a TM D, such that L(D) = L. A TM
decidable language is a language L for which there is a decider TM D, such that L(D) = L.
Here is a figure showing the hierarchy of languages.

Regular

Context free grammar

Turing decidable
Turing recognizable
Not Turing recognizable.
Conceptually, when we think about algorithms in computer science, we are normally interested in code
which is guaranteed to halt on all inputs. So, for questions about languages, our primary interest is in Turing
decidable (not just recognizable) languages.
Any algorithmic task can be converted into decision problem about languages. Some tasks are naturally
in this form, e.g. “Is the length of this string prime?”. In other cases, we have to restructure the question in
one of two ways:
• A complex input object (e.g. a graph) may need to be encoded as a string.
• A construction task may have to be rephrased as a yes/no question.

23.2 TM example: Adding two numbers

23.2.1 A simple decision problem
For example, consider the task of adding two decimal numbers. The obvious algorithm might take two
numbers a and b as input, and produce a number c as output. We can rephrase this as a question about
languages by asking “Given inputs a, b, and c, is c = a + b”.

141
For the alphabet
Σ = {0, 1, . . . , 9, +, −} ,
consider the language
 
 ai , bj , ck ∈ [0, 9] and 

 

han an−1 . . . a0 i
L = an an−1 . . . a0 + bm bm−1 . . . b0 = cr cr−1 . . . c0 ,
 + hbm bm−1 . . . b0 i 

 

= hcr cr−1 . . . c0 i
Pn
where han an−1 . . . a0 i = i=0 ai · 10i is the number represented in base ten by the string an an−1 . . . a0 .
We then ask whether we can build a TM which decides the language L.

23.2.2 A decider for addition

To build a decider for this addition problem, we will use a multi-tape TM. We showed (last class) that a
multi-tape TM is equivalent to a single tape TM. First, let us build a useful helper function, which reverses
the contents of one tape.

Reversing a tape

Given the content of tape 1 , we can reverse it easily in two steps using a temporary tape. First, we put a
marker onto the temporary tape. Moving the heads on both tapes to the right, we copy the contents of 1

onto the temporary tape.

Next, we put the

1 head at the start of its tape, but the temporary tape head remains at the end of
this tape. We copy the material back onto

1 , but in reverse order, moving the

1 head rightwards and

the temporary tape head leftwards.
Let ReverseTape(t) denote the TM mechanism (i.e. procedure) that reverses the tth tape. We are
going to buildup TM by putting together such procedures.

Adding two numbers

Now, let us assemble the addition algorithm. We will use five tapes: the input ( 1 ), three tapes to hold

numbers ( 2 , 3 , and 4 ), and a scratch tape used for the reversal operation.
The TM will first scan the input tape (i.e.,

1 ), and copy the first number to

2 , and the second

number to
3 . Next, we do ReverseTape(2) and ReverseTape(3). Now, we move the head of 2 and

3 to the beginning of the tapes, and we start moving them together computing the sum of the digits under
the two heads, writing the output to

4 , and moving the three heads to the right. Naturally, we have a

carry over digit, which we encode in the current state of the TM controller (the carry over digit is either 0, 1
or 2).

If one of the heads of 2 or 3 reaches the end of the tape, then we continue moving it, interpreting ␣
as a 0. We halt when the heads on both tapes see ␣.

Next, we move the head of 4 back to the beginning of the tape, and do ReverseTape(4). Finally, we

compare the content of 4 with the number written on 1 after the = character. If they are equal, the TM
accepts, otherwise it rejects.

23.3 Encoding a graph problem

As the above example demonstrates, the coding scheme used for the input has big impact on the complexity
of our algorithm. The addition algorithm would have been easier if the numbers were written in reverse order,
or if they had been in binary. Such details may affect the running time of the algorithm, but they do not
change whether the problem is Turing decidable or not.
When algorithms operate on objects that are not strings, these objects need to be encoded into strings
before we can make the algorithm into a decision problem. For example, consider the following situation.

142
5
Graph encoding
5
7
(1,2)
1 2 3 (2,3)
(3,5)
(5,1)
(3,4)
(4,3)
4 (4,2)

Figure 23.1: A graph encoded as text. The string encoding the graph is in fact
“5hNLi7hNLi(1,2)hNLi(2,3)hNLi(3,5)hNLi(5,1)hNLi(3,4)hNLi(4,3)hNLi(4,2)”. Here hNLi denotes
the spacial new-line character.

We are given a directed graph G = (V, E), and two vertices s, t ∈ V , and we would like to decide if there
is a way to reach t from s.
All sorts of encodings are possible. But it is easiest to understand if we use encodings that look like
standard ASCII file, of the sort you might use as input to your Java or C++ program. ASCII files look like
they are two-dimensional. But remember that they are actually one-dimensional strings inside the computer.
Line breaks display in a special way, but they are underlyingly just a special separator character (<NL> on
a unix system), very similar to the $ or # that we’ve used to subdivide items in our string examples.
To make things easy, we will number the vertices of V from 1 to n = |V |. To specify that there is an edge
between two vertices u and v, we then specify the two indices of u and v. We will use the notation (u, v).
Thus, to specify a graph as a text file, we could use the following format, where n is the number of vertices
and m is the number of edges in the graph.

n
m
(n1 , n01 )
(n2 , n02 )
..
.
(nm , n0m )

Namely, the first line of the file, will contain the number (written explicitly using ASCII), next the second
line is the number of edges of G (i.e., m). Then, every line specify one edge of the graph, by specifying the
two numbers that are the vertices of the edge. As a concrete example, consider the following graph.
The number of edges is a bit redundant, because we could just stop reading at the end of the file. But
it is convenient for algorithm design.
See Figure 23.1, for an example of a graph its encoding using these scheme.

23.4 Algorithm for graph reachability

To encode an instance of the s, t-reachability problem, our ASCII file will need to contain not only the
graph but also the vertices s and t. The input tape for our TM would contain all this information, laid out
in 1D (i.e. imagine the line break displayed as an ordinary separator character).

143
To solve this problem, we will need to search the graph, starting with node s. The TM accepts iff this
search finds the node t. We will store information on four TM tapes, in addition to the input tape. The TM
would have the following tapes:

1 : Input tape

2 : Target node t.

3 : Edge list.

4 : Done list: list of nodes that we’ve finished processing

5 : To-do list: list of nodes whose outgoing edges have not been followed

Given the graph, the TM reads the graph (checking that the input is in the right format). It puts the

list of edges onto tape 3 , puts t onto its own tape (i.e., 2 ), and puts the node s onto the to-do list tape

(i.e., 5 ).
Next, the TM loops. In each iteration, it removes the first node x from the to-do list. If x = t, the TM

halts and accepts. Otherwise, x is added to the done list (i.e., 4 ). Then the TM searches the Edge list for
all edges going outwards from x. Suppose an outgoing edge goes from x to y. Then if y is not already on
the finished list or the to-do list, then y is added to the to-do list.
If there is nothing left on on the to-do list, the TM halts and rejects.
This algorithm is a graph search algorithm. It is breadth-first search if the new nodes are added to the
end of the to-do list and depth-first search if they are added in the start of the list. (Or, said another way,
the to-do list operates as either a queue or a stack.)
The separate visited list is necessary to prevent the algorithm from going into an infinite loop if the graph
contains cycles.

23.5 Some decidable DFA problems

If D is a DFA, the string encoding of D is written as hDi.
The string encoding of a DFA is similar to the encoding of a directed graph except that our encoding has
to have a label for each edge, specify the start state, and list the final states.

Emptiness of DFA. Consider the language

n o

EDFA = hDi D is a DFA, and L(D) = ∅ .

This language is decidable. Namely, given an instance hDi, there is a TM that reads hDi, this TM always
stops, and accepts if and only if L(D) is empty. Indeed, do a graph search on the DFA (as above) starting at
the start state of D, and check whether any of the final states is reachable. If so, the L(D) 6= ∅.

Lemma 23.5.1 The language EDFA is decidable.

Emptiness of NFA. Consider the following language

n o

ENFA = hDi D is a NFA, and L(D) = ∅ .

This language is decidable. Indeed, convert the given NFA into a DFA (as done in class, long time ago)
and then call the code for EDFA on the encoded DFA. Notice that the first step in this algorithm takes the
encoded version of D and writes the encoding for the corresponding DFA. You can imagine this as taking a
state diagram as input and producing a new state diagram as output.

144
Equal languages for DFAs. Consider the language
n o

EQDFA = hD, Ci D and C are NFAs, and L(D) = L(C) .

This language is also decidable. Remember that the symmetric difference of two sets X and Y is
X ⊕ Y = (X ∩ Y ) ∪ (Y ∩ X). The set X ⊕ Y is empty if and only if the two sets are equal. But, given a
DFA, we know how to make a DFA recognizing the complement of its language. And we also know how to
take two DFA’s and make a DFA recognizing the union or intersection of their languages.
So, given the encodings for D and C, our TM will construct the encoding of a DFA hBi recognizing the
symmetric difference of their languages. Then it would call the code for deciding if hBi ∈ EDFA .

Informally, problems involving regular languages are always decidable, because they are so easy to manip-
ulate. Problems involving context-free languages are sometimes decidable. And only the simplest problems
involving Turing machines are decidable.

23.6 The acceptance problem for DFA’s

The following language is also decidable:
n o

ADFA = hD, wi D is a DFA, w is a word, and D accepts w. .

As before, the notation hD, wi is the encoding of the DFA D and the word w; that is, it is the pair hDi and hwi.
For example, if hwi is just w (it’s already a string), then hD, wi might be hDi #w where # is some separator
character. Or it might be (hDi , w). Or anything similar that encodes the input well. We will just assume
that it is in some such reasonable encoding of a pair and that the low-level code for our TM (which we will
not spell out in detail) knows what it is.
A Turing machine deciding ADFA needs to be able to take the code for some arbitrary DFA, plus some
arbitrary string, and decide if that DFA accepts that string. So it will need to contain a general-purpose
DFA simulator. This is called the acceptance problem for DFA’s.
It’s useful to contrast this with a similar-sounding claim. If D is any DFA, then L(D) is Turing-decidable.
Indeed, to build a TM that accepts L(D), we simply move the TM head to the right over the input, using the
TM’s controller to simulate the controller of the DFA directly.
In this case, we are given a specific fixed DFA D and we only need to cook up a TM that recognizes strings
from this one particular language. This is much easier than ADFA .
To decide ADFA , our TM will use five tapes:

1 : input: hD, wi,

2 : state,

3 : final states

4 : transition triples

5 : input string.

The simulator then runs as follows:

(1) Check the format of the input. Copy the start state to tape

2 . Copy the input string to tape

5 .

Copy the transition triples and final states of the input machine hDi to tapes 3 and 4 .

(2) Put the tape

5 head at the beginning of the tape.
→ q (written on tape
4 ) whose input state and character match the state
c
(3) Find a transition triple p −
written on tape
1 (i.e., p) and the character (i.e., c) under the head on tape
5 .

145
(4) Change the current state of the simulated DFA from p to q.
Specifically, copy the state q (written on the triple we just found on
4 ), to tape
2 .
(5) Move the tape
5 head to the right (i.e., the simulation handled this input character).
(6) Goto step (3).

(7) Halt the loop when the tape

5 head sees a blank. Accept if and only if the state on tape
2 is one

of the states on list of final states, stored on tape 3 .

146
Chapter 24

Lecture 20: More decidable problems,

and simulating TM and “real”
computers
9 April 2009

This lecture presents more example of languages that are Turing decidable, from Sipser section 4.1.

24.1 Review: decidability facts for regular languages

A language is decidable if there is a TM that is a decider (i.e., a TM that always stops) that accepts this
language. If D is a DFA, the string encoding of D is written as hDi. The encoding of a pair D and w is written
as hD, wi.

Decidable DFA problems. The following languages are all decidable.

n o

(A) EDFA = hDi D is a DFA and L(D) = ∅ .
This is the language of all DFAs with an empty language.

(B) EQDFA : the language of all pairs of DFAs that have the same language.
n o

(C) ADFA = hD, wi D is a DFA, w is a word, and D accepts w .
Here hD, wi is in the language if and only if the DFA D accepts w.
n o

(D) ANFA = hD, wi D is a NFA generating w .
n o

(E) EQDFA = hD, Ci D, C are DFA’s and L(D) = L(C) .
n o

(F) Aregex = hR, wi R is a regular expression generating w .
To decide this language, the TM can convert R into a DFA D, and then check if hD, wi ∈ ADFA .

24.2 Problems involving context-free languages

The situation with context-free languages is more complicated, because some problems are Turing decidable
and some are not.

147
24.2.1 Context-free languages are TM decidable
Given a RA P, we are interested in the question of whether we can build a TM decider that accepts L(D).
Observe, that we can turn P into an equivalent CFG, and this CFG can be turned into an equivalent CNF
grammar G. With G it is now easy to decide if an input word w is in L(G). Indeed, we can either using the
CYK algorithm to decide if a word is in the grammar, or alternatively, enumerate all possible parse trees for
the given CNF that generates the given word w. That is, if n = |w|, then we need to generate all possible
parse trees with 2n − 1 internal nodes (since this is the size of a parse tree deriving such a word in CNF),
and see if any of them generates w. In either case, we have the following.

Lemma 24.2.1 Given a RA P, there is a TM T which is a decider, and L(P) = L(T). Namely, for every RA
there exists an equivalent TM.

24.2.2 Is a word in a CFG?

The following construction of a TM is somewhat similar to the one in Section 24.2.1.
n o

Lemma 24.2.2 The language ACFG = hG, wi G is a CFG and G generates w . is decidable.

Proof: We build a TM TCFG for ACFG . The input for it is the pair hG, wi. AS a first step, we convert G to
be in CNF (we saw the algorithm of how to do this in detail in class). Let G 0 denote the resulting grammar.
Next, we use CYK to decide if w ∈ L(G 0 ). If it is, the TM TCFG accepts, otherwise it rejects.

Given a TM decider TCFG for ACFG , building a TM decider that has language equal to a specific given G
is easy. Specifically, given G, we would like to build a TM decider T0 such that L(T0 ) = L(G).
So, modify the given TM to encode G. As a first step, the new TM T0 would write G on the input tape
(next to the input word w). Next, it would run the TM TCFG on this given input to decide if hG, wi ∈ ACFG .

Remark 24.2.3 (Encoding instances inside a TM.) The above demonstrates that given a more general
algorithm we can use it to solve the problem for specific instances. This is done by encoding the given specific
instance in the constructed TM.
If you have trouble imaging this encoding the whole CFG grammar into the TM, as done above, think
about storing a short string like UIUC in the TM state diagram, to be written out on (say) tape 2. The first
state transition in the TM would write U onto tape 2, the next three transitions would write I, then U, then
C. Finally, it would move the tape 2 head back to the beginning and transition into the first state that does
the actual computation.
Note, that doing this encoding of a specific instance inside the TM, does not necessarily yields the most
efficient TM for the problem. For example, in the above, we could first convert the given instance into CNF
before encoding it into the TM.
We could also hard-code the string w into our TM but leave the grammar as a variable input. We omit
the proof of the following easy lemma.
n o

Lemma 24.2.4 Let w be a specific string. The language ACFG,w = hGi G is a CFG and G generates w
is decidable.

24.2.3 Is a CFG empty?

n o

Lemma 24.2.5 The language ECFG = hGi G is a CFG and L(G) = ∅ is decidable.

Proof: We already saw that in the conversion algorithm of a CFG into CNF (this was one of the initial
steps of this conversion). We shortly re-sketch the algorithm.
To this end, the TM mark all the variables in G that can generate (in one step) a string of terminals (or
of course). We will refer to such a variable as being useful. Now, the TM iterates repeatedly over the rules
of G. For a rule X → w, where w is a string of terminals and variables, the variable X is useful, if all the

148
variables of w are useful, and in such a case we will mark X as useful. The loop halts when the TM has made
a full pass through the rules of G without marking anything new as useful.
This TM accepts the input grammar if the initial variable of G is useful, and otherwise it rejects.
At every iteration over all the rules of G the TM must designate at least one new variable as new to
repeat this process again. So it follows that the number of outer iterations performed by this algorithm is
bounded by the number of variables in the grammar G‘, implying that this algorithm always terminates.

24.2.4 Undecidable problems for CFGs

We quickly mention a few problems that are not TM decidable. We will prove this fact later in the course.
The following languages are not TM decidable.
n o

(i) EQCFG = hG, G 0 i G, G 0 are CFG and L(G) = L(G 0 ) .
To see why this is surprising, we remind the reader that this language was solvable for DFAs.
n o

(ii) ALLCFG = hGi G is a CFG and L(G) = Σ∗ .

24.3 Simulating a real computer with a Turing machine

We would like to argue that we can simulate a “real” world computer on a Turing machine. Here are some
key program features that we would like to simulate on a TM.

• Numbers & arithmetic: We already saw in previous lecture how some basic integer operations can
be handled. It is not too hard to extend these to negative integers and perform all required numerical
operations if we allow a TM with multiple tapes. As such, we can assume that we can implement any
standard numerical operation.
Of course, can also do floating point operations on a TM. The details are overwhelming but they are
quite doable. In fact, until 20 years1 ago, many computers implemented floating point operations using
integer arithmetic. Hardware implementation of floating point-operations became mainstream, when
Intel introduced the i486 in 1989 that had FPU (floating-point unit). You would probably will see/seen
how floating point arithmetic works in computer architecture courses.
• Stored constant strings: The program we are trying to translate into a TM might have strings and
constants in it. For example, it might check if the input contains the (all important) string UIUC. As we
saw above, we can encode such strings in the states. Initially, on power-up, the TM starts by writing
out such strings, onto a special tape that we use for this purpose.
• Random-access memory: We will use an associative memory. Here, consider the memory as having
a unique label to identify it (i.e., its address), and content. Thus, if cell 17 contains the value abc, we
will consider it as storing the pair (17, abc). We can store the memory on a tape as a list of such pairs.
Thus, the tape might look like:

(17, abc)$(1, samuel)$(85, noclue)$ . . . (11, stamp)$␣␣␣␣␣ . . .

Here, address 17 stores the string abc, address 1 stores the string samuel, and so on.

Reading the value of address x from the tape is easy. Suppose x is written on i , and we would like to

find the value associated with x on the memory tape and write it onto j . To do this, the TM scans

mem the memory tape (i.e., the tape we use to simulate the associative memory) from the beginning,

till the TM encounter a pair in mem having x as its first argument. It then copies the second part of
the pair to the output tape j .

Storing new value (x, y) in memory is almost as easy. If a pair having x as first element exists you
delete it out (by writing a special cross-out character over it), and then you write the new pair (x, y)
in the end of the tape mem .

1 This number keep changing. Very irritating.

149
If you wanted to use memory more efficiently, the new value could be written into the original location,
whenever the original location had enough room. You could also write new pairs into crossed-out
regions, if they have enough room. Implementations of C malloc/free and Java garbage collection use
slightly more sophisticated versions of these ideas. However, TM designers rarely care about efficiency.
• Subroutine calls: To simulate a real program, we need to be able to do calls (and recursive calls).
The standard way to implement such things is by having a stack. It is clear how to implement a stack
on its own TM tape.
We need to store three pieces of information for each procedure call:
(i) private working space,
(ii) the return value,
(iii) and the name of the state to return to after the call is done.
The private working space needs to be implemented with a stack, because a set of nested procedure
calls might be active all at once, including several recursive calls to the same procedure.
The return value can be handled by just putting it onto a designated register tape, say
24 .
Right before we give control over to a procedure, we need to store the name of the state it should
return to when it is done. This allows us to call a single fixed piece of code from several different places
in our TM. Again, these return points need to be put on a stack, to handle nested procedure calls.
After it returns from a procedure, the TM reads the state name to return to. A special set of TM
states handle reading a state name and transitioning to the corresponding TM state.
These are just the most essential features for a very simple general-purpose computer. In some computer
architecture class, you will see how to implement fancier program features (e.g. garbage collection, objects)
on top of this simple model.

24.4 Turing machine simulating a Turing machine

24.4.1 Motivation
We already seen that a TM can simulate a DFA. We think about TMs as being just regular computer
programs. So think about an interpreter. What is it? It is a program that reads in another program (for
example, think about Java virtual machine) and runs it.2

So, what would be the equivalent of an interpreter in the

language of Turing machines? Well, its a TM that reads in a
description of a TM M , and an input w for it, and simulates
running M on w.
Initially this construct looks very weird - inherently cir-
cular in nature. But it is useful for the same reason in-
terpreters are useful: It enable us to manipulate TMs (i.e.,
programs) directly and modify them without knowing in
advance what they are. In particular, we can start talk-
ing computationally about ways of manipulating TMs (i.e.,
programs).

For example, in a perfect world (which we are not living in, naturally), we would like to give a formal
specification of a program (say, a TM that decides if a prime number is prime), and have another program
2 Things of course are way more complicated in practice, since Java virtual machines nowadays usually compile portions of

the code being run frequently to achieve faster performance (i.e., just in time compilation [JIT]), but still, you can safely think
about a JVM as an interpreter.

150
that would swallow this description and spits out the program performing this computation (i.e., have a
computer that writes our programs for us).
A more realistic example is a compiler which translates (say) Java code into assembly code. It takes code
as input and procedures code in a different language as output. We could also build an optimizer that reads
Java code and produces new code, also in Java but more efficient. Or a cheating helper program that reads
Java code and writes out a new version with different variable names and modified comments.

24.4.2 The universal Turing machine

We would like to build the universal Turing machine UTM that recognizes the language
n o

ATM = hT, wi T is a TM and T accepts w .

We emphasize that UTM is not a decider. Namely, its stops only if T accepts w, but it might run forever if
T does not accept w.
To simplify our discussion, we assume that T is a single tape machine with some fixed alphabet (say
ΣT = {0, 1} and the tape alphabet is ΓT = {0, 1, ␣}. To simplify the discussion, the TM for ATM is going to
be a multi-tape machine. Naturally, one can convert this TM into a single tape TM.
So, the input for UTM is an encoding hT, wi. As a first step, the UTM would verify that the input is in
the right format (such a reasonable encoding for a TM was given as an exercise in the homework). The UTM
would copy different components of the input into different tapes:

1 : Transition function δ of T.
It is going to be a sequence (separated by $) of transitions. A transition (q, c) → (q 0 , t, L) would be
encouded as a string of the form:
(#q, c) − (#q 0 , t, L)
where #q is the index which is the index of the state q (in T) and #q 0 is the index of q 0 .
More specificially, you an think about the states of T being numbered between 1 and m, and #q is just
the binary representation of the index of the state q.

2 : #q0 – the initial state of T.

3 : #qacc – the accept state of T.

4 : #qrej – the reject state of T .

5 : $w – the input tape to be handled.
Once done copying the input, the UTM would move the head of
5 to the beginning of the tape. It then
performs the following loop:

(I) Loop:
(i) Scan
1 to find transition matching state on
2 and the character under the head of
5 .
(ii) Update state on
2 .
(iii) Update character and head position on
5 .
We repeat this till the state in
2 is equal to the state written on either
3 (qacc ) or
4 (qrej ).

Naturally, UTM accepts if

2 =
3 and rejects if
2 =
4 at any point during the simulation.

151
Chapter 25

Lecture 21: Undecidability, halting

and diagonalization
14 April 2009

‘There must be some mistake,’ he said, ‘are you not a greater computer than the Milliard Gargantubrain at
Maximegalon which can count all the atoms in a star in a millisecond?’
‘The Milliard Gargantubrain?’ said Deep Thought with unconcealed contempt. ‘A mere abacus - mention it not.’
– The Hitch Hiker’s Guide to the Galaxy, by Douglas Adams.

In this lecture we will discuss the halting problem and diagonalization. This covers most of Sipser
section 4.2.

25.1 Liar’s Paradox

There’s a widespread fascination with logical paradoxes. For example, in the Deltora Quest novel “The Lake
of Tears” (author Emily Rodda), the hero Lief has just incorrectly answered the trick question posed by the
giant guardian of a bridge.

“We will play a game to decide which way you will die,” said the man. “You may say one thing,
and one thing only. If what you say is true, I will strangle you with my bare hands. If what you
say is false, I will cut off your head.”

After some soul-searching, Lief replies “My head will be cut off.” At this point, there’s no way for the
giant to make good on his threat, so the spell he’s under melts away, he changes back to his original bird
form, and Lief gets to cross the bridge.
The key problem for the giant is that, if he strangles Lief, then Lief’s statement will have been false.
But he said he would strangle him only if his statement was true. So that does not work. And cutting off
his head does not work any better. So the giant’s algorithm sounded good, but it turned out not to work
properly for certain inputs.
A key property of this paradox is that the input (Lief’s reply) duplicates material used in the algorithm.
We’ve fed part of the algorithm back into itself.

25.2 The halting problem

Consider the following language
n o

ATM = hM, wi M is a TM and M accepts w .

152
Recognize-ATM ( hM, wi)
We saw in the previous lecture, that one can build a Simulate M using UTM till it halts
universal Turing machine UTM that can simulate any if M halts and accepts then
Turing machine on any input. As such, using UTM , we accept
have the following TM recognizing ATM : else
reject

Note, that if M goes into an infinite loop on the

input w, then the TM Recognize-ATM would run for-
ever. This means that this TM is only a recognizer, Regular
not a decider. A decider for this problem would call a
halt to simulations that will loop forever. So the ques- Context free grammar
tion of whether ATM is TM decidable is equivalent to
asking whether we can tell if a TM M will halt on in- Turing decidable
put w. Because of this, both versions of this question Turing recognizable
are typically called the halting problem.
We remind the reader that the language hierarchy Not Turing recognizable.
looks as depicted on the right.

25.2.1 Implications
So, let us suppose that the Halting problem (i.e., deciding if a word in is in ATM ) were decidable. Namely,
there is an algorithm that can solves it (for any input). this seems somewhat hard to believe since even
humans can not solve this problem (and we still live under the delusion that we are smarter than computers).
If we could decide the Halting problem, then we could build compilers that would automatically prevent
programs from going into infinite loops and other very useful debugging tools. We could also solve a variety
of hard mathematical problems. For example, consider the following program.
Percolate ( n)
for p < q < n do
if p is prime and q is prime, and p + q = n then
return

If program reach this point then Stop!!!

Main:
n←4
while true do
Percolate (n)
n←n+2

Does this program stops? We do not know. If it does stop, then the Strong Goldbach conjecture is
false.

Conjecture 25.2.1 (Strong Goldbach conjecture.) Every even integer greater than 2 can be written as
a sum of two primes.

This conjecture is still open and its considered to be one of the major open problems in mathematics. It
was stated in a letter on 7 of June 1742, and it is still open. Its seems unlikely that a computer program
would be able to solve this, and a larger number of other mathematical conjectures. If ATM is decidable, then
we can write a program that would try to generate all possible proofs of a conjecture and verify each proof.
Now, if we can decide if a programs stop, then we can discover whether or not a mathematical conjecture
is true or not, and this seems extremely unlikely (that a computer would be able to solve all problems in
mathematics).

153
I hope that this informal argument convinces you that its seems extremely unlikely that ATM is TM
decidable. Fortunately, we can prove this fact formally.

25.3 Not all languages are recognizable

Let us show a non-constructive proof that not all languages are Turing recognizable. This is true because
there are fewer Turing machines than languages.
Fix an alphabet Σ and define the lexicographic order on Σ∗ to be: first order strings by length, within
each length put them in dictionary order.
Lexicographic order gives us a mapping from the integers to all strings, e.g. s1 is the first string in our
ordered list, and si is the ith string.
The encoding of each Turing machine is a finite-length string. So we can put all Turing machines into
an ordered list by sorting their encodings in lexicographic order. Let us call the Turing machines in our list
M1 , M2 , and so forth.
We can make an (infinite) table of how each Turing machine s1 s2 s3 s4 ...
behaves on each input string. This table in depicted on the right. M1 acc acc rej rej ...
Here, the ith row represents the ith TM Mi , where the jth entry in M2 rej acc rej acc . . .
the row is acc if Mi accepts the jth word sj . M3 acc rej acc acc . . .
The idea is now to define a language from the table. Consider M4 rej acc rej rej . . .
the language Ldiag which is the language formed by taking the di- .. .. .. .. .. ..
agonal of this table. . . . . . .
Formally, the word si ∈ Ldiag if and only if Mi accepts the string si . Now, consider the complement
language L = Ldiag .
This language can not be recognized by an of the Turing machines on the list M1 , M2 , . . .. Indeed, if Mk
recognized the language L, then consider sk . There are two possibilities.
• If Mk accepts sk then the kth entry in the kth row of this infinite table is acc. Which implies in turn
that sk ∈/ L (since L is the complement language), but then Mk (which recognizes L) must not accept
sk . A contradiction.
• If Mk does not accept sk then the kth entry in the kth row of this infinite table is rej. Which implies
in turn that sk ∈ L (since L is the complement language), but then Mk (which recognizes L) must
accept sk . A contradiction.
Thus, our assumption that all languages have a TM that recognizes them is false. Let us summarize this
very surprising result.

Theorem 25.3.1 Not all languages have a TM that recognize them.

Intuitively, the above claim is a statement above infinities: There are way more languages (essentially,
any real number defines a language) than TMs, as the number of TMs is countable (i.e., as numerous as
integer numbers). Since the cardinality of real numbers (i.e., ℵ) is strictly larger than the cardinality of
integer numbers (i.e., ℵ0 ), it follows that there must be an orphan language without a machine recognizing
it.
A limitation of the preceding proof is that it does not identify any particular tasks that are not TM
recognizable or decidable. Perhaps the problem tasks are only really obscure problems of interest only to
mathematicians. Sadly, that is not true.

25.4 The Halting theorem

We will now show that a particular concrete problem is not TM decidable. This will let us construct particular
concrete problems that are not even TM recognizable.

Theorem 25.4.1 (The halting theorem.) The language ATM is not TM decidable,

154
Proof: Assume ATM is TM decidable, and let Halt be this TM deciding ATM . That is, Halt is a TM
that always halts, and works as follows
(
accept M accepts w
Halt hM, wi =
reject M does not accept w.

We input hM
will now build a new TM Flipper, such that on the i, it runs Halt on the input hM, M i. If
Halt hM, M i accepts than Flipper rejects, and if Halt hM, M i rejects than Flipper accepts. Formally

Flipper ( hM i)
res ← Halt(hM, M i)
if res is accept then
reject
else
accept
The key observation is that Flipper always stops. Indeed, it uses Halt as a subroutine and Halt by our
assumptions always halts. In particular, we have the following
(
reject M accepts hM i
Flipper hM i =
accept M does not accept hM i .

Flipper is a TM (duh!), and as such it has an encoding hFlipperi. Now, consider running Flipper on itself.
We get the following
(
reject Flipper accepts hFlipperi
Flipper hFlipperi =
accept Flipper does not accept hFlipperi .

This is absurd. Ridiculous even! Indeed, if Flipper accepts hFlipperi, then it rejects it (by the above
definition), which is impossible. Indeed, if Flipper must reject (note, that Flipper always stops!) hFlipperi,
but then by the above definition it must accept hFlipperi, which is also impossible.
Thus, it must be that our assumption that Halt exists is false. We conclude that ATM is not TM
decidable.

Corollary 25.4.2 The language ATM is TM recognizable but not TM decidable,

25.4.1 Diagonalization view of this proof

Let us redraw the diagonalization table from Section 25.3.
hM1 i hM2 i hM3 i hM4 i ...
This time, we will include only input strings that happen
to be encodings of Turing machines. The table on the M1 rej acc rej rej ...
right shows the behavior of Halt on inputs of the form M2 rej acc rej acc ...
hMi , Mj i. Our constructed TM Flipper takes two inputs M3 acc acc acc rej ...
M4 rej acc acc rej ...
that are hM, M i and its output is the opposite of
identical .. .. .. .. .. ..
Halt hM, M i . . . . . . .
So it corresponds to the negation of the entries down the diagonal of this table. Again, we essentially
argued that there is no row in this infinite table that its entries are the negation of the diagonal. As such,
our assumption that Halt is a decider, was false.

25.5 More Implications

From this basic result, we can derive a huge variety of problems that can not be solved. Spinning out these
consequences will occupy us for most of the rest of the term.

155
Theorem 25.5.1 There is no C program that reads a C program P and input w, and decides if P “accepts”
w.
The proof of the above theorem is identical to the halting theorem - we just perform our rewriting on
the C program.
Also, notice that being able to recognize a language and its complement implies that the language is
decidable, as the following theorem testifies.

Theorem 25.5.2 A language is TM decidable iff it is TM recognizable and its complement is also TM
recognizable.
Proof: It is obvious that decidability implies that the language and its complement are recognizable. To
prove the other direction, assume that L and L are both recognizable. Let M and N be Turing machines
recognizing them, respectively. Then we can build a decider for L by running M and N in parallel.
Specifically, suppose that w is the string input to M . Simulate both M and N using UT M , but single-step
the simulations. Advance each simulation by one step, alternating between the two simulations. Halt when
either of the simulations halts, returning the appropriate answer.
If w is in L, then the simulation of M must eventually halt. If w is not in L, then the simulation of N
must eventually halt. So our combined simulation must eventually halt and, therefore, it is a decider for L.

A quick consequence of this theorem is that:

Theorem 25.5.3 The set complement of AT M is not TM recognizable.

If it were recognizable, then we could build a decider for AT M by Theorem 25.5.2.

156
Chapter 26

Lecture 22: Reductions

16 April 2009

26.1 What is a reduction?

Last lecture we proved that ATM is undecidable. Now that we have one example of an undecidable language,
we can use it to prove other problems to be undecidable.

Meta definition: Problem A reduces to problem B, if given a solution to B, then it implies a solution for
A. Namely, we can solve B then we can solve A. We will done this by A =⇒ B.

An oracle ORAC for a language L is a function that receives as a word w, and it returns true if and
only if w ∈ L. An oracle can be thought of as a black box that can solve membership in a language without
requiring us to consider the question of whether L is computable or not. Alternatively, you can think about
an oracle as a provided library function that computes whatever it requires to do, and it always return (i.e.,
it never goes into an infinite loop).
Intuitively, a TM decider for a language L is the ultimate oracle. Not only it can decide if a word is in
L, but furthermore, it can be implemented as a TM that always stops.
In the context of showing languages are undecidable, the following more specific definition would be
useful.

Definition 26.1.1 A language X reduces to a language Y , if one can construct a TM decider for X using
a given oracle ORACY for Y .
We will denote this fact by X =⇒ Y .

In particular, if X reduces to Y then given a decider for the language Y (i.e., an oracle for Y ), then there
is a program that can decide X. So Y must be at least as “hard” as X. In particular, if X is undecidable,
then it must be that Y is also undecidable.

Warning. It is easy to get confused about which of the two problems “reduces” to the other. Do not get
hung up on this. Instead, concentrate on getting the right outline for your proofs (proving them in the right
direction, of course).

Reduction proof technique. Formally, consider a problem B that we would like to prove is undecidable.
We will prove this via reduction, that is a proof by contradiction, similar in outline to the ones we have seen
for regular and context-free languages. You assume that your new language L (i.e., the language of B) is
decided by some TM M . Then you use M as a component to create a decider for some language known to
be undecidable (typically ATM ). This is would imply that we have a decider for A (i.e., ATM ). But this is
a contradiction since A (i.e., ATM ) is not decidable. As such, we must have been wrong in assuming that L
was decidable.

157
We will concentrate on using reductions to show that problems are undecidable. However, the technique
is actually very general. Similar methods can be used to show problems to be not TM recognizable. We have
used similar proofs to show languages to be not regular or not context-free. And reductions will be used in
CS 473 to show that certain problems are “NP complete”, i.e. these problems (probably) require exponential
time to solve.

26.1.1 Formal argument

Lemma 26.1.2 Let X and Y be two languages, and assume that X =⇒ Y . If Y is TM decidable then X
is TM decidable.

Proof: Let T be the TM decider for Y . Since X reduces to Y , it follows that there is a procedure TX|Y
(i.e., TM decider) for X that uses an oracle for Y as a subroutine. We replace the calls to this oracle in TX|Y
by calls to T. The resulting TM TX is a TM decider and its language is X. Thus X is TM decidable.

The counter-positive of this lemma, is what we will use.

Lemma 26.1.3 Let X and Y be two languages, and assume that X =⇒ Y . If X is TM undecidable then
Y is TM undecidable.

26.2 Halting
We remind the reader that ATM is the language
n o

ATM = hM, wi M is a TM and M accepts w .

This is the problem that we showed (last class) to be undecidable (via diagonalization). Right now, it is
the only problem we officially know to be undecidable.
Consider the following slight modification, which is all the pairs hM, wi such that M halts on w. Formally,
n o

AHalt = hM, wi M is a TM and M stops on w .

Intuitively, this is very similar to ATM . The big obstacle to building a decider for ATM was deciding
whether a simulation would ever halt or not.
To show formally that AHalt is undecidable, we show that we can use a oracle for AHalt to build a decider
for ATM . This construction looks like the following.

Lemma 26.2.1 The language ATM reduces to AHalt . Namely, given an oracle for AHalt one can build a
decider (that uses this oracle) for ATM .

Proof: Let ORACHalt be the given oracle for AHalt . We build the following decider for ATM .

Decider-ATM hM, wi

res ← ORACHalt hM, wi
// if M does not halt on w then reject.
if res = reject then
halt and reject.

// M halts on w since res =accept.

// Thus, simulating M on w would terminate in finite time.
res2 ←Simulate M on w (using UTM ).

return res2 .

158
Clearly, this procedure always return and as such its a decider for ATM .

Theorem 26.2.2 The language AHalt is not decidable.

Proof: Assume, for the sake of contradiction, that AHalt is decidable. As such, there is a TM, denoted
by TMHalt , that is a decider for AHalt . We can use TMHalt as an implementation of an oracle for AHalt ,
which would imply by Lemma 26.2.1 that one can build a decider for ATM . However, ATM is undecidable.
A contradiction. It must be that AHalt is undecidable.

We will be usually less formal in our presentation. We will just show that given a TM decider for AHalt
implies that we can build a decider for ATM . This would imply that ATM is undecidable.
Thus, given a black box (i.e., decider) TMHalt that can decide membership in AHalt , we build a decider
for ATM is follows.

Turing machine for ATM

accept accept
accept Simulate M
on w
hM, wi hM, wi reject reject
TMHalt

reject reject

This would imply that if AHalt is decidable, then we can decide ATM , which is of course impossible.

26.3 Emptiness
Now, consider the language
n o

ETM = hM i M is a TM and L(M ) = ∅ .

Again, we assume that we have a decider for ETM . Let us call it TMETM . We need to use the component
TMETM to build a decider for ATM .
A decider for ATM is given M and w and must decide whether M accepts w. We need to restructure this
question into a question about some Turing machine having an empty language. Notice that the decider for
ETM takes only one input: a Turing machine. So we have to somehow make the second input (w) disappear.
The key trick here is to hard-code w into M , creating a TM Mw which runs M on the fixed string w.
Specifically the code for Mw might look like:
TM Mw :
1. Input = x (which will be ignored)
2. Simulate M on w.
3. If the simulation accepts, accept. If the simulation rejects, reject.
Its important to understand what is going on. The input is hM i and w. Namely, a string encoding M
and a the string w. The above shows that we can write a procedure (i.e., TM) that accepts this two strings
as input, and outputs the string hMw i which encodes Mw . We will refer to this procedure as EmbedString.
The algorithm EmbedString(hM, wi) as such, is a procedure reading its input, which is just two strings,
and outputting a string that encodes the TM hMw i.
It is natural to ask, what is the language of the machine encoded by the string hMw i; that is, what is
L(Mw )?
Because we are ignoring the input x, the language of Mw is either Σ∗ or ∅. It is Σ∗ if M accepts w, and
it is ∅ if M does not accept w.
We are now ready to prove the following theorem.

159
Theorem 26.3.1 The language ETM is undecidable.
Proof: We assume, for the sake of contradiction, that ETM is decidable, and let TMETM be its decider.
Next, we build our decider AnotherDecider-ATM for ATM , using the EmbedString procedure described
above.

AnotherDecider-ATM(hM, wi)
hMw i ← EmbedString (hM, wi)
r ← TMETM (hMw i).
if r = accept then
reject.

// TMETM (hMw i) rejected its input

return accept

Consider the possible behavior of AnotherDecider-ATM on the input hM, wi.

• If TMETM accepts hMw i, then L(Mw ) is empty. This implies that M does not accept w. As such,
AnotherDecider-ATM rejects its input hM, wi.
• If TMETM accepts hMw i, then L(Mw ) is not empty. This implies that M accepts w. So AnotherDecider-ATM
accepts hM, wi.
Namely, AnotherDecider-ATM is indeed a decider for ATM , (its a decider since it always stops on its
input). But we know that ATM is undecidable, and as such it must be that our assumption that ETM is
decidable is false.
In the above proof, note that AnotherDecider-ATM is indeed a decider, so it always halts, either
accepting or rejecting. By contrast, Mw might not always halt. So, when we do our analysis, we need to
think about what happens if Mw never halts. In this example, if M never halts on w, then w will be treated
just like the explicit rejection cases and this is what we want.
Here is the code for AnotherDecider-ATM in flow diagram form.
AnotherDecider-ATM
accept accept
hM, wi hMw i
EmbedString TME
TM
reject reject

Observe, that AnotherDecider-ATM never actually runs the code for Mw . It hands the code to a
function TMETM which analyzes what the code would do if we ever did choose to run it. But we never run
it. So it does not matter that Mw might go into an infinite loop.
Also notice that we have two input strings floating around our code: w (one input to the decider for
ATM ) and x (input to Mw ). Be careful to keep track of which strings are input to which functions. Also be
careful about how many inputs, and what types of inputs, each function expects.

26.4 Equality
An easy corollary of the undecidability of ETM is the undecidability of the language
n o

EQTM = hM, N i M and N are TM’s and L(M ) = L(N ) .

Lemma 26.4.1 The language EQTM is undecidable.

160
Proof: Suppose that we had a decider DeciderEqual for EQTM . Then we can build a decider for ETM
as follows:

TM R:

1. Input = hM i
2. Include the (constant) code for a TM T that rejects all its input. We denote the string encoding
T by hT i.
3. Run DeciderEqual on hM, T i.
4. If DeciderEqual accepts, then accept.
5. If DeciderEqual rejects, then reject.

Since the decider for ETM (i.e., TMETM ) takes one input but the decider for EQTM (i.e. DeciderEqual)
requires two inputs, we are tying one of DeciderEqual’s input to a constant value (i.e., T ).
There are many Turing machines that reject all their input and could be used as T . Building code for R
just requires writing code for one such TM.

26.5 Regularity
It turns out that almost any property defining a TM language induces a language which is undecidable, and
the proofs all have the same basic pattern. Let us do a slightly more complex example and study the outline
in more detail.
Let n o

RegularTM = hM i M is a TM and L(M ) is regular .
Suppose that we have a TM DeciderRegL that decides RegularTM . In this case, doing the reduction
from halting, would require to turn a problem about deciding whether a TM M accepts w (i.e., is w ∈ ATM )
into a problem about whether some TM accepts a regular set of strings.
Given M and w, consider the following TM Mw0 :
TM Mw0 :

(i) Input = x
(ii) If x has the form an bn , halt and accept.
(iii) Otherwise, simulate M on w.
(iv) If the simulation accepts, then accept.
(v) If the simulation rejects, then reject.

Again, we are not going to execute Mw0 directly ourself. Rather, we will feed its description hMw0 i (which
is just a string) into DeciderRegL. Let EmbedRegularStringdenote this algorithm, which accepts as
input hM i and w, and outputs hMw0 i, which is the encoding of the machine Mw0 .
If M accepts w, then every input x will eventually be accepted by the machine Mw0 . Some are accepted
right away and some are accepted in step (i). So if M accepts w then the language of Mw0 is Σ∗ .
If M does not accept w, then some strings x (that are of the form an bn ) will be accepted in step (ii) of
0
Mw . However, after that,n either step (iii)
o will never halt or step (iv) will reject. So the rest of the strings
n n
(that are in the set Σ \ a b n ≥ 0 ) will not be accepted. So the language of Mw0 is an bn in this case.
∗

Since an bn is not regular, we can use our decider DeciderRegL on Mw0 to distinguish these two cases.
Notice that the test in step (ii) was cooked up specifically to match the capabilities of our given decider
DeciderRegL. If DeciderRegL had been testing whether our language contained the string “uiuc”, step
(ii) would be comparing x to see if it was equal to “uiuc”. This test can be anything that a TM can compute
without the danger of going into an infinite loop.
Specifically, we can build a decider for ATM as follows.

161
YetAnotherDecider-ATM (hM, wi)
hMw0 i ← EmbedRegularString (hM, wi)
r ← DeciderRegL(hMw0 i).
return r

The reason why YetAnotherDecider-ATM does the right thing is that:

— If DeciderRegL accepts, then L(Mw0 ) is regular. So it must be Σ∗ . This implies that M accepts w.
So YetAnotherDecider-ATM should accept hM, wi.
— If DeciderRegL rejects, then L(Mw0 ) is not regular. So it must be an bn . This implies that M does
not accept w. So YetAnotherDecider-ATM should reject hM, wi.

26.6 Windup
Notice that the code in Section 26.5 is almost exactly the same as the code for the ETM example in Sec-
tion 26.3. The details of Mw and Mw0 were different. And one example passed on the return values from
YetAnotherDecider-ATM directly, whereas the other example negated them. This similarity is not acci-
dental, as many examples can be done with very similar proofs.
Next class, we will see Rice’s Theorem, which uses this common proof template to show a very general
result. Namely, almost any nontrivial property of a TM’s language is undecidable.

162
Chapter 27

Lecture 23: Rice Theorem and Turing

machine behavior properties
21 April 2009

This lecture covers Rice’s theorem, as well as decidability of TM behavior properties.

27.1 Outline & Previous lecture

27.1.1 Forward outline of lectures
This week and next, we’ll see three major techniques for proving undecidability:
• Rice’s Theorem (today): generalize a lot of simple reductions with common outline.
• Linear Bounded automata (Thursday): Allow us to show that ALLCF G , EQCF G are undecidable.
Also, LBAs illustrate a useful compromise in machine power: much of the flexibility of a TM but
enough resource limits to be more analyzable.
• Post’s Correspondence problem (a week from Thursday): allows us to show that AM BIGCFG is unde-
cidable.

27.1.2 Recap of previous class

In the previous class, we proved that the following language is undecidable.
n o

RegularTM = hM i M is a TM and L(M ) is regular .

To do this, we assume that RegularTM was decided by some TM S. We then used this to build a decider
for ATM (which can not exist)

Decider for ATM

(i) Input = hM, wi

(ii) Construct hMw i (see below).
(iii) Feed hMw i to S and return the result.

Our auxiliary TM Mw looked like:

TM Mw :

(i) Input = x
(ii) If x has the form an bn , halt and accept.

163
(iii) Otherwise, simulate M on w.
(iv) If the simulation accepts, then accept.
(v) If the simulation rejects, then reject.

The language of Mw was either Σ∗ or an bn , depending on whether M accepts w.

27.2 Rice’s Theorem

27.2.1 Another Example - The language L3
Let us consider another reduction with a very similar outline. Suppose we have the following language
n o

L3 = hM i |L(M )| = 3 .

That is L3 contains all Turing machines whose languages contain exactly three strings.

Lemma 27.2.1 The language L3 is undecidable.

Proof: Proof by reduction from ATM . Assume, for the sake of contradiction, that L3 was decidable and let
deciderL3 be a TM deciding it. We use deciderL3 to construct a Turing machine decider9 -ATM deciding
ATM . The decider TMdecider9 -ATM is constructed as follows:
decider9 -ATM ( hM, wi )
Construct a new Turing machine Mw :

Mw ( x ): // x: input
res ← Run M on w
if (res = reject) then
reject
if x = UIUC or x = Iowa or x = Michigan then
accept

reject

return deciderL3 (hMw i).

(We emphasize here again, that constructing Mw involve taking the encoding of hM i and w, and gener-
ating the encoding of hMw i.)
Notice that the language of Mw has only two possible values. If M loops or rejects w, then L(Mw ) = ∅.
If M accepts w, then th the language
of Mw contains exactly three strings: “UIUC”, “Iowa”, and “Michigan”.
So decider9 -ATM hMw i accepts exactly when M accepts w. Thus, decider9 -ATM is a decider for ATM
But we know that ATM is undecidable. A contradiction. As such, our assumption that L3 is decidable is
false.

27.2.2 Rice’s theorem

Notice that these two reductions have very similar outlines. Our hypothetical decider decider looks for
some property P . The auxiliary TM’s tests x for membership in an example set with property P . The big
difference is whether we simulate M on w before or after testing x and, consequently, whether the second
possibility for L(Mw ) is ∅ or Σ∗ .
It’s easy to cook up many examples of reductions similar to this one, all involving sets of TM’s whose
languages share some property (e.g. they are regular, they have size three). Rice’s Theorem generalizes all
these reductions into a common result.

164
Theorem 27.2.2 (Rice’s Theorem.) Suppose that L is a language of Turing machines; that is, each word
in L encodes a TM. Furthermore, assume that the following two properties hold.
(a) Membership in L depends only on the Turing machine’s language, i.e. if L(M ) = L(N ) then hM i ∈ L ⇔
hN i ∈ L.
(b) The set L is “non-trivial,” i.e. L 6= ∅ and L does not contain all Turing machines.
Then L is a undecidable.
Proof: Assume, for the sake of contradiction, that L is decided by TMdeciderForL. We will construct
a TMDecider4 -ATM that decides ATM . Since Decider4 -ATM does not exist, we will have a contradiction,
implying that deciderForL does not exist.
Remember from last class that TM∅ is a TM (pick your favorite) which rejects all input strings. Assume,
for the time being, that TM∅ 6∈ L. This assumption will be removed shortly.
Since L is non-trivial, also choose some other TM Z ∈ L. Now, given hM, wi Decider4 -ATM will construct
the encoding of the following TM Mw .
TM Mw :

(1) Input = x.
(2) Simulate M on w.
(3) If the simulation rejects, halt and reject.
(4) If the simulation accepts, simulate Z on x and accept if and only if T halts and accepts.

If M loops or rejects w, then Mw will get stuck on line (2) or stop at line (3). So L(Mw ) is ∅. Because
membership in L depends only on a Turing machine’s language and hTM∅ i is not in L, this means that Mw
is not in L. So Mw will be rejected by N .
If M accepts w, then Mw will proceed to line (4), where it simulates the behavior of Z. So L(Mw ) will
be L(Z). Because membership in L depends only on a Turing machine’s language and T is L, this means
that Mw is in L. So Mw will be accepted by N .
As usual, our decider for ATM looks like:
Decider4 -ATM (hM, wi)
Construct hMw i from hM, wi
return deciderForL (hMw i)
So Decider4 -ATM (hM, wi) will accept hM, wi iff deciderForL accepts Mw . But we saw above that
deciderForL accepts Mw iff M accepts w. So Decider4 -ATM is a decider for ATM . Since such a decider
cannot exist, we must have been wrong in our assumption that there was a decider for L.
Now, let us remove the assumption that TM∅ ∈ / L. The above proof showed that L is undecidable,
assuming that hTM∅ i was not in L. If TM∅ ∈ L, then we run the above proof using L in place of L. At the
end, we note that L is decidable iff L is decidable.

27.3 TM decidability by behavior

27.3.1 TM behavior properties
One thinking about TMs there are three kind of properties one might consider:
(1) The language accepted by the TM’s, e.g. the TM accepts the string “UIUC”. In this case, such a property
is very likely undecidable by Rice’s theorem.
(2) The TM’s structure, e.g. the TM has 13 states. In this case, the property can probably be checked
directly on the given description of the TM, and as such this is (probably) decidable.
(3) The TM’s behavior, e.g. the TM never moves left on input “UIUC”. This kind properties can be either
decidable or not depending on the behavior under consideration, and this classification might be non-
trivial.

165
27.3.2 A decidable behavior property
For example, consider the following set of Turing machines:
n o

LR = hM i M never moves left for the input x, where x is the empty word .

Surprising, the language LR is decidable because never moving left (equivalently: always moving right)
destroys the Turing machine’s ability to do random access into its tape. It is effectively made into a DFA.
Specifically, if a Turing machine M never moves left, it reads through the whole input, then starts looking
at blank tape cells. Once it is on the blank part of the tape, it can cycle through its set of states. But after
|Q| moves, it has run out of distinct states and must be in a loop. So, if you watch M for four moves (the
length of the string "UIUC") plus |Q| + 1 moves, it has either halted or its in an infinite loop.
Therefore, to decide LR , you simulate the input Turing machine for |Q| + 5 moves. After that many
moves, it has either

• moved left (in which case you reject), or

• has halted or gone into an infinite loop without ever moving left (in which case you accept).

This algorithm is a decider (not just a recognizer) for L, because it definitely halts on any input Turing
machine M .

27.3.3 An undecidable behavior property

By contract, consider the following language:
n o

Lx = hM i M writes an x at some point, when started on blank input .

This language Lx is undecidable. The reason is that a Turing machine with this restriction (no writing
x’s) can simulate a Turing machine without the restriction.
Proof: Suppose that Lx were decidable. Let R be a Turing machine deciding Lx . We will now construct
a Turing machine S that decides ATM .
S is constructed as follows:
• Input is hM, wi, where M is the code for a Turing Machine and w is a string.

• Construct the code for a new Turing machine Mw as follows

(a) On input y (which will be ignored).
(b) Substitute X for x every where in < M > and w, creating new versions < M 0 > and w0 .
(c) Simulate M 0 on w0
(d) If M 0 rejects w0 , reject.
(e) If M 0 accepts w0 , print x on the tape and then accept.
• Run R on hMw i. If R accepts, then accept. If R rejects, then reject.
If M accepts w, then Mw will print x on any input (and thus on a blank input). If M rejects w or loops
on w, then Mw is guaranteed never to print x accidently. So R will accept hMw i exactly when M accepts
w. Therefore, S decides ATM .
But we know that ATM is undecidable. So S can not exist. Therefore we have a contradiction. So Lx
must have been undecidable.

166
Appendix - more examples of
undecidable languages

27.4 More examples

The following examples weren’t presented in lecture, but may be helpful to students.

27.4.1 The language LUIUC

Here’s another example of a reduction that fits the Rice’s Theorem outline.
Let n o

LUIUC = hM i L(M ) contains the string “UIUC” .

Lemma 27.4.1 LUIUC is undecidable.

Proof: Proof by reduction from ATM . Suppose that LUIUC were decidable and let R be a Turing machine
deciding it. We use R to construct a Turing machine deciding ATM . S is constructed as follows:
• Input is hM, wi, where M is the code for a Turing Machine and w is a string.
• Construct code for a new Turing machine Mw as follows:
– Input is a string x.
– Erase the input x and replace it with the constant string w.
– Simulate M on w.
• Feed hMw i to R. If R accepts, accept. If R rejects, reject.
If M accepts w, the language of Mw contains all strings and, thus, the string “UIUC”. If M does not
accept w, the language of Mw is the empty set and, thus, does not contain the string “UIUC”. So R(hMw i)
accepts exactly when M accepts w. Thus, S decides ATM
But we know that ATM is undecidable. So S does not exist. Therefore we have a contradiction. So LUIUC
must have been undecidable.

27.4.2 The language Halt_Empty_TM

Here’s another example which isn’t technically an instance of Rice’s Theorem, but has a very similar structure.
Let n o

Halt_Empty_TM = hM i M halts on blank input .

Lemma 27.4.2 Halt_Empty_TM is undecidable.

Proof: By reduction from ATM . Suppose that Halt_Empty_TM were decidable and let R be a Turing
machine deciding it. We use R to construct a Turing machine deciding ATM . S is constructed as follows:

167
• Input is hM, wi, where M is the code for a Turing Machine and w is a string.
• Construct code for a new Turing machine Mw as follows:
– Input is a string x.
– Ignore the value of x.
– Simulate M on w.
• Feed hMw i to R. If R accepts, then accept. If R rejects, then reject.
If M accepts w, the language of Mw contains all strings and, thus, in particular the empty string. If M
not accept w, the language of Mw is the empty set and, thus, does not contain the empty string. So
does
R hMw i accepts exactly when M accepts w. Thus, S decides ATM
But we know that ATM is undecidable. So S can not exist. Therefore we have a contradiction. So
Halt_Empty_TM must have been undecidable.

27.4.3 The language L111

Here is another example of an undecidable language defined by a Turing machine’s behavior, to which Rice’s
Theorem does not apply.
Let n o

L111 = hM i M prints three one’s in a row on blank input .

Lemma 27.4.3 The language L111 is undecidable.

Proof: Suppose that L111 were decidable. Let R be a Turing machine deciding L111 . We will now
construct a Turing machine S that decides ATM .
The decider S for ATM is constructed as follows:
• Input is hM, wi, where M is the code for a Turing Machine and w is a string.
• Construct the code for a new Turing machine M 0 , which is just like M except that

– every use of the character 1 is replaced by a new character 10 which M does not use.
– when M would accept, M 0 first prints 111 and then accepts
• Similarly, create a string w’ in which every character 1 has been replaced by 10 .
• Create a second new Turing machine Mw0 which simulates M 0 on the hard-coded string w0 .

• Run R on hMw0 i. If R accepts, accept. If R rejects, then reject.

If M accepts w, then Mw0 will print 111 on any input (and thus on a blank input). If M does not accept
w, then Mw0 is guaranteed never to print 111 accidently. So R will accept hMw0 i exactly when M accepts w.
Therefore, S decides ATM .
But we know that ATM is undecidable. So S can not exist. Therefore we have a contradiction. So L111
must have been undecidable.

168
Chapter 28

Lecture 24: Dovetailing and

non-deterministic Turing machines
23 April 2009

This lecture covers dovetailing, a method for running a gradually expanding set of simulations in parallel.
We use it to demonstrate that non-deterministic TMs can be simulated by deterministic TMs.

28.1 Dovetailing
28.1.1 Interleaving
We have seen that you can run two Turing machines in parallel, to compute some function of their outputs,
e.g. recognize the union of their languages.
Suppose that we had Turing machines SkM1 , . . . , Mk recognizing languages L1 , . . . , Lk , respectively. Then,
we can build a TM M which recognizes i=1 Li . To do this, we assume that M has k simulation tapes, plus
input and working tapes. The TM M cycles through the k simulations in turn, advancing each one by a
single step. If any of the simulations halts and accepts, then M halts and accepts.
We could use this same method to run a single TM M on a set of k input strings w1 , . . . , wk ; that is,
accept the input list if M accepts any of the strings w1 , . . . , wk .
The limitation of this approach is that the number of tapes is finite and fixed for any particular Turing
machine.

28.1.2 Interleaving on one tape

Suppose that TM M recognizes language L and consider the language
n o
b = w1 #w2 # . . . #wk M accepts wk for some k .
L

The language L b is recognizable, but we have to be careful how we construct its recognizer M
c. Because M
is not necessarily a decider, we can not process the input strings one after another, because one of them
might get stuck in an infinite loop. Instead, we need to run all k simulations in parallel. But k is different
c, so we can not just give k tapes to M
for different inputs to M c.
Instead, we can store all the simulations on a single tape

T . Divide up

T
into k sections, one for each simulation. If a simulation runs out of space in its section, push everything over
to the right to make more room.

169
28.1.3 Dovetailing

Dovetailing (in carpentry) is a way of connecting two pieces of wood by interleaving

them, see picture on the right.
Dovetailing is an interleaving technique for simulating many (in fact, infinite
number of) TM together. Here, we would like to interleave an infinite number of
simulations, so that if any of them stops, our simulation of all of them would also stop.
Consider the language:
n o

J = hM i M accepts at least one string in Σ∗ .

It is tempting to design our recognizer for J as follows.

algBuggyRecog (hM i)
x=
while True do
simulated M on x (using UTM )
if M accepts then
halt and accept
x ← next string in lexicographic order
Unfortunately, if M never halts on one of the strings, this process will get stuck before it even reaches
the string that M does accept. So we need to run our simulations in parallel. Since we can not start up an
infinite number of simulations all at once, we use the following idea.
Dovetailing is the idea of running k simulations in parallel, but keep dynamically increasing k. So, for

our example, suppose that we store all our simulations on tape T and x lives on some other tape. Then
our code might look like:

algDovetailingRecog (hM i)
x=
while True do
On
T , start up the simulation of M on x
Advance all simulations on
T by one step.
if any simulation on

T accepted then
halt and accept
x ← Next(x)

Here Next(x) yields the next string in the lexicographic ordering.

In each iteration through the loop, we only start up one new simulation. So, at any time, we are only
running a finite number of simulations. However, the set of simulations keeps expanding. So, for any string
w ∈ Σ∗ , we will eventually start up a simulation on w.

Increasing resource bounds

The effect of dovetailing can also be achieved by running simulations with a resource bound and gradually
increasing it. For example, the following code can also recognize J.

Increasing resource bound

for i = 0, 1, . . .

(1) Generate the first i strings (in lexicographic order) in Σ∗

(2) On tape T , start up simulations of M on these i input strings

170
(3) Run the set of simulations for i steps.
(4) If any simulation has accepted, halt and accept
(5) Otherwise increment i and repeat the loop
Each iteration of the loop does only a finite amount of work: i steps for each of i simulations. However,
because i increases without bound, the loop will eventually consider every string in Σ∗ and will run each
simulation for more and more steps. So if there is some string w which is accepted by M , our procedure will
eventually simulate M on w for enough steps to see it halt.

28.2 Nondeterministic Turing machines

A non-deterministic Turing machine (NTM) is just like a normal TM, except that the transition function
generates a set of possible moves (not just a single one) for a given state and character being read. That is,
the output of the transition function is a set of triples (r, c, D) where r is the new state, c is the character
to write onto the tape, and D is either L or R. That is
n o

δ(q, c) = (r, d, D) for some r ∈ Q, d ∈ Γ and D ∈ {L, R} .

An NTM M accepts an input w if there is some possible run of M on w which reaches the accept state.
Otherwise, M does not accept w.
This works just like non-determinism for the simpler automata. That is, you can either imagine searching
through all possible runs, or you can imagine that the NTM magically makes the right guess for what option
to take in each transition.
For regular languages, the deterministic and non-deteterministic machines do the same thing. For context-
free languages, they do different things. We claim that non-deterministic Turing machines can recognize the
same languages as ordinary Turing machines.

28.2.1 NTMs recognize the same languages

Theorem 28.2.1 NTMs recognize the same languages as normal TMs.
Suppose that a language L is recognized by an NTM M . We need to construct a deterministic TM that
recognizes L. We can do this using dovetailing to search all possible choices that the NTM could make for
its moves.
Our simulation will use two simulation tapes S and T in a similar way. In this case, flicker isn’t a big
problem. But the double buffering makes the algorithm slightly easier to understand.
simulate NTM M on input w
(1) put the start configuration q0 w onto tape S
(2) for each configuration C on S
– generate all options D1 , D2 , . . . Dk for the next configuration
– if any of D1 , D2 , . . . Dk is an accept configuration, halt and accept
– otherwise, erase all the Di that have halted and rejected
– copy the rest onto the end of T
(3) if tape T is empty, halt and reject
(4) copy the contents of T to S, overwriting what was there
(5) erase tape T .
(6) goto step 2
You can think about the set of possible configurations as a tree. The root is the start configuration. Its
children are the configurations that could be reached in one step. Its grandchildren are the configurations
that could be reached in two steps. And so forth. Our simulator is then doing a breadth-first search of this
tree of possibilities, looking for a branch that ends in acceptance.

171
28.2.2 Halting and deciders
Like a regular TM, an NTM can cleanly reject an input string w or it can implicitly reject it by never halting.
An NTM halts if all possible runs eventually halt. Once it halts, the NTM accepts the input if some run
ended in the accept state, and rejects the input if all runs ended in the reject state.
A NTM is a decider if it halts on all possible inputs on all branches. Formally, if you think about all
possible configurations that a TM might generate for a specific input as a tree (i.e., a branch represents a
non-deterministic choice) then an NTM is a decider if and only if this tree is finite, for all inputs.
The simulation we used above has the property that the simulation halts exactly if the NTM would have
halted. So we have also shown that

Theorem 28.2.2 NTMs decide the same languages as normal TMs.

28.2.3 Enumerators
A language can be enumerated , if there exists a TM with an output tape (in addition to its working tape),
that the TM prints out on this tape all the words in the language (assume that between two printed words
we place a special separator character). Note, that the output tape is a write only tape.

Definition 28.2.3 (Lexicographical ordering.) For two strings s1 and s2 , we have that s1 < s2 in
lexicographical ordering if |s1 | < |s2 | or |s1 | = |s2 | then s1 appears before s2 in the dictionary ordering.
(That is, lexicographical ordering is a dictionary ordering for strings of the same length, and shortest
strings appear before longer strings.)

Claim 28.2.4 A language L is TM recognizable iff it can be enumerated.

Proof: Let T the TM recognizer for L, and we need to build an enumerator for this language. Using
dovetailing, we “run” T on all the strings in Σ∗ = {w1 , w2 , . . .} (say in lexicographical ordering). Whenever
one of this executions stops and accepts on a string wi , we print this string wi to the enumerator output
string. Clearly, all the words of L would be sooner or later printed by this enumerated. As such, this language
can be enumerated.
As for the other direction, assume that we are given an enumerate Tenum for L. Given a word x ∈ Σ∗ ,
we can recognize if it is in L, by running the enumerator and reading the strings it prints out one by one. If
one of these strings is x, then we stop and accept. Otherwise, this TM would continue running. Clearly, if
x ∈ L sooner or later the enumerator would output x and our TM would stop and accept it.

Claim 28.2.5 A language L is decidable iff it can be enumerated in lexicographic order.

Proof: If L is finite the claim trivially hold, so we assume that L is infinite.

If L is decidable, then there is a decider deciderForL for it. Generates the words of Σ∗ in lexicographic
ordering, as w1 , w2 , . . .. In the ith stage, check if wi ∈ L by calling deciderForL on wi . If deciderForL
accepts wi then we print it to the output tape. Clearly, this procedure never get stuck since deciderForL
always stop. More importantly, the output is sorted in lexicographical ordering. Note, that if x ∈ L, then
there exists an i such that wi = x. Thus, in the ith stage, the program would output x. Thus, deciderForL
indeed prints all the words in L.
Similarly, assume we are given a lexicographical enumerator Tenum for L. Consider a word x ∈ Σ∗ . Clearly,
the number of words in Σ∗ that appear before x in the lexicographical ordering is finite. As such, if x is the
ith word in Σ∗ in the lexicographical ordering, then if Tenum outputs i words and none of them is x, then we
know it would never x, and as such x is not in the language. As such, we stop and reject. Similarly, if x is
output in the first i words of the output of Tenum then we stop and accept. Clearly, since Tenum enumerates
an infinite language, it continuously spits out new strings. As such, sooner or later it would output i words
of L, and at this point our procedure would stop. Namely, this procedures accepts the language L, and it
always stop; namely, it is a decider for L.

172
Chapter 29

Lecture 25: Linear Bounded

Automata and Undecidability for
CFGs
28 April 2009

“It is a damn poor mind indeed which can’t think of at least two ways to spell any word.”
– Andrew Jackson

This lecture covers Linear Bounded Automata, an interesting compromise in power between Turing
machines and the simpler automatas (DFAs, NFAs, PDAs). We will use LBAs to show two CFG grammar
problems (equality and generating all strings) are undecidable.
In some of the descriptions we uses PDAs. However, the last part of this notes show how these PDAs can
be avoided, resulting in arguably a simpler and slightly more elegant argument.

29.1 Linear bounded automatas

A linear bounded automata (LBA) is a TM whose head never moves off the portion of the tape occupied
by the initial input string.
That is, an LBA is a TM that uses only the tape space occupied by the input.
An equivalent definition of an LBA is that it uses only k times the amount of space occupied by the input
string, where k is a constant fixed for the particular machine. To simulate k tape cells with a single tape
cell, increase the size of the tape alphabet Γ. E.g. the new tape alphabet has symbols that are k-tuples of
the symbols from the old alphabet.
A lot of interesting algorithms are LBAs, because they use only space proportional to the length of the
input. (Naturally, you need to pick the variants of the algorithms that use space efficiently.) Examples
include ADFA , ACFG , EDFA , ECFG , s − t graph reachability, and many others.
When an LBA runs, a transition off the righthand edge of the input area cause the input to be rejected.
Or maybe the read head simply sticks on the rightmost input position. You can define them either way and
it will not matter for what we are doing here.

29.1.1 LBA halting is decidable

Suppose that a given LBA T (which is a TM, naturally) has
• q states,
• k characters in the tape alphabet Γ (we remind the reader that the input alphabet Σ ⊆ Γ and also the
special blank character ␣),
• and the input length is n.

173
Then T can be in at most

tape head controller

content position state
z}|{ z}|{ z}|{

α(n) = kn ∗ n ∗ q = k n nq (29.1)
configurations.
Here is the shocker: If an LBA runs more than α(n) steps, then it must be looping. As such, given an
LBA T and a word w (with n characters), we can simulate it for α steps. If it does not terminate by then,
then it must be looping, and as such it can never stop on its input. Thus, an LBA that stops on input w
must stop in at most α(|w|) steps.
This implies that n o

ALBA = hT, wi T is a LBA and T accepts w
is decidable. Similarly, the language
n o

HaltLBA = hT, wi T is a LBA and T stops on w .

is decidable.
We formally prove one of the above claims. The other one follows by a similar argumentation.

Claim 29.1.1 The language ALBA is decidable.

Proof: Indeed, our decider would receive as input hT, wi, where T is an LBA. Let n = |w|. By the
argumentation above, if T accepts w, then it does it in at most α(n) steps (see Eq. (29.1)). As such, simulate
T on the input w for α(n) steps, using the universal TM simulator UTM . If the simulation accepts, then
accept. If the simulation rejects, then reject. Now, if the simulation did not accept after simulating T for
α(n) steps, then T is looping forever on w, and so we reject.
Of course, if during the simulation T decides to move past the end of the input, it’s not an LBA, and as
such we reject the input.1

29.1.2 LBAs with empty language are undecidable

In light of the above claim, one might presume that all “natural” languages on LBAs are decidable, but
surprisingly, this is not the situation. Indeed, consider the language of all empty LBAs; that is,
n o

ELBA = hTi T is a LBA and L(T) = ∅ .

The language ELBA is actually undecidable.

A proof via verifying accepting traces

We remind the reader that configuration x of a TM T yields the configuration y of tm, if running T on x for
one step results in the configuration y of T. We denote this fact by x 7→ y.

The idea. Assume we are given a general TM T (which we emphasize is not an LBA) and a word w. We
would like to decide if T accepts w (which is of course undecidable).
If T does accept w, we can demonstrate that it does by providing a trace of the execution of T on w. This
trace (defined formally below) is just a string. We can easily build a TM that verifies that a supposed trace
is legal (e.g. uses the correct transitions for T), and indeed shows that T accepts w.
Crucially, this trace verification can be done by a TM VrfT,w that just uses the space provided by the
string itself. That is, the verifier VrfT,w is an LBA. The language of VrfT,w is empty if T does not accept w
1 Forthe very careful reader, Sipser handles this case slightly differently. His encoding of T would specify that the machine
is supposed to be an LBA. Attempts to move off the input region would cause the read head to stay put.

174
(because then there is no accepting trace). If T does accept w then the language of VrfT,w contains a single
word: the trace showing that T accepts w. So, if we have a decider than decides if hVrfT,w i ∈ ELBA , then we
can decide if T accepts w.
Observe that we assumed nothing about T or w. The only required property is that VrfT,w is a LBA.

The verifier. A computation history (i.e., trace) for a TM T on input w is a string

#C1 #C2 # . . . #Ck ##,

where C1 , . . . , Ck are configurations of T, such that

(i) C1 = q0 w is the initial configuration of T when executed on w, and

(ii) Ci yields Ci+1 (according to the transitions of T), for i = 1, . . . , k − 1.

The pair of sharp signs marks the end of the trace, so the algorithm knows when the trace ends.
Such a trace is an accepting trace if the configuration Ck is an accepting configuration (i.e., the accept
state qacc of T is the state of T encoded in Ck ).2

Initial checks. So, we are given hTi and w, and we want to build a verifier VrfT,w that checks, given a
trace t as input, that this trace is indeed an accepting trace for T on w. As a first step, VrfT,w will verify that
C1 (the first configuration written in t) is indeed q0 w. Next it needs to verify that Ck (the last configuration
in t) is an accepting configuration which is also easy (i.e., just verify that qacc is the state written in it).
Finally, the verifier needs to make sure that the ith configuration implies the (i + 1)th configuration in the
trace t, for all i.

Verifying two consecutive configurations. So, consider the ith configuration in t, that is

Ci = αaqbβ,

where α and β are two strings. Naturally, Ci+1 is the next configuration in the input trace t. Since VrfT,w
has the code of T inside it (as a built-in constant), it knows what δT (q, b), the transition function of T, is.
Say it knows that δT (q, b) = (q 0 , c, R). If our input is a valid trace, then Ci+1 is supposed to be

Ci+1 = αacq 0 β.

To verify that Ci and Ci+1 do match up in this way, the TM VrfT,w goes back and forth on the tape
erasing the parts of Ci and Ci+1 that must be identical. We can not erase these symbols: we will need to
keep Ci+1 around so we can check it against Ci+2 . So instead we translate each letter a into a special version
of this character b
a.3
After we have marked all the identical characters, we’ve verified this pair of configurations except for the
middle two to three letters (depending on whether this was a left or right move). So the tape in this stage
looks like
Ci+1
Ci
z }| { z }| {
bacq 0 βb # . . .
. . . # αaqbβ # α
We have verified that the prefix of Ci (i.e., α) is equal to αb, and the suffix of Ci (i.e., β) is equal to the suffix
b So only thing that remains to be verified is the middle part, which can be easily done since
of Ci+1 (i.e., β).
we know T’s transition function.
After that, the verifier removes the hats from the characters in Ci+1 and moves right to match Ci+1
against Ci+2 . If it gets to the end of the trace and all these checks were successful, the verifier VrfT,w accepts
the input trace t.
2 It should also be the case that no previous configuration in this trace is either accepting or rejecting. This is implied by

the fact that TM’s don’t have transitions out of the accept and reject states.
3 We have omitted some details about how to handle moves near the right and left ends of the non-blank tape area. There

details are tedious but easy to fill in, and the reader should verify that they know how to fill in the missing details.

175
Lemma 29.1.2 Given a (general) TM T and a string w, one can build a verifier VrfT,w , such that given an
accepting trace t, the verifier accepts t, and no other string. Note, that VrfT,w is a decider that always stops.
Moreover, VrfT,w is a LBA.

Proof: The details of VrfT,w are described above.

It is easy to see that VrfT,w never goes on the portion of the tape which are not parts of its original input
t. As such, VrfT,w is a LBA.

Theorem 29.1.3 The language ELBA is undecidable.

Proof: Proof by reduction from ATM . Assume for the sake of contradiction that ELBA is decidable, and
let ELBA −Decider be the TM that decides it. We will build a decider decider5 -ATM for ATM .

decider5 -ATM hT, wi
Check that hTi is syntactically correct TM code
Compute hVrfT,w i from hT, wi.

res ← ELBA −Decider hVrfT,w i .
if res == accept then
reject
else
accept

Since we can compute hVrfT,w i from hT, wi, it follows that this algorithm is a decider. Furthermore,
given hT, wi such that T accepts w, then
there exists
an accepting trace t for T accepting w, and as such,
L(VrfT,w ) 6= ∅. As such ELBA −Decider hVrfT,w i rejects its input, which imply that decider5 -ATM accepts
hT, wi.
Similarly, if T does not accept w, then L(VrfT,w ) = ∅. As such, ELBA −Decider hVrfT,w i accepts its
input, which imply that decider5 -ATM rejects hT, wi.
Thus decider5 -ATM is indeed a decider for ATM , but this is impossible, and we thus conclude that our
assumption, that ELBA is decidable, was false, implying the claim.

A direct proof that ELBA is undecidable

We provide a direct proof of Theorem 29.1.3 because it is shorter and simpler. The benefit of the previous
proof is that it introduces the idea of verifying accepting traces, which we would revisit shortly.

Alternative direct proof of Theorem 29.1.3: We are given hT, wi, were T is a TM and w is an input for it.
We will assume that the tape alphabet of T is Γ, its input alphabet is Σ, and assume that z and $ are not
in Γ. We build a new machine Zw from T and w that gets as input a word of the form zk $. The machine
Zw first writes w on the input tape, move the head to te beginning of the tape, and then just runs T on the
input, with the modification that the new machine treats z as a space. However, if the new machine ever
reaches the T $ character on the input (in any state), it immediately stops and rejects.
Clearly, Zw is an LBA (by definition). Furthermore, if T accepts w after k steps, then Zw would accept
the word wzk+1 $. Similarly, if wzj $ is accepted by Zw then T would accept w. We thus conclude that L(Zw )
is not empty if and only if w ∈ L(T).
Going back to the proof, given hTi and w the construction of hZw i is easy. As such, assume for the sake
of contradiction, that ELBA is decidable, and we are given a decider for membership of ELBA , we can feed it
hZw i, and if this decider accepts (i.e., L(Zw ) = ∅, then we know that T does not accept w. Similarly, if Zw is
being rejected by the decider, then L(Zw ) 6= ∅, which implies that T accepts w. Namely, we just constructed
a decider for ATM , which is undecidable. A contradiction.

176
29.2 On undecidable problems for context free grammars
We would like to prove that some languages involved with context-free grammars are undecidable. To this
end, to reduce ATM to a question involving CFGs, we somehow need to map properties of TMs to CFGs.

29.2.1 TM consecutive configuration pairs is a CFG

Lemma 29.2.1 Given a TM T, the language
n o

LT:x7→y = x#y R x, y are valid configurations of T and x yields y

is a CFG.
Proof: Let Γ be the tape alphabet of T, and Q be the set of states of T. Let δ be the transition function of
T. We have the following rewriting rules depending on δ:

∀α, β ∈ Γ∗ ∀b, c, d ∈ Γ ∀q ∈ Q
R
if δ(q, c) = (q 0 , d, R) then αqcβ 7→ αdq 0 β ≡ αqcβ 7→ β R q 0 dαR
R
if δ(q, c) = (q 0 , d, L) then αbqcβ 7→ αq 0 bdβ. ≡ αbqcβ 7→ β R dbq 0 αR .

Intuitively, x 7→ y is equivalent to saying that the string x can be very locally edited and generate y. In
the above, we need to copy the α and β portions, and then do the rewriting which only involves at most 3
letters. As such, the grammar

=⇒ S1 → C
C → xCx ∀x ∈ Γ
C → T
T → qcZq 0 d ∀b, c, d ∈ Γ ∀q ∈ Q such that δ(q, c) = (q 0 , d, R)
0
T → bqcZbdq ∀b, c, d ∈ Γ ∀q ∈ Q such that δ(q, c) = (q 0 , d, L)
Z → xZx ∀x ∈ Γ
C → #.

generates LT:x7→y as can now be easily verified.

Lemma 29.2.2 Given a TM T and an input string w, the language

C1 #C2 #C3 #C4 # . . . #Ck
R R
LT,w,trace = C1 #C2 #C3 #C4 # . . . Ck
is an accepting trace of T on w

can be written as the intersection of two context free languages.

Proof: Let L1 be the regular language q0 w#Γ∗# – these are all traces that start with the initial state of T on
w, where q0 is the initial state of T, and Γ# = Γ ∪ {#}.
R
Let L2 be the language of all traces, such that the configuration C2i written in the even position 2i is
implied by the configuration C2i−1 written in position 2i − 1, for all i ≥ 1. Clearly, this is a context free
grammar, by just extending the grammar of Lemma 29.2.1.
Using similar argument, L3 be the language of all traces, such that the configuration C2i+1 written in
R
the odd position 2i + 1 are implied by the configuration C2i written in position 2i. Clearly, this is a context
free grammar, by just modifying and extending the grammar of Lemma 29.2.1.
Finally, let L4 be the regular language of all traces, such that the last trace written on them is accepting.
That is Γ∗# #Γ∗ qacc Γ∗ , where qacc is the accepting state of T.
Now, L1 and L4 are regular, and L2 and L3 are context free. Since context free language are closed under
intersection with regular languages, it follows that the language L0 = L1 ∩ L2 ∩ L4 is CFL. Now, the required
language is clearly L1 ∩ L2 ∩ L3 ∩ L4 = L0 ∩ L3 , which is the intersection of two context free languages.

177
n o

Theorem 29.2.3 The language hG, G 0 i L(G) ∩ L(G 0 ) 6= ∅ is undecidable. Namely, given two context free
grammars, there is no decider that can decide if there is a word that they both generates.
Proof: If this was decidable, then given hT, wi, we can decide if the language LT,w,trace of Lemma 29.2.2 is
empty or not, since it is the intersection of two context free grammars that can be computed from hT, wi. But
if this language is not empty, then T accepts w. Namely, we got a decider for ATM , which is a contradiction.

29.2.2 The language of a context-free grammar generates all strings is unde-

cidable
Consider the language n o

ALLCFG = hGi G is a CFG, and L(G) = Σ∗ .
This language seems like it should be decidable, since ECFG was decidable. But it is not. It is a fairly
delicate matter whether questions about CFGs are decidable or not. The proof technique is similar to what
we used for ELBA .

The idea
The idea is that given T and w to build a verifier to an accepting traces for T and w. Here the verifier is
going to be a CFG. The problem is, if you think about it, is that there is no way that a CFG can verify a
trace, as the checks needed to be performed are too complicated to be performed by a CFG.
Luckily, we can generate a CFG VrfGT,w that would accept all the traces that are not accepting traces
for T on w. Indeed, we will build several CFGs, each one “checking” one condition, and their union would
be the required grammar. As such, L(VrfGT,w ) is the set of all strings, if and only if, T does not have an
accepting trace for w.
The alphabet of our grammar is going to be

Σ = Γ ∪ Q ∪ {#} ,

where Γ is the tape alphabet of T, Q is the set of states of T, and # is the special separator character.
(Or, almost. There is a small issue that needs to be fixed, but we will get to that in a second.)

The details of the PDA trace checker

It is easier to understand checking the trace if we build a PDA. We can then transform our PDA into an
equivalent grammar.
Checking that a trace t = #C1 #C2 # . . . #Ck ## is valid, requires checking the following:
(i) t looks syntactically like a trace
(ii) Initial check: C1 = q0 w.
(iii) Middle check: Ci implies Ci+1 , for all i.
(iv) Final check: Ck contains qacc .
It is not hard for a PDA to check that syntax and the first and last configurations are OK.
However, we can not check that the middle configurations match, because a PDA can only compare strings
that are in reverse order. Furthermore, it is not clear how a PDA can perform this check for more than one
pair Ci #Ci+1 . So, we need to modify the format of our traces, so that every odd-numbered configuration is
written backwards. Thus, the trace would be given as

t = #C1 #C2R #C3 # . . . #Ck−1

R
#Ck #,

or, if there are an even number of configurations in the trace, the trace would be written as

t = #C1 #C2R #C3 # . . . #Ck−1 #CkR #.

178
Our basic plan is still valid. Indeed, there will be an accepting trace in this modified format if and only
if T accepts w.

Verifying two consecutive configurations. Let us build a pushdown automata that reads two config-
urations X#Y R # and decides if the configuration X does not imply Y . To make things easier, let us first
build a PDA that checks that the configuration X does imply the configuration Y .
The PDA P would scan X and push it as it to the stack. As it reads X and read the state written
in X, it can push on the stack how the output configuration should look like (there is a small issue with
having to write the state on the stack. This can be easily be done by some a few pushes and pops, but this
is tedious but manageable). Thus, by the time we are done reading X (when P encounters #), the stack
already contains the implied (reversed) configuration of X, let use denote it by Z R . Now, P just read the
input (Y R ) and matches it to the stack content. It accepts if and only if the configuration X implies the
configuration Y .
Interestingly, the PDA P is deterministic, and as such, we can complement it (this is not true of a general
PDA because of the nondeterminism). Alternatively, just observe that P has a reject state that is arrived
to after the comparison fails. In the complement PDA, we just make this “hell” state into an accept state.
Thus, we have a PDA P that accepts X#Y R # iff the configuration X does not imply the configuration Y .
Now, its easy to modify this PDA so that it accepts the language
n o

L1 = Σ∗ #X#Y R #Σ∗ Configuration X does not imply configuration Y ,

which clearly contains only invalid traces. Similarly, we can build a PDA that accepts the language
n o

L2 = Σ∗ #X R #Y #Σ∗ Configuration X does not imply configuration Y ,

Putting these two PDAs together, yield a PDA that accepts all strings containing two consecutive configura-
tions, such that the first one does not imply the second one.
Now, since we a PDA for this language, we clearly can build a CFG GM that accepts all such strings.

Strings with invalid initial configurations. Consider all traces having invalid initial configurations.
Clearly, they are generated by strings of the form
∗
(Σ \ {#, q0 }) #Σ∗ .

Clearly, one can generate a grammar GI that accepts these strings.

Strings with invalid final configurations. Consider all traces having invalid initial configurations.
Clearly, they are generated by strings of the form
∗
Σ∗ #(Σ \ {#, qacc }) .

Clearly, one can generate a grammar GF that accepts these strings.

Putting things together. Clearly, all invalid (i.e., non-accepting) traces of T on w are generated by the
grammars GI , GM , GF . Thus, consider the context free grammar GM,w formed by the union of GI , GM , GF .
When T does not accept w, there is no accepting trace for T on w, so L(GM,w ) (the strings that are not
accepting traces) is Σ∗ . When T accepts w, there is an accepting trace for T on w, so L(GM,w ) (the strings
that are not accepting traces) is not equal to Σ∗ .

The reduction proof

Theorem 29.2.4 The language
n o

ALLCFG = hGi G is a CFG, and L(G) = Σ∗

is undecidable.

179
Proof: Let us assume, for the sake of contradiction, that the language ALLCFG is decidable, and let
deciderAllCFG be its decider. We will now reduce ATM to it, by building a decider for it as follows.

decider6 -ATM hM, wi
Check that hM i is syntactically correct TM code
Compute hGM,w i from hM, wi, as described above.

res ← deciderAllCFG hGM,w i .
if res == accept then
reject
else
accept

Clearly, this is a decider, and indeed if T accepts w, then there exists an accepting trace t showing it.
As such, L(GM,w ) = Σ∗ \ {t} = 6 Σ∗ . Thus, deciderAllCFG rejects hGM,w i, and thus decider6 -ATM accepts
hM, wi.
Similarly, if T rejects w then L(GM,w ) = Σ∗ , and as such deciderAllCFG accepts hGM,w i. Implying that
decider6 -ATM rejects hM, wi.
Thus, decider6 -ATM is a decider for ATM , which is impossible. We conclude that our assumption, that
ALLCFG is decidable, is false, implying the claim.
Now, suppose that ALLCFG is decided by R. We construct a decider for ATM as follows:

29.2.3 CFG equivalence is undecidable

From the undecidability of ALLCFG , we can quickly deduce that
n o

EQCFG = hG, Hi G and H are CFGs and L(G) = L(H)

is undecidable. This proof is almost identical to the reduction of ETM to EQTM that we saw in lecture 21.

Theorem 29.2.5 The language EQCFG is undecidable.

Proof: Proof by contradiction. Suppose that EQCFG is decidable and let deciderEqCFG be a TM that
decides it.
Given an alphabet Σ, it is not hard to construct a grammar FΣ that generates all strings in Σ∗ . E.g. if
Σ = {c1 , c2 , . . . ck }, then we could use rules:

S → XS |

X → c1 | c2 | . . . | ck

As such, here is a TM deciderAllCFG that decides ALLCFG .

deciderAllCFG hGi
Σ ← alphabet used by hGi
Compute FΣ from Σ.
res ← deciderEqCFG (hG, FΣ i)
return res

It is easy to verify that deciderAllCFG is indeed a decider. However, we have already shown that ALLCFG
is undecidable. So this decider deciderAllCFG can not exist. As such, our assumption that EQCFG is
decidable is false. As such, EQCFG is undecidable decidable.

180
29.3 Avoiding PDAs
The proofs we used above are simpler when one uses PDAs. The same argumentation can be done without
PDAs by slightly changing the rules. The basic idea is to interleave two configurations together. This is best
imagined by thinking about each character as being a tile of two characters. Thus, the following
b d x a b q c e b d x
b d x a q0 b d e b d x
describes the two configurations x = bdxabqcebdx and y = bdxaq 0 bdebdx. If we are given a TM T with tape
alphabet Γ and set of states Q, then the alphabet of the tiles is

b= x
Σ x, y ∈ Γ ∪ Q .
y
Note, that in the above example x yields y, which implies that except for a region of three columns the two
strings are identical, see
b d x a b q c e b d x
.
b d x a q0 b d e b d x
Thus, a single step of a TM is no more than a local rewrite of the configuration string.
Given two configurations x, y of T, we will refer to the string resulting from writing them together
b as described above as pairing , denoted by x . Note, that if one of the configurations
interleaved over Σ
y
is shorter than the other, we will pad the other configuration by introducing blanks characters (i.e., ␣) so
that they are of the same length.

x
Lemma 29.3.1 Given a TM T, one can construct an NFA D, such that D accepts a pairing if and only
y
if x and y are two valid configurations of T, and x 7→ y.

x
Proof: First making sure x and y are valid configurations when reading the string s = is easy using
y
a DFA (you verify that x contains only a single state in it, and the rest of the characters of x are from the
tape alphabet of x, one also has to do the same check for y). Let refer to the DFAs verifying the x and y
parts of s as Dx and Dy , respectively. Note that Dx (resp. Dy ) reads the string s but ignores the bottom (resp.
top) part of each character of s.
As such, we just need to verify that x yields y. To this end, observe that x yields y if and only if they
are identical except for three positions where the transitions happens. We build a NFA that verify that the
top and bottom parts are equal, till it guess that it needs to rewrite this 3 tile region. It then guesses what
is the tile that needs to be written (note, that the transition function of T specify all valid such tiles), it
verifies that indeed thats what in the next three characters of the input, and then it compares the rest of
the input. Let this NFA be D= .
Now, we construct a DFA that accepts the language of L(Dx ) ∩ L(Dy ) ∩ L(D= ). Clearly, this DFA accepts
the required language.

xR
Similarly, it is easy to build a DFA that verifies that the pairing is valid and x yields y (according
yR
to T). Now, consider an extended execution trace

C1 $ CR2 $ C3 $ $ CRk−2 $ Ck−1

... .
C2 $ CR3 $ C4 $ $ CRk−1 $ Ck

We would like to verify that this encodes a valid accepting trace for T on the input string w. This would
require verifying that following conditions are met.

181
(i) The trace has the right format of pairings separated with dollar tiles. Can be easily be done by a DFA.
Let L1 be the language that this DFA accepts.
(ii) Check that C1 = q0 w - can be done with a DFA.
Let L2 be the language that this DFA accepts.
(iii) The last configuration Ck is an accepting configuration. Easily can be done by a DFA.
Let L3 be the language that this DFA accepts.

C2i−1 CR2i
(iv) The pairs and are valid pairing such that C2i 7→ C2i+1 and C2i−1 7→ C2i , for all i
C2i CR2i+1
(again, according to TM. This can be done by a DFA, by Lemma 29.3.1.
Let L4 be the language that this DFA accepts.
(v) Finally, we need to verify that the configurations are copied correctly from the bottom of one tile to
the top of the next tile.
Let L5 be the language of all string that their copying is valid.
Clearly, the set of all valid traces of T on w is the set L = L1 ∩ L2 ∩ L3 ∩ L4 ∩ L5 .
We are interested in building a CFG that recognized the complement language L, which is the language

L = L1 ∪ L2 ∪ L3 ∪ L4 ∪ L5 .

Now, since L1 , . . . , L4 it is easy to build a CFG that accepts L1 , L2 , L3 and L4 , respectively.

The only problematic language is L5 which is just all strings where there is a consecutive pair of config-
urations such that the copying failed. That is

$ xR $ y0
... ...
$ yR $ z
R
where y R 6= y 0 . But if we ignore the rest of the string and top and bottom portions of these two
pairings, this is just recognized the language “not palindrome”, which we know is CFG. Indeed, the grammar
of not-palindrome over an alphabet Γ is

=⇒ S2 → xS2 x ∀x ∈ Γ
S2 → xCy ∀x, y ∈ Γ and x 6= y
C → Cx | xC ∀x ∈ Γ
C → $.
b as follows
We now extend this grammar for the extended alphabet Σ

u x
=⇒ S3 → S ∀u, v, x ∈ ΓT
x 3 v
u y
S2 → C ∀x, y, u, v ∈ ΓT and x 6= y
x v
x x
C → C | C ∀x, y ∈ ΓT
y y
$
C → ,
$

where ΓT is the tape alphabet of T. Thus, the context-free language

b∗ $ $ b∗
Σ L(S3 ) Σ
$ $

182
is exactly L5 . We conclude that L is a context-free language (being the union of 5 context-free/regular
b ∗ if and only if T rejects w. We conclude the following.
languages). Furthermore, L = Σ

Theorem 29.3.2 (Restatement of Theorem 29.2.4.) The language

n o

ALLCFG = hGi G is a CFG, and L(G) = Σ∗

is undecidable.

183
Chapter 30

Lecture 26: NP Completeness I

30 April 2008

"Then you must begin a reading program immediately so that you man understand the crises of our age," Ignatius
said solemnly. "Begin with the late Romans, including Boethius, of course. Then you should dip rather extensively
into early Medieval. You may skip the Renaissance and the Enlightenment. That is mostly dangerous propaganda.
Now, that I think about of it, you had better skip the Romantics and the Victorians, too. For the contemporary
period, you should study some selected comic books."
"You’re fantastic."
"I recommend Batman especially, for he tends to transcend the abysmal society in which he’s found himself. His
morality is rather rigid, also. I rather respect Batman."
– A confederacy of Dunces, John Kennedy Toole

30.1 Introduction
The question governing this course, would be the development of efficient algorithms. Hopefully, what is an
algorithm is a well understood concept. But what is an efficient algorithm? A natural answer (but not the
only one!) is an algorithm that runs quickly.
What do we mean by quickly? Well, we would like our algorithm to:

1. Scale with input size. That is, it should be able to handle large and hopefully huge inputs.

2. Low level implementation details should not matter, since they correspond to small improvements in
performance. Since faster CPUs keep appearing it follows that such improvements would (usually) be
taken care of by hardware.

3. What we will really care about are asymptotic running time. Explicitly, polynomial time.

In our discussion, we will consider the input size to be n, and we would like to bound the overall running
time by a function of n which is asymptotically as small as possible. An algorithm with better asymptotic
running time would be considered to be better.

Example 30.1.1 It is illuminating to consider a concrete example. So assume we have an algorithm for
a problem that needs to perform c2n operations to handle an input of size n, where c is a small constant
(say 10). Let assume that we have a CPU that can do 109 operations a second. (A somewhat conservative
assumption, as currently [Jan 2006]1 , the blue-gene supercomputer can do about 3 · 1014 floating-point
operations a second. Since this super computer has about 131, 072 CPUs, it is not something you would
have on your desktop any time soon.) Since 210 ≈ 103 , you have that our (cheap) computer can solve in
(roughly) 10 seconds a problem of size n = 27.
1 But the recently announced Super Computer that would be completed in 2011 in Urbana, is naturally way faster. It

supposedly would do 1015 operations a second (i.e., petaflop). Blue-gene probably can not sustain its theoretical speed stated
above, which is only slightly slower.

184
But what if we increase the problem size to n = 54? This would take our computer about 3 million years
to solve. (In fact, it is better to just wait for faster computers to show up, and then try to solve the problem.
Although there are good reasons to believe that the exponential growth in computer performance we saw in
the last 40 years is about to end. Thus, unless a substantial breakthrough in computing happens, it might
be that solving problems of size, say, n = 100 for this problem would forever be outside our reach.)
The situation dramatically change if we consider an algorithm with running time 10n2 . Then, in one
second our computer can handle input of size n = 104 . Problem of size n = 108 can be solved in 10n2 /109 =
1017−9 = 108 which is about 3 years of computing (but blue-gene might be able to solve it in less than 20
minutes!).
Thus, algorithms that have asymptotically a polynomial running time (i.e., the algorithms running time
is bounded by O(nc ) where c is a constant) are able to solve large instances of the input and can solve the
problem even if the problem size increases dramatically.

Can we solve all problems in polynomial time? The answer to this question is unfortunately no.
There are several synthetic examples of this, but in fact it is believed that a large class of important problems
can not be solved in polynomial time.

Problem: Satisfiability
Instance: A boolean formula F with m variables
Question: Is there an assignment of values to variables, such that F evaluates to true?

This problem is usually referred to as SAT.

Currently, all solutions known to SAT require checking all possibilities, requiring (roughly) 2m time.
Which is exponential time and too slow to be useful in solving large instances of the problem.
This leads us to the most important open question in theoretical computer science:

Question 30.1.2 Can one solve SAT in polynomial time?

The common belief is that SAT can NOT be solved in polynomial time in the size of the formula.
SAT has two interesting properties.

1. Given a supposed positive solution, with a detailed assignment (i.e., proof): x1 ← 0, x2 ← 1, ..., xm ← 1
one can verify in polynomial time if this assignment really satisfies F . This is done by computing F
on the given input.
Intuitively, this is the difference in hardness between coming up with a proof (hard), and checking that
a proof is correct (easy).

2. It is a decision problem. For a specific input an algorithm that solves this problem has to output
either TRUE or FALSE.

A teaser. Can one find a satisfying assignment for the following circuit in polynomial time?

30.2 Complexity classes

Definition 30.2.1 (P: Polynomial time) Let P denote is the class of all decision problems that can be
solved in polynomial time in the size of the input.

185
Definition 30.2.2 (NP: Nondeterministic Polynomial time) Let NP be the class of all decision prob-
lems that can be verified in polynomial time. Namely, for an input of size n, if the solution to the given
instance is true, one (i.e., an oracle) can provide you with a proof (of polynomial length!) that the answer
is indeed TRUE for this instance. Furthermore, you can verify this proof in polynomial time in the length of
the proof.

Clearly, if a decision problem can be solved in polynomial

time, then it can be verified in polynomial time. Thus, P ⊆ NP.

Remark 30.2.3 The notation NP stands for Non-deterministic

Polynomial. The name come from a formal definition of this class
using Turing machines where the machines first guesses (i.e., the
non-deterministic stage) the proof that the instance is TRUE, and
The relation between the different
then the algorithm verifies the proof.
complexity classes P, NP, co−N P .

Definition 30.2.4 (co-NP) The class co-NP is the opposite of NP – if the answer is FALSE, then there
exists a short proof for this negative answer, and this proof can be verified in polynomial time.

See Figure 30.2 for the currently believed relationship between these classes (of course, as mentioned
above, P ⊆ NP and P ⊆ co-NP is easy to verify). Note, that it is quite possible that P = NP = co-NP,
although this would be extremely surprising.

Definition 30.2.5 A problem Π is NP-Hard, if being able to solve Π in polynomial time implies that
P = NP.

Question 30.2.6 Are there any problems which are NP-Hard?

Intuitively, being NP-Hard implies that a problem is ridiculously hard. Conceptually, it would imply
that proving and verifying are equally hard - which nobody that did 473g believes is true.
In particular, a problem which is NP-Hard is at least as hard as ALL the problems in NP, as such it is
safe to assume, based on overwhelming evidence that it can not be solved in polynomial time.

Theorem 30.2.7 (Cook-Levin Theorem) SAT is NP-Hard.

Definition 30.2.8 A problem Π is NP-Complete (NPC in short) if it is both NP-Hard and in NP.

Problem: Circuit Satisfiability

Instance: A circuit C with m inputs
Question: Is there an input for C such that C returns true for it.

Clearly, Circuit Satisfiability is NP-Complete, since we can verify a positive solution in polynomial time
in the size of the circuit,
By now, thousands of problems have been shown to be NP-Complete. It is extremely unlikely that any
of them can be solved in polynomial time.

186
Input: boolean formula F
⇓ n = size of F
transform F into a boolean circuit C
⇓
Find SAT assign’ for C using CSAT solver
⇓
Return TRUE if C is satisfied, otherwise false.

Figure 30.1: An algorithm for solving SAT using an algorithm that solves the CSAT problem

Definition 30.2.9 In the formula satisfiability problem, NP-Hard

(a.k.a. SAT) we are given a formula, for example:

a ∨ b ∨ c ∨ d ⇔ (b ∧ c) ∨(a ⇒ d) ∨ (c 6= a ∧ b) co-NP
and the question is whether we can find an assignment to NP
the variables a, b, c, . . . such that the formula evaluates to
TRUE.
P

It seems that SAT and Circuit Satisfiability are “similar” NP-Complete

and as such both should be NP-Hard. The relation between the complexity
classes.

30.2.1 Reductions
Let A and B be two decision problems.
Given an input I for problem A, a reduction is a transformation of the input I into a new input I 0 , such
that
A(I) is TRUE ⇔ B(I 0 ) is TRUE.
Thus, one can solve A by first transforming and input I into an input I 0 of B, and solving B(I 0 ).
This idea of using reductions is omnipresent, and used almost in any program you write.
Let T : I → I 0 be the input transformation that maps A into B. How fast is T ? Well, for our nafarious
purposes we need polynomial reductions; that is, reductions that take polynomial time.
Problem: Circuit Satisfiability
Instance: A circuit C with m inputs
Question: Is there an input for C such that C returns true for it.

For example, given an instance of SAT, we would like to generate an equivalent circuit C. We will
explicitly write down what the circuit computes in a formula form. To see how to do this, consider the
following example.
The resulting reduction is depicted in Figure 30.1.
Namely, given a solver for CSAT that runs in TCSAT (n), we can solve the SAT problem in time

TSAT (n) ≤ O(n) + TCSAT (O(n)),

where n is the size of the boolean formula. Namely, if we have polynomial time algorithm that solves CSAT
then we can solve SAT in polynomial time.
Another way of looking at it, is that we believe that solving SAT requires exponential time; namely,
TSAT (n) ≥ 2n . Which implies by the above reduction that

2n ≤ TSAT (n) ≤ O(n) + TCSAT (O(n)).

187
Namely, TCSAT (n) ≥ 2n/c − O(n), where c is some positive constant. Namely, if we believe that we need
exponential time to solve CSAT then we need exponential time to solve SAT.
This implies that if CSAT ∈ P then SAT ∈ P.
We just proved that CSAT is as hard as SAT. Clearly, CSAT ∈ NP which implies the following theorem.

Theorem 30.2.10 CSAT (formula satisfiability) is NP-Complete.

30.3 Other problems that are known to be NP-Complete

We next list (without proof) a bunch of well known NP-Complete problems. Namely, for each one of these
problems if you solve them in polynomial time then we can solve all the problems in NP in polynomial time.
Problem: Clique
Instance: A graph G, integer k
Question: Is there a clique in G of size k?
Problem: Independent Set
Instance: A graph G, integer k
Question: Is there an independent set in G of size k?
Problem: Vertex Cover
Instance: A graph G, integer k
Question: Is there a vertex cover in G of size k?
Problem: 3Colorable
Instance: A graph G.
Question: Is there a coloring of G using three colors?
Problem: Hamiltonian Cycle
Instance: A graph G.
Question: Is there a Hamiltonian cycle in G?
Problem: TSP
Instance: G = (V, E) a complete graph - n vertices, c(e): Integer cost function over
the edges of G, and k an integer.
Question: Is there a traveling-salesman tour with cost at most k?
Problem: Subset Sum
Instance: S - set of positive integers,t: - an integer
P number (Target)
Question: Is there a subset X ⊆ S such that x∈X x = t?
Problem: Vec Subset Sum
Instance: S - set of n vectors of dimension k, each vector has non-negative numbers
→
−
for its coordinates, and a target vector t . P
→
− →
−
Question: Is there a subset X ⊆ S such that → −x ∈X x = t ?
Problem: 3DM
Instance: X, Y, Z sets of n elements, and T a set of triples, such that (a, b, c) ∈ T ⊆
X × Y × Z.
Question: Is there a subset S ⊆ T of n disjoint triples, s.t. every element of X ∪ Y ∪ Z
is covered exactly once.?
Problem: Partition
Instance: A set S of n numbers. P P
Question: Is there a subset T ⊆ S s.t. t∈T t = s∈S\T s.?

30.4 Proof of Cook-Levin theorem

FILL IN

188
Chapter 31

Lecture 27: Post’s Correspondence

Problem and Tilings
?

This lecture covers Post’s Correspondence Problem (section 5.2 in Sipser). Undecidability of this problem
implies the undecidability of CFG ambiguity. We will also see how to simulate a TM with 2D tiling patterns
and, as a consequence, show how undecidability implies the existence of aperiodic tilings.

31.1 Post’s Correspondence Problem

Suppose that we have a set of domino tiles. Each domino piece has a string at the top and a string at the
abb
bottom., for example . So a set S of dominos might look like:
bc
 

 

b a ca abc
S= , , , .

 ca ab a c 

A match for S is an ordered list of, one or more, dominos from S, such that if you read the symbols on
the tops, this makes the same string as reading the symbols on the bottom. You can use the same domino
more than once in a match, and you do not have to use all the elements of S. For example, here is a match
for our example set
a b ca a abc
.
ab ca a ab c
The tops and bottoms of the dominos both form the string abcaaabc.
Not all sets have a match. For example, T does not have a match because the tops are all longer than
the bottoms.  

 

abc ba bb
T = , , .

 c a b 

The set R does not have a match because there are no d’s or f’s in the top strings.
 

 

ab cb aa
R= , , .

 df bd fa 


It seems like it should be fairly easy to figure out whether a set of dominos has a match, but this problem
is actually undecidable.

189
Post’s Correspondence Problem (PCP) is the problem of deciding whether a set of dominos has a match
or not.
The modified Post’s Correspondence Problem (MPCP) is just like PCP except that we specify both
the set of tiles and also a special tile. Matches for MPCP have to start with the special tile.
We will show that PCP undecidable in two steps. First, we will reduce ATM to MPCP. Then we will
reduce MPCP to PCP.

31.1.1 Reduction of ATM to MPCP

An informal description of the set of tiles generated and why it works
Given a string hM, wi, we will generate a set of tiles T such that T has a match, if and only if, M accepts w.
In the set T there would a special initial tile that must be used first. The string the match would generate
would be an accepting trace of M on w. That is, the top and bottom strings of the tiles of T forming the
match will look like
#C1 #C2 # . . . #Cn ##,
where the Ci are configurations of the TM M . Here C1 should be the start configuration, and Cn should be
an accept configuration. (We are slightly oversimplifying what we are going to do, but do not worry about
this quite yet.)
Here, Ci implies the configuration Ci+1 . As such, Ci and Ci+1 are almost identical, except for maybe a
chunk of 3 or 4 letters. We will set the initial tile to be

#
,
#q0 w#

where q0 w is the initial configuration of M when executed on w. At this point, the bottom string is way long
than the shorter string. As such, to get a match, the tiling would have to copy the content of the bottom row
to the top row. We will set up the tiles so that the copying would result in copying also the configuration
C0 to the bottom row, while performing the computation of M on C0 . As such, in the end of this process,
the tiling would look like
#C0 #
.
#C0 #C1 #
The trick is that again the bottom part is longer, and again to get a match, the only possibility is to copy
C1 to the top part, and in the process writing out C2 on the bottom part, resulting in
#C0 #C1 #
.
#C0 #C1 #C2 #
Now, how are we going to do this copying/computation? The idea is to introduce for every character
x ∈ ΣM ∪ {#}, a copying tile
x
.
x
Here ΣM denotes the alphabet set used by M . Similarly, δM denotes the transition function of M .
Next, assume we have the transition δM (q, a) = (q 0 , b, R), then we will introduce the computation tile

qa
.
bq0

Similarly, for the transition δM (q2 , c) = (b

q , d, L), we will introduce the following computation tiles

yq2 c
∀y ∈ ΣM ..
b
qyd

Here is what is going on in the ith stage: The bottom row as an additional configuration Ci written in
it. To get a match, the tiling has to copy the configuration to the top row. But the copying tiles, only copy

190
regular characters, and it can not copy states. Thus, when the copying process reaches the state character in
Ci , it must use the right computation tile to copy this substring to the top row. Then it continues copying
the rest of the configuration to the top. Naturally, as the copying goes on from the bottom row to the top
row, new characters are added to the bottom row. The critical observation is that the computation tiles
guarantee, that the added string to the bottom row is exactly Ci+1 , since we copied the characters verbatim
in the areas of Ci unrelated to the computation of M on Ci , and the computation tile copied exactly the
right string for the small region where the computation changes.
Thus, if the starting configuration before the ith stage was
#C0 #C1 # . . . Ci−1 #
#C0 #C1 # . . . #Ci #
then after the ith stage, the string generated by the tiling (which is trying so hard to match the bottom and
top rows) is
#C0 #C1 # . . . Ci−1 # Ci #
.
#C0 #C1 # . . . #Ci # Ci+1 #
Let this process continue till we reach the accepting configuration Cn = αqacc xβ, where α, β are some
string and x is some character in ΣM . Here, the tiling we have so far looks like
#C0 #C1 # . . . Cn−1 #
.
#C0 #C1 #C2 # . . . #Cn #
The question is how do we make this into a match? The idea is that now, since Cn is an accepting config-
uration, we should now treat the rest of the tiling as a cleanup stage, and slowly reduct Cn to the empty
string, as we copy it up and down. How do we do that? Well, let us introduce a delete tile

qacc x
qacc

into our set of tiles T (also known as a pacman tile). Clearly, if we use the copying tiles to copy Cn = αqacc xβ
to the top row, and erase in the process the character x, using the above “delete x” tile. We get
#C0 #C1 # . . . Cn−1 #αqacc xβ#
.
#C0 #C1 #C2 # . . . #Cn−1 #αqacc xβ#αqacc β#
We can now repeat this process, by introducing such delete tiles for every character of Σ, and also introducing
backward delete tiles like
xqacc
.
qacc
Thus, by using these delete tiles, we will get the tiling
#C0 #C1 # . . . Cn−1 #αqacc xβ#αqacc β# . . . #qacc y#
.
#C0 #C1 #C2 # . . . #Cn−1 #αqacc xβ#αqacc β# . . . #qacc y#qacc #
To finish of the tiling, we introduce a stopper tile

qacc ###
.
##

Adding it to the tiling results in the required match. This is the tiling
the accepting trace winding down the match
z }| { z }| {
#C0 #C1 # . . . #Cn−1 #αqacc xβ# αqacc β# . . . #qacc y#qacc ###
.
#C0 #C1 # . . . #Cn−1 #αqacc xβ# αqacc β# . . . #qacc y#qacc ###
| {z } | {z }
the accepting trace winding down the match

Its now easy to argue that with this set of tiles, if there is a match it must have the above structure
which encodes an accepting trace.

191
Computing the tiles
Let us recap the above description. We are given a string hM, wi, and we are generating a set of tiles T , as
follows. The set containing the initial tile is
 

 

#
T1 = . /* initial tile */

 #q 0 w# 


The set of copying tiles is  

 
 
x
T2 = x ∈ ΣM ∪ {#} .
 x
 


The set of computation tiles for right movement of the head is

 
 
 
qx
T3 = 0 ∀q, x, q 0 , y such that δM (q, x) = (q 0 , y, R) .

 yq 


The set of computation tiles for left movement of the head is

 
 
 
zqx 0 0
T4 = 0 ∀q, x, q , y, z such that δ M (q, x) = (q , y, L) .

 q zy 


The set of pacman tiles (i.e., delete tiles) is

 
 
 
qacc x xqacc
T5 = , ∀x ∈ ΣM .

 qacc qacc 


Finally, the set of containing the stopper tiles is

 

 

qacc ###
T6 = .

 ## 


Let T = T1 ∪T2 ∪T3 ∪T4 ∪T5 ∪T6 . Clearly, given hM, wi, we can easily compute the set T . Let AlgT M 2M P CP
denote the algorithm performing this conversion.
We summarize our work so far.

Lemma 31.1.1 Given a string hM, wi, the algorithm AlgT M 2M P CP computes a set of tiles T that is an
instance of MPCP. Furthermore, T contains a match if and only if M accepts w.

This implies the following result.

Theorem 31.1.2 The MPCP problem is undecidable.

Proof: The reduction is from ATM . Indeed, assume for the sake of contradiction that the MPCP problem is
decidable, and we are given a decider decider_MPCP for it. Next, we use it to build the following decider
for ATM .

192

decider7 -ATM hM, wi
T ← AlgT M 2M P CP (hM,
wi)

res ← decider_MPCP T .
return res

Clearly, this is a decider, and it accepts if and only if M halts on w. But this is a contradiction, since
ATM is undecidable.
Thus, our assumption (that MPCP is decidable) is false, implying the claim.

31.1.2 Reduction to MPCP to PCP

We are almost done, but not quite. Since our reduction only show that MPCP is undecidable, but we want
to show that PCP is decidable. An instance of MPCP is a set of tiles (just like an instance of PCP), with
the additional constraint that a specific tile is denoted as an initial tile that must be used as the first tile in
the matching. As such, to convert this instance T of MPCP into PCP we need to somehow remove the need
to specify the initial tile.
To this end, let us introduce a new special ? character into the alphabet used by the tiles. Next, given a
string w = x1 x2 . . . xm , let us denote
?w = ?x1 ? x2 . . . ? xm ? w? = ?x1 ? x2 . . . ? xm ? w? = x1 ? x2 . . . ? xm ? .
In particular, if
t1 tm
T = ,..., ,
b1 bm
t1
where is the special initial tile, then the new set of tiles would be
b1
   

 
 
 

?t1 ?t2 ?tm ?t1
X= , ,..., ∪ .

 b1 ? b2 ? bm ? 
 
 ?b 1? 


Note, that in this new set of tiles, the only tile that can serve as the first tile in the match is
?t1
,
?b1 ?
since its the only tile that has in the bottom string a ? as the first character. Now, to take care of the balance
of stars in the end of the string, we also add the tile
 

 

??
Y =X∪
 ? 
 

Its now easy to verify that if the original instance of T of MPCP had a match, then the set Y also has a
match.
The important thing about Y is that it does not need to specify what is the special initial tile (this is a
minor difference to T , but a difference nevertheless). As such, U is an instance of PCP. We conclude:

Lemma 31.1.3 Given a string hM, wi, the algorithm AlgT M 2P CP computes a set of tiles T that is an
instance of PCP. Furthermore, T contains a match if and only if M accepts w.
As before, this implies the described result.

Theorem 31.1.4 The PCP problem is undecidable.

193
31.2 Reduction of PCP to AMBIGCFG
We can use the PCP result to prove a useful fact about context-free grammars. Let us define
n o

AMBIGCFG = hGi G is a CFG and G is ambiguous .

We remind the reader that a grammar G is ambiguous if there are two different ways for G to generate some
word w.
We will show this problem is undecidable by a reduction from PCP. That is, given a PCP problem S, we
will construct a context-free grammar which is ambiguous exactly when S has a match. This means that
any decider for AM BIGCFG could be used to solve PCP, so such a decider can not exist.
Specifically, suppose that S looks like

t1 tk
S= ,... .
b1 bk

We could define the corresponding grammar as

D→T|B

T → t1 T | t2 T | . . . tk T | t1 | . . . | tk

B → b1 B | b2 B | . . . bk B | b1 | . . . | bk ,

with D as the initial symbol. This grammar is ambiguous if the tops of a sequence of tiles form the same
string as the bottoms of a sequence of tiles. However, there is nothing forcing the two sequences to use the
same tiles in the same order.
So, we will add some labels to our rules which name the set of tiles we have used. Let us suppose the
tiles are named d1 through dk . Then we will make our grammar generate strings like ti tj . . . tm dm . . . dj di
where the second part of the string contains the labels of the tiles used to build the first part of the string
(in reverse order).
So our final grammar H looks like

D→T |B

T → t1 Td1 | t2 Td2 | . . . tk Tdk | t1 d1 | . . . | tk dk

B → b1 Bd1 | b2 Bd2 | . . . bk Bdk | b1 d1 | . . . | bk dk .

Here V D is the initial symbols. Clearly, there is an ambiguous word winL(H) if and only if the given instance
of PCP has a match. Namely, deciding if a grammar is ambiguous is equivalent to deciding an instance of
PCP. But since PCP is undecidable, we get that deciding if a CFG grammar is ambiguous is undecidable.

Theorem 31.2.1 The language AMBIGCFG is undecidable.

31.3 2D tilings
Show some of the pretty tiling pictures linked on the 273 lectures web page, walking through the following
basic ideas.
A tiling of the plane is periodic if it is generated by repeating the contents of some rectangular patch of
the plane. Otherwise the tiling is aperiodic.
A set of tiles is aperiodic if these tiles can tile the whole plane but all tilings generated by these set are
aperiodic.
Wang’s conjectured, in 1961, that if a set of tiles can cover the plane at all, it can cover the plane
periodically.
This is a tempting conjecture, but it is wrong.

194
In 1966, Robert Berger found a set of 20426 Wang tiles that is aperiodic. A Wang tile is a square tile
with colors on its edges. The colors need to match when you put the tiles together.

green

blue black

red

Various people found smaller and smaller aperiodic sets of Wang tiles. The minimum so far is due to
Karel Culik II, and it is made out of 13 tiles.
Other researchers have built aperiodic sets of tiles with pretty shapes, e.g. the Penrose tiles. (Show
pretty pictures.)

31.3.1 Tilings and undecidability

Let us restrict our attention to square tiles that are unit squares, as they are easier to understand. Next,
consider the problem of deciding whether a set T of such unit square tiles can cover the plane. If there were
no aperiodic tile sets, then we could decide this problem as follows

Can T tile the plane?

Enumerate all pairs of integers (m, n). For each pair, do

– Find (by exhaustive search) all ways to cover an m by n box using the tiles in T .
– If there are no ways to tile this box, the plane can not be tiled either. So halt and reject.
– Check whether any of these tilings has matching top and bottom, left and right sides. If so,
we can repeat this pattern to tile the whole plane. So halt and accept

However, this procedure will loop forever if the set T has only aperiodic tilings.
In general, we have the following correspondence between 2D tilings and Turing machine behaviors:
tile set Turing machine
can not cover the plane halts
has a periodic tiling loops repeating configurations
has an aperiodic tilings runs forever without repeating configurations

31.4 Simulating a TM with a tiling

In fact, we can reduce ATM to a version of the tiling problem called the tiling completion problem. This
problem asks whether, given a set of tiles and a partial tiling of the plane, can we complete this into a tiling
of the whole plane.
To decide ATM using a tiling decider, we first hardcode our input string into M , making a TM Mw . We
then construct a tiling which covers the whole plane exactly when Mw does not halt on a blank input tape.
We have three types of tiles: start tiles, copy tiles, and action tiles. These tiles have labels on their edges
and also arrows, both of which have to match when the tiles are put together.
Start tiles: Here is the partial tiling we use as input to the problem. B is the blank symbol for Mw and
q0 is its start state.

195
B q0 B B

The tile set is engineered so that the rest of this row can only be filled by repeating the left and right
tiles from this set.
Copy tiles: For every character c in Mw ’s alphabet, we have a tile that just copies c from one row to
the next. We also have an entirely blank tile which must (given the design of this tile set) cover the lower
half-plane.

Action tiles: A transition of Mw δ(q, c) = (r, d, R) is implemented using a “split tile” (left below) and
a set of “merge tiles” (right below) for every character t in the tape alphabet.

d rt

r r

qc t

So a single transition line might look like:

d a d ra b

... r r ...

d a qc a b

So, this reduction shows that the tiling completion problem is undecidable.

References
Branko Grünbaum and G. C. Shephard (1986) Tilings and Patterns, W H Freeman.

196
Raphael M. Robinson (1971) Undecidability and nonperiodicity for tilings of the plane Inventiones Mathe-
maticae 12/3, pp. 177-209.

197
Chapter 32

Review of topics covered

This review of the class notes was written by Madhusudan Parthasarathy.

32.1 Introduction
The theory of computation is perhaps the fundamental theory of computer science. It sets out to define,
mathematically, what exactly computation is, what is feasible to solve using a computer, and also what is
not possible to solve using a computer.
The main objective is to define a computer mathematically, without the reliance on real-world computers,
hardware or software, or the plethora of programming languages we have in use today. The notion of a Turing
machine serves this purpose and defines what we believe is the crux of all computable functions.
The course is also about weaker forms of computation, concentrating on two classes, regular languages
and context-free languages. These two models help understand what we can do with restricted means of
computation, and offer a rich theory using which you can hone your mathematical skills in reasoning with
simple machines and the languages they define. However, they are not simply there as a weak form of
computation— the most attractive aspect of them is that problems formulated on them are tractable, i.e.
we can build efficient algorithms to reason with objects such as finite automata, context-free grammars and
pushdown automata. For example, we can model a piece of hardware (a circuit) as a finite-state system and
solve whether the circuit satisfies a property (like whether it performs addition of 16-bit registers correctly).
We can model the syntax of a programming language using a grammar, and build algorithms that check if
a string parses according to this grammar.
On the other hand, most problems that ask properties about Turing machines are undecidable. Undecid-
ability is an important topic in this course. You have seen and proved yourself that several tasks involving
Turing machines are unsolvable— i.e. no computer, no software, can solve it. For example, you know now
that there is no software that can check whether a C-program will halt on a particular input. This is quite
amazing, if you think about it. To prove something is possible is, of course, challenging, and you will learn
in other courses several ways of showing how something is possible. But to show something is impossible
is rare in computer science, and you will probably see no other instance of it in any other undergraduate
course. To show something is impossible requires an argument quite unlike any other, and you have seen the
method of diagonalization to prove impossibilities and reduction that help you prove infer one impossibility
from another. Impossibility results for regular languages and context-free languages are shown using the
pumping lemma.
In conclusion, you have formally learnt how to define a computer, and analyze the properties of com-
putable functions, which surely is the theoretical foundation of computer science.

32.2 The players

An alphabet (usually denoted by Σ) for us is always some finite set; words are sequences (strings) of letters
in the alphabet. And a language is a set of words over the alphabet.

198
The main players in our drama have been the four classes of languages: regular language (REG), context-
free languages (CFL), Turing-decidable languages (TM-DEC) and Turing-recognizable languages (TM-DEC).
Regular languages are the languages accepted by deterministic finite automata (DFAs) and context-free
languages are those languages generated by context-free grammars (CFGs). Turing-decidable languages are
those languages L for which there are Turing machines that always halt on every input, and decide whether
a word is in L or not.
Turing-recognizable languages are more subtle. A language L is Turing-recognizable if there is a TM M
which (a) when run on a word in L, halts eventually and accepts, and (b) when run on a word not in L,
M either halts and rejects, or does not halt. In other words, a TM recognizing L has to halt and accept all
words in L, and for words not in L, can reject or go off into a loop.
The main things to remember are:

• REG ⊂ CFL ⊂ TM-DEC ⊂ TM-RECOG.

• Each of the above inclusions is strict: i.e. there is a language that is context-free but not regular, there
is a language that is TM-DEC but not context-free, etc.

• There are languages that are not even TM-RECOG.

Regular languages are trivially contained within context-free languages (as DFAs can be easily converted
to PDAs). However, it is not easy to see that a CFG/PDA for L can be converted to a TM deciding L. However,
this is possible (see Theorem 4.9). TM-DEC languages are clearly TM-RECOG as well, by definition.
For example, if Σ = {a, b}, then

• {ai bj | i, j ∈ N} is regular (and hence also a CFL, and TM-DEC and TM-RECOG),

• {an bn | n ∈ N} is a CFL but not regular (but is TM-DEC and TM-RECOG),

• {an bn cn | n ∈ N} is TM-decidable (and TM-RECOG) but not context-free (nor regular).

• A language that is Turing-recognizable but not Turing-decidable is

ATM = {hM, wi | M is a TM that accepts w}.

• A language that is not even Turing-recognizable is

DOESN OTT M = {hM, wi | M is a TM that does not accept w}.

The notion of what we call an “algorithm” in computer science accords with Turing-decidability. In other
words, when we build an algorithm for a decision problem in computer science, we want it to always halt
and say ’yes’ or ’no’. Hence the notion of a computable function is that it be TM-decidable.

32.3 Regular languages

Let us review what we learnt about regular languages (Chapter 1 of Sipser).
Fix an alphabet Σ. A regular language L ⊆ Σ∗ is any language accepted by a deterministic finite
automaton (DFA).
A DFA is a 5-tuple (Q, Σ, δ, q0 , F ), where Q is a finite set of states, q0 ∈ Q is the initial state, and F ⊆ Q
is the set of final states. The (deterministic) transition function is the function δ : Q × Σ → Q.
A non-deterministic finite automaton (NFA) is a 5-tuple (Q, Σ, δ, q0 , F ), where Q, q0 and F are as
in a DFA, and the nondeterministic transition function is δ : Q × Σ → P(Q).
A regular expression is formed using the syntax:

| a | ∅ | R1 ∪ R2 | R1 · R2 | R1∗ .

Here are some properties of regular languages:

199
• Non-deterministic finite automata can be converted to equivalent DFAs. This construction is the “subset
construction” and is important. See Theorem 1.39 in Sipser. Intuitively, for any NFA, we can build a
DFA that tracks the set of all states the NFA can be in. Handling -transitions is a bit complex, and
you should know this construction. Hence NFAs are equivalent to DFAs.
• Regular languages are closed under union, intersection, complement, concatenation and Kleene-* (The-
orems 1.45, 1.47 and 1.49 in Sipser).
• Regular expressions define exactly the class of regular languages (Theorem 1.54). In other words, any
language generated by a regular expression is accepted by some DFA (Lemma 1.55) and any language
accepted by a DFA/NFA can be generated by a regular expression (Lemma 1.60).
• So the trinity: DFA ≡ NFA ≡ Regular Expression holds.
• The pumping lemma says that, if L is regular, then there is a p ∈ N such that for every s ∈ L with
|s| > p, there are words x, y, z with s = xyz such that (a) |y| > 0 (b) |xy| ≤ p and (c) for every i,
xy i z ∈ L. In mathematical language,
   
s = xyz
L   s∈L  |y| > 0 
is ⇒∃p ∈ N. ∀s ∈ L 
 ∧ ⇒ ∗
∃x, y, z ∈ Σ :


|xy| ≤ p
regular |s| > p
∀i xy i z ∈ L

• The contrapositive to the pumping lemma says that, if for every p ∈ N, there is an s ∈ L with |s| > p,
such that for every x, y, z with s = xyz, |y| > 0, and |xy| ≤ p, there is some i such that xy i z 6∈ L, then
L is not regular.
In mathematical language,
 
|s|> p and   
 ∀p ∈ N  L
 s = xyz  ⇒ is not
 ∃s ∈ L ∀x, y, z ∈ Σ∗ :  |y| > 0  → (∃i. xy i z 6∈ L) 
regular.
|xy| ≤ p

• The contrapositive to the pumping lemma gives us a way to prove a language is not regular. We take
an arbitrary p, and construct a particular word wp , which depends on p, such that wp ∈ L and |wp | > p.
Then we show that no matter which x, y, z is chosen such that s = xyz, |xy| ≤ p and |y| > 1, there is
an i such that xy i z 6∈ L.
Knowing how to prove a language non-regular using the pumping lemma is important.
• We can using the above technique, show several languages to be non-regular, for example (see Eg.1.73,
1.74, 1.75, 1.76, 1.77):
– {0n 1n |n ≥ 0} is not regular.
– {w|w has an equal number of 0s and 1s } is not regular.
– {ww|w ∈ Σ∗ } is not regular.
2
– {1n |n ≥ 0} is not regular.
Choosing wp ∈ L should be done carefully and cleverly. However, the choice of i being 0 or 2 usually
works for most example.
Note that you are allowed to pick wp (but not p), and allowed to pick i (not x,y or z).
• Deterministic finite automata can be uniquely minimized. In other words, for any regular language,
there is a unique minimal automaton accepting it (here, by minimal, we mean an automaton with the
least number of states). Moreover, given a DFA A, we can build an efficient algorithm to build the
minimal DFA for the language L(A). This is not covered in Sipser; see the handout on suffix languages
and minimization:

200
http://uiuc.edu/class/fa07/cs273/Handouts/minimization/suffix.pdf
and the minimization algorithm:
http://uiuc.edu/class/fa07/cs273/Handouts/minimization/minimization.pdf.

For the final exam, you are not required to know this algorithm, but just know that regular languages
have a unique minimal DFA.

Turning to algorithms for manipulating automata, here are some things worth knowing (read Sipser
Section 4.1):

• We can build an algorithm that checks, given a DFA/NFA A, whether L(A) 6= ∅. In other words, the
problem of checking emptiness of an automaton is decidable. (see Sipser Theorems 4.1 and 4.2). In
fact, this algorithm runs in linear (i.e. O(n)) time.

• Automata are closed under operations union, intersection, complement, concatenation, Kleene-*, etc.
Moreover, we can build algorithms to do all these closures. That is, we can build algorithms that will
take two automata and compute an automaton accepting the union of the languages accepted by the
two automata, etc.
All constructions we did on automata are actually computable algorithmically. For example, we can
build algorithms to convert regular expressions to automata, automata to regular expressions, etc.
Several other questions regarding automata are also decidable: For example:

– Given two automata A and B, we can decide whether L(A) ⊆ L(B).

Note that L(A) ⊆ L(B) iff L(A) ∩ L(B) = ∅. So we can build the complement C of B, intersect
C with A to get D, and check D for emptiness.
– Given two automata A and B, we can decide if L(A) = L(B), by checking if L(A) ⊆ L(B) and if
L(B) ⊆ L(A), which we have show above to be decidable. (See Sipser Theorem 4.5.)

In general, most reasonable questions about automata are decidable.

32.4 Context-free Languages

A context-free grammar is a 4-tuple (V, Σ, R, S), where V is a finite set of variable, Σ is a finite set of
terminals, S ∈ V is the start variable, and R is a set of rules of the form A → w where A ∈ V and
w ∈ (V ∪ Σ)∗.
A context-free grammar generates a set of words over its terminal alphabet Σ, and is called the language
generated by the grammar. A word w is generated by a CFG if we can derive, starting with the start symbol
S, and using the rules a finite number of times, the word w. A derivation of a word can also be seen as a
parse tree, where the root is labeled with S, and the leaves, read left to write, give w, and each node with
its children encode some derivation rule in R.
A CFG is in Chomsky normal form if every rule is of the form A → BC or A → a, where A, B, C are
non-terminals, and a is a terminal. (There’s a slightly more complex definition to allow generating .)
A string is derived ambiguously in a CFG G if it has two or more different left-most derivations (or two
or more parse tree derivations) in G. A grammar G is ambiguous if it can derive some word w ambiguously.
A language over Σ is a context-free language if it is generated by some context-free grammar with terminal
alphabet Σ.
A pushdown automaton (PDA) is a 6-tuple (Q, Σ, Γ, δ, q0 , F ), where Q, q0 and F are as in a DFA, Γ is a
finite stack alphabet, and the nondeterministic transition function is δ : Q × Σ × Γ → P(Q × Γ ).
A pushdown automaton is, by convention, nondeterministic. It uses a LIFO stack to store data, and
accepts when it has some run on a word on which it reaches a final state.
Here are some important facts about context-free languages:

• Context-free grammars can be converted to Chomsky normal form (Theorem 2.9).

201
• Pushdown automata define exactly the class of context-free languages. I.e. PDA ≡ CFG. (Sipser
Theorems 2.20).
• Context-free languages are closed under union, concatenation and Kleene-*, but not under intersection
or complement. (See the last section in this article for more details.)
• Deterministic pushdown automata are strictly weaker than pushdown automata (since they can be
complemented by toggling the final states).
• The membership problem for CFGs and PDAs is decidable. In particular, the CYK algorithm uses
dynamic programming to solve the membership problem for CFGs, and in fact produces a parse tree
as well, in O(n3 ) time.
• The problem of checking if a context-free language generates all words (i.e. if L(G) = Σ∗ ) is undecid-
able. This is proved in Theorem 5.13, using context-free grammars that check computation histories
of Turing machines.
• The problem of checking if a context-free language is ambiguous is undecidable (Exercise 5.21 in Sipser),
and is proved by a reduction from the Post’s correspondence problem. You need to know this fact, not
the proof.
• The language {an bn cn | n ∈ N} is not a context-free language. There is a pumping lemma for context-
free languages, and we can use it to show that this language is not context-free. You are not required
to know this proof.

32.5 Turing machines and computability

32.5.1 Turing machines
A Turing machine is a 7-tuple (Q, Σ, Γ, δ, q0 , qaccept , qreject where Q is a finite set of states, Σ is a finite
alphabet, Γ is the tape-alphabet that includes Σ and a particular symbol # which is the blank symbol,
q0 ∈ Q is the initial state, qaccept ∈ Q is the accept state and qreject is the reject state (qaccept 6= qreject ).
The transition function is deterministic: δ : Q × Γ → Q × Γ × {L, R}.
A Turing machine has access to a one-way infinite tape, and hence has unbounded memory. It can go
back and forth on the tape rewriting symbol to do its computation.
A Turing machine halts if it reaches an accepting or rejecting configuration (i.e. reaches a configuration
with state qaccept or qreject . A Turing machine stops when it reaches such a configuration.
There are two main classes of languages defined using Turing machines:
• A language L is Turing-decidable if there is a Turing machine M such that M halts on all inputs, and
accepts all words in L and rejects all words not in L.
• A language L is Turing-recognizable if there is a Turing machine M such that (a) on any word in L,
M halts and accepts, and (b) on any word not in L, either M does not halt, or halts and rejects.
The notion of Turing-recognizability and Turing-decidability are robust concepts. Changing the definition
of Turing machines in any reasonable way does not change this notion. For example, a multi-tape Turing
machine is not more powerful, and can be converted to a single-tape Turing machine. Also, giving two-way-
infinite tapes do not make Turing machines more powerful.
A nondeterministic Turing machine (NTM) is a Turing machine which has a transition function
that is non-deterministic. The languages decided and recognized by non-deterministic Turing machines
are precisely those decided and recognized by deterministic Turing machines. This proof is done using
dovetailing: a deterministic Turing machine simulates an NTM by systematically exploring all runs of the
NTM for time i steps, for increasing values of i.
By definition, a Turing-decidable language is also Turing-recognizable.
An important language is:
- AT M = {hM, wi | M is a Turing machine that halts and accepts w}.
Here are some of the important results:

202
• AT M is not Turing-decidable (Theorem 4.11). This is the fundamental undecidable problem and is
shown undecidable using a diagonalization method, which takes the code for a purported TM deciding
AT M and pits it against itself to lead to a contradiction. Diagonalization is an important technique to
prove impossibility results (almost the only technique we know!).
• AT M is Turing-recognizable. This is easy to show: we can build a Turing machine that on input
hM, wi, simulates M on w and accepts if M accepts w. Hence the class TM-DEC is a strict subclass of
TM-RECOG.
• A language L is Turing-decidable iff L and L are Turing-recognizable (Theorem 4.22).
If L is decidable, then L is decidable as well, and so L and L are Turing-recognizable. If L and L
are both Turing-recognizable, we can build a decider for L by simulating the machines for L and L in
“parallel”, and accept or reject depending on which of them accepts.
• A corollary to the above theorem is that if L is Turing-recognizable and not Turing-decidable, then L
is not TM recognizable. Hence, AT M is not even TM-recognizable (Corollary 4.23).

32.5.2 Reductions
Reductions are a technique to deduce undecidability of problems using another problem that is known to be
undecidable.
A language S reduces to a language T if, given a TM deciding T , we can build a TM that decides S.
In other words, if T is decidable, then S is decidable. Which, paraphrased, says that if S is undecidable
then T is undecidable.
Hence, to show T is undecidable, we choose a language S that we know is undecidable, and reduce S to
T.
Many reductions are from AT M ; to show L is undecidable, we try to reduce AT M to L, i.e. assuming we
have a decider for L, we show that we can build a decider for AT M . Since AT M has no decider, it follows
that L has no decider.
Reduction proofs are important to understand and learn. Reductions from AT M to languages that accept
Turing machine descriptions often go roughly like this:
• Assume L has a decider R; we build a decider D for AT M .
• D takes as input hM, wi.
D then modifies M to construct a TM NM,w .
D then feeds this machine NM,w to R.
Depending on whether R accepts or rejects, D accepts or rejects (sometimes switching the answer).
Using reductions we can prove several languages undecidable. For example, (see Theorems 5.1, 5.2, 5.3,
5.4)
• HALTT M = {hM, wi | M is a TM and M halts on input w} is undecidable.
It is Turing-recognizable, though.
• ET M = {hM, wi | M is a TM and L(M ) = ∅} is undecidable.
It is not even Turing-recognizable since its complement is Turing-recognizable.
• REGU LART M = {hM i | L(M ) is regular } is undecidable.
• EQT M = {hM1 , M2 i | L(M1 ) = L(M2 )} is undecidable.
Rice’s theorem generalizes many undecidability results. Consider a class P of Turing machine descriptions.
Assume that P is a property of Turing machines that depends only on the language of the Turing machines
(i.e. if M and N are Turing machines accepting the same language, then either both are in P or both are not
in P ). Also assume that P is not the empty set nor the set of all Turing machines. Then P is undecidable.
Note that if P was the empty set or the set of all Turing machine descriptions, then clearly it is decidable.
Note: There will be a question on reductions in the exam. The reduction will be one which is a direct
corollary of Rice’s theorem, but you will be asked to give a proof without using Rice’s theorem.

203
32.5.3 Other undecidability problems
There were several other problems that were shown to be undecidable. Knowing these are undecidable is
important; you will not be asked for proofs of any of these, however:

• A linear bounded automaton (LBA) is a Turing machine that uses only the space occupied by the input,
and does not use any extra cells. The emptiness problem for LBAs in undecidable (Theorem 5.10): i.e.
ELBA = {hM iM is an LBA and L(M ) 6= ∅} is undecidable.
However, the membership problem for LBAs is decidable (Theorem 5.9):
i.e. ALBA = {hM, wi | M is an LBA accepting w} is decidable.

• Checking if a context-free grammar accepts all words is undecidable (Theorem 5.13).

• The Post Correspondence Problem is undecidable (Theorem 5.15).

32.6 Summary of closure properties and decision problems

32.6.1 Closure Properties
Here’s a summary of closure properties for the various classes of languages:

Union Intersection Complement Kleene-* Homomorphisms

REG Yes Yes Yes Yes Yes
CFL Yes No No Yes Yes
TM-DEC Yes Yes Yes Yes Yes
TM-RECOG Yes Yes No Yes Yes

The results for regular languages are in Sipser and class notes.
See also
http://uiuc.edu/class/fa07/cs273/Handouts/closure/regular-closure.html.
Sipser doesn’t cover closure properties of context-free languages very clearly. However, note that closure
under union is easy as it is simple to combine two grammars to realize their union. Non-closure under
intersection follows from the fact that L1 = {ai bj ck | i = j} and L2 = {ai bj ck | j = k} are both context-free,
but their intersection L1 ∩ L2 = {ai bj ck | i = j = k} is not. Non-closure under complement is easy to see
as L = {an bn cn | n ∈ N} is not context-free but its complement is context-free. Closure under Kleene-* and
homomorphisms are easy as one can easily transform a grammar to do these operations.
See
http://uiuc.edu/class/fa07/cs273/Handouts/closure/cfl-closure.html
for more detailed proofs.
Turning to TM-DEC, they are closed under union as you can run TM M1 followed by M2 , and accept if one
of them accept. For intersection, you can run them one after the other, and accept if both accept. Closure
under complement is easy as we can swap the accept and reject states of a TM. Kleene-* and homomorphisms
were not covered, but it is easy to see that TM-DEC languages are closed under these operations (try them
as an exercise!).
Finally, TM-RECOG is closed under union as you can run two Turing machines in “parallel”, and accept if
one of them accepts. The class is closed under intersection, as we can run them one after another, and accept
if both accept. (Note the subtleties of the construction here; simulating a TM that recognizes a language has
to be done carefully as it may not halt). The class of TM-RECOG languages is not closed under complement
(for example, AT M is TM-RECOG but its complement is not). In fact, if L is TM-RECOG and its complement
is also TM-RECOG, then L is TM-DEC. Since we know AT M is not TM-DEC, it follows that its complement
is not TM-RECOG. TM-RECOG languages are closed under Kleene-* and homomorphisms— we leave these
as exercises.

204
32.6.2 Decision problems
For each class of languages, let’s consider four problems—

Membership: Given a language L, and a word w, check if w ∈ L.

Emptiness: Given a language L, check if L = ∅.
Inclusion: Given two languages L1 and L2 , check if L1 ⊆ L2 .

Equivalence: Given two languages L1 and L2 , check if L1 = L2 .

For each of the problems above, we will consider the cases when the language(s) are given as finite
automata, PDAs or TM. Note that the problem does not change much if we give it using a grammar instead
of a PDA, or a regular expression instead of an automaton, because we can always convert them to PDAs or
automata.

Membership Emptiness Inclusion Equivalence

REG Yes Yes Yes Yes
CFL Yes Yes No No
TM-DEC No No No No
TM-RECOG No No No No

Notice that regular languages are the most tractable class, and context-free languages have the important
property that membership (Theorem 4.7) and emptiness (Theorem 4.8) are decidable. In particular, mem-
bership of context-free languages is close to the problem of parsing, and hence is an important algorithmic
problem. Context-free languages do not admit a decidable inclusion or equivalence problem (Theorem 5.13
shows that checking if a CFG generates all words is undecidable; we can reduce this to both the problem of
inclusion– L(A) ⊆ L(B)– and equivalence– L(A) = L(B)– by setting A to be a CFG generating all words).
For Turing machines, almost nothing interesting is decidable. However, note that the membership
problem for Turing machines (AT M ) (but not emptiness problem for Turing machines (ET M )) is Turing-
recognizable.

205
Part II

Discussions

206
Chapter 33

Discussion 1: Review
20 January 2009

Purpose: Most of this material is review from CS 173, though students may have forgotten
some of it, especially details of notation. The exceptions are the sections on strings and graph
induction.

Before you start, introduce yourself. Promise office hours will be posted very soon and encourage them
to come.

33.1 Homework guidelines

• Put each problem on a different sheet of paper, we grade them separately.

• Put your name on each sheet of paper you submit.

• Put the section number on every sheet of paper you submit.

• Homeworks will usually be due on Thursday at 12:30.

• Homeworks are available from the class webpage.

Do not forget to register with the news server and start reading the newsgroups!

33.2 Numbers
What are Z, N (no zero), N0 , (mention quickly Q and IR).

33.3 Divisibility
What does it mean for x to divide q? Namely, there exists integer n such that q = xn. As such, every
number n is divisible by 1 and n (i.e., itself).
An integer number is even if it can be divided by 2. A number which is not even, is odd.

Question 33.3.1 Is zero even?

Well, 0 = 2 ∗ 0 and as such, 0 is divisible by two.

Meaning of p = q mod k. Examples... (i.e., 121 = 73 mod 3).

208
√
33.4 2 is not rational
A number x is rational if it can be written as the ratio of two integers x = α/β. The fraction α/β is
irreducible if α and β do not have any common divisors (except 1, of course). Note that a rational number
has a unique way to be written in irreducible.

Lemma 33.4.1 An integer k is even if and only if k 2 is even.

Proof: If k is even, then it can be written as k = 2u, where u is an integer. As such, k 2 = (2u)2 = 4u2 is
even, since it can be divided by 2. As for the other possibility, if k = 2u + 1, then

u2 = (2k + 1)2 = 4k 2 + 2 ∗ 2k + 1 = 2(2k 2 + 2k) + 1,

is odd (since it is the sum of an even number and an odd number), implying the claim.
√ √
Theorem 33.4.2 The number 2 is not rational (i.e., 2 is an irrational number).
√
Proof:
√ Assume for the sake of contradiction that 2 is rational, and it can written as the irreducible ratio
2 = α/β.
Let us square both size of this equation. We get that

α2
2= .
β2

That is the number 2 is a divisor of α2 . Namely, α2 is an even number. But then, α must be an even number.
So, let α = 2a. We have that
(2a)2
2= . ⇒ 2β 2 = (2a)2 = 4a2 .
β2
As such,
β 2 = 2a2 .
Namely, β 2 is even, which implies, again that β is even. As such, let us write β = 2b, where b is an integer.
We thus have that
√ α 2a a
2= = = .
β 2b b
Namely, we started from a rational number in irreducible form√(i.e., α/β) and we reduced it further to a/b.
√ is impossible. A contradiction. Our assumption that 2 is a rational number is false. We conclude
But this
that 2 is irrational.

33.5 Review of set theory

Sets, set building notation (just show an example), subset, proper subset, empty set, Venn diagram
Tuples, cross product of two sets.
Sets of sets, power set, sets can contain a mixture of stuff like {a, b, {a, b}}.
Demorgan’s Laws
Cardinality of a set

33.6 Strings
strings, empty string, sets of strings
substring, prefix, suffix, string concatenation
What happens when you concatenate the empty string with another string?
Suppose w is a string. Is the empty string a substring of w? (Yes!)

209
33.7 Recursive definition
Recursive definition (concrete example).
Let U be the set that contains the point (0, 0). Also:

• if (x, y) ∈ U , then (x + 1, y) ∈ U .
• if (x, y) ∈ U , then (x, y + 1) ∈ U
Q: What is U ?
A: Clearly, (0, 1) ∈ U , (0, 2) ∈ U ,.... (0, k) ∈ U (for any k).
As such, (1, k) ∈ U (for any k)
As such, (2, k) ∈ U (for any k)
As such, (j, k) ∈ U (for any j > 0, k > 0).
Note that U = N0 × N0 .
(A more complicated example of this is in the homework.)

33.8 Induction example

The following wasn’t done in some (all?) of the sections, but you may still find it useful to read.
Consider a graph G = (V, E). The graph G is connected, if there is a path between any two vertices of
G. A graph is acyclic if does not contain any cycle in it. A leaf in G is a node that has exactly one edge
adjacent to it.

Lemma 33.8.1 A connected acyclic graph G over n > 1 vertices, must contain a leaf.
Proof: Indeed, start from a vertex v ∈ V (G), if it is a leaf we are done, otherwise, there must be an edge
e = vu of G adjacent to v (since G is connected), and travel on this edge to u. Mark the edge e as used.
Repeat this process of “walking” in the graph till you reach a leaf (and then you stop), where the walk can
not use an edge that was used before.
If this walk process reached a leaf then we are done. The other possibility is that the walk never ends.
But then, we must visit some vertex we already visited a second time (since the graph is finite). But that
would imply that G contains a cycle. But that is impossible, since G is acyclic (i.e., it does not contain
cycles).

Claim 33.8.2 A connected acyclic graph G over n vertices has exactly n − 1 edges.
Proof: The proof is by induction on n.
The base of the induction is n = 2. Here we have two vertices, and since its connected, this graph must
have the edge connecting the two vertices, implying the claim.
As for n > 2, we know by the above lemma that G contains a leaf, and let w denote this leaf.
Consider the graph H formed by G by removing from it the vertex w and the single edge e0 attached to
it. Clearly the graph H is connected and acyclic, and it has n − 1 vertices. By the induction hypothesis, the
graph H has n − 2 edges. But then, the graph G has all the edges of H plus one (i.e., e0 ). Namely, G has
(n − 2) + 1 = n − 1 edges, as claimed.

210
Chapter 34

Discussion 2: Examples of DFAs

27 January 2009

Purpose: This discussion demonstrates a few constructions of DFAs. However, its main
purpose is to show how to move from a diagram describing a DFA into a formal description,
in particular of the transition function.
This material here (probably) can not be covered in one discussion section.

Questions on homework 1?
Any questions? Complaints, etc?

34.1 Languages that depend on k

34.1.1 aab2i
Consider the following language:
n o

L2 = aabn n is a multiple of 2 .

Its finite automata is

a a b
S T q0 q1
b
b b a

H a
a,b
Advice To TA:: Do not erase this diagram from the board, you would need to modify it shortly,
for the next example. end
This automata formally is the tuple (Q, Σ, δ, S, F ).

1. Q = {S, T, H, q0 , q1 } - states.

2. Σ = {a, b} - alphabet.

3. δ : Q × Σ → Q, described in the table.

211
a b
S T H
T q0 H
H H H
q0 H q1
q1 H q0

4. S is the start state.

5. F = {q0 } is the set of accepting states.

34.1.2 aab5i
Consider the following language:
n o

L5 = aabn n is a multiple of 5 .

Its finite automata is

a a b b b b
S T q0 q1 q2 q3 q4

b b
a
H
a
a
a,b a
a
Advice To TA:: Do not erase this diagram from the board, you would need to modify it shortly,
for the next example. end
This automata formally is the tuple (Q, Σ, δ, S, F ).
δ a b
1. Q = {S, T, H, q0 , q1 , q2 , q3 , q4 } - states. S T H
T q0 H
2. Σ = {a, b} - alphabet.
H H H
3. δ : Q × Σ → Q - see table. q0 H q1
q1 H q2
4. S is the start state. q2 H q3
q3 H q4
5. F = {q0 } is the set of accepting states.
q4 H q0

34.1.3 aabki
Let k be a fixed constant, and consider the following language:
n o

Lk = aabn n is a multiple of k .

Its finite automata is

212
b

a a q0 b q1 b ... b qi b ... b qk−1

S T
b b
a
H
a
a
a, b
a
This automata formally is the tuple (Q, Σ, δk , S, F ).
a b
1. States: S T H
Q = {S, T, H, q0 , q1 , . . . , qk−1 }. T q0 H
H H H
2. Σ = {a, b} - alphabet. q0 H q1
q1 H q2
3. δk : Q × Σ → Q - see table. .. .. ..
. . .
4. S is the start state. qi H qi+1
.. .. ..
5. F = {q0 } is the set of accepting states. . . .
qk−1 H q0

Explicit formula for δ

Advice To TA:: Skip the first two forms in the discussion. Show only the one in Eq. (34.1). end

Another way of writing the transition function δk , for the above example, is the following:

δk (S, a) = T,
δk (S, b) = H,
δk (T, a) = q0 ,
δk (T, b) = H,
δk (H, a) = H,
δk (H, b) = H,
δk (qi , a) = H, ∀i
δk (qi , b) = qi+1 for i < k − 1,
δk (qk−1 , b) = q0 .

Another way to write the same information



 T s = S, x = a



 H s = S or T, x = b


 q0 s = T, x = a
δk (s, x) = H s = H, x = a or b



 H s = qi , x = a, i = 0, . . . , k − 1



 qi+1 s = qi , x = b, i < k − 1,

q0 s = qk−1 , s = b.

213
This can be made slightly more compact using the mod notation:


 T s = S, x = a



 H s = S or T, x = b

q0 s = T, x = a
δk (s, x) = (34.1)

 H s = H, x = a or b



 H s = qi , x = a, ∀i

q(i+1) mod k s = qi , x = b, ∀i.

Note, that using good state names would help you to describe the automata compactly (thus q0 here is
not that the starting state). Generally speaking, the shorter your description is, the least work needed to be
done, and the chance you make a silly mistake is lower.

34.2 Number of changes from 0 to 1 is even

0 1
Let us build a DFA that recognizes all binary strings, such that
the number of times the string changes from consecutive zeroes to 1
ones (and vice versa) is even. Thus 0000 is in our language (num- qeven,0 qodd,1
ber of changes is 0, but 0000111111 is not (number of changes 0 0
is one). We need to remember what was the last character in
qinit
the input, and whether the number of changes so far was even 1
or odd. As such, we need 4 states. But wait! What about the 0
empty string . Clearly, it is in the language (i.e., the number qeven,1 qodd,0
of changes between 0 and 1 is zero, which is even). As such, we 1
need a special state to handle it.
1 0

1. States: Q = {qinit , qeven,0 , qeven,1 , qodd,0 , qodd,1 }.

0 1
2. Σ = {0, 1} - alphabet. qinit qeven,0 qeven,1
qeven,0 qeven,0 qodd,1
3. δ : Q × Σ → Q, see table on the right. qeven,1 qodd,0 qeven,1
4. qinit is the start state. qodd,0 qodd,0 qeven,1
qodd,1 qeven,0 qodd,1
5. Set of accepting states: F = {qeven,0 , qeven,1 , qinit }.

The resulting finite automata is the tuple (Q, Σ, δ, qinit , F ).

34.3 State explosion

Advice To TA:: Show first the longer way to solve this problem. Use your board cleverly, so that you
can move from the tedious form into the shorter form, by just erasing things. end
Because automatas do not have an explicit way of storing information, we need to encode the information
to solve the problem explicitly in the states of the DFA. That can get very tedious, as the following example
shows.
Let L be the language of all binary strings such that the third character from the end is zero.
To design an automata that accepts this language, we clearly need to be able to remember the last three
characters of the input. To this end, let us introduce the states q000 , q001 , q010 , q011 , q100 , q101 , q110 , q111 . Here
the automata would be in state q011 if the last three characters are 0, 1 and 1. Naturally, we need to also
be able to handle the first one or two characters arriving to the input, which forces us to introduce special
states for this case. Here is the resulting automata in formal description. Here, the automata is the tuple
(Q, Σ, δ, e, F ).

214
0 1
e q0 q1
1. States: q0 q00 q01
Q = {e, q0 , q1 , q00 , q01 , q10 , q11 , q000 , q001 , q010 , q011 , q1 q10 q11
q00 q000 q001
q100 , q101 , q110 , q111 }
q01 q010 q011
2. Σ = {0, 1} - alphabet. q000 q000 q001
q001 q010 q011
3. δ : Q × Σ → Q - see table. q010 q100 q101
q011 q110 q111
4. e is the start state.
q100 q000 q001
5. F = {q000 , q001 , q010 , q011 } is the set of accepting states. q101 q010 q011
q110 q100 q101
q111 q110 q111

Advice To TA:: Please show the explicit long way of writing the transition table (shown above), and
only then show the following more compact way. Its beneficial to see how using more formal representation,
can save you a lot of time and space. Say it explicitly in the discussion. end

A more sane way to write the transition table for δ is

0 1
e q0 q1
qx qx0 qx1
qxy qxy0 qxy1
qxyz qyz0 qyz1

Which is clearly the most compact way to describe this transition function. And here is a drawing of
this automata:

q0 e 1 q1
0
1 0
0 1
q00 q01 0 q11 1 q100
1
0
0 1 1
q000 1 q001 1 q011 1 q111 q110 1 q101 0 q010 0 q100
0 1
0 1
0
1 0
0

0 1

Advice To TA:: Trust me. You do not want to erase this diagram before doing the next example..
end

215
34.3.1 Being smarter

Since we care only if the third character from the end is zero, we could pretend that when the input starts,
the automata already saw, three ones on the input. Thus, setting the initial state to q111 . Now, we can get
rid of the special states we had before.

1. States:
Q = {q000 , q001 , q010 , q011 ,
q100 , q101 , q110 , q111 }
2. Σ = {0, 1} - alphabet. 0 1
qxyz qyz0 qyz1
3. δ : Q × Σ → Q - see table.
4. q111 is the start state.

5. F = {q000 , q001 , q010 , q011 } is the set of accepting states.

q000 1 q001 1 q011 1 q111 q110 1 q101 0 q010 0 q100

0 1
0 1
0
1 0
0

0 1

This brings to the forefront several issues: (i) the most natural way to design an automata does not
necessarily leads to the simplest automata, (ii) a bit of thinking ahead of time will save you much pain, and
(iii) how do you know that what you came up with is the simplest (i.e., fewest number of states) automata
accepting a language?
The third question is interesting, and we will come back to it later in the course.

34.4 None of the last k characters from the end is 0

Let L0k be the language of all binary strings such that none of the least k characters is 0. We can of course
adapt the previous automata to this language, by changing the accepting states and the start state. However,
we can solve it more efficiently, by remembering what is the maximum length of the suffix of the input seen
so far, such that its all one.
In particular, let qi be the state such that the suffix of the input is a zero followed by i ones. Clearly, qk
is the accept state, and q0 is the starting state. The transition function is also simple. If the automata sees
a 0, it goes back to q0 . If it at qi and it accepts a 1 it goes to qi+1 , if i < k. . We get the following automata.

216
n o

1. States: Q = qi i = 0, . . . , k .

2. Σ = {0, 1} - alphabet.
3. δk : Q × Σ → Q, where
q0 x=0
δk (qi , x) =
qmin(i+1,k) x = 1.

4. q0 is the start state.

5. Set of accepting states: F = {qk }.

So, a little change in the definition of the language can make a dramatic difference in the number states
needed.

217
Chapter 35

Discussion 3: Non-deterministic finite

automatas
February 3, 2009

Purpose: This discussion demonstrates a few simple NFAs, and how to formally define a
NFA. We also demonstrate that complementing a NFA is a tricky business.

Questions on homework 2?
Any questions? Complaints, etc?

35.1 Non-determinism with finite number of states

35.1.1 Formal description of NFA
ǫ
C D
1
0 1
M1 : E H
0 0
A 1 1
0
B G

In the above NFA, we have δ(A, 0) = {C}. Despite the -transition from C to D. As such, δ(A, 0) 6=
{C, D}. If δ(A, 0) = {C, D} then the NFA is a different NFA:
ǫ
C D

0 1
0 1
E H
0 0
A 1 1
0
B G
In any case, the NFA M1 (depicted in first figure) is the 5-tuple (Q, Σ, δ, A, F), where
δ : Q × Σ → P(Q).

218
Here Σ = {0, 1}, Q = {A, B, C, D, E, G, H}, and F = {H}.
δ 0 1 ǫ
C D
A {C} {B} ∅
1
B {E, G} ∅ ∅ 0 1
C ∅ ∅ {D} M1 : E H
D ∅ {E} ∅
0 0
E ∅ {H} ∅ A 1 1
G {H} {E} ∅ 0
H ∅ ∅ ∅ B G

35.1.2 Concatenating NFAs

We are given two NFAs M = (Q, Σ, δ, A, F ) and M 0 = (Q0 , Σ, δ 0 , A0 , F 0 ). We would like to build an NFA for
the concatenated language L(M )L(M 0 ).
First, we can assume that M has a single accepting state f . Indeed, we can create a new accepting state
f , add it to Q, make all the states in F non-accepting, but add an -transition from them to f . Thus, we
can now assume that F = {f }.
Back to our task, of constructing the concatenated NFA, we can just create an transition from f to
0 b
A . Here is the formal construction of the NFA for the concatenated language N = Q, Σ, δ, A, F , where 0

Q = Q ∪ Q0 . As for widehatδ, we have that



δ(q, x) q∈Q
b
δ(q, x) = δ 0 (q, x) q ∈ Q0


δ(f, ) ∪ {A0 } q = f and x = .

Claim 35.1.1 The NFA N accepts a string w ∈ Σ∗ , if and only if there exists two strings x, y ∈ Σ∗ , such
that w = xy and x ∈ L(M ) and y ∈ L(M 0 ).

Proof: If x ∈ L(M ) then there is an accepting trace (i.e., a sequence of states and inputs that show that x
is being accepted by M , and let the sequence of states be A = r0 , r1 , . . . , rα , and the corresponding input
sequence be x1 , . . . , xα ∈ Σ . Here x = x1 x2 . . . xα (note that some of these characters might be ).
Similarly, let A0 = r00 , r10 , . . . , rβ0 be accepting trace of M 0 accepting y, with the input characters y1 , y2 , . . . , yβ ∈
Σ , where y = y1 y2 . . . yβ .
Note, that by our assumption rα = f . As such, the following is accepting trace of w = xy for N :

r0 → r1 → r2 → · · · → rα → r00 → r10 → · · · → rβ0 .

x1 x2 xα y1 y2 yβ

Indeed, its a valid trace, as can be easily verified, and rβ0 ∈ F 0 (otherwise y would not be in L(M 0 ).
Similarly, given a word W ∈ L(N ), and accepting trace for it, then we can break this trace into two parts.
The first part is trace before using the transition f → A0 , and the other is the rest of the trace. Clearly, if
we remove this transition from the given accepting trace, we end up with two accepting traces for M and
M 0 , implying that we can break w into two strings x and y, such that x ∈ L(M ) and y ∈ L(M 0 ).

35.1.3 Sometimes non-determinism keeps the number of states small

Let Σ = {0, 1}. Remember that the smallest DFA that we built for
n o

L3 = x ∈ Σ∗ the third character from the end of x is a zero .

had 8 states. Note that the following NFA does the same job, by guessing position of third character from
the end of string.

219
0, 1

0 0, 1 0, 1
A B C D
M3 :

Q: Is there a language L where we have a DFA for L with smaller number of states that of any NFA for
L?
A: No. Because any DFA is also a NFA.

35.1.4 How to complement an NFA?

Given the NFA above M3 , it is natural to ask how to build a NFA for the complement language
n o

L(M3 ) = L3 = x ∈ Σ∗ the third character from the end of x is not zero .

Naively, the easiest thing would be to complement the states of the NFA. We get the following NFA M4 .
0, 1

0 0, 1 0, 1
A B C D
M4 :

But this is of course complete and total nonsense. Indeed, the language of L(M4 ) = Σ∗ , which is definitely
not L3 . Here is the correct solution.
0,1

1 0,1 0,1
A B C D
M5 :

The conclusion of this tragic and sad example is that complementing a NFA is a non-trivial task (unlike
DFAs where all you needed to do was to just flip the accepting/non-accepting states). So, for some tasks
DFAs are better than DFAs, and vice versa.

35.1.5 Sometimes non-determinism keeps the design logic simple

Consider the following language:

L = {x : x has 1111 or 1010 as a substring}

Designing a DFA for L, using the most obvious logic, we will have:
1 E
C 0, 1
0 1
1

1 0 0 G
A B 1
0 0
D 1
0 F

220
With NFA we can go this way:
0, 1

0, 1
1 0
C D E
0
1 0, 1
A B
1
1 1
C’ D’ E’
Note that the NFA approach is easily extendable to more than 2 substrings.

35.2 Pattern Matching

Suppose we wanted to build an NFA for the following pattern.

abc?(ba)∗ b
∗
Where ? represents a substring of length 1 or more and represents 0 of more of the previous expression.
The NFA for this pattern would be

a,b,c a,b,c ǫ

a b c a,b,c ǫ b a

35.3 Formal definition of acceptance

Recall that a finite automaton M accepts a string w means there is a sequence of states r0 , r1 , ..., rn in Q
where

1. r0 = q0

2. δ(ri , wi+1 ) = ri+1 , for i = 0, ..., n − 1 and

3. rn ∈ F

How do we formally show a string w is accepted by M . Lets show that the (automaton on page 1) accepts
the string 101.
We show that there must exists states r0 , r1 , ..., r3 statisfying the above three conditions. We claim that
the sequence A,B,E,G satisfies the three claims.

1. A = q0

221
2. δ(A, 1) = B
δ(B, 0) = E
δ(E, 1) = G
3. G ∈ F

222
Chapter 36

Discussion 4: More on
non-deterministic finite automatas
10 February 2008

Questions on homework 3?

Any questions? Complaints, etc?

36.1 Computing -closure

An NFA with some -transitions:

1
B C
ǫ
ǫ
0 0
A
1 1
1 D
1 E

0
And after removing -transitions:

223
0,1

0,1

0,1
0,1
0,1
B C
0,1
0,1
0,1 0
A
1 0,1 1
1 D
0,1 E

0
1
General rule: For every state Q and every a ∈ Σ, compute the set of all reachable states from Q when
we feed just character a.

36.2 Subset Construction

The NFA above converted into a DFA:
0,1

ABCDE
1

A 1 0
0

ABCD
Note that to convert an NFA to a DFA, here we first remove -transitions and then we apply subset
construction. You could reverse the order of these two operations (like what Sipser does), but note that the
new start state will be the -closure of old start state.

224
Chapter 37

Discussion 5: More on
non-deterministic finite automatas
17 February 2009

Questions on homework 4?
Any questions? Complaints, etc?

37.1 Non-regular languages

37.1.1 L(0n 1n )
n o

Lemma 37.1.1 The language L1 = 0n 1n n ≥ 0 is not regular.
n o

Proof: We remind the reader that we already saw that the language L(an bn ) = an bn n ≥ 0 is not
regular. As such, assume for the sake of contradiction that L1 is regular, and consider the homomorphism
h(0) = a and h(1) = b. Since regular languages are closed under homomorphism, and L1 is regular, it follows
that h(L1 ) is regular. However, h(L1 ) is the language
n o n o

h(L1 ) = h(0)n h(1)n n ≥ 0 = an bn n ≥ 0 = L(an bn ),

which is a contradiction, since L(an bn ) is not regular.

37.1.2 L(#a + #b = #c)

Let Σ = {1, 2, 3}, consider the following language
n o

L2 = x #a (x) + #b (x) = #c (x) ,

where #h (x) is the number of times the character h appears in x, for h = a, b, c.

Direct proof
n o

Lemma 37.1.2 The language L2 = x #a (x) + #b (x) = #c (x) , is not regular.

Proof: Assume for the sake of contradiction,

that L2 is regular, and let M = (Q, Σ, δ, q0 , F ) be a DFA that
accepts it. For i ≥ 0, let qi = δ q0 , ai be the state M is in, after reading the string ai (i.e., a string of length
i made out of as). We claim that if i 6= j, then qi 6= qj . Indeed, If there are i 6= j such that qi = qj , then we

225
have that M accepts the string cj , if we start from qj , since M accepts the string aj cj . But then, it must
be that M accepts ai cj . Indeed, after M reads ai it is in state qi = qj , and we know that it accepts cj , if
we start from qj . But this is a contradiction, since ai cj ∈
/ L2 , for i 6= j.
This implies that M has an infinite number of states, which is of course impossible.

By closure properties
Here is another proof of Lemma 37.1.2.
Proof: Assume, for the sake of contradiction, that L2 is regular. Then, since regular languages are closed
under intersection, and the language a∗ c∗ is regular, we have that L3 = L2 ∩ a∗ c∗ is regular. But L3 is
clearly the language n o

L3 = an cn n ≥ 0 ,

which is not regular. Indeed, if L3 was regular then f (L3 ) would be regular (by closure under homomor-
phism), which is false by Lemma 37.1.1, where f (·) is the homomorphism mapping f (a) = 0 and f (b) =
and f (c) = 1.

37.1.3 Not too many as please

Lemma 37.1.3 The language
n o
∗
L4 = an x n ≥ 1, x ∈ {a, b} , and x contains at most 2n a’s ..

is not regular.

We first provide a direct proof.

Proof: Assume for the sake of contradiction that L4 is regular, and let M = (Q, Σ, δ, q0 , F ) be a DFA
accepting L4 . Let qi be the state that M arrives to after reading the string ai b. Now, the word ai ba2i ∈ L4 ,
and as such, δ(qi , a2i ) is an accepting state. Similarly, ai ba2j ∈
/ L4 if j > i, since this string has too many as
in its second part. As such δ(qi , a2j ) is not accepting if j > i.
We claim that because of that qi 6= qj for any j > i. Indeed, δ(qi , a2j ) is not an accepting state, but
δ(qj , a2j ) is an accepting state. Thus qi 6= qj , but this implies that M has an infinite number of states, a
contradiction.

Proof using the pumping lemma

Using the pumping lemma we can show that L4 is not regular.
Proof: By contradiction, if L4 is regular, then it must satisfy the pumping lemma. Let p be the pumping
length guaranteed by pumping lemma. Looking at definition of L4 , the word ap ba2p ∈ L4 . Now we must
have this decomposition: ap ba2p = xyz, where |y| = 6 0 and |xy| ≤ p, such that for all k ≥ 0, sk = xy k z ∈ L4 .
But it is easy to see s0 6∈ L. Indeed, s0 has a run of p − |y| < p of as in its prefix, but after the b, it has a
string of length 2p made out of b, which is not in L4 . A contradiction.

37.1.4 A Trick Example (Optional)

Consider the following language. n o

L7 = wxwR w, x ∈ {0, 1}+ .

Is it regular or not? It seems natural to think that it is not regular. However, it is in fact regular. Indeed, L7
is the set of all strings where the first and last character are the same, which is definitely a regular language.

226
Chapter 38

Discussion 6: Closure Properties

20 February 2008

Questions on homework 5?
Any questions? Complaints, etc?

38.1 Closure Properties

38.1.1 sample one
For character a and language L, we define:

L/a = {xsepxa ∈ L}

This operation preserves regularity: consider a DFA for L with set of final states F . Redefine F as follows:
n o

F 0 = s δ(s, a) ∈ F } ∪ {t ∈ F : δ(t, a) = t} = {s : δ(s, a) ∈ F

and see that this a DFA for L/a.

38.1.2 sample two

Now consider this operation:
n o

min(L) = x x ∈ L and no proper prefix of x is in L .

Again consider a DFA for L and remove all outgoing transitions from final states of that DFA to have a NFA
for min(L).

38.1.3 sample three

Show that L is irregular: n o

L = an bm cr ds et n + m − r = s + t
n o

map b to a and c, d to e to obtain an en n ≥ 0 which we know is not regular.

227
Chapter 39

Discussion 7: Context-Free Grammars

March 3, 2009

Questions on homework or exam?

Any questions? Complaints, etc?

39.1 Context free grammar for languages with balance or without

it
39.1.1 Balanced strings
n o

Let La=b = an bn n ≥ 1 . Here is a grammar for this language

S → aSb | .

39.1.2 Mixed balanced strings

a=b be the language of all strings over {a, b}, with equal number of as and bs, where the as and bs
Let Lmix
might be mixed together.
We the following grammar for generating all strings that have the same numbers of a’s and b’s

S → aSb | bSa | SS | ,

where S is the start variable.

Advice To TA:: State the following lemma, and sketch its proof, but do not do the proof in the
discussion section. Point the interested students to the class notes. end

Lemma 39.1.1 We have L(G) = Lmix

a=b .

Proof: It is easy to see that every string that G generates has equal number of a’s and b’s. As such,
L(G) ⊆ Lmix
a=b .
We will use induction on the length of string x ∈ L(G), 2n = |x|. For n = 0 we can generate by G. For
n = 1, we can generate both ab and ba by G.
Now for n > 1, consider a balanced string with length 2n, x = x1 x2 x3 · · · x2n ∈ Lmix
a=b . Let #c (y) be the
number of appearances of the character c in the string y. Let αi = #a (x1 · · · xi ) − #b (x1 · · · xi ). Observe
that α0 = α2n = 0. If αj = 0, for some 1 < j < 2n, then we can break x into two words y = x1 . . . xj and
z = xj+1 . . . x2n that are both balanced. By induction, y, z ∈ L(G), and as such S ⇒∗ y and S ⇒∗ z. This
implies that
S ⇒ SS ⇒∗ yz = x.

228
Namely, x ∈ L(G).
The remaining case is that αj 6= 0 for j = 2, . . . , 2n − 1. If x1 = a then α1 = 1. As such, for all
j = 1, . . . , 2n − 1, we must have that αj > 0. But then α2n = 0, which implies that α2n−1 = 1. We conclude
that x1 = a and x2n = b. As such, x2 . . . x2n−1 is a balanced word, which by induction is generated by L(G).
Thus, the x can be derived via S → aSb ⇒∗ ax2 x3 . . . x2n−1 b = x. Thus, x ∈ L(G).
The case x1 = b is handled in a similar fashion, and implies that x ∈ L(G) also in this case. We conclude
that Lmix
a=b ⊆ L(G).
Thus Lmix a=b = L(G).

39.1.3 Unbalanced pair

Consider the following language:
n o

La6=b = an bm n 6= m and n, m ≥ 0 .

If n 6= m then either n > m or m > n, therefore we can design this grammar by first starting with the basic
grammar for when n = m, and then transition into making more a’s or b’s.
Let X be the non terminal representing “choosing” to generate more a’s than b’s and Y be the non-terminal
for the other case. One grammar that generates La6=b will therefore be:

S → aSb | aA | bB, A → aA | , B → bB | .

39.1.4 Balanced pair in a triple

Consider the language
n o

L4 = ai bj ck i = j or j = k .

We can essentially combine two copies of the previous grammar (with one version that works on b and c) in
order to create a grammar that generates L2 :

S → Sa=b C | ASb=c

Sa=b → aSa=b b | . Sb=c → bSb=c c | . A → Aa | C → Cc | .

n o

Exercise 39.1.2 Derive a CFG for the language L04 = ai bj ck i = j or j = k or i = k .

39.1.5 Unbalanced pair in a triple

Now consider the related language
n o

L2 = ai bj ck i 6= j or j 6= k .

We can essentially combine two copies of the previous grammar (with one version that works on b and c) in
order to create a grammar that generates L2 :

S → Sa6=b C | ASb6=c Sa6=b → aSa6=b b | aA | bB. Sb6=c → bSb6=c c | bB | cC.

A → Aa | B → Bb | C → Cc | .

229
39.1.6 Anything but balanced
n o

Let Σ = {a, b}, and let Let La=b = Σ∗ \ an bn n ≥ 1 .
The idea is that lets first generate all words that contain b in them, and then later the contain a. The
grammar for this language is
S1 → ZbZaZ Z → aZ | bZ | .
Clearly L(Z) ⊆ La=b . The only words we miss, must have all their as before their bs. But these are all
words of the form ai bj , where i 6= j ≥ 0. But we already saw how to generate such words in Section 39.1.3.
Putting everything together, we get the following grammar.

⇒S → S1 | Sa6=b
S1 → ZbZaZ
Z → aZ | bZ |
Sa6=b → aSa6=b b | aA | bB,
A → aA | ,
B → bB | .

39.2 Similar count

Consider the language n o
∗
L = w0n w ∈ {a, b} and #a (w) = n ,

where #a (w) is the number of appearances of the character a in w. The grammar for this language is

S → | bS | aS0.

39.3 Inherent Ambiguity

In lecture, the following ambiguous grammar representing basic mathematical statements was discussed:

E→E∗E|E+E N → 0N | 1N | 0 | 1.

The ambiguity caused because there is no inherent preference from combining expressions with ∗ over +
or vise versa. It was then fixed by introducing a preference :

E → E ∗ E | T, T→N ∗T |N N → 0N | 1N | 0 | 1.

However some languages are inherently ambiguous, no context free grammar without ambiguity can generate
it.
Consider the following language:
n o n o

L = an bn ck dk n, k ≥ 1 ∪ an bk ck dn n, k ≥ 1 .

In other words, it is the language of a+ b+ c+ d+ where either:

1. the number of a’s equals the number of b’s and the number c’s equals the number of d’s

2. the number of a’s equals the number of d’s and the number of b’s equals the number of c’s

One ambiguous grammar that generates it

S → XY | Z, X → aXb | , Y → cYd | , Z → aZd | T, T → bTc | .

230
The
n reason why allo grammars for this language must be ambiguous can be seen in strings of the form
n n n n
a b c d n ≥ 1 . Any grammar needs some way of generating the string in a way that either the a’s
and b’s are equal and the c and d’s are equal or the a’s and d’s are equal and the b’s and c’s are equal.
When generating equal a’s and b’s, it must be still possible to have the same number of c’s and d’s. When
generating equal a’s and d’s , it must still be possible
n to have the same
o number of b’s and c’s. No matter
n n n n
what grammar is designed, any string of the form a b c d n ≥ 1 must have at least two possible parse
trees.
(This is of course only an intuitive explanation. A formal proof that any grammar for this language must
be ambiguous is considerably more tedious and harder.)

39.4 A harder example

Consider the following language:
n o

L = xy x, y ∈ {0, 1}∗ where |x| = |y|, and x 6= y .

It should be clear that this language cannot be regular. However, it may not be obvious that we can in
fact design a context free grammar for it. x and y are guaranteed to be different if, for some k, the kth
character is 0 in x and 1 in y (or vise versa). It is important to notice that we should not try to build x
and y separately as, in a CFG, we would have no way to enforce them being of the same length. Instead,
we just remember that if the string is of length 2n, the first n characters are considered x and the second n
characters are y. Similarly, notice that we cannot choose k ahead of time for similar reasons.
So, consider the following string

w = x1 x2 . . . xk−1 1 xk+1 . . . xn y1 y2 . . . yk−1 0 yk+1 . . . yn ∈ L.

Then we can rewrite this string as follows

k−1 chars n−k chars k−1 chars n−k chars
z }| { z }| { z }| { z }| {
w = x1 x2 . . . xk−1 1 xk+1 . . . xn y1 y2 . . . yk−1 0 yk+1 . . . yn .

In particular, let z1 z2 . . . zn−1 = xk+1 . . . xn y1 y2 . . . yk−1 . Then,

k−1 chars n−k chars
z }| { z }| {
w = x1 x2 . . . xk−1 1z1 z2 . . . zn−1 0 yk+1 . . . yn
k−1 chars n−1−k+1 chars
z }| { k−1 chars z }| { n−k chars
z }| { z }| {
= x1 x2 . . . xk−1 1 z1 . . . zk−1 zk . . . zn−1 0 yk+1 . . . yn .
| {z } | {z }
=X =Y

Now, X is a word of odd length with 1 in the middle (and we definitely know how to generate this kind of
words using context free grammars). And Y is a word of odd length, with 0 in the middle. In particular,
any word of L can be written as either XY or Y X, where X and Y are as above. We conclude, that the
grammar for this language is

S → XY | YX X → DXD | 1 Y → DYD | 0 D → 0 | 1.

231
Chapter 40

Discussion 8: From PDA to grammar

5 March 2008

Questions on homework 7?
Any questions? Complaints, etc?

40.1 Converting PDA to a Grammar

Note that the following PDA is designed such that it has the required properties for converting into a
grammar:
1. It has a single final state.

2. It empties the stack before accepting.

3. Each transition just pushes one symbol or pops one symbol and not both or none.
Note that its language is L = {an bn : n ≥ 1}.
a, ǫ → a b, a → ǫ

ǫ, ǫ → $ b, a → ǫ ǫ, $ → ǫ
p q r s
The equivalent grammar is (note that in this case we can simplify it to get our familiar grammar for L):

Aps → Aqr Aps is the start state

Aqr → aAqq b
Aqr → aAqr b
App →
App →
Aqq →
Arr →
Ass →
Axyz → Axz Azy for all x, y, z ∈ {p, q, r, s} (64 rules)

232
Chapter 41

Discussion 9: Chomsky Normal Form

and Closure Properties
12 March 2008

Questions on homework 8?
Any questions? Complaints, etc?

41.1 Chomsky Normal Form

We want to write this grammar in CNF:

S → ASA | aB
A→ B |S
B→

After removing nullable variables and unreacheables:

S → AS | SA | ASA | a
A→ S

Now we remove unit rules:

S → AS | SA | ASA | a
A → AS | SA | ASA | a
returning back to the language:
S → AS | SA | ASA | a |
A → AS | SA | ASA | a
finally:
S → AS | SA | ATSA | a |
A → AS | SA | ATSA | a
TSA → SA

41.2 Closure Properties

CFL’s are closed under substitution, union, concatenation, Kleene closure and positive closure, reversal,
intersection with a regular language, and inverse homomorphism.
Consider L = {an b3n c2n : n ≥ 0}. Assume L is CFL. Consider this homomorphism: h(0) = a, h(1) =
bbb, h(2) = cc. h−1 (L) = {0n 1n 2n : n ≥ 0} and must be CFL, but we know that this is not a CFL. So by
contradiction we have shown that L is not a CFL.

233
Chapter 42

Discussion 10: Pumping Lemma for

CFLs
26 March 2008

Questions on homework 9?
Any questions? Complaints, etc?

42.1 Pumping Lemma for CFLs

Show that the following is not a CFL:
n o

L = ww w ∈ {a, b}∗ .

Assume L is a CFL with pumping length p. Consider s = ap bp ap bp ∈ L. Let s = xyzwv be the

decomposition guaranteed by pumping lemma. We have several cases:
1. yzw is in the first run of a’s: xy 2 zw2 v 6∈ L (why?).
2. yzw is between the first run of a’s and b’s: xzv 6∈ L (why?)
3. yzw is in the first run of b’s: xy 2 zw2 v 6∈ L (why?).

4. yzw is between the first run of b’s and second run of a’s: xzv 6∈ L (why?)
5. yzw is in the second run of a’s: xy 2 zw2 v 6∈ L (why?).
6. yzw is between the second run of a’s and b’s: xzv 6∈ L (why?)

7. yzw is in the second run of b’s: xy 2 zw2 v 6∈ L (why?).

Questions on Practice Midterm

234
Chapter 43

Discussion 11: High-level TM design

7 April 2008

43.1 Questions on homework?

Any questions? Complaints, etc?

43.2 High-level TM design

43.2.1 Modulo
Question 43.2.1 Design a TM that given $0a on tape
1 and $0b on tape
2 , writes 0a mod b on tape
3 .
Solution: The idea is to check off every b characters from
1 and copy the remaining characters to
2 .
We assume that ␣ is the blank symbol on tapes and S indicates that the head remains stationary.

0, 0, ? → 1, 0, ?, R, R, S ?, 0, ? →?, 0, ?, S, L, S

$, $, ? → $, $, ?, R, R, S 0, ␣, ? → 0, ␣, ?, S, L, S
n0 n1 n2

␣, 0/␣, ? →?, ?, ?, L, S, S ?, ?, ? → $, ?, ?, R, S, S

1, ?, ␣ → 1, ?, 0, L, S, R n4
n3
$, ?, ? →?, ?, ?, S, S, S

n5
?, $, ? →?, $, ?, L, R, S

The basic idea is that immediately after we move the head of 2 back to the beginning of the tape (state
n2 ), we write a $ on the first tape (i.e., transition from n3 to n1 ). Thus, conceptually, every time this loop
is being performed a block of b charactere of 0 are being chopped of 1 .

To use this box as a template in our future designs (less and more like Macros in C++), we name this

Mod( 1 , 2 , 3 ).

235
L1

L2 :
carry= 0 1, 0, ␣ → 1, 0, 0, R, R, R
L1 :

␣, ␣, ␣ → ␣, ␣, ␣, S, S, S
0, 1, ␣ → 0, 1, 0, R, R, R
0, 0, ␣ → 0, 0, 0, R, R, R 1, 1, ␣ → 1, 1, 1, R, R, R
1, 1, ␣ → 1, 1, 0, R, R, R L3 1, 0, ␣ → 1, 0, 1, R, R, R ␣, 1, ␣ → ␣, 1, 0, S, R, R
0, 1, ␣ → 0, 1, 1, R, R, R 1, ␣, ␣ → 1, ␣, 0, R, S, R
L2 carry= 1 ␣, 0, ␣ → ␣, 0, 0, S, R, R
␣, 1, ␣ → ␣, 1, 1, S, R, R
0, ␣, ␣ → 0, ␣, 0, R, S, R L3 :
␣, ␣, ␣ → ␣, ␣, 1, S, S, S 1, ␣, ␣ → 1, ␣, 1, R, S, R 0, 0, ␣ → 0, 0, 1, R, R, R
␣, 0, ␣ → ␣, 0, 1, S, R, R
qacc 0, ␣, ␣ → 0, ␣, 1, R, S, R

Figure 43.1: The TM for the binary addition.

43.2.2 Multiplication
Question 43.2.2 Design a TM that given $0a on tape
1 and $0b on tape
2 , writes 0ab on tape
3 .
Solution: The idea is to attach a copies of tape
2 at the end of tape
3 .

n0
?, 0, ␣ →?, 0, 0, S, R, R
$, $, ? → $, $, ?, R, R, S
0, ?, ? → 0, ?, ?, R, S, S
n1 n2

?, ␣, ? →?, ␣, ?, S, L, S
?, $, ? →?, $, ?, S, R, S
n3
␣, ?, ? →?, ?, ?, S, S, S
?, 0, ? →?, 0, ?, S, L, S
n4

To use this box as a template in our future designs (less and more like Macros in C++), we name this

Mult( 1 , 2 , 3 ).

43.2.3 Binary Addition

Question 43.2.3 Design a TM that given w1R on tape R
1 and w2 on tape
R
2 , writes w3 on tape 3,

∗
where w1 , w2 , w3 ∈ {0, 1} and w3 is the binary addition of w1 and w2 .

Thus, if 1 = 01 (i.e., this is the number 102 = 2 and 2 = 1011 (i.e., this is the binary number
11012 = 13 then the output should be 1111 (which is 15).

236

copy from beginning till $ from 1 to

copy from $ on tape 1 till end to 3

/***
2 = 0a and
2 = 0b ***/
CLEAR(
1 )
CLEAR(
7 )
Mod(
2 ,
3 ,
5 )
/***
5 = 0a mod b ***/

do
if EQ(t7 , t5 ) then accept.

if GREQ( 1 , 3 ) then reject.

add one 0 at the end of
1

COPY( 1 , t4 )

Mult( 1 , t4 , t6 )

Mod(t6 , 3 , t7 )
while true

Figure 43.2: The algorithm for c2 = a mod b.

Solution: We sum starting from the least significant digit, the normal procedure. See Figure 43.1. Here,
a transition of the form 0, 1, ␣ → 0, 1, 0, R, R, R stands for the situation where the TM reads 0 on the first
tape, 1 on the second tape and ␣ on the third tape, next it writes 0, 1 and 0 to these three tapes respectively,
and move the three heads to the right.

43.2.4 Quadratic Remainder

Question 43.2.4 Design a TM that given 0a $0b leaves only 0c on the tape where c ∈ N0 is the smallest
number such that c2 ≡ a mod b. If such a c does not exist, the TM must reject.
Solution: To solve this problem we assume that we have some other macros in addition to those we have
built already.
• CLEAR(t): write ␣ on all the non-␣ characters of the tape t till it encounters a ␣. Then returns the
head to the beginning of t.
• EQ(
1 ,
2 ): checks whether the number of non-space characters on
1 and
2 are the same.
• GREQ(
1 ,
2 ): checks whether the length of string of tape
1 is more than or equal to that of
2 .
• COPY(
1 ,
2 ): copies content of tape
1 to tape
2 .
Note that our TM just needs to check for c among {0, 1, 2 · · · , b − 1}. We should serialize our macros
in the following way (instead of a diagram, we write pseudo-code which is more readable). We use 7 tapes.
Figure 43.2 depicts the resulting TM.

43.3 MTM
Question 43.3.1 Show that an MTM (an imaginary more powerful TM) whose head can read the character
under the head and the character to the left of the head (if such a character does not exist, it will read a ␣,
i.e. blank character) and can just rewrite the character under the head, is equivalent to a normal TM.
Solution: First of all it is obvious that an MTM can simulate a TM since it can ignore the extra
information that it can read using its special head.
Now observe that a TM can simulate an MTM this way: For making a move using the transition function
of MTM, the TM that simulates it must read the character under the head (which a normal TM can) and the

237
character to the left of head (which a normal TM can’t). What the simulating TM does is that it remembers
the current state of MTM in its states (note that we have done several times, this kind of “remembering
finite amount of information inside states by redefining states, e.g. extending them to tuples” in class),
brings the head to the left and reads that character and remembers it inside its states, moves the head to
the right, so now head is in its original place and our TM knows the missed character and can perform the
correct move using MTM’s transition function.

238
Chapter 44

Discussion 12: Enumerators and

Diagonalization
14 April 2009

Questions on homework 8?
Any questions? Complaints, etc?

44.1 Cardinality of a Set

For a finite set X, we denote by |X| the cardinality of X; that is, the number of elements in A.

Definition 44.1.1 For two arbitrary sets (maybe infinite) X and Y , we have |X| ≤ |Y |, iff there exists an
injective mapping f : X → Y .

Definition 44.1.2 Two arbitrary sets (maybe infinite) X and Y , are of the same cardinality (i.e., same
“size”) if |A| = |B|. Namely, there exists an injective and onto mapping f : X → Y .

Observation 44.1.3 For two sets X and Y , if |X| ≤ |Y | and |Y | ≤ |X| then |X| = |Y |.

For N, the set of all natural numbers, we define |N| = ℵ0 . Any set X, with |X| ≤ ℵ0 , is referred to as a
countable set.

Claim 44.1.4 For any set A, we have |X| < |P(X)|. That is, |X| ≤ |P(X)| and |P(X)| =
6 |X|.
(Here P(X) is the power set of X.)

Proof: It is easy to verify that |X| ≤ |P(X)|. Indeed, consider the mapping h(x) = {x} ∈ P(X), for all
x ∈ X.
So, assume for the sake of contradiction thatn|X| = |P(X)|,
andolet f be a one-to-one and onto mapping

from X onto P(X). Next, consider the set B = x ∈ X x ∈ / f (x) .
Now, consider element b = f −1 (B), and consider the question of whether it is a member of the set B
or not. Now if f −1 (B) = b ∈ B, then by the definition of B, we hvae bf −1 (B) = b ∈ / B. Similarly, if
f −1 (B) = b ∈
/ B, then by definition of B, we have f −1 (B) = b ∈ B.
A contradiction. We conclude that our assumption that f exists (since X and P(X) have the same
cardinality) is false. We conclude that |X| =
6 |P(X)|.

239
Definition 44.1.5 An enumerator T for a language L is a Turing Machine that writes out a list of all
strings in L. It has no input tape, only an output tape on which it prints the strings, with some separator
character (say, #) printed between them.
The strings can be printed in any order and the enumerator is allowed to print duplicates of a string it
already printed. However, sooner or later all strings in L must be printed eventually by T. Naturally, all the
strings printed by T are in L.

44.2 Rationals are enumerable

Consider the set of rational numbers
n o

Q = a/b a ∈ Z, b ∈ N, b 6= 0, and a, b are relatively prime .

We remind the reader that two natural numbers a and B are relatively prime (or coprime) if they
have no common factor other than 1 or, equivalently, if their greatest common divisor is 1. Thus 2 and 3 are
coprime, but 4 and 6 are not coprime. Thus, although 2/3 = 4/6, we will consider only the representation
2/3 to be in the set Q.
We show that this set is enumerable by giving the pseudo-code for an enumerator for it.

EnumerateRationals
for i = 1 . . . ∞ do
for x = 0 . . . i − 1 do
y =i−x
if x, y are relatively prime then
print x/y onto the tape followed by #
print −x/y onto the tape followed by #.

It is obvious that every rational number will be enumerated at some point. Any rational number is of
the form a/b and as such when i = a + b an d y = b it will enumerate this rational number.
It helps to picture this as travelling along each line x + y = i.

44.3 Counting all words

Consider a finite alphabet Σ, and consider the problem of enumerating all words in Σ∗ . That is, we want
to come up with a way to be able to compute the ith word in Σ∗ (let denote it by wi ). We want to do it
in such a way that given a word w we can compute the i such that wi = w, and similarly, given i we can
compute wi .
i
To this end, let Σi be all the words in Σ∗ of length exactly i. Clearly, |Σi | = |Σ| . We sort the words inside
Σi lexicographically. As such, we can now list all the words in Σ∗ ∗, by first listing the words in Σ0 = {},
Σ1 = Σ, and so on.
For example, for Σ = {a, b}, we get the following enumeration of the words of Σ∗ :
=w3 =w5 =w6 =w7 =w9 w10 w11 w12 w13 w14 w15
z}|{ z}|{ z}|{ z}|{ z}|{ z}|{ z}|{ z}|{ z}|{ z}|{ z}|{ z}|{ z}|{ z}|{ z}|{
=w1 =w2 =w4 =w8
, a , b
|{z} | {z } | , aa , ab , ba , bb , aaa , aab , aba , abb , baa , bab , bba , bbb , . . .
{z } | {z }
Σ0 Σ1 Σ2 Σ3

It is now easy to verify that given w ∈ Σ∗ , we can figure out the i such that wi = w. Similarly, given i,
we can output the word wi .
We just demonstrated that there is a one-to-one and onto mapping from N to Σ∗ , and we can conclude
the following.

Lemma 44.3.1 For a finite alphabet Σ, the set Σ∗ is countable.

240
44.4 Languages are not countable
Let n o
∗
Lall = L L is some language, and L ⊂ {a, b} .

Claim 44.4.1 The set Lall is not countable.

Proof: We show that this set is not countable by using a diagonalization argument. Assume for the sake
of contradiction that Lall is countable. Then there exists a one-to-one and onto mapping g : N → Lall . Let
Li = g(i), for all i.
We can think about this mapping as follows. We create an infinite table where the ith row is the language
of Li . We also enumerate the columns, where the ith column is the ith word in Σ∗ (use the above enumeration
scheme. We write 1 in the ith row and jth column of this table if wj is in the language Li .
Consider the diagonal language of this table:
w1 w2 w3 w4 ...
L1 1 1 0 0 ...
L2 0 1 0 1 ...
L3 1 0 1 1 ...
L4 0 1 0 0 ...
.. .. .. .. .. ..
. . . . . .
n o

Formally, Ld = wi wi ∈ Li , i ≥ 0 .
Let Ldiag be the complement of Ld . Clearly, Ldiag is well defined and Ldiag ∈ Lall . But then, there must
exist a k such that Lk = g(k) = Ldiag .
So consider the kth row in this table (i.e., this is the row that corresponds to the language Lk ). We
consider if the word wk ∈ Lk . If wk ∈ Lk = Ldiag then the entry (k, k) in the table must be 1. But in that
case wk ∈ Ld , and as such wk ∈ / Ldiag . This is impossible.
The other possibility is that wk ∈/ Lk = Ldiag , then the entry at position (k, k) in the table must be 0.
But in that case, the wk ∈/ Ld , but then such wk ∈ Ldiag , which is again impossible.
A contradiction. We conclude that our assumption that Lall is countable is false.

241
Chapter 45

Discussion 13: Reductions

16 April 2008

Questions on homework 12?

Any questions? Complaints, etc?

45.1 Easy Reductions

Question: Reduce ATM to L:
n o

L = M M accepts 00 and doesn’t accept 11 .

Solution: Let N be a decider for L. To decide hM, wi ∈ ATM , build a new TM M 0 (with input x) that
simulates M on w and if M accepts, and if x is 00, accepts (how can it verify this?), otherwise rejects. Now
observe that:
hM, wi ∈ ATM ⇐⇒ M 0 ∈ L
Lefthand side can be decided using N .

Question: Reduce ATM to L:

L = {M : L(M ), and L(M ) are both infinite}

Solution: Let N be a decider for L. To decide hM, wi ∈ ATM , build a new TM M 0 (with input x) that
simulates M on w and if M accepts, and if |x| is even, accepts (how can it verify this?), otherwise rejects.
Now observe that:
hM, wi ∈ ATM ⇐⇒ M 0 ∈ L
Lefthand side can be decided using N .

242
Chapter 46

Discussion 14: Reductions

29 April 2009

Questions on homework ?
Any questions? Complaints, etc?

46.1 Undecidability and Reduction

n o

(Q1) Prove that the language ODDTM = T L(T) has odd length strings is undecidable.

Solution:
Proof: We assume, for the sake of contradiction, that ODDTM is decidable. Given hT, wi, consider the following procedure.

Z(x):
if x = ab then
Accept
r ← Simulate T on w
if r =Accept then Accept

Reject.
Thus L(Z) = Σ∗ (which includes odd length strings) if T accepts w, and L(Z) = {ab} (a set of even length strings) otherwise.
We reduce ATM TM to ODDTM .
Suppose isOdd(T) is a decider for ODDTM . We build the following decider for ATM .

decider10 -ATM (T, w):

N ← compute the encoding of Z(x) above;
return isOdd(N )

Clearly, this is a decider for ATM , which is impossible. We conclude our assumption is false, and thus ODDTM is not decidable.

n o

(Q2) Prove that the language SUBSETTM = hM, N i L(M ) ⊆ L(N ) is undecidable. Hint: Reduce
n o

EQTM = hM, N i L(M ) = L(N ) to SUBSETTM for this purpose.

Solution:
Assume, for the sake of contradiction, that isSubset(M, N ) is a decider for SUBSETTM . Then, we have the following decider
for EQTM .
DeciderEQTM(M, N ):
if isSubset(M, N ) and isSubset(N, M ) then
return Accept
else
return Reject.

243
However, we already know that EQTM is undecidable. A contradiction.

(Q3) Assume that the languages L1 and L2 are recognizable languages, where L1 ∪ L2 = Σ∗ . Now, assume
you have a decider for the language L1 ⊕ L2 . Show that L1 is decidable.

Solution:
Let ORACxor be a decider for L1 ⊕ L2 and let T1 (resp. T2 ) be a machine that recognizes L1 (resp. L2 ).

Decider1 (w)
r ← Simulate ORACxor on w.
x1 = start a simulation of T1 on w
x2 = start a simulation of T2 on w
while (true)
Advance x1 and x2 by one step
if x1 accepts then accept
if x1 rejects then reject
if x2 accepts then
return not r
if x2 rejects then
return r
‘

Since both L1 and L2 are recognizable, and L1 ∪ L2 = Σ∗ , it follows that one of the two simulations x1 and x2 must stops.
This implies that the above procedure is indeed a decider. It is now an easy case analysis to verify that this is indeed a decider
for L1 .
Indeed, if x1 accepts than w ∈ L1 and we are done. Similarly, if x1 rejects than w ∈
/ L1 and we are done.
If x2 accepts, but r =reject this implies that w ∈ L2 and w ∈
/ L1 ⊕ L2 . Namely, it must be that w ∈ L1 and we accept.
If x2 accepts, but r =accept this implies that w ∈ L2 and w ∈ L1 ⊕ L2 . Namely, it must be that w ∈
/ L1 and we reject.
If x2 rejects, but r =reject this implies that w ∈
/ L2 and w ∈
/ L1 ⊕ L2 . This implies that w ∈
/ L1 ∪ L2 , which is impossible in
our case (so this case never happens.)
If x2 rejects, but r =accept this implies that w ∈
/ L2 and w ∈ L1 ⊕ L2 . This implies that w ∈ L1 , and we accept.
n o

(Q4) Let EQTM = hM, N i L(M ) = L(N ) . Reduce ATM to EQTM as another way to prove that EQTM is
undecidable

Solution:
For a given hT, wi, consider Tw defined as follows
Tw (y) :
if y 6= w then
reject
r ←Simulate T on w
return r
And, consider Nw defined as follows:
Nw (y) :
if y = w then
accept
else
reject
Let deciderEQ be a decider for EQTM . We can design a decider for ATM as follows using Tw and Nw :
DeciderA (hT, wi):
Compute hTw i and hNw i from hT, wi.
r ←Simulate deciderEQ on hTw , Nw i.
return r
Nw only accepts w and Tw accepts nothing if T does not accept w, and {w} otherwise. Therefore the two languages will be
equal iff T accepts w.

(Q5) Prove that the following language is not recursively enumerable (namely, it is not recognizable)
n o

L = hTi T is a TM, and L(T) is infinite .

244
Solution:
We reduce the not recognizable language ATM to L. Let us check the following routine first (fix T and w):
Tw (x) :
Simulate T(w) for |x| steps.
if simulation above does not accept then
accept
else
reject
Observe that L(Tw ) is infinite iff T(w) 6=accept . So now we have the following reduction: Assume, for the sake of contradiction,
that we are given a recognizer recogL. We get the following recognizer for ATM .
recognizerA (hM, wi):
TM
Compute hTw i
return recogL(hTw i)
So, if T does not accept w, then L(Tw ) = Σ∗ and then recogL(hTw i) would stop and accept.
If T accepts w, then the language L(Tw ) is finite, and then recogL(hTw i) might reject (or it might run forever. If recogL(hTw i)
halts and rejects, then recognizerA would reject.
TM

In any case, we got a recognizer for ATM which is impossible, since this language is not recognizable.

245
Chapter 47

Discussion 15: Review

6 May 2009

Questions on homework ?
Any questions? Complaints, etc?

47.1 Problems
(Q1) If B is regular (or CFL), and A ⊆ B, can we deduce that B is regular (or CFL)?

Solution:
No, every language is a subset of Σ∗ which is regular. More interesting sample is
n ˛ o n ˛ o n ˛ o
L1 = an bn cn ˛ n ≥ 0 ⊆ L2 = an bn ck ˛ n, k ≥ 0 ⊆ L3 = ai bj ck ˛ i, j, k ≥ 0 .
˛ ˛ ˛

The language L3 is regular, L2 is not regular but is a CFL and L1 is not a CFL.
n o

(Q2) Give a direct proof why L = ww w ∈ Σ∗ is not regular.

Solution:
Proof: Assume, for the sake of contradiction, that this language is regular, and let D = (Q, Σ, qinit , δ, F ) be a DFA for it.
Consider the strings wi = ai b, for i = 1, . . . , ∞. Let qi = δ(qinit , wi ), for i = 1, . . . , ∞. We claim all the qi s are distinct. Indeed,
if for some i 6= j, we had qi = qj then

q 0 = δ(qi , aj b) = δ(δ(qinit , ai b), aj b) = δ(qinit , ai baj b) ∈

/ F,
and q 0 = δ(qj , aj b) = δ(δ(qinit , aj b), aj b) = δ(qinit , aj baj b) ∈ F.
Which is impossible. As such, the number of states of D is infinite. A contradiction.
n o

(Q3) Show that L = hG1 , G2 , ki ∃w, |w| ≤ k, w ∈ L(G1 ) ∩ L(G2 ), and G1 , G2 are CFGs is decidable.

Solution:
For a given G1 , G2 and k, the TM produces all strings w of length at most k and checks whether each string is in both L(G1 )
and L(G2 ) (first we convert the grammar into CNF and then we use CYK algorithm).
n o

(Q4) Show that L = hG1 , G2 i L(G1 ) \ L(G2 ) = ∅ (G1 and G2 are grammars) is undecidable.

Solution: n ˛ o
We can reduce EQCF G = hG1 , G2 i ˛ L(G1 ) = L(G2 ) to L. If we have a decider for L, we can build a decider for EQCFG by
˛

querying it once for hG1 , G2 i and once for hG2 , G1 i. (Note that for any two sets A and B we have that A = B if and only if
A \ B = ∅ and B \ A = ∅.)

246
n o

(Q5) Show that L = hP, wi P is an RA and w ∈ L(P) is decidable.

Solution:
Given hP, wi, the decider converts P into a CFG G (remember we have seen the algorithm for this) and then converts G into CNF
(we also saw this algorithm), and finally using CYK it detects if w ∈ L(G) or not.

(Q6) Assume we have a language L and a TM T with the following property. For every string w with length
at least 2, T(w) halts and outputs k strings w1 , · · · , wk where |wi | < |w| for all i (k is not a constant
and depends on string w). We know that w ∈ L iff for all i, wi ∈ L.
Assuming that 0 ∈ L, 1 6 inL, 6∈ L, design a decider for set L (you can use T as a subroutine) and
prove that your decider works.

Solution:
DeciderL (w):
if |w| ≤ 1 then
if w = 0 then
return Yes
else
return No
w1 , · · · , wk ← T(w)
for i = 1, . . . , k do
if DeciderL (wi ) =No then
return No
return Yes

Claim 47.1.1 For any string w ∈ Σ∗ , we have that DeciderL (w) halts, and furthermore DeciderL (w) = Yes ⇐⇒ w ∈ L.

Proof: We prove by induction on the length of w.

For the base case, when |w| ≤ 1, it is easy to see that lines 1,2,3 of the algorithm, return the correct result immediately. And T
halts in this case.
If |w| > 1, and assume inductively that the claim holds for any string strictly shorter than w. For w, the algorithm computes
(by using a decider) a finite number of strings w1 , . . . wk , and call recursively on each one of these strings. By induction, we
know that each one of these recursive calls returns, since the call is done on shorter input strings. As such, the DeciderL (w)
stops on w.
Furthermore, it returns true iff DeciderL (wi ) = Y es for all i. But by induction hypothesis this happens exactly when
M (wi ) = Y es for all i. By the property of machine M , this happens iff w ∈ L, which is the desired result.

247
Part III

Exams

248
Chapter 48

Exams – Spring 2009

250
48.1 Midterm 1 - Spring 2009
6 May 2009

INSTRUCTIONS (read carefully)

• Fill in your name, netid, and discussion section time below. Also write your netid on the other pages
(in case they get separated).

NAME:
NETID: DISC:

• There are 7 problems. Make sure you have a complete exam.

• The point value of each problem is indicated next to the problem, and in the table below.

• Points may be deducted for solutions which are correct but excessively complicated, hard to understand,
or poorly explained. Please keep your solutions short and crisp.

• The exam is designed for one hour, but you have the full two hours to finish it.

• It is wise to skim all problems and point values first, to best plan your time.

• This is a closed book exam. No notes of any kind are allowed. Do all work in the space provided, using
the backs of sheets if necessary. See the proctor if you need more paper.

• Please bring any apparent bugs to the attention of the proctors.

• After the midterm is over, discuss its contents with other CS 373 students only after verifying that
they have also taken the exam (e.g. they aren’t about to take the conflict exam).

• We indicate next to each problem how much time we suggest you spend on it. We also suggest you
spend the last 25 minutes of the exam reviewing your answers.

Problem 1: Short Answers (8 points)

[10 minutes.]
The answers to these problems should be short and not complicated.

(A) If a DFA M has k states then M must accept some word of length at most k − 1. True or false?

True False

And why? (At most 20 words.)

(B) Let L be a finite language. Is the complement language L regular?

Yes No

And why? (At most 30 words.)

251
(C) Suppose that L is a regular language over the alphabet Σ = {a, b}. And consider the language
n o

L0 = 0i i = |w| , w ∈ L .

Is the language L0 regular?

Yes No

And why? (At most 40 words.)

(D) a∗ ∅ = a∗ . True or False?

True False

n o

(E) Let L = wx w ∈ Σ∗ , x ∈ Σ∗ , |w| = |x| . Is L regular?

Yes No

T∞
(F) Let Li be a regular language, for i = 1, . . . , ∞. Is the language i=1 Li always regular? True or false?

True False

(G) If L1 and L2 are two regular languages that are accepted by two DFAs D1 and D2 , respectively, each
with k1 and k2 states, then the language L1 \ L2 can always be recognized by a DFA with k1 ∗ k2 states?
True or false?

True False

(H) The minimum size NFA for a regular language L, always has strictly fewer states then the minimum
size DFA for the language L. True or False?

True False

Problem 2: NFA transitions (6 points)

[5 minutes.]
Suppose that the NFA N = (Q, {a, b} , δ, q0 , F) is defined by the following state diagram:

b
q1 a q2 c q3 a q4

b c b
b
b
a,b
q7 q6 q5
b
Fill in the following values:
(A) F =
(B) δ(q2 , a) =
(C) δ(q3 , b) =

252
(D) δ(q6 , c) =
n o

(E) List the members of the set q ∈ Q q5 ∈ δ(q, b) :

(F) Does the NFA accepts the word acbacbb? (Yes / No)

Yes No

Problem 3: NFA to DFA (6 points)

[10 minutes.]
Convert the following NFA to a DFA recognizing the same language, using the subset construction. Give
a state diagram showing all states reachable from the start state, with an informative name on each state.
Assume the alphabet is {a, b}.

a
q
a
a,b a
z p
a s
b
a
r
b
a t

Problem 4: Modifying DFAs (6 points)

[15 minutes.]
Suppose that M = (Q, Σ, δ, q0 , F ) is a DFA, where Σ = {a, b}. Using M , we define a new DFA
M 0 = (Q ∪ {r, s} , Σ ∪ {#} , δ 0 , q0 , F 0 ) ,
where F 0 = {s} and the new transition function δ 0 is defined as follows


 δ(q, t) q ∈ Q and t ∈ Σ


 s q ∈ F, t = #
δ 0 (q, t) = q0 q ∈ Q \ F, t = #



 r q ∈ {r, s} , t ∈ {a, b}

s q ∈ {r, s} , t = #
(A) Assume M is the following DFA:
q0 a q1

a
b b b

a,b q2 q3
a
Draw M 0 for this case.
(B) In general, define the language of M 0 in terms of the language of an arbitrary M . (Hint: Make sure
that your answer works for the above example!)

253
Problem 5: NFA construction (8 points)
[15 minutes.]
A string y is a subsequence of string x = x1 x2 · · · xn ∈ Σ∗ , if there exists indices i1 < i2 < · · · < im
such that y = xi1 xi2 · · · xim . Note, that the empty word is a subsequence of every string. For example,
aaba is a subsequence of cadcdacba, but abc is not a subsequence of cbacba.
Let N = (Q, Σ, δ, q0 , F ) be an NFA with language L. Describe a construction for an NFA N 0 such that:
n o

L(N 0 ) = x there is a y ∈ L, such that x is a subsequence of y .

(A) Explain the idea of your construction.

(B) Write down your construction formally (in tuple notation, as we did for example in constructing the
product of two DFA’s in lecture). Namely, you are given the NFA N = (Q, Σ, δ, q0 , F ), and you need to
specify the new NFA N 0 . (Use formal notations and few words.)

Problem 6: Proof (8 points)

[20 minutes.]
Consider effective regular expressions. These are regular expressions that are not allowed to use the
empty-set notation. Formally, they are defined as follows:
regex conditions set represented
a a∈Σ {a}
{}
R+S R, S regexps L(R) ∪ L(S)
RS R, S regexps L(R)L(S)
R∗ R a regex L(R)∗
The complexity ∆(R) of a regular expression R is the number of operators
appearing in R. Thus,
∗
∆(ab) = ∆(a ◦ b) = 1 (we concatenate a with b), ∆( (a + b) ) = 2, and ∆ ((a + b)∗ (b + c + ))
∗
=6
(note, that parenthesis are not counted).
Prove formally the following claim.
Claim. If R is an effective regular expression, then L(R) contains at least one word of length at most
∆(R) + 1.

Proof:

Problem 7: NFA modification with a proof (8 points)

[20 minutes.]
Fix a finite alphabet Σ. We want to transform an NFA A over Σ into an NFA B over Σ such that B
accepts all suffixes of words accepted by A. The suffix language of L(A) is
n o

S = y ∈ Σ∗ ∃x ∈ Σ∗ , xy ∈ L(A)

We want to build a NFA B such that L(B) = S.

Let A = (Q, Σ, δ, q0 , F ) be an NFA where all states in Q are reachable from the initial state (i.e. for any
state q ∈ Q, there is a set of transitions that connect the initial state q0 to q). More formally, for any q ∈ Q,
there is a word w ∈ Σ∗ such that q ∈ δ(q0 , w) (where δ is the transition function extended to words).
Let us define B = (Q ∪ {q00 }, Σ, δ 0 , q00 , F ) where we define
δ 0 (q, a) = δ(q, a) for all q ∈ Q, a ∈ Σ ∪ {}
δ (q00 , )
0
= Q
δ 0 (q00 , a) = ∅ for all a ∈ Σ

254
Prove formally that L(B) = S.
Hint: The proof goes by showing that L(B) is contained in S, and that L(B) is contained in S. So, you
must show both inclusions using precise arguments. You do not need to use complete mathematical notation,
but your proof should be precise, correct, short and convincing. (Make sure you are not writing unnecessary
text [like copying the hint, or writing text unrelated to the proof].)

255
48.2 Midterm 2 - Spring 2009
6 May 2009

CS 373, Spring 2009, Midterm 2, 7-9pm, April 2, 2009.

INSTRUCTIONS (read carefully)
• Fill in your name, netid, and discussion section time below. Also write your netid on the other pages
(in case they get separated).

NAME:
NETID: DISC:

• There are 7 problems, on pages numbered 1 through 8. Make sure you have a complete exam.

• The point value of each problem is indicated next to the problem, and in the table below.

• Points may be deducted for solutions which are correct but excessively complicated, hard to understand,
or poorly explained.

• The exam is designed for slightly over one hour, but you have the full two hours to finish it.

• It is wise to skim all problems and point values first, to best plan your time.

• This is a closed book exam. No notes of any kind are allowed. Do all work in the space provided, using
the backs of sheets if necessary. See the proctor if you need more paper.

• Please bring any apparent bugs to the attention of the proctors.

• After the midterm is over, discuss its contents with other students in the class only after verifying
that they have also taken the exam (e.g. they aren’t about to take the conflict exam).

Problem 1: Short Answer (12 points)

[10 minutes.]
Answer “yes” or “no” to the following questions. No explanations are required. (All the strings in this
problem use the same, fixed alphabet.)
n o

(a) Is the language ai bi+j cj i, j ≥ 0 a context-free language?

Yes: No:

(b) If L is a language over Σ∗ , and h is a homomorphism, and h(L) is regular. Then L must be regular. Is
this statement correct?
Yes: No:

∗
(c) Suppose L is a language over {0} , such that if 0i is in L then 0i+2 is in L. Then L must be regular. Is
this statement correct?
Yes: No:

256
(d) If L1 and L2 be two languages. If L1 and L2 are both context-free, then L1 \L2 must also be context-free.
Is this statement correct?
Yes: No:

(e) If L is a language, its subset language is

n o

S(L) = w there exists x ∈ L s.t. w is a subsequence of x .

A string y is a subsequence of string x = x1 x2 · · · xn ∈ Σ∗ , if there exists indices i1 < i2 < · · · < im

such that y = xi1 xi2 · · · xim . Note, that the empty word is a subsequence of every string. For example,
aaba is a subsequence of cadcdacba, but abc is not a subsequence of cbacba.
If L is context-free, then the language S(L) is context free. Is this statement correct?
Yes: No:

(f) Consider a parallel recursive automata D – it is made out of two recursive automatas B1 and B2 and
it accepts a word w if both B1 and B2 accepts w. The parallel recursive automata D might accept a
language that is not context-free. Is this statement correct?
Yes: No:

Problem 2: Grammar design (8 points)

[10 minutes.]
Let Σ = {a, b, c}. Let n o

J = w #a (w) = #b (w) or #b (w) = #c (w) ,

where #z (w) is the number of appearances of the character z in w. For example, the word x = baccacbbcb ∈
L(J) since #a (x) = 2, #b (x) = 4, and #c (x) = 4. Similarly, the word y = abbccc ∈ / L(J) since #a (y) = 1,
#b (y) = 2, and #c (y) = 3.
Give a context-free grammar whose language isnJ. Be sure to indicate what its
o start symbol is. (Hint:
∗
First provide a CFG for the easier language K = w ∈ {a, b} #a (w) = #b (w) and modify it into the
desired grammar.)

Problem 3: RA design (8 points)

[5 minutes.]
Let  
 w = ai bi cn dn or 

J = w ∈ {a, b, c, d} w = cn ai bi dn
∗
.
 for some i, n ≥ 0 
Give the state diagram for a RA whose language is J. Include brief comments explaining the design of
your RA, to help us understand how it works.

Problem 4: Short Answer II (8 points)

[8 minutes.]
The answers to these problems should be short and not complicated.
n o

(a) Suppose we know that the language B = an bn cn n ≥ 2 is not context-free. Let L be the language
n o
∗
L = w ∈ {a, b, c, d, e, f} #a (w) = #d (w) = #f (w) ,

257
where #z (w) is the number of appearances of the character z in w. Prove that L is not context-free
using closure properties and the fact that B is not context-free.
(b) Let w be a word of length n in a language generated by a grammar G which is in Chomsky Normal Form
(CNF). First, how many internal nodes does the parse tree for w using G has?
Secondly, assume that G has k variables, and n = |w| > 2k . Is the language L(G) finite or not? Justify
your answer.

Problem 5: Not regular (8 points)

[8 minutes.] n o

Suppose Σ = {a, b} and let L = ww w ∈ Σ∗ . That is, every word in L is a concatenation of some
string with itself. Provide a direct (and short) proof that L is not regular (a proof not using the pumping
lemma is preferable).

Problem 6: Finite or not. (8 points)

[10 minutes.]
Let G be a context-free grammar over Σ = {a, b}, and let m be some fixed number.
n o

(i) Prove that L≥m = w ∈ L(G) |w| ≥ m is a context-free language.
(There is a very short proof of this – your answer should be at most 40 words. Longer answers would
get no points.)
(ii) Assume that (i) holds, and furthermore, one can compute the grammar of L≥m given L and m. Describe
an algorithm that decides whether the language L is finite or not. Prove that your algorithm works.

Problem 7: Proof. (8 points)

(Extra credit.) [9 minutes.]
Consider the grammar
=⇒ S → aSb | T,
T → cT | .
n o

Prove that the language of this grammar is L = an ck bn n, k ≥ 0 . Your proof must be formal, correct
and short .
Hint: Prove (by induction, naturally) exactly what are the words that can be derived by a tree with
i internal nodes, with T as the root (i.e., these are the strings that can be derived from T in i derivation
steps). Next, prove a similar claim about all the words that can be generated by a parse tree with i internal
nodes, having S as the root.

258
48.3 Final - Spring 2009
6 May 2009

CS 373: Theory of Computation

Spring 2009 Final Exam, May 11, 2009
13:30-16:30am, SC 1404
NAME:
NETID: DISC:
INSTRUCTIONS (read carefully)
• Print your name, netID, and section time in the boxes
P Pnts Score Grader
above.
1 18
• The exam contains 9 problems on pages numbered 1
through 10. Make sure you have a complete exam. 2 16
• You have three hours. 3 9
• The point value of each problem is indicated next to 4 8
the problem and in the table on the right.
• It is wise to skim all problems and point values first,
5 10
to best plan your time. If you get stuck on a problem,
move on and come back to it later.
6 10
• Points may be deducted for solutions which are cor-
7 10
rect but excessively complicated, hard to understand,
hard to read, or poorly explained.
8 10
9 9
100
• If you change your answers (especially for T/F questions or multiple-choice questions), be sure to erase
well or otherwise make sure your final choice is clear.
• Please bring apparent bugs or unclear questions to the attention of the proctors.
• This is a closed book exam.
• Do all work in the space provided, using the backs of sheets if necessary. See the proctor if you need
more paper.

Problem 1: True/False (18 points)

Decide whether each of the following statements is true or false, and check the corresponding box. You do
not need to explain or prove your answers. Read the questions very carefully. They may mean something
quite different than what they appear to mean at first glance.

259
n o

(a) Consider a regular language L, an let L0 = w w ∈ L and w is a palindrome . The language L0 is
regular. True or false?
False: True:

(b) If L is undecidable and L is recognizable, then L is not recognizable.

False: True:

(d) Let T and Z be two non-deterministic Turing machine deciders. Then there is a TM that recognizes the
language L(T) ∩ L(Z).
False: True:

(e) Let h : Σ∗ → Σ∗ be a string sorting operator; that is, it sorts the characters of the given string and
returns the sorted string (it uses alphabetical ordering on the characters of Σ). For example, we have
h(baba) = aabb and h(peace) = aceep. Let L be a language, and consider the sorted language
n o

h(L) = h(w) w ∈ L .

The statement “if L is regular, then h(L) is also regular” is true or false?
False: True:

(f) Let T be a TM decider for some language L. Then h(L) (see previous question) is decidable.
False: True:

(g) If languages L1 and L2 are context-free then L1 ∩ L2 is decidable.

False: True:

(h) If a language L is context-free and there is a Chomsky normal form (CNF) for it with k variables, then
either L is infinite, or L is finite and contains only words of length < 2k+4 .
False: True:

(i) If a language L is accepted by a RA D then L will be accepted by a RA C which is just like D except that
the set of accepting states has been complemented.
False: True:

(j) Let f : Σ∗ → N be a function that for a string w ∈ Σ∗ it returns some positive integer number f (w).
Furthermore, assume that you are given a TM T that, given w, computes f (w) and this TM always halts.
Then the language n o

L = hM, wi M stops on w after at most f (w) steps

is decidable.
False: True:

260
Problem 2: Classification (16 points)
For each language L described below, we have listed 2–3 language classes. Mark the most restrictive listed
class to which L must belong. E.g. if L must always be regular and we have listed “regular” and “context-free”,
mark only “regular”.
For example, if you are given a language L, and given choices Regular, Context-free, and Decidable, then
you must mark Regular if L is regular, you must mark Context-free if L is context-free and not regular, and
mark Decidable if L is Decidable but not context-free.
n o
∗
(a) L = xcn x ∈ {a, b} , n = |x| .

Regular Context-free Decidable

n o

(b) L = hGi G is a CFG and L(G) is not empty .

Decidable Recognizable Not recognizable

n o

(c) L = ai bj ck dm i + j + k + m is a multiple of 3 and i + j = k + m

Regular Context-free Decidable

 
 w is a string, 

(d) L = hw, M1 , M2 i M1 and M2 are TMs, .
 M1 accepts w, or M2 accepts w 

Decidable Recognizable Not recognizable

(e) L = x1 #x2 # . . . #xn | xi ∈ {a, b}∗ for each i and, for some r < s, we have xr = xR
s .

Regular Context-free Decidable

n o

(f) L = hG, Di G is a CFG, D is a DFA, and L(D) ∩ L(G) = ∅ .

Decidable Recognizable Not TM recognizable

n o

(g) L = hG1 , G2 i G1 and G2 are CFGs and L(G1 ) ∩ L(G2 ) > 0 .

Decidable Recognizable Not TM recognizable

n o

(h) L = hGi G is a CFG and L(G) 6= Σ∗

Decidable Recognizable Not recognizable

261
Problem 3: Context-free grammars (9 points)
n o
(i) Let Σ = [ , ] , and consider the language L0 of all balanced bracketed words. For example, we have
[ ][ ] ∈ L0 and [ [ ][ ] ] ∈ L0 , but ][ ∈
/ L0 .
Give a context-free grammar for this language, with the start symbol S0 .

(ii) Let L1 be the language of all balanced bracketed words, where the character x might appear somewhere
at most once. For example, we have [ x ][ ] ∈ L1 and [ [ ][ ] ] ∈ L1 , but ][ ∈
/ L1 and [ x ]x ∈
/ L1 .
Give an explicit context-free grammar for L1 , with the start symbol S1 . You can use the grammar for
the language L0 that you constructed above.

(iii) Let L2 be the language of all balanced bracketed words, where the character x might appear somewhere
in the word at most twice. For example, we have x[ x ][ ] ∈ L2 and [ [ ][ ] ] ∈ L2 , but ][ ∈
/ L2 and
x[ x ]x ∈
/ L2 .
Give a context-free grammar for this language, with the start symbol S2 .

Problem 4: Proof of Non-Regularity (8 points)

Let Σ = {a, b}. Let n o

L = w w ∈ Σ∗ and #a (w) = #b (w) ,

where #a (w) (resp. #b (w)) is the number of a (resp. b) appearing in w.

Prove that L is not regular (provide a short and concise proof). We encourage you to use the direct proof
for non-regularity, though you are also allowed to use the pumping lemma.

Problem 5: Short answer (10 points)

(a) For the following grammar, give an equivalent grammar to it in Chomsky normal form (CNF).

=⇒ S → aTb
T → aTb | .

(b) A Quadratic Bounded Automaton (QBA) is a Turing machine that can use at most n2 calls of the
tape (assume it has a single tape) given an input string of length n. Let
n o

LQBA = hD, wi D is a QBA and D halts on w. .

Explain why LQBA is Turing decidable, and shortly describe a TM that decides this language.

Problem 6: TM variations (10 points)

Joe the programmer has come up with a new extension of Turing machines that he believes can help decide
more languages, called the back-tracking Turing machine (BTM for short).
A back-tracking Turing machine has, apart from the normal transitions (that allow it read a symbol,
rewrite it, and move left or right), a new backtrack transition. A transition t from state p1 to p2 can be
labeled as a backtrackq transition. If the BTM use the transition t, which is (say)

a →?, ?
p1 GGGGGGGGGGGGGGGGA p
2
backtrackq

then it resets the current contents of the tape and the tape head position to that which they were at the last
time it was in state q, and then it sets the control to be at state p2 .

262
In other words, assume the BTM has run through a sequence of configurations c1 , c2 , . . . ck and ck is a
configuration where the state is p1 . Then, if it uses the transition t when being in the configuration ck it
can go to a new configuration c = wp2 w0 , where ci = wqw0 is the last configuration in the sequence where
the state was q.
Joe thinks that this greatly enhances a Turing machine, as it allows the Turing machine to roll-back to
an earlier configuration by undoing changes.
Note, that a BTM can perform several (say three) consecutive transitions of backtrackx one after the
other. That would be equivalent to restoring to the second to last configuration in the execution history
with the state x in it. Also, observe that BTM are deterministic.
Show that Joe is wrong by giving, for any BTM, a deterministic Turing machine that performs the same
job. Keep your description of the deterministic Turing machine to high-level psuedo-code, and do write down
the intuition of how the deterministic Turing machine will work.
(Your answer should not exceed a hundred words.)

Problem 7: All together now. (10 points)

T and Z are Turing machines and
Let L = hT, Zi .
there is a word w such that both T and Z accept w
Show that L is TM-recognizable, i.e. explain how to construct a Turing machine that accepts hT, Zi if
hT, Zi ∈ L. Assume that T and Z work on input words over the same alphabet. Remember that T and Z may
run forever on some inputs.

Problem 8: Reduction (10 points)

n o

Recall that ATM = hM, wi M is a Turing machine and M accepts w and ATM is undecidable.
Let
n o

L = hTi T is a Turing machine such that for every w ∈ L(T) we have wR ∈ L(T) .

(Recall that for any word w, wR denotes the reverse of the word w.) We assume here that the input alphabet
of T has at least two characters in it.
Show that L is undecidable using a reduction from ATM .
Prove this directly, not by citing Rice’s Theorem.

Problem 9: Decidability (9 points)

(i) Let Σ = {a, b}, and recall the language
n o

Leq = w w ∈ Σ∗ and #a (w) = #b (w) .

Give a CFG for this language.

(ii) Consider the following language

D is a DFA and it accepts some string w ∈ {a, b}∗
L = hDi .
such that #a (w) = #b (w)

Show how to construct a decider for L.

You may use any decidability results you have learnt in this course.

263
48.4 Mock Final Exam - Spring 2009
6 May 2009

CS 373: Theory of Computation, Mock Final Exam, Spring 2009

Version: 2.0

Problem 1: True/False (10 points)

Completely write out “True” if the statement is necessarily true. Otherwise, completely write “False”. Other
answers (e.g. “T”) will receive credit only if your intent is unambiguous. For example, “x + y > x” has answer
“False” assuming that y could be 0 or negative. But “If x and y are natural numbers, then x + y ≥ x” has
answer “True”. You do not need to explain or prove your answers.
1. Let D be a DFA with n states such that L(D) is infinite. Then L(D) contains a string of length at most
2n − 1.
False: True:
n o

2. Let Lw = hTi T is a TM and T accepts w , where w is some fixed string. Then there is an enumer-
ator for Lw .
False: True:

3. The set of undecidable languages is countable.

False: True:
n o

4. For a TM T and a string w, let CHT,w = x x is an accepting computation history for T on w . The
language CHT,w is decidable.
False: True:
n o

5. The language hT, wi T is a linear bounded automaton and T accepts w is undecidable.
False: True:
n o

6. The language hGi Gis a context-free grammar and G is ambiguous is Turing-recognizable.
False: True:

7. Context-free languages are closed under homomorphism.

False: True:

8. The modified Post’s correspondence problem is Turing recognizable.

False: True:

9. There is a bijection between the set of recognizable languages and the set of decidable languages.
False: True:

264
Problem 2: Classification (20 points)
For each language L described below, classify L as
• R: Any language satisfying the information must be regular.
• C: Any language satisfying the information must be context-free, but not all languages satisfying the
information are regular.
• DEC: Any language satisfying the information must be decidable, but not all languages satisfying the
information are context-free.
• NONDEC: Not all languages satisfying the information are decidable. (Some might be only Turing
recognizable or perhaps even not Turing recognizable.)
For each language, circle the appropriate choice (R, C, DEC, or NONDEC). If you change your
answer be sure to erase well or otherwise make your final choice clear. Ambiguously marked answers
will receive no credit.
1. R C DEC NONDEC
n o

L = hTi T is a linear bounded automaton and L(T) = ∅ .

2. R C DEC NONDEC
n o

L = wwR w w ∈ {a, b}∗ .

3. R C DEC NONDEC
n o

L = w the string w occurs on some web page indexed by Google on May 3, 2007 .

4. R C DEC NONDEC

w = x#x1 #x2 #...#xn such that

L= w .
n ≥ 1 and there is some i for which x 6= xi

5. R C DEC NONDEC
n o

L = ai bj i + j = 27 mod 273 .

6. R C DEC NONDEC

L = L1 ∩ L2 ,
where L1 and L2 are context-free languages
7. R C DEC NONDEC
n o

L = hTi T is a TM and L(T) L(M ) is finite .

8. R C DEC NONDEC

L = L1 \ L2 ,
where L1 is context-free and L2 is regular.
9. R C DEC NONDEC

L = L1 ∩ L2 ,
where L1 is regular and L2 is an arbitrary language.

265
Problem 3: Short answer I (8 points)
(a) Give a regular expression for the set of all strings in {0, 1}∗ that contain at most one pair of consecutive
1’s.

(b) Let M be a DFA. Sketch an algorithm for determining whether L(M ) = Σ∗ . Do not generate all strings
(up to some bound on length) and feed them one-by-one to the DFA. Your algorithm must manipulate
the DFA’s state diagram.

Problem 4: Short answer II (8 points)

Let Σ = {a, b}, let G = (V, Σ, R, S) be a CFG, and let L = L(G). Give a grammar G0 for the language
L0 = {wxwR | w ∈ Σ∗ , x ∈ L} by modifying G appropriately.

Problem 5: Nonregularity (8 points)

Show that each of the languages below is not regular using the “direct argument” (you are advised to use
the direct argument; however, if you wish to prove this in any other way, including the pumping lemma, you
are welcome to do so). Assume Σ = {a, b}.
n o
∗
(a) L = w w ∈ {a, b} is a palindrome .
n o

(b) L = w w contains at least twice as many a’s as b’s .

Problem 6: TM design (6 points)

Give the state diagram of a TM M that does the following on input #w# where w ∈ {0, 1}∗ . Let n = |w|.
If n is even, then M converts #w to #0n #. If n is odd, then M converts #w to #1n #. Assume that is
an even length string.
The TM should enter the accept state after the conversion. We don’t care where you leave the head at
the end of the conversion. The TM should enter the reject state if the input string is not in the right format.
However, your state diagram does not need to explicitly show the reject state or the transitions into it.

Problem 7: Subset construction (10 points)

Let N = (Q, Σ, δ, q0 , F ) be an NFA that does not contain any epsilon transitions. The subset construction
can be used to construct a DFA M = {Q0 , Σ, δ 0 , q00 , F 0 ) recognizing the same language as N . Fill in the
following key details of this construction, using your best mathematical notation:
Q0 =????
q00 =???
F 0 =????.
Suppose that P is a state in Q0 and a is a character in Σ, then

δ 0 (P, a) =?????

Suppose N has -transitions. How would your answer to the previous question change?

Problem 8: Writing a proof (8 points)

n o

We have seen that ALLCF G = hGi G is a CFG and L(G) = Σ∗ is undecidable.

(a) Show that ALLCF G is Turing-recognizable.

(b) Is ALLCF G Turing-recognizable? Explain why or why not.

266
Problem 9: RA modification (8 points)
Let L be a context-free language on the alphabet Σ. Let (M, main, {(Qm , Σ ∪ M, δm , q0m , Fm )}m∈M ) be an
RA recognizing L. Give an RA recognizing the language
n o

L0 = xy x ∈ L and y ∈ Σ∗ and |y| is even .

(a) Explain the idea behind the construction.

(b) Give tuple notation for new RA.

Problem 10: Decidability (6 points)

Show that
A, B, C are DFAs over the same alphabet Σ,
EQIN TDFA = hA, B, Ci .
and L(A) = L(B) ∩ L(C)
is decidable.
This question does not require detail at the level of tuple notation. Rather, keep your proof short by
exploiting theorems and constructions we’ve seen in class.

Problem 11: Reduction (8 points)

Let EV ENTM = {hM i | M is a TM and L(M ) does not contain any string of odd length}. Show that EV ENTM
is undecidable (you may assumeATM - Turing machine membership - is undecidable).
You may not use Rice’s Theorem.

Problem 12: Proof (8 points)

Prove that every binary string of even length can be derived from the following grammar:

S → 0S0 | 1S1 | 1S0 | 0S1 | .

267
48.5 Quiz 1 - Spring 2009
6 May 2009

Quiz I
n o

(Q1) Consider L = an bn n ≥ 0 and the homomorphism h(a) = a and h(b) = a.
Which of the following statements is true?

(a) L is regular and h(L) is regular.

(b) L is regular and h(L) is not regular.
(c) L is not regular and h(L) is not regular.
(d) L is not regular and h(L) is regular.
n o
∗
(Q2) Let L = x1 #x2 #...#xk k ≥ 2, ∀i, j xi = xj and ∀xi xi ∈ {a, b} . The language is

(a) regular, but not context free.

(b) context free and not regular.
(c) both regular and context free.
(d) neither regular nor context-free.
n o

(Q3) Let L = ai bj (i + j) mod 5 = 3 , where x mod y denotes the remainder of dividing x by y. The
language L is

(a) regular, but not context free.

(b) context free and not regular.
(c) both regular and context free.
(d) neither regular nor context-free.
n o

(Q4) Let L = 0n 10n n ≥ 0 . The language L is

(a) regular, but not context free.

(b) context free, but not regular.
(c) both regular and context free.
(d) neither regular nor context-free.

(Q5) Let G = (V, Σ, R, S) be any CFG. Consider G 0 = (V ∪ S 0 , Σ, R0 , S0 ) where R0 = R ∪ N and N is the

additional rules n o

N = S0 → cS0 c c ∈ Σ ∪ {S0 → S} .

The language L(G 0 ) is

n o

(a) wxw x ∈ L(G), w ∈ Σ∗
n o

(b) wxwR x ∈ L(G), w ∈ Σ∗
n o

(c) w1 xw2 x ∈ L(G), w1 , w2 ∈ Σ∗
(d) None of the above.

268
(e) All of the above.

(Q6) Suppose L is a context-free language (CFL) and R is regular, then L \ R is CFL. This is

(a) True.
(b) False.

(Q7) Consider the grammar G = (V, Σ, R, S), where V = {S, X, Y} with the rules (i.e., R) defined as
S → aSb | X.
X → cXd | Y.
Y → eYf | .
If we associate a language with each variable, what is the language of X ?
n o

(a) en fn n ≥ 0
n o

(b) ck en fn dk n ≥ 0, k ≥ 0
n o

(c) cn en fn dn n ≥ 0
n o

(d) an cn en fn dn bn n ≥ 0

(Q8) Consider the grammar with the rules

S → A1B | A2B.
A → 0A | .
B→C | D
D → 0D | 1D | .
C → 2C | .
Which of the following strings can be derived in 0 or more steps?

(a) 0012202
(b) 002
(c) (122222)2

(Q9) What is the language of the following recursive-automata?

ǫ
S:

( S )
q0 q1 q2 q3

S q4 S

n o

(a) (n )n n ≥ 0
n o

(b) (n )n (m )m n, m ≥ 0
n o

(c) w w is a string with balanced parenthesis .

269
(Q10) Consider the execution trace of the string (()) on the following recursive-automata.
ǫ
S:

( S )
q0 q1 q2 q3

S q4 S

As a reminder, a configuration of a recursive automata, is a pair (q, s) where q is a state, and s is

the “call stack” of the recursive automata. Now, if the stack s contains the elements x, y, z (where z is
the top of the stack), we write its content as hx, y, zi, and the
empty stack is denoted by hi. As such,
the initial configuration of the recursive automata is q0 , hi .
( ( )
? S ?
q0 , hi →
− q1 , hi −
→ q0 , hq2 i →
− ?, hq2 i −
→ q0 , h?i →
− q3 , h?i −
→ ?, hq2 i →
−
)
?
q3 , hq2 i − → ?, h?i → − ?, h?i .
Which one of the following is the correct execution trace?
( (
S S
(a) q0 , hi →
− q1 , hi −
→ q0 , hq2 i →
− q1 , hq2 i −
→ q0 , hq2 , q2 i →
− q3 , hq2 , q2 i
pop
)
pop
)

−−→ q2 , hq2 i → − q3 , hq2 i −−→ q2 , hi →
− q3 , hi .

( ( pop
S
(b) q0 , hi →
− q1 , hi −
→ q0 , hq2 i →
− q1 , hq2 i −−→ q0 , hq2 , q2 i →
− q3 , hq2 , q2 i
) pop )
S
−
→ q2 , hq2 i →− q3 , hq2 i −−→ q2 , hi → − q3 , hi .

( (
S S
(c) q0 , hi →
− q1 , hi −
→ q4 , hq2 i →
− q4 , hq2 i −
→ q0 , hq2 , q2 i →
− q3 , hq2 , q2 i
pop
) pop )
−−→ q2 , hq2 i → − q3 , hq2 i −−→ q2 , hi → − q3 , hi

270
48.6 Quiz 2 – Spring 2009
6 May 2009

(Q1) Consider the Turing Machine , T M = {Q, Σ, Γ, δ, q0 , qacc , qrej } where Q = {q0 , q1 , q2 , qacc , qrej }, Σ =
{0, 1} and Γ = {0, 1, B} and the transitions are defined as follows:
δ(q0 , 0) = (q1 , 0, R); δ(q0 , 1) = (q1 , 1, R);
δ(q1 , 0) = (q2 , 0, R); δ(q1 , 1) = (q2 , 1, R);
δ(q2 , B) = (qacc , B, R) and all other transitions are to the reject state.
Assuming that we start the T M on some input and the current configuration is 1q1 100, what is the
next configuration?

(a) 11q2 00.

(b) q2 1100.
(c) 11q1 00.

(Q2) Consider the following statements about the T M described in the question above. Which of them is
correct?
(a) the T M always halts.
(b) there exist inputs on which the T M does not halt.
(c) there exist inputs on which the T M halts and inputs on which it does not halt.
(Q3) How many Turing machines can you have with the same Q, Σ, Γ as described in the first question?
(a) 309 .
(b) 306 .
(c) 3015 .
(d) infinitely many.
(Q4) Which of the following is true?
The set of Turing recognizable languages is closed under
(a) complementation and intersection.
(b) intersection.
(c) complementation.
(d) neither intersection nor complement.
(Q5) Suppose L is undecidable and suppose L is a Turing recognizable language. Which of the following
statements is true?
(a) L is not recognizable.
(b) L is recognizable.
(c) L may or may not be recognizable.
(Q6) Which of the following properties is a property of the language of T M M ?
(a) M is a T M that has 481 states.
(b) M is a T M and |L(M )| = 481.
(c) M is a T M takes more than 481 steps on some input.

271
n o

(Q7) Consider the language L = hM i takes no more than 481 steps on some input
Which of the following statements is true?
(a) L is decidable.
(b) L is not decidable.
(c) L may be decidable or may not be decidable.
n o

(Q8) Consider the language L = hM i M has 481 states
Which of the following statements is true?
(a) L is decidable and recognizable.
(b) L is decidable but not recognizable.
(c) L is not decidable but recognizable.
(d) L is not decidable nor recognizable.
(Q9) For any TM M and word w, consider a procedure W riteN (M, w) that produces the following procedure
N as output:

N (x){simulate M on w; if M accepts w then accept x else reject x}

Assume the procedure IsEmpty(M ) is one that takes a Turing machine M as input and returns ”yes”
if L(M ) is empty and ”N o” otherwise.
Consider this following procedure:
¯
Decider(M, w){N = WriteN; return IsEmpty(N)}

What does the above procedure prove?

n o

(a) L = M L(M ) = ∅ is undecidable as AT M reduces to L.
n o

(b) L = M L(M ) = Σ∗ is undecidable as AT M reduces to L.
(c) AT M is undecidable as L reduces to AT M .
n o

(d) L = M L(M ) = ∅ is decidable.
n o

(Q10) Let G = hM, N i L(M ) = L(N ) Which of the following statements is true?

(a) G is decidable.
(b) G is undecidable by Rices theorem.
n o

(c) G is not decidable as we can reduce L = M L(M ) = ∅ to it .
n o

(d) G is decidable as we can reduce L = M L(M ) = ∅ to it .

272
Chapter 49

Exams – Spring 2008

273
49.1 Midterm 1
19 February 2008

Problem 1: Short Answer (8 points)

The answers to these problems should be short and not complicated.

1. If an NFA M accepts the empty string (), does M ’s start state have to be an accepting state? Why
or why not?

2. Is every finite language regular? Why or why not?

3. Suppose that an NFA M = (Q, Σ, δ, q0 , F ) accepts a language L. Create a new NFA M 0 by flipping
the accept/non-accept markings on M . That is, M 0 = (Q, Σ, δ, q0 , Q − F ). Does M 0 accept L (the set
complement of L)? Why or why not?

4. Simplify the following regular expression ∅∗ (a ∪ b) ∪ ∅b∗ ∪ abb

Problem 2: DFA design (6 points)

Let Σ = {a, b}. Let L be the set of strings in Σ∗ which contain the substring bba or the substring aaa.
For example, aabba ∈ L and baaab ∈ L, but babab 6∈ L. Strings shorter than three characters are never
in L.
Construct a DFA that accepts L and give a state diagram showing all states in the DFA.
You will receive zero credit if your DFA uses more than 10 states or makes significant use of non-
determinism.

Problem 3: Remembering definitions (8 points)

1. Define formally what it means for a DFA (Q, Σ, δ, q0 , F ) to accept a string w = w1 w2 . . . wn .

2. Let Σ and Γ be alphabets. Suppose that h is a function from Σ∗ to Γ∗ . Define what it means for h to
be a homomorphism.

Problem 4: NFA transitions (6 points)

Suppose that the NFA N = (Q, {0, 1, 2}, δ, q0 , F ) is defined by the following state diagram:
B
0
2
1
A ǫ D
1
1
2
ǫ
1
C E
Fill in the following values:

(a) F =

(b) δ(A, 0) =

274
(c) δ(C, 1) =
(d) δ(D, 1) =
(e) List the members of the set {q ∈ Q | D ∈ δ(q, 2)}:
(f) Does the NFA accept the word 11120? (Yes / No)

Problem 5: NFA to DFA conversion (6 points)

Convert the following NFA to a DFA recognizing the same language, using the subset construction. Give
a state diagram showing all states reachable from the start state, with an informative name on each state.
Assume the alphabet is {0, 1}.
0

B
1
0 1

A 0,1 C
1 1

Problem 6: Short Construction (8 points)

1. Give a regular expression for the language L containing all strings in a∗ b∗ whose length is a multiple
of three. E.g. L contains T aaaabb but does not contain ababab or aaabb.
2. Let Σ = {a, b, c}. Give an NFA for the language L containing all strings in Σ∗ which have an a or a
c in the last four positions. E.g. bbabbb and abbbcb are both in L, but acabbbb is not. Notice that
strings of length four or less are in L exactly when they contain an a or a c.
You will receive zero credit if your NFA contains more than 8 states.

Problem 7: NFA modification and tuple notation (8 points)

For this problem, the alphabet is always Σ = {a, b}.
0 0
n Given
an NFA M o that accepts the language L, design a new NFA M that accepts the language L =
0
twt w ∈ L, t ∈ Σ . For example, if aab is in L, then aaaba and baabb are in L .

(a) Briefly explain the idea behind your construction, using English and/or pictures.
(b) Suppose that M = (Q, Σ, δ, q0 , F ). Give the details of your construction of M 0 , using tuple notation.

275
49.2 Midterm 2
27 March 2008

Problem 1: Short Answer (12 points)

Answer “yes” or “no” to the following questions. No explanations are required. (All the strings in this problem
use the same, fixed alphabet.)

n o
∗
(a) Is the language wwR w w ∈ {a, b} a context-free language?

Yes: No:

(b) If L is a non-regular language over Σ∗ , and h is a homomorphism, then h(L) must also be non-regular.
Is this statement correct?
Yes: No:

(c) Suppose all the words in language L are no more than 1024 characters long. Then L must be regular.
Is this statement correct?
Yes: No:

(d) If L1 and L2 be two languages, the xor of the two languages is

 
 w ∈ L1 and w ∈ / L2 
L1 ⊕ L2 = (L1 \ L2 ) ∪ (L2 \ L1 ) = w or .
 
w∈ / L1 and w ∈ L2

If L1 and L2 are both context-free, then L1 ⊕ L2 must also be context-free. Is this statement correct?
Yes: No:

(e) If L is a language, its prefix language is

n o

P (L) = w there exists x s.t. wx ∈ L .

If L is context-free, then the language P (L) is context free. Is this statement correct?
Yes: No:

(f) A PDA that is allowed to enter the accept state only if the stack is empty is a strict PDA . There are
context-free languages for which no strict PDA exists. Is this statement correct? Yes: No:

Problem 2: Grammar design (8 points)

Let Σ = {a, n
b}. o

Let J = w1 #w2 # . . . #wn−1 #wn n ≥ 2, wi ∈ Σ∗ for all i, and for some i, |wi | = |wi+1 |
In other words, an element of J is a list of at least two strings of a’s and b’s, separated by #’s. In the
list, some pair of adjacent strings have the same length. E.g. J contains b#aa#bbb#aba but not a#bbb#b.
Give a context-free grammar whose language is J. Be sure to indicate what its start symbol is.

276
Problem 3: PDA design (8 points)
Let  
 w contains only a’s 

J = w ∈ {a, b}
∗
or .
 w has an equal number of a’s and b’s 

For example, J contains , aaa, and aabbba. But abbba is not in J.

Give the state diagram for a PDA whose language is J. Include brief comments explaining the design of
your PDA, to help us understand how it works.

Problem 4: Short Answer II (8 points)

The answers to these problems should be short and not complicated.
n o

(a) Suppose we know that the language B = an bn cn n ≥ 2 is not context-free. Let L be the language
n o

L = an bcn bdn n ≥ 0 . Prove that L is not context-free using closure properties and the fact that B
is not context-free.

(b) Define what it means for a grammar G to be in Chomsky Normal Form (CNF).

Problem 5: Pumping Lemma (8 points)

n o

Suppose Σ = {a, b} and let L = an w w ∈ Σ∗ and |w| = n . That is, L contains even-length strings whose
first half contains only a’s. Prove that L is not regular by filling in the missing parts of the following pumping
lemma proof.

Suppose that L were regular. Let p be the constant given by the pumping lemma.

Consider the string wp = .

Fill in
Because wp ∈ L and |wp | ≥ p, there must exist strings x, y, and z such that wp = xyz, |xy| ≤ p, |y| > 0,
and xy i z ∈ L for every i ≥ 0.

Since is not in L, we have a contradiction. Therefore, L must not have been regular.
Fill in

Problem 6: Formal notation (8 points)

Let M = (Q, Σ, Γ, δ, q0 , F ) be a PDA recognizing the language L, here L is defined over an alphabet Σ. Define
the language L0 to be n o

L0 = x#y x, y ∈ L .

Describe how to construct a PDA recognizing L0 . (You can safely assume that # is not in the alphabets Σ
and Γ used by M and L.)

(a) Describe the ideas behind your construction in words and/or pictures.

(b) When you read the # from the input, or shortly after you read it, you will need to do something about
whatever is left on the stack from reading the x part of the input string. Do you need to push anything
onto the stack or pop anything off? If so, what? Namely, describe what your PDA does upon reading
#.

(c) Give the details of your construction in formal notation. That is, for the new PDA recognizing L0 , specify
the set of states, the initial and final states, the stack alphabet, the details of the transition function.
The input alphabet for the new machine will be Σ ∪ {#}.

277
Problem 7: Induction (8 points)
Let Σ = {a, b}. Given any string w in Σ∗ , let A(w) be the number of a’s in w and B(w) be the number of
b’s in w.
Suppose that grammar G has the following rules:

S → aSb | aSS | ab,

where S is the start symbol. Use induction on the derivation length to prove that A(w) ≥ B(w) for any
string w in L(G).

278
49.3 Final – Spring 2008
7 May 2008

INSTRUCTIONS (read carefully)

• Print your name, netID, and section time in the boxes above.

• The exam contains 9 problems on pages numbered 1 through 10. Make sure you have a complete exam.

• You have three hours.

• The point value of each problem is indicated next to the problem and in the table on the right.

• It is wise to skim all problems and point values first, to best plan your time. If you get stuck on a
problem, move on and come back to it later.

• Points may be deducted for solutions which are correct but excessively complicated, hard to understand,
hard to read, or poorly explained.

• If you change your answers (especially for T/F questions or multiple-choice questions), be sure to erase
well or otherwise make sure your final choice is clear.

• Please bring apparent bugs or unclear questions to the attention of the proctors.

• This is a closed book exam. You may only consult a cheat sheet, handwritten (by yourself) on a single
8.5 x 11 inch sheet (both sides is ok). You can only use your normal eyeglasses (if any) to read it, e.g.
no magnifying glasses.

• Please submit your cheat sheet with the exam.

• Do all work in the space provided, using the backs of sheets if necessary. See the proctor if you need
more paper.

Problem 1: True/False (18 points)

(a) If L is TM recognizable, and L is TM recognizable, then L is TM decidable.

False: True:

(b) Let G be a context-free grammar given in Chomsky normal form (CNF). Since the grammar G is in CNF
form it must be ambiguous.
False: True:

(c) A non-deterministic TM can decide languages that a regular TM cannot decide.

False: True:

279
n o

(d) Suppose that the language L = an bn cn n ≥ 0 is not context-free. Let h(·) be a homomorphism.
Then the language h(L) cannot be context-free.
False: True:

(e) For any k > 1, there is no language that is decided by a TM with k tapes, but is undecidable by any
TM having k − 1 (or less) tapes.
False: True:

(f) If a language L is context-free then L is TM decidable.

False: True:

(g) If a language L is regular and recognized by a DFA with k states, then either L is infinite, or L is finite
and contains only words of length < k.
False: True:

(h) If a language L is accepted by a PDA P then L will be accepted by a PDA R which is just like P except
that the set of accepting states has been complemented.
False: True:
n o

(i) The language ATM = hM, wi M does not accept w is TM recognizable.
False: True:

Problem 2: Classification (16 points)

For each language L described below, we have listed 2–3 language classes. Mark the most restrictive listed
class to which L must belong. E.g. if L must always be regular and we have listed “regular” and “context-free”,
mark only “regular”.
n o
∗
(a) L = xw x, w ∈ {a, b} and |x| = |w|

Regular Context-free TM-Decidable

n o

(b) L = hGi G is a CFG and G is not ambiguous

TM-Decidable TM-Recognizable Not TM recognizable

n o

(c) L = ai bj ck dm i + j + k + m is a multiple of 13

Regular Context-free TM-Decidable

 
 w is a string, 

 

k is an odd number larger than 2,
(d) L = hw, M1 , M2 , . . . , Mk i

 each Mi is a TM, 

 
and a majority of the Mi ’s accept w

TM-Decidable TM-Recognizable Not TM recognizable

280
(e) L = {x1 #x2 # . . . #xn | xi ∈ {a, b}∗ for each i and, for some i, xi is a palindrome}.

Regular Context-free TM-Decidable

n o

(f) L = hG, Di G is a CFG, D is a DFA, and L(G) ⊆ L(D)

TM-Decidable TM-Recognizable Not TM recognizable

n o

(g) L = hGi G is a CFG and L(G) is finite

TM-Decidable TM-Recognizable Not TM recognizable

n o

(h) L = hGi G is a CFG and L(G) 6= Σ∗

TM-Decidable TM-Recognizable Not TM recognizable

Problem 3: Context-free grammars (9 points)

Let Σ = {a, b} and consider the language
n o

L = x#y r x, y ∈ Σ∗ and x 6= y

Notice that the lengths of x and y are not required to be equal.

(a) [4 points] Give a context-free grammar for L for the case that x and y have the same length, where the
start symbol is S.

(b) [4 points] Give a context-free grammar for L for the case that x and y have different lengths, where
the start symbol is X. (You can use portions of the grammar you created for part (a) in your answer, if
you want to. Do not re-use variables from your answer to part (a).)

(c) [1 point] Give a context-free grammar for L for all cases. Let the start symbol be Y. (You can use
portions of the grammar you created for parts (a) and (b) in your answer, if you want to.)

Problem 4: Pumping Lemma (8 points)

Let Σ = {a, b}. Let
 
 k≥2 

L = x1 #x2 # . . . #xk for all i, xi ∈ Σ∗ .
 and, for some i 6= j, we have xi = xj 

Prove that L is not regular by filling in the missing parts of the following pumping lemma proof.

Suppose that L were regular. Let p be the pumping length given by the pumping lemma.
Consider the string wp = ‘
Because wp ∈ L and |wp | ≥ p, there must exist strings x, y, and z such that wp = xyz, |xy| ≤ p, |y| > 0,
and xy i z ∈ L for every i ≥ 0.
Since isn’t in L, because

281
As such, we have a contradiction. Therefore, L must not have been regular.

Problem 5: Short answer (10 points)

(a) Give an example of an ambiguous grammar. Show that it is ambiguous by showing two parse trees for
the same string w.
(b) Recall that a Linear Bounded Automaton is a Turing machine that cannot move its head off the portion
of the tape occupied by the input string. Also recall that
n o

ALBA = hM, wi M is an LBA and M accepts w. .

Explain why ALBA is Turing decidable.

Problem 6: TM variations (10 points)

A unistate TM (UTM ) is a TM with a single state q, and two tapes. A UTM M starts with the input string
w at the left of tape 1, and with tape 2 initially blank. M accepts w if M writes the special character ./
anywhere on tape 2.
Prove that if M is any TM, then there is a UTM M 0 such that L(M 0 ) = L(M ). (Hint: the tape alphabet
of M 0 will contain many extra symbols.)
(a) First, give the basic idea, but giving enough details so that it is clear you understand what the key issues
are.

Next, give the following details:

(b) What is the tape alphabet Γ0 of M 0 in terms of the description of M ?

Let δ be the transition function of M , and δ 0 be the transition function of M 0 , recalling that M 0 is a
two-tape machine. Suppose that in M , δ(p, a) = (p0 , b, D) where D ∈ {L, R, S} indicates that M moves
left, right, or remains in one place, respectively. Then your construction of M 0 should somehow simulate
this transition of M . Below, give values of the place-holder keywords to describe how your simulation
works.
(c) For the above transition of M , we create the transition
δ 0 (state, tape1sym, tape2sym) = (new-state, new-tape1sym, new-tape2sym, direction1, direction2)
where (fill in the blanks)

state = tape1sym = tape2sym =

new-state = new-tape1sym = new-tape2sym =

direction1 = direction2 =

282
Problem 7: Palindrome (10 points)
n o

Let L = hM i M is a Turing machine and M accepts at least one palindrome .

Show that L is TM-recognizable, i.e. explain how to construct a Turing machine that accepts hM i exactly
when M accepts at least one palindrome. Assume that it is easy to extract M ’s alphabet Σ from hM i. Of
course, M might run forever on some input strings.

Problem 8: Reduction (10 points)

n o

Recall that ATM = hM, wi M is a Turing machine and M accepts w and ATM is undecidable.
n o

Let L = hM i M is a Turing machine and L(M ) is infinite .
Show that L is undecidable using a reduction from ATM . Prove this directly, not by citing Rice’s Theorem.

Problem 9: Decidability (9 points)

We showed in class that ECFG is decidable, where ECFG is the language
n o

ECFG = hGi G is a CFG and L(G) = ∅ .
n o

Also, let J = an bn n ≥ 0 , and consider the language L:
n o

L = hDi D is a DFA and D accepts some string in J .

Suppose that R is a decider for ECFG and M is a PDA accepting J. Show how to construct a decider for L .

283
49.4 Mock Final
Spring 2007

This mock final exam was generated from previous semester. Some parts where modified or removed (if
they were not relevant).

INSTRUCTIONS (read carefully)

• Print your name and netID here and netID at the top of each other page.

• The exam contains 12 pages and 11 problems. Make sure you have a complete exam.

• You have three hours.

• The point value of each problem is indicated next to the problem and in the table below.

• It is wise to skim all problems and point values first, to best plan your time. If you get stuck on a
problem, move on and come back to it later.

• Points may be deducted for solutions which are correct but excessively complicated, hard to understand,
hard to read, or poorly explained.

• This is a closed book exam. No notes of any kind are allowed. Do all work in the space provided, using
the backs of sheets if necessary. See the proctor if you need more paper.

• Please bring apparent bugs or unclear questions to the attention of the proctors.

Problem 1: True/False (10 points)

(a) Let M be a DFA with n states such that L(M ) is infinite. Then L(M ) contains a string of length at
most 2n − 1.

(b) Let Lw = {< M >| M is a TM and M accepts w}, where w is some fixed string. Then there is an
enumerator for Lw .

(c) The set of undecidable languages is countable.

(d) For a TM M and a string w, let CHM,w = {x | x is an accepting computation history for M on w}.
Then CHM,w is decidable.

(e) The language {< M, w >| M is a linear bounded automaton and M accepts w} is undecidable.

(f) The language {< G >| G is a context-free grammar and G is ambiguous} is Turing-recognizable.

(g) Context-free languages are closed under homomorphism.

(h) There is a bijection between the set of Turing-recognizable languages and the set of decidable languages.

284
Problem 2: Classification (20 points)
For each language L described below, classify L as

• R: Any language satisfying the information must be regular.

• C: Any language satisfying the information must be context-free, but not all languages satisfying the
information are regular.

• DEC: Any language satisfying the information must be decidable, but not all languages satisfying the
information are context-free.

• NONDEC: Not all languages satisfying the information are decidable. (Some might be only Turing
recognizable or perhaps even not Turing recognizable.)

For each language, circle the appropriate choice (R, C, DEC, or NONDEC). If you change your
answer be sure to erase well or otherwise make your final choice clear. Ambiguously marked answers
will receive no credit.

1. R C DEC NONDEC
L = {< M >| M is a linear bounded automaton and L(M ) = ∅}.

2. R C DEC NONDEC
L = {wwR w | w ∈ {a, b}∗ }.

3. R C DEC NONDEC
L = {w | the string w occurs on some web page indexed by Google on May 3, 2007}

4. R C DEC NONDEC
L = {w | w = x#x1 #x2 #...#xn such that n ≥ 1 and there is some i for which x 6= xi }.

5. R C DEC NONDEC
L = {ai bj | i + j = 27 mod 273}

6. R C DEC NONDEC
L = L1 ∩ L2 where L1 and L2 are context-free languages

7. R C DEC NONDEC
L = {< M >| M is a TM and L(M ) is finite}

8. R C DEC NONDEC
L = L1 − L2 where L1 is context-free and L2 is regular.

9. R C DEC NONDEC
L = L1 ∩ L2 where L1 is regular and L2 is an arbitrary language.

Problem 3: Short answer I (8 points)

(a) Give a regular expression for the set of all strings in {0, 1}∗ that contain at most one pair of consecutive
1’s.

Problem 4: Short answer II (8 points)

(a) Let Σ = {a, b}, let G = (V, Σ, R, S) be a CFG, and let L = L(G). Give a grammar G0 for the language
L0 = {wxwR | w ∈ Σ∗ , x ∈ L} by modifying G appropriately.

285
Problem 5: Pumping Lemma (8 points)
For each of the languages below you can apply the pumping lemma either to prove that the language is
non-regular or to prove that the language is not context-free. Assuming that p is the pumping length, your
goal is to give a candidate string for wp that can be pumped appropriately to obtain a correct proof. You
ONLY need to give the string and no further justification is necessary. Assume Σ = {a, b} unless specified
explicitly.

(a) L = {w | w is a palindrome}. To show L is not regular using the pumping lemma wp =

(b) L = {w | w contains at least twice as many a’s as b’s}. To show L is not regular using the pumping
lemma
wp =

Problem 6: TM design (6 points)

Give the state diagram of a TM M that does the following on input #w where w ∈ {0, 1}∗ . Let n = |w|. If
n is even, then M converts #w to #0n . If n is odd, then M converts #w to #1n . Assume that is an even
length string.
The TM should enter the accept state after the conversion. We don’t care where you leave the head at
the end of the conversion. The TM should enter the reject state if the input string is not in the right format.
However, your state diagram does not need to explicitly show the reject state or the transitions into it.

Problem 7: Subset construction (10 points)

Let N = (Q, Σ, δ, q0 , F ) be an NFA that does not contain any epsilon transitions. The subset construction
can be used to construct a DFA M = {Q0 , Σ, δ 0 , q00 , F 0 ) recognizing the same language as N . Fill in the
following key details of this construction, using your best mathematical notation:
Q0 =
q00 =
F0 =
Suppose that P is a state in Q0 and a is a character in Σ. Then

δ 0 (P, a) =
Suppose N has -transitions. How would your answer to the previous question change?

δ 0 (P, a) =

Problem 8: Writing a proof (8 points)

We have seen that ALLCF G = {< G >| G is a CFG and L(G) = Σ∗ } is undecidable.

1. Show that ALLCF G is Turing-recognizable.

2. Is ALLCF G Turing-recognizable? Explain why or why not.

Problem 9: PDA modification (8 points)

Let L be a context-free language on the alphabet Σ. Let M = (Q, Σ, Γ, δ, q, F ) be a PDA recognizing L.
Give a PDA M 0 recognizing the language L0 = {xy | x ∈ L and y ∈ Σ∗ and |y| is even}
Note: If you make assumptions about M , then state them clearly and justify.

1. Explain the idea behind the construction.

2. Give tuple notation for M 0 .

286
Problem 10: Decidability (6 points)
Show that
A, B, C are DFAs over the same alphabet Σ
EQIN TDF A = hA, B, Ci
and L(A) = L(B) ∩ L(C)
is decidable.
This question does not require detail at the level of tuple notation. Rather, keep your proof short by
exploiting theorems and constructions we’ve seen in class.

Problem 11: Reduction (8 points)

Let ODDT M = {hM i | M is a TM and L(M ) does not contain any string of odd length}. Show that ODDT M
is undecidable using a reduction from AT M (Turing machine membership). You may not use Rice’s Theorem.

287
49.5 Mock Final with Solutions
Spring 2007

Problem 1: True/False (10 points)

1. Let M be a DFA with n states such that L(M ) is infinite. Then L(M ) contains a string of length at
most 2n − 1.
Solution: True.

2. Let Lw = {< M >| M is a TM and M accepts w}, where w is some fixed string. Then there is an
enumerator for Lw .
Solution: True.

3. The set of undecidable languages is countable.

Solution: False.

4. For a TM M and a string w, let CHM,w = {x | x is an accepting computation history for M on w}.
Then CHM,w is decidable.

Solution: True.

5. The language {< M, w >| M is a linear bounded automaton and M accepts w} is undecidable.

Solution: False.

6. The language {< G >| G is a context-free grammar and G is ambiguous} is Turing-recognizable.

Solution: True.

7. Context-free languages are closed under homomorphism.

Solution: True.

8. The modified Post’s correspondence problem is Turing recognizable.

Solution: True.

9. There is a bijection between the set of Turing-recognizable languages and the set of decidable languages.
Solution: True.

288
Problem 2: Classification (20 points)
For each language L described below, classify L as

• R: Any language satisfying the information must be regular.

• C: Any language satisfying the information must be context-free, but not all languages satisfying the
information are regular.

• DEC: Any language satisfying the information must be decidable, but not all languages satisfying the
information are context-free.

• NONDEC: Not all languages satisfying the information are decidable. (Some might be only Turing
recognizable or perhaps even not Turing recognizable.)

1. R C DEC NONDEC
L = {< M >| M is a linear bounded automaton and L(M ) = ∅}.
Solution: NONDEC

2. R C DEC NONDEC
L = {wwR w | w ∈ {a, b}∗ }.
Solution: DEC

3. R C DEC NONDEC
L = {w | the string w occurs on some web page indexed by Google on May 3, 2007}
Solution: R

4. R C DEC NONDEC
L = {w | w = x#x1 #x2 #...#xn such that n ≥ 1 and there is some i for which x 6= xi }.
Solution: C

5. R C DEC NONDEC
L = {ai bj | i + j = 27 mod 273}
Solution: R

6. R C DEC NONDEC
L = L1 ∩ L2 where L1 and L2 are context-free languages
Solution: DEC

7. R C DEC NONDEC
L = {< M >| M is a TM and L(M ) is finite}
Solution: UNDEC

289
8. R C DEC NONDEC
L = L1 − L2 where L1 is context-free and L2 is regular.

Solution: C

9. R C DEC NONDEC
L = L1 ∩ L2 where L1 is regular and L2 is an arbitrary language.

Solution: NONDEC

Problem 3: Short answer I (8 points)

1. Give a regular expression for the set of all strings in {0, 1}∗ that contain at most one pair of consecutive
1’s.

Solution:
First, a regular expression for the set of strings that have no consecutive 1’s is:
R = 0∗ .(10.0∗ )∗ .( + 1).
The regular expression for the set of strings that contain at most one pair of consecutive 1’s is just:
R.R = 0∗ .(10.0∗ )∗ .( + 1). 0∗ .(10.0∗ )∗ .( + 1)

since a string w has at most one pair of consecutive 1’s iff w = xy, for some x, y, where x and y have no pair of consecutive
ones.
There are, of course, many solutions to this question; whatever your solution is, be careful to check border cases. For example,
check if you allow initial 0’s, final 0’s, allow 1 to occur in the beginning and the end, allow , and also test your expression
against some random strings, some in the language and some outside it.

2. Let M be a DFA. Sketch an algorithm for determining whether L(M ) = Σ∗ . Do not generate all strings
(up to some bound on length) and feed them one-by-one to the DFA. Your algorithm must manipulate
the DFA’s state diagram.

Solution:
If M does not accept all of Σ∗ , then there must be a word w 6∈ L(M ). On word w, M would reach a unique state
which is non-final. Also, if there is some way to reach a non-final state from the initial state, then clearly the DFA does
not accept the word that labels the path to the non-final state. Hence it is easy to see that the DFA M does not accept
Σ∗ iff there is a path from the initial state to a non-final state (which can be the initial state itself).
The algorithm for detecting whether L(M ) = Σ∗ proceeds as follows:
1. Check whether the input is a valid DFA (in particular, make sure it is complete; i.e. from any state, on any
input, there is a transition to some other state).
2. Consider the transition graph of the DFA.
3. Do a depth-first search on this graph, searching for a state that is not final.
4. If any non-final state is reached on this search, report that M does not accept Σ∗ ; otherwise report that M accepts
Σ∗ .
An alternate solution will be to flip the final states to non-final and non-final states to final (i.e. complementing
the DFA), and then checking the emptiness of the resulting automaton by searching for a reachable final state from the
initial state.

Problem 4: Short answer II (8 points)

1. Let Σ = {a, b}, let G = (V, Σ, R, S) be a CFG, and let L = L(G). Give a grammar G0 for the language
L0 = {wxwR | w ∈ Σ∗ , x ∈ L} by modifying G appropriately.

290
Solution:
The grammar G0 = (V 0 , Σ, R0 , S 0 ) where
• V 0 = V ∪ {S 0 } where S 0 is a new variable, not in V ;
• R0 consists of all rules in R as well as the following rules:
– One rule S 0 −→ aS 0 a, for each a ∈ Σ
– The rule S 0 −→ S
Intuitively, S 0 is the new start variable, and for any word wxwR ∈ L0 , S 0 generates it by first generating the w and
wR parts, and then calling S to generate x.

a cba
2. Let P = { ac aa , b , } be an instance of the Post correspondence problem. Does P have a match?
Show a match or explain why no match is possible.

Solution: ˆ a ˜ ˆ a ˜ h cba i ˆ a ˜
Yes, the PCP has a match: aa c b
, aa
The top word and the bottom word are both aacbaa.

Problem 5: Pumping Lemma (8 points)

For each of the languages below you can apply the pumping lemma either to prove that the language is
non-regular or to prove that the language is not context-free. Assuming that p is the pumping length, your
goal is to give a candidate string for wp that can be pumped appropriately to obtain a correct proof. You
ONLY need to give the string and no further justification is necessary. Assume Σ = {a, b} unless specified
explicitly.

(a) L = {w | w is a palindrome}. To show L is not regular using the pumping lemma

Solution:
wp = a bba
p p

(b) L = {w | w contains at least twice as many a’s as b’s}. To show L is not regular using the pumping
lemma. Solution:
wp = b a
p 2p

Problem 6: TM design (6 points)

Solution:
The Turing machine’s description is:

291
In the diagram, the initial state is qin , and the accept state is qacc ; the reject state is not shown, and we assume that all
transitions from states that are not depicted go to the reject state. We are assuming that the blank tape-symbol is #.
Intuitively, the TM first reads the tape content w, moving right, alternating between states q0 and q1 in order to determine
whether |w| is even or odd. If |w| is even, it ends in state q0 , moves left rewriting every letter in w with a 0, till it reaches the
first symbol on the tape, and then accepts and halts. If |w| is odd, it does the same except that it rewrites letters in w with
1’s.

Problem 7: Subset construction (10 points)

Let N = (Q, Σ, δ, q0 , F ) be an NFA that does not contain any epsilon transitions. The subset construction
can be used to construct a DFA M = {Q0 , Σ, δ 0 , q00 , F 0 ) recognizing the same language as N . Fill in the
following key details of this construction, using your best mathematical notation:
Solution:
0Q = P(Q)
q00 = {q0 }
F 0 = {R ⊆ Q | R ∩ F 6= ∅}
Suppose that P is a state in Q0 and a is a character in Σ. Then

δ 0 (P, a) = {q ∈ Q | ∃p ∈ P, q ∈ δ(p, a)}

Suppose N has -transitions. How would your answer to the previous question change?

292
δ 0 (P, a) = {q ∈ Q | ∃p ∈ P, q ∈ E(δ(p, a))}
where
E(R) = {s | s can be reached from some state in R using zero or more -edges}
is the epsilon-closure of the set R.

Problem 8: Writing a proof (8 points)

We have seen that ALLCF G = {< G >| G is a CFG and L(G) = Σ∗ } is undecidable.

1. Show that ALLCF G is Turing-recognizable.

Solution:
ALLCF G = {hGi | G is not the encoding of a CFG, or, G is a CFG and L(G) 6= Σ∗ }.
First, recall that membership of a word in the language generated by a grammar is decidable, i.e.. for any grammar
G and any word w, we can build a TM that decides whether G generates w.
Also, for any Σ, we can fix some ordering of symbols in Σ, and enumerate all words in Σ∗ in the lexicographic
ordering. In particular, we can build a TM that can construct the i’th word in Σ∗ for any given i.
We can build a TM recognizing ALLCF G as follows:
1. Input is hGi.
2. Check if hGi is a proper encoding of a CFG; if not, halt and accept.
3. Set i := 1;
4. while (true) do {
5. Generate the i’th word wi in Σ∗ , where Σ is the set of terminals in G.
6. Check if wi is generated by G. If it is not, halt and accept.
7. Increment i;
8. }
The TM above systematically generates all words in Σ∗ and checks if there is any word that is not generated by G;
if it finds one it accepts hGi.
Note that if hGi is either not a well-formed grammar or L(G) 6= Σ∗ , then the TM will eventually halt and accept.
If L(G) = Σ∗ , the TM will never halt, and hence will never accept.

2. Is ALLCF G Turing-recognizable? Explain why or why not.

Solution:
ALLCF G is not Turing-recognizable. We know that ALLCF G is not Turing-decidable (the universality problem
for context-free grammars in undecidable), and we showed above that ALLCF G is Turing-recognizable. Also, for any
language R, if R and R are Turing-recognizable, then R is Turing-decidable. Hence, if ALLCF G was Turing-recognizable,
then ALLCF G would be Turing-decidable, which we know is not true. Hence ALLCF G is not Turing-recognizable.

Problem 9: PDA modification (8 points)

1. Explain the idea behind the construction.

Solution:
The idea would be to construct the PDA M 0 which will essentially simulate M , and from any of the final states
of M , nondeterministically jump on an -transition to a new state that checks whether the rest of the input is of even
length.

2. Give tuple notation for M 0 .

Solution:
M 0 = (Q0 , Σ, Γ, δ 0 , q, {p0 }) where Q0 = Q ∪ {p0 , p1 } where p0 and p1 are two new states (that are not in Q), and the transition
function δ 0 : Q0 × Σ × Γ → P(Q0 × Γ ) is defined as follows:

• For every q ∈ Q, a ∈ Σ, d ∈ Γ , δ 0 (q, a, d) = δ(q, a, d)

• For every q ∈ Q, d ∈ Γ, δ 0 (q, , d) = δ(q, , d)

293
• For every q ∈ Q,
δ 0 (q, , ) = δ(q, , ) ∪ {(p0 , )}, if q ∈ F , and
δ 0 (q, , ) = δ(q, , ), if q 6∈ F .
• For every a ∈ Σ
δ 0 (p0 , a, ) = {(p1 , )}, and δ 0 (p1 , a, ) = {(p0 , )}.
• For every a ∈ Σ , d ∈ Γ , where either a = or d 6= ,
δ 0 (p0 , a, d) = ∅, and δ 0 (p1 , a, d) = ∅.

Problem 10: Decidability (6 points)

Show that
A, B, C are DFAs over the same alphabet Σ
EQIN TDF A = hA, B, Ci
and L(A) = L(B) ∩ L(C)
is decidable.
This question does not require detail at the level of tuple notation. Rather, keep your proof short by
exploiting theorems and constructions we’ve seen in class.
Solution:
We know that for any two DFAs A and B over an alphabet Σ, we can build a DFA D whose language is the intersection of
L(A) and L(B). We also know that we can build a TM that can complement an input DFA, and we know that we can build
a TM that checks whether the language of a DFA is empty (this TM can, for example, do a depth-first search from the initial
state of the DFA, and explore whether any final state is reachable).
Also, note that L(A) ⊆ L(B) iff L(A) ∩ L(B) = ∅.
So, we can build a Turing machine that takes as input hA, B, Ci, and first checks whether these are all DFAs (over the
same alphabet Σ). Now, if they are, we must first check if L(A) ⊆ L(B) ∩ L(C). We can first build a DFA D that accepts
L(B) ∩ L(C). Now we need to check if L(A) ⊆ L(D), which is the same as checking if L(A) ∩ L(D) = ∅. We can do this by
first complementing D to get a DFA E, and then building a DFA F that accepts the intersection of the languages of A and E,
and then finally checking if the language of F is empty. We do a similar check to verify whether L(B) ∩ L(C) ⊆ L(A).
Hence, the decider for EQIN TDF A works as follows.
1. Input is hA, B, Ci.
2. Check if A, B and C are DFAs over the same alphabet Σ. If not, reject.
3. Build the automaton D accepting L(B) ∩ L(C).
4. Complement D to get DFA E.
5. Build the DFA F that accepts L(A) ∩ L(E).
6. Check if L(F ) = ∅. If it is not, then reject (as L(A) 6⊆ L(B) ∩ L(C)).
7. Complement A to obtain the DFA G.
8. Construct DFA H accepting L(G) ∩ L(D).
9. Check if L(H) = ∅. If it is, accept, else reject (as L(B) ∩ L(C) 6⊆ L(A)).

Problem 11: Reduction (8 points)

Solution:
Let AT M = {hM, wi | M is a TM that accepts w}. We know that AT M is undecidable. Let us reduce AT M to ODDT M
to prove that ODDT M is undecidable. In other words, given a decider R for ODDT M , let us show how to build a decider for
AT M .
The decider D for AT M works as follows:
1. Input is hM, wi.
2. Construct the code for a TM NM,w that works as follows:
. a. Input to NM,w is a word x
. b. Simulate M on w; accept x if M accepts w.
3. Feed NM,w to R, the decider of ODDT M .
4. Accept if R rejects; reject if R accepts.
For any TM M and word w, if M accepts w, then NM,w is a TM that accepts all words, and if M does not accept w then
NM,w accepts no word. Hence M accepts w iff L(NM,w ) contains an odd-length word.
Hence D accepts hM, wi iff R rejects hNM,w i iff L(NM,w ) contains an odd-length word iff M accepts w. Hence D decides
AT M .
Note that D does not simulate M on w; it simply constructs the code of a TM NM,w (which if run may simulate M on w).
So D simply constructs NM,w and runs R on it. Since R is a decider, it halts always, and hence D also always halts.

294
Since AT M reduces to ODDT M , i.e. any decider for ODDT M can be turned into a decider for AT M , we know that if
ODDT M is decidable, then AT M is decidable as well. Since we know AT M is undecidable, ODDT M must be undecidable.

295
49.6 Quiz 1
6 February 2008

This quiz has 3 pages containing 7 questions. None requires a long answer. No proofs are required;
explanations are only required when explicitly requested. Ensure your answers are legible. You have 20
minutes to finish.

1. (2 points) To formally define a DFA, what five components do you need to specify?
2. (3 points) Suppose that A = {aa, bb} and B = {1, 2}. List the members of B × P(A).
3. (2 points) Is the following a valid state diagram for a DFA? Explain your answer.

A
0 1
0

B D

0
1 1
C

4. (6 points) Here is the state diagram for an NFA.

Q0 a Q1 a Q2 ǫ Q3
a ǫ b

Q4 b Q5
Suppose the transition function is named δ. Fill in the following output values for the transition
function:

(a) δ(Q0, a) =

(b) δ(Q4, a) =

296
5. (5 points) Give the state diagram of an NFA which recognizes the language represented by the regular
expression (a + )(ba)∗ ba. (It’s not necessary to follow any specific mechanical construction.)
6. (3 points) Give a regular expression for the following language

L = {w | |w| is of even length}

7. (4 points) Let Σ be some alphabet. Suppose that N1 = (Q1 , Σ, δ1 , q1 , F1 ) is a DFA recognizing L1 ,

N2 = (Q2 , Σ, δ2 , q2 , F2 ) is a DFA recognizing L2 , and N3 = (Q3 , Σ, δ3 , q3 , F3 ) is a DFA recognizing L3 .
We can use the product construction to build a DFA M recognizing (L1 ∪ L2 ) − L3 . Each state of M
is a triple of states (q1 , q2 , q3 ), where q1 ∈ Q1 , q2 ∈ Q2 , and q3 ∈ Q3 .

(a) Precisely describe the set of final states for M .

(b) Suppose that δ is the transition function for M . Give a formula for δ in terms of δ1 , δ2 , and δ3 .
That is, if c is any character in Σ, then
δ((q1 , q2 , q3 ), c) =

297
49.7 Quiz 2
5 March 2008

1. (2 points) Given a context-free grammar G with start symbol S, generating language L, explain how
to construct a grammar for L∗ .
2. (5 points) Finish the following statement of the pumping lemma.
If L is a regular language, there is an integer p such that, if w is any string in L of length at least p,
then we can divide w into substrings w = xyz such that

(1) |x| ≥ 1
(2)
(3)
3. (3 points) Suppose that δ is the transition function for a PDA. What types of objects does δ take as
input? What type of object does δ produce as its output value?

4. (4 points) Let our alphabet be Σ = {a, b, c}. Give a context-free grammar that generates L = {an bj cn |
n ≥ 0, j ≥ 1}. Just give the rule(s) for each grammar. Your start symbol should be S and all variables
should be uppercase letters.
5. (2 points) Let Σ = {0, 1}. Let L = {0n x1n | x ∈ Σ∗ , n ≥ 1}. Is L regular? Why or why not?

6. (5 points) Suppose we know that the language B = {an bn | n ≥ 0} is not regular. Let L = {an bj cn |
j ≥ 1, n ≥ 1}. Prove that L is not regular using closure properties and the fact that B isn’t regular.
7. (4 points) Let G be the grammar with start symbol S and the following rules:

S → AS | B
A → ab | b
B→c

Give a parse tree and a leftmost derivation for the string abbc.

298
49.8 Quiz 3
23 April 2008

This quiz has 6 questions. None requires a long answer. No proofs are required; explanations are only
required when explicitly requested. Ensure your answers are legible. You have 20 minutes to finish.

1. (6 points) Mark each of the following claims as true or false:

(a) The language {< M, w >| M is a TM and M halts on input w} is TM recognizable.
(b) The language {< G >| G is a CFG and L(G) = Σ∗ } is TM decidable. (Σ is the alphabet given in
the description of G.)

(c) There are more possible languages than there are different Turing machines.
2. (4 points) Suppose that a Turing machine M contains the transition δ(q, 0) = (r, 2, R) (q and r are
states; 0 and 2 are tape symbols.) If M is now in configuration 021q03, what will its next configuration
be?
3. (3 points) An LBA is a Turing machine with one important restriction. What’s the restriction?

4. (4 points) Briefly explain why the following statement is true. (Don’t give a proof, just outline the key
ideas.)

If L and L are recognizable, then L is decidable.

5. (4 points) Suppose that an extended TM is like a normal TM, except its transitions have the option
of staying in place (S) rather than always having to move left or right. Show how to simulate the
transition δ(s, a) = (r, b, S) using transitions of a normal TM. (Either write out the transitions as
equations or draw a fragment of a state diagram.)
6. (4 points) Let M be a TM and w be a string. Define a TM Mw (with input alphabet Σ) as follows:

• Input = x
• If x is a palindrome, then accept.
• Otherwise, simulate M on w.
• Accept if M accepts. Reject if M rejects.

What are the possible values for L(Mw )?

299
Part IV

Homeworks

300
Chapter 50

Spring 2009

50.1 Homework 1: Problem Set 1

Spring 09
This homework contains four problems (and one extra credit problem). Please follow the homework
format guidelines posted on the class web page:
http://www.cs.uiuc.edu/class/sp09/cs373/
In particular, submit each problem on a separate sheet of paper, put your name on each sheet, and
write your discussion section time and day (e.g. Tuesday 10am) in the upper righthand corner. These
details may sound picky, but they make the huge pile of homeworks much easier to grade quickly and more
importantly, since we return them in the discussion sections, easier for you to get them back.
Note, that this homework should be done on your own. The next homeworks in this class could be done
will working in groups. See webpage for details.

1. We are all individuals.

[Category: Proof, Points: 10]
The following is an inductive proof of the following statement:1
“All students enrolled in CS373 in Spring 2009 have hair of the same color”
(Which I hope you realize is not true.).
Proof: For any subset S of students enrolled, we will argue that their hair is of the same color. The
proof is by induction on the size of S.
Base case: Base case is when |S| = 1. A set containing only one student clearly satisfies the statement.
Inductive step: Assume the statement is true for all subsets of students S 0 , where |S 0 | = k. We will
show that the statement is true for all subsets of students S, where |S| = k + 1.
Let |S| = k + 1. Remove one student, say x, from S to get a set S\x . By the inductive assumption,
all students in S\x have hair of the same color. Now remove another student y from S to get a set
S\y ; again, all students in S\y have hair of the same color. Now, since x’s color agrees with the rest of
the students in the set S\y and y’s color agrees with the rest of the students in S\x , x and y must also
have hair of the same color!
Let us illustrate the argument for a particular k, say 5. Assume S = {x, y, z, a, b}. Then S \ {x} =
{y, z, a, b} is a 4-element set, and hence the students in this set have the same hair color. Similarly
S \ {y} = {x, z, a, b} all have the same hair color. Hence, since x’s hair color is the same as, say
z, and y’s color also agrees with z’s, x and y have the same hair color, and hence the students in
S = {x, y, z, a, b} all have the same hair color.
1 We will assume for the sake of simplicity that each student has hair, and this hair is of only one color!

302
Show why the above induction argument is wrong. In particular, illustrate one set for which the
inductive argument fails.

2. Trees and edges.

[Category: Proof, Points: 10]
The following is an inductive proof of the statement that in every tree T = (V (T ), E(T )), |E(T )| =
|V (T )| − 1, i.e a tree with n vertices has n − 1 edges.
Proof: The proof is by induction on |V (T )|.
Base case: Base case is when |V (T )| = 1. But a tree with a single vertex has no edge, so |E(T )| = 0.
Therefore in this case the formula is true since 0 = 1 − 1.
Inductive step: Assume that the formula is true for all trees T where |V (T )| = k. We will prove
that the formula is true for trees with k + 1 nodes. A tree T with k + 1 nodes can be obtained from
a tree T 0 with k nodes by attaching a new vertex to a leaf of T 0 . This way we add exactly one vertex
and one edge to T 0 , so |V (T )| = |V (T 0 )| + 1 and |E(T )| = |E(T 0 )| + 1. Since |V (T 0 )| = k by induction
hypothesis we have |E(T 0 )| = |V (T 0 )| − 1.
Combining the last three relations we have |E(T )| = |E(T 0 )|+1 = |V (T 0 )|−1+1 = |V (T )|−1−1+1 =
|V (T )| − 1, which means that the formula is true for tree T .
Show that the above is not a correct inductive proof! You must argue why it is not correct, and in
particular produce a tree which the above argument does not cover.
3. Number of leaves.
[Category: Proof, Points: 10]
Give a (correct) inductive proof of the following claim:
Let T be a full binary tree over n vertices (that is, every node in T other than the leaves has two
children). Then T has (n + 1)/2 leaves.

4. True/false/whatever.
[Category: Notation, Points: 20]
Answer each of the following with true, false or meaningless. The notation \ denotes “set-minus” or
“set-difference”.

D1) ∅ ∈ {∅, 1}
D2) ∅ ⊆ {∅, 1}
D3) 1 ⊆ {∅, 1}
D4) {1} + {1, 2} = {1, 2}
D5) {1, 2} \ {1} = {2}
D6) {1, 2} \ {0} = {1, 2}
D7) {1, 2} ∩ {3, 4} = {}
D8) {1, 2} ∩ {3, 4} = {∅}
D9) {1, 2} ∪ {1, 3} = {1, 1, 2, 3}
D10) {1, 2} ∪ {1, 3} = {1, 2, 3}
D11) {1, {1}, {{1}}} = {1}
D12) {1, {1}, {{1}}} = {1, 1, 1}
D13) {1} ∈ {1, {1}, {{1}}}
D14) {1} ⊆ {1, {1}, {{1}}}
D15) {{1}} ∈ {1, {1}, {{1}}}

303
D16) {A, B} × {C, D} = {(A, B), (C, D)}.

D17) {A, B} × {C, D} ∩ {C, D} × {A, B} = {}.
D18) |{A, B, C} × {D, E}| = 6.
D19) {A, B} × {} = {A, B}.
D20) {A, B} \ {B, A} = {}.

5. Getting to 100.
[Category: Proof, Points: 10]
(Extra credit.)
We are given one copy of every digit in the list 0, 1, 2, 3, 4, 5, 6, 7, 8, 9. You are asked to form numbers
from these digits (you can use each digit only once, and you must use each digit once), such that the
sum of these numbers add up to 100.
For example, a valid set of numbers you can form from the digits is {23, 17, 40, 5, 6, 8, 9} but it adds
up to 108, not 100.
Prove that this is impossible; i.e. no matter how you form the numbers, you can not find a way for
them to add up to exactly 100.

50.2 Homework 2: DFAs

Spring 09

1. DFA building
[Category: Construction, Points: 10]
Let Σ = {a, b}. Let L be the set of words w in Σ∗ such that w has an even number of b’s and an odd
number of a’s and does not contain the substring ba.

(a) [1 Points] Give a regular expression for the language of all strings over Σ that do not contain ba
as a substring.
(b) [1 Points] Give a regular expression for the language L above (shorter expression is better, natu-
rally).
(c) [8 Points] Construct a deterministic finite automaton for L that has at most 5 states. Make sure
that your DFA is complete (has transitions from all actions on all states).
If you find this hard, you can also give a DFA with more states that accepts L for partial credit.
Hint: You may want to try to enumerate some elements in L, and see whether you can simplify
the description of L.

2. Control.
[Category: Construction, Points: 10]
We want to build the controller for an automatic door, that opens for some time when a sensor senses
that a person is approaching the door.
Let Σ = {tick , approach, open, close}
Every odd event is a “tick” or an “approach”, and every even event is the controller’s response to it,
“open” or “close”. An “open” event directs the door to open if it is closed, and to remain open if it
is open. Similarly, a “close” event directs the door to close or remain closed. An “approach” event
happens when the the sensor detects that a person is approaching the door, and a “tick” event denotes
the passage of one second of time. We want the door to open immediately after detecting an “approach”
event, and remain open for exactly 2 ticks after the last “approach” event, at which point the door

304
must close. (Thus, the input has even length and is made out of pair of word. Each pair is a command,
followed by current status report.)
Build an automaton that accepts all valid sequences of this controller behavior. Your automaton should
be over the alphabet Σ, and, for example, must accept
tick .close.approach.open.tick .open.tick .open.tick .close,
and accepts
approach.open.tick .open.approach.open.tick .open.tick .open.tick .close.
But it rejects the word
tick .open,
and rejects the word
tick .close.approach.open.tick .open.tick .close.

3. Recursive definitions
[Category: Construction, Points: 10]
Consider the following recursive definition of a set S.
(a) (1, 58) ∈ S
(b) If (x, y) ∈ S then (x + 2, y) ∈ S
(c) If (x, y) ∈ S then (−x, −y) ∈ S
(d) If (x, y) ∈ S then (y, x) ∈ S
(e) S is the smallest set satisfying the above conditions.
Give a nonrecursive definition of the set S. Explain why it is correct.
4. Set Theory
[Category: Proof, Points: 10]
For any two arbitrary sets X and Y , we have that (X \ (X ∩ Y )) ∩ (Y \ (Y ∩ X)) is an empty set.
(a) Explain informally why this is true, using words and/or a Venn diagram.
(b) Prove it formally.
5. No such thing.
[Category: Proof, Points: 10]
(Extra credit.)
Let Lsame be the language of all strings over {0, 1, 2} that have the same number of 0s as the sum
of the number of 1s and 2s. Provide a formal (correct) proof that no finite automata (i.e., DFA) can
accept Lsame .

50.3 Homework 3: DFAs II

Spring 09
1. Intersect ’em
[Category: Category: Construction, Points: 10]
You are given two NFAs A1 = (P, Σ, δ1 , p0 , F1 ) and A2 = (Q, Σ, δ2 , q0 , F2 ).
Construct an NFA that will accept the language L(A1 ) ∩ L(A2 ) with no more than |P | ∗ |Q| states.
Also, prove that it indeed accepts the language of the intersection as stated above.
Hint: You may want to think of a product construction. And you may want to look at the proof of the
correctness of the product construction.

305
2. Express yourself
[Category: Category: Comprehension, Points: 10]

(a) What is the language of each of the following regular expressions?

Note: A clear, crisp and one-level-interpretable English description is acceptable, like “This
is the set of all binary strings with at least three zeros and at most a hundred ones”, or like
{0n (10)m : n and m are integers}. A vague, recursive or multi-level-interpretable description is
not acceptable, like: “This is the set of binary strings such that they start and end by 1, and the
rest of string starts and ends by 0, and the remainder of string is a smaller string of the same
form!”, or “This is the set of strings like 010, 00100, 0001000 and so on!!”.
(1) (0∗ + 1 + 1∗ )∗
(2) (1 + )(00∗ 1)∗ 0∗
(3) 0∗ (1 + 000∗ )∗ 0∗
(4) (0∗ 1∗ )000(0 + 1)∗
(5) 1∗ + 0∗ + (00∗ 11∗ 00∗ 11∗ )∗ + (11∗ 00∗ 11∗ 00∗ )∗
(b) Write down a regular expression for each case below, that represents the desired language. (Binary
strings are strings over the alphabet {0, 1}.)
(1) All binary strings that their third character from the end is zero.
(2) All binary strings which have 111 as a substring.
(3) All binary strings that have 00 as a substring but do not contain 011 as a substring.
(4) All binary strings such that in every prefix of the string, the number of one’s and zero’s differ
by at most one.
(5) All binary strings such that every pair of consecutive zero’s appears before any pair of con-
secutive one’s.

3. Modifying DFAs
[Category: Comprehension, Points: 5+5]
Suppose that M = (Q, Σ, δ, q0 , F ) and N = (R, Σ, γ, r0 , G) are two DFAs sharing the common alphabet
Σ = {a, b}.

(a) Define a new DFA M 0 = ((Q ∪ {qx }) × {0, 1}, Σ ∪ {#}, δ 0 , (q0 , 0), F 0 ) whose transition function is
defined as follows

 (δ(q, t), i) q ∈ Q and t ∈ Σ
δ 0 ((q, i), t) = (q0 , 1) q ∈ F, i = 0, t = #

(qx , i) otherwise.
and where (qj , i) ∈ F 0 iff qj ∈ F and i = 1
Describe the language accepted by M 0 in terms of the language accepted by M .
(b) Show how to design a DFA N 0 over Σ ∪ {#} that accepts the language
n o

L0 = x#y#z x ∈ L(M ) , y ∈ L(N ) and z ∈ a∗

Define your DFA formally using notation similar to the definition of M 0 in part (a).

4. Multiple destinations
[Category: Proof, Points: 7+3]
Let L = aa∗ + bb∗ .

(a) Prove that any DFA accepting L must have more than one final state.
(b) Show that L is acceptable by an NFA with only one final state.

306
5. Equality and more.
[Category: Construction, Points: 10]
Let Σ = {0, 1, $}. For any n ∈ N, let the language Ln be:
n o
∗
Ln = w1 $w2 w1 , w2 ∈ {0, 1} , |w1 | = |w2 | = n, and w1 = w2 .

(a) Exhibit a DFA for L3 .

(b) For any fixed k, specify the DFA accepting Lk .
(c) Let L be the language
n o
∗
L = w1 $w2 w1 , w2 ∈ {0, 1} , and w1 = w2 .

Argue,as precisely as you can (a proof would be best), that L is not regular.
(d) We can express the language L as ∪∞ k=1 Lk . It is tempting to reason as follows: The language L
is the union of regular languages, and hence is regular.
What is the flaw in this argument, and why is it wrong in our case?

6. How big?
[Category: Proof, Points: 10]
(Extra credit.)
Let Σ = {0, 1, $}, and consider the language
n o
∗
Ln = w1 $w2 w1 , w2 ∈ {0, 1} , |w1 | = |w2 | = n, and w1 = w2 .

(i) Prove that any DFA accepting Ln must have at least 2(2n+1 − 1) states.
(ii) Prove that any NFA accepting Ln must have at least 2(2n+1 − 1) states.

50.4 Homework 4: NFAs

Spring 09

1. NFA interpretation.
[Category: Notations., Points: 10]
Consider the following NFA M .
a b
A B E
b ǫ
a
F

ǫ
(a) Give a regular expression that represents the language of M . Explain briefly why it is correct.
Note: You needn’t go through the process of converting this to a regular expression using GNFAs;
you can guess (correctly) the regular expression.
(b) Recall the definition of an NFA accepting a string w (Sipser p. 54). Show formally that M accepts
the string w = abba

307
(c) Let Σ = {a, b}. Give the formal definition of the following NFA N (in tuple notation). Make sure
you describe the transition function completely (for every state and every letter).
B
a a b ǫ
a a
A C E

b
b, ǫ

2. NFA to DFA.
[Category: Construction., Points: 10]
Convert the following NFA to an equivalent DFA (with no more than 10 states). You must construct
this DFA using the subset-construction as done in class. Draw the DFA diagram, and also write down
the DFA in tuple notation (there is no need to include states that are unreachable from the initial
state).

a, b C
b
a
A a D a
ǫ
b b

3. NFA Construction by Guessing.

[Category: Construction, Points: 10]
Let Σ = {0, 1} be an alphabet. Let
n o

Ln = αcxcβ c ∈ Σ, α, β ∈ Σ∗ , x ∈ Σn , and |α| is divisible by 3 and |β| is divisible by 3 .

(a) [3 Points] Draw the NFA for L2 .

(b) [7 Points] Formally define the NFA Mn = (Qn , Σ, δn , qinit,n , Fn ) for Ln for any n. In particular,
you must define formally Qn , δn , qinit,n , Fn using simple set notations.
4. NFA to Regex.
[Category: Construction, Points: 10]
In lecture 8 (Sipser pp. 69–76), we saw a procedure for converting a DFA to a regular expression. This
algorithm also works even if the input is an NFA.
For the following NFA, use this procedure to compute an equivalent regular expression. So that everyone
does the same thing (and we don’t create a grading nightmare), you should do this by:
– Adding new start and end states, and then

308
– removing states p, q, s in that order.
Provide detailed drawing of the GNFA after each step in this process.

1: The original NFA.

a
p q
a, b
b
c a
b b
s
b

Note, that in this problem you will get interesting self-loops. For example, one can travel to from q to
p and then back q. This creates a self-loop at q when p is removed.
5. NFA to Regex by other means.
[Category: Proof., Points: 10]
(Extra credit.)
There is another technique one can use to compute a regular expression from an NFA. As a concrete
example, consider the following NFA M seen in class (lecture 8, the examples section).
a
A B
b
b
a

C
a, b

Consider, for each state in the above automata, the language that this automata would accept if we
set this state to be the initial state. The language accepted from state A, denoted by L(A), which we
will write as A to make the notations cleaner. Clearly, a word in A is either a followed by a word that
is in B (the language the automaton accepts when B is the initial state), or its b followed by a word
in C (the language the automaton accepts with C as initial state). We can write this observation as an
equation over the three languages:
A = aB ∪ bC.
As a concrete example, in the above automata, C is a final state, which implies that ∈ C. As such,
by the above equation, A must contain the word b = b ∈ bC. Now, since A is the initial state of M ,
it follows that L(M ) = A. This implies that b ∈ L(M ). (Thats a very indirect way to see that, but it
would be useful for us shortly.)
(A) Write down the system of three equations for the languages in the above automata. (Note, that
one gets one equation for each state of the NFA.)
(B) Let r, s be two regular expression over some alphabet Σ, and consider an equation of the form

D = rD ∪ s.

let D be the minimal language that satisfies this equation. Give a regular expression for the
language D. Prove your answer.
(C) For a character a ∈ Σ, what is the smallest language E satisfying the equation E = aE? Prove
your answer.

309
(D) The above suggests a way to get the regular expression for a language. Take the system of
equations from (A), and eliminate from it (by substitution) the variable B. Then, eliminate from
it the variable C. After further manipulation, you remain with an equation of the form
A = something,
where “something” does not contain any of our variables (i.e., B, and C). Now, convert the right
side into a regular expression describing all the words in A.
Carry out this procedure in detail for the above NFA M (specifying all the details in your solution).
What is the regular expression of the language of M that results from your solution?

50.5 Homework 5: On non-regularity.

Spring 09
(Q1) Irregular.
[Category: Proof., Points: 10]
Let Σ = {1, #} and let
n o

L = w w = x1 #x2 #...#xk for k ≥ 0, where xi ∈ 1∗ and xi 6= xj for i 6= j .

Provide a direct proof that L is not a regular language.

(Q2) Inverse homomorphism.
[Category: Proof., Points: 10]
Let h : Σ∗ → Γ∗ be a homomorphism, and L be a regular language over Γ∗ . Show that the inverse
homomorphism language h−1 (L) is regular. Here
n o

h−1 (L) = w h(w) ∈ L .
n o
i
For example, for the language L = (aaa) i ≥ 0 , and the homomorphism h(0) = aa and h(1) = aa,
the inverse homomorphism language is
n o
i
L0 = (0 + 1) i ≥ 0 is a multiply of 3 ,

since for any w ∈ L0 , we have that h(w) ∈ L. Note, that the inverse homomorphism is not a
unique mapping. A word w ∈ L might have several inverse words w1 , . . . , wj ∈ L0 , such that
h(w1 ) = h(w2 ) = · · · = h(wj ) = w. For example, for the word a6 ∈ L, we have h−1 (a6 ) =
{000, 001, 010, 011, 100, 101, 110, 111}, since, for example, h(000) = h(101) = a6 .
Similarly, it might be that a word w ∈ L has no inverse in h−1 (L). For example, aaa ∈ L, but has no
inverse in L0 , since h(w) is always an even length word.
To prove that if L is regular then h−1 (L) is regular, assume you are given a DFA D such that L = L(D).
Show how to modify this DFA D = (Q, Γ, δ, q0 , F ) into a DFA C for h−1 (L). Describe formally the
construction, and prove formally that h−1 (L) = L(C). (Hint: If w = w1 w2 . . . wk ∈ h−1 (L) then
h(w) = h(w1 )h(w2 ) . . . h(wk ) ∈ L.)
For example, for the languages show above, we have the following two DFAs.

0, 1
q0 a q1 a q2
q0 q1 q2
0, 1 0, 1
a
n o
i
DFA for L == (aaa) i ≥ 0 DFA for L0 = h−1 (L)

310
This implies that regular languages are closed under inverse homomorphism.

(Q3) Irregularity via closure properties.

[Category: Proof., Points: 10]
Use (only) closure properties to show that the following languages are not regular, using a proof by
contradiction. You can use languages that you saw in class that are not regular as a starting (or
ending) point to your arguments.
n o

(a) L1 = an bm cn n, m ≥ 0
n o

(b) L2 = an bn cn n ≥ 0 .
n o

(c) L3 = an b3n n ≥ 0 .
(Hint: Use the result from question 2.)
n o

(d) L4 = an bn an n ≥ 0 .
(Hint: Use the result from question 2 and (b).)
n o
∗
(e) L5 = w ∈ (a + b) #a (w) = #b (w) , where #c (w) is the number of times the character c ap-
pears in w.
n o
∗
(f) L6 = w ∈ (a + b) #a (w) 6= #b (w) .

(Q4) Palindromes are irregular.

[Category: Proof., Points: 10]
A palindrome is a string that if you reverse it, remains the same. Thus tattarrattat2 is a palin-
drome. Let us consider the empty string to be a palindrome.

(a) Give a direct proof (without using the pumping lemma) that Lpal , the language of all palindromes
over the alphabet {a, b} is not regular. Your proof should show that any DFA for this language
has an infinite number of states.
(b) (Hard?) Prove using only closure properties that Lpal is not a regular languages.

(Q5) CFGs are as strong as regular languages.

[Category: Proof., Points: 10]
Let L be a regular language recognized by a DFA D = (Q, Σ, δ, q0 , F ). We claim that we can construct
c
a CFG for L as follows. We introduce a variable Xi for every state qi ∈ Q. Given a transition qi −
→ qj
in D, we introduce the rule
Xi → cXj
in the grammar. We also introduce the rule Xi → , if qi ∈ F . Finally, we set X0 to be the starting
variable for the resulting grammar. Let G = G(D) denote this grammar.
For example, the grammar for the following DFA,

q0 a q1 a q2

a
is the following grammar
2 Which is the longest palindromic word in the Oxford English Dictionary is tattarrattat, coined by James Joyce in Ulysses
for a knock on the door.

311
⇒ X0 → aX1 |
(G) X1 → aX2
X2 → aX0

Prove that in general this construction works. That is, prove formally that for any DFA D, we have
that the language of LG = L(G(D)) is equal to L(D).
(Hint: Do not use induction in your proof. Instead, argue directly about accepting traces and deriva-
tions of words.)

(Q6) What * has to do with it, anyway?

[Category: Proof., Points: 10]
(Extra credit – really hard.)
n o

Let L ⊆ 0i i ≥ 0 be an arbitrary language. Prove, that the language L∗ is regular.
(Note, that this is quite surprising if L is not regular.)
(Hint: Consider first the case when L contains two words. Then prove it for any finite L. Finally,
prove it for the general case. Credit would be given only for a solution for the general case, naturally.)

50.6 Homework 6: Context-free grammars.

Spring 09

(Q1) What is in, what is out?

[Category: Understanding., Points: 10]
For each of the following grammars answer the following:

(a) Is abcd in the language of the grammar? If so give an accompanying derivation and parse tree.
(b) Is acaada in the language of the grammar, if so give an accompanying derivation and parse tree.
(c) What is the language generated by the grammar and explain your answer.

(Note: assume Σ = {a, b, c, d} and the start symbol is S for both grammars.)

(Q2) Building grammars.

[Category: Construction., Points: 10]
Show that the following languages are context-free by giving a context-free grammar for each of them.
n o

(a) L = ai bj 2i ≤ j ≤ 3i, i, j ∈ N .
(Hint: Build a grammar for the case that j = 2i and the case j = 3i, and think how to fuse these
two grammars together to a single grammar.)

0 ∗ w contains equal number of occurrences of
(b) L = w ∈ {0, 1} .
substring 01 and 10
Thus 101 contains a single 01 and a single 10 and as such belongs to the language, while 1010
does not as it contains two 10’s and one 01.

312
(Q3) Prove this.
[Category: Proof, Points: 10]

Consider the following proof.

Lemma 50.6.1 If L1 and L2 are both context-free languages, then L1 ∪ L2 is a context-free language.

Proof: Let G1 = (V1 , Σ, R1 , S1 ) and G2 = (V2 , Σ, R2 , S2 ) be context free grammars for L1 and L2 ,
respectively, where V1 ∩ V2 = ∅. Create a new grammar

G = (V1 ∪ V2 , Σ, R, S) ,
n o
where S ∈
/ V1 ∪ V2 and R = R1 ∪ R2 ∪ S → S1 , S → S2 .
We next prove that L(G) = L1 ∪ L2 .

L(G) ⊆ L1 ∪ L2 :
Consider any w ∈ L(G), and any derivation of w by G. It must be of the following form:

S → Si → X1 X2 → . . . → w,

where i is either 1 or 2. Assume, without loss of generality, that i = 1, and observe that if we
remove the first step, this derivation becomes

S1 → X1 X2 → . . . → w.
∗
Namely, S1 =⇒ w using grammar rules only from R1 . We conclude that w ∈ L(G1 ) = L1 , as S1
is the start symbol of G1 .
The case i = 2 is handled in a similar fashion.
Thus, we conclude that w ∈ L1 ∪ L2 , implying that L(G) ⊆ L1 ∪ L2 .

L1 ∪ L2 ⊆ L(G):
Consider any word w ∈ L1 ∪ L2 , and assume without limiting generality, that w ∈ L1 . As such,
∗
we have that S1 =⇒ w. But S → S1 is a rule in G, and as such we have that
G1

∗
S → S1 =⇒ w.
G1

∗
Namely, S =⇒ w, since all the rules of G1 are in G. We conclude that w ∈ L(G).
G

Putting the above together, implies the lemma.

Provide a detailed formal proof to the following claim, similar in spirit and structure to the above
proof.

Lemma 50.6.2 If L! and L2 are both context-free languages, then L1 L2 is a context-free language.

(Q4) Edit with some mistakes.

[Category: Construction., Points: 10]

The edit distance between two strings w and w0 , is the minimal number of edit operations one has
to do to modify w into w0 . We denote this distance between two strings x and y by EditDist(x, y). We
allow the following edit operations: (i) insert a character, (ii) delete a character, and (iii) replace a
character by a different character.

313
For example, the edit distance between shalom and halo is 2. The edit distance between har-peled
and sharp␣eyed is 4:

har-peled =⇒ shar-peled =⇒ sharpeled =⇒ sharp␣eled =⇒ sharp␣eyed.

For the sake of simplicity, assume that Σ = {a, b, $}. For a parameter k, describe a CFG for the
language
n o
∗
Lk = x$y R x, y ∈ {a, b} , EditDist(x, y) ≤ k .

For example, since EditDist(aba, bab) = 2, we have that aba$bab ∈ L2 , but aba$bab ∈
/ L1 . Similarly,
EditDist(aaaa, abb) = 3, and as such aaaa$bba ∈ L3 , but aaaa$bba ∈
/ L2 .
(Hint: What is the language L0 ? Try to give a grammar to L1 before solving the general case.)
Provide a short argument why your CFG works.

(Q5) Speedup theorem for CFGs.

[Category: Proof., Points: 10]

Assume you are given a CFG G = (V, Σ, R, S), such that any word w ∈ L(G) has a derivation of w with
f (n) steps, where n = |w|. Here f (n) is some function.
Prove the following claim.

Claim 50.6.3 There exists a grammar G 0 such that L(G) = L(G 0 ), and furthermore, any derivation
of a word w ∈ L(G) of length n requires at most df (n)/2e derivation steps.

For example, consider the grammar S → aSb | .

A word an bn can be derived using n + 1 steps. Its speeded up version is

S0 → aaS0 bb | ab | .

Now, an bn has derivation of length d(n + 1)/2e by the grammar of S0 .

(Hint: Consider a parse tree of height h in the grammar, and think how to modify the grammar, so
that you get a parse tree for the same word with height only dh/2e.)

50.7 Homework 7: Context-free grammars II

Spring 09

(Q1) Building Recursive Automata.

[Category: Construction, Points: 10]
For each of the following languages construct a recursive automaton for it, and briefly describe why it
works. Also, for each of these languages, pick a word of length at least 6 in the language and show the
run of your automaton on it, including stack contents at each point of the run:
n o

(a) L1 = ai bj i ≥ 0, 3i ≥ j ≥ 2i over the alphabet {a, b}.
n o
∗
(b) L2 = w1 $w2 $0i $1j w ∈ {a, b} , i = |w2 | and j = |w1 |
Here L2 is defined over the alphabet {a, b, $, 0, 1}.

314
(Q2) Understanding Recursive Automata.
[Category: Understanding, Points: 10]
For the following recursive automaton with initial module S, give the language of the automata precisely.
a, b

p0 S′ p1
S:

q1 S′ q2
a a
q0 b q7 S′
b q
q6 3

# ǫ
q4

S′ : a, b

(Q3) CYK Parsing.

[Category: Understanding, Points: 10]
For the following CNF grammar (with start symbol S) and the following string:
book the flight through champaign
(assume spaces differentiate the various terminals and nonterminals [not single characters]), generate
a valid parse tree using the CYK parsing algorithm. Turn in the filled out chart (matrix) as well.
S → NP VP | X2 PP | VERB NP | VERB PP | VP PP | book
NP → i | she | me | champaign | DET NOMINAL
DET → the | a
NOMINAL → book | flight | meal | NOMINAL NOUN | NOMINAL PP
NOUN → book | flight | champaign
VP → book | include | prefer | VERB NP | X2 PP | VERB PP | VP PP
VERB → book | flight
X2 → VERB PP
PP → PREPOSITION NP
PREPOSITION → through
In the above, the terminals are
Σ = {book , i , she, me, champaign, the, a, flight, meal , include, prefer , through}.
(Q4) Closure Properties of CFLs
[Category: Proof, Points: 10]
For each of the following languages, prove that they are not context free using closure properties
discussed in class to generate a known non-context free language such as {an bn cn | n ≥ 0}. You can
assume that CFLs are closed under homomorphisms and inverse homomorphisms as well.
n o
∗
(a) L1 = w ∈ {a, b, c, d} #a (w) = #b (w) = #c (w) = #d (w) , where #x (w) denotes the number
of appearances of the character x’s in w, for x ∈ {a, b, c, d}.
n o

(b) L2 = 0i #02i #03i i ≥ 0 .

315
(Q5) Shuffle
[Category: Proof, Points: 10]
For a given language, L, let
n o

Shuffle(L) = w y ∈ L, |y| = |w| and, w is a permutation of letters in y .

For instance if L = {ab, ada}, then Shuffle(L) = {ab, ba, aad, ada, daa}.
Prove that if L is a regular language, then Shuffle(L) is not necessarily a CFL. In other words, prove
that the statement “For every regular language L, the language Shuffle(L) is a CFL.” is false.

50.8 Homework 8: Recursive Automatas

Spring 09
(Q1) Building Recursive Automata.
[Category: Construction, Points: 10]
The language of Boolean expressions is defined as the language of the grammar G = (V, Σ, R, B) where
V = {B}, Σ = {0, 1, (, ), ∨, ¬} and B has the following rules
B → 0 | 1 | (B ∨ B) | (¬B).
A Boolean expression evaluates to 0 or 1 in the natural sense: 0 is false, 1 is true, ∨ represents the
boolean disjunction (or ) and ¬ represents theboolean negation (not). For example, the expression
(¬(0 ∨ 1)) evaluates to 0, and the expression ¬((¬1) ∨ (¬1)) evaluates to 1.

(a) Let L be the following language over the alphabet Σ0 = Σ ∪ {=}.

n o

L = hexpi = b hexpi is a boolean expression that evaluates to b .

(Eg. The word “(¬(0 ∨ 1)) = 0” is in L but the word (¬(0 ∨ 1)) = 1 is not in L.)
Construct a recursive automaton for L and briefly describe why it works.
(b) For the string w = (0 ∨ 1) = 1 show the run of your automaton on it, including stack contents at
each point of the run (i.e., list the sequence of configurations in the accept trace for w for your
RA). Use the formal definition given in the lecture notes.
(Q2) Recursive Automata with Finite Memory.
[Category: Proof, Points: 10]
You are working on a computer, which has a limited stack size of (say) 5. You know this means that
you can have a call depth of at most 5 recursive calls.
(a) Argue that the language accepted by any RA on this machine is regular.
More formally, given a RA
n o

D = M, main, (Qm , Σ ∪ M, δm , q0m , Fm ) m ∈ M ,

describe an NFA C = (Q0 , Σ, δ 0 , q0 , F 0 ) that will accept the same language.

(b) What is Q0 ?
(c) What is q0 ? What is F 0 ?
(d) What is δ 0 ?
(Hint: Think about using configurations of the RA in your construction. See the class notes for the
formal definition of what is configuration. It might also be useful for you the review the concept of
acceptance for a RA.)

316
50.9 Homework 9: Turing Machines
Spring 09

(Q1) Building Turing Machines.

[Category: Construction, Points: 50]
Give the state diagram for a Turing Machine for the following language.
n o

L = $a2n bn c3n n ≥ 0 .

For example, the input may look like $abc.

You do not need to draw transitions that lead to the (implicit) reject state. Hence, any transition that
is not present in your diagram will be assumed to lead to the reject state. Indicate which symbol (e.g.
t) you are using for the special blank symbol that fills the end of the tape.

(Q2) Building Turing Machines II.

[Category: Construction, Points: 50]
Give the state diagram for a Turing Machine for the following language.
n n o

L = $a2 #bn n ≥ 0 .

For example, the input may look like $aaaa#bb or $aaaaaaaa#bbb

50.10 Homework 10: Turing Machines II

Spring 09

(Q1) High Level TM Design.

[Category: Construction of machines, Points: 10]
A perfect number is a positive integer that is the sum of its proper positive divisors, that is, the sum
of the positive divisors excluding the number itself. For example, 6 is the first perfect number, because
1, 2, 3 are its proper positive divisors, and 6 = 1 + 2 + 3.
Design a Turing Machine that for a given input tape with n cells containing O’s, marks the positions
which are perfect numbers.
Specifically, cells at the non-perfect-numbered positions are left containing 0, cells at perfect-numbered
locations are left containing X. For example, consider the input OOOOOOOOOO. This represents the first
10 numbers. So the Turing machine should halt with 00000X0000 on the tape.

(Q2) TM Encoding.
[Category: Comprehension, Points: 10]
In this problem we demonstrate a possible encoding of a TM using the alphabet {0, 1, ;} where ; is used
as a seperator. We encode M = (Q, Σ, Γ, δ, q0 , qacc , qrej ) as a string representing |Q|, |Σ|, |Γ|, q0 , qacc ,
qrej and then the δ. Each of the first five quantities is represented using a numbering system where n
is represented by a 1 followed by n 00 s.
Thus is |Q| = n, this means we have n states numbered 1 to n.
If |Σ| = n , this means we have n symbols in the alphabet. We adopt the convention that these symbols
are 0 to n − 1 where each can be represented using the numbering system mentioned above.

317
We use a similar scheme for Γ with the restriction that the blank symbol is assigned the largest number.

q0 , qacc and qrej are the next 3 numbers represented.

D is either 10 to mean move left or 100 to mean move right.

The remaining string represents δ as follows. Each transition (q, a) → (q 0 , b, D) is represented as the
concatenation of the represntation of each of the 5 quantities and the reresentation of δ is simply the
concatenation of the representation of each transitions. We use the convention that transitions not
mentioned go to the reject state in the encoded machine.

Here is the representation of a mystery Turing machine M , using this encoding.

1000000;
1000;
10000;
10;
100000;
1000000;
10; 100; 100; 100; 100;
100; 1; 100; 1; 100;
100; 10; 100; 10; 100;
100; 1000; 1000; 1000; 10;
1000; 1; 100000; 10; 100;
1000; 10; 10000; 1; 10;
10000; 10; 10000; 1; 10;
10000; 100; 100000; 10; 10;
10000; 1; 100000; 10; 10

(a) Draw a state diagram for this TM (omitting the reject state).

(b) What is the language of this TM? Give a brief justification.

(Q3) TM with infinite number of tapes.

[Category: Construction, Points: 20]

An ITM is a special TM with one head and infinite number of tapes (one sided tapes). The tapes are
numbered 0, 1, 2, 3, · · · . The cells on each tape are also numbered from left to right with 0, 1, 2, 3, · · · .
When the machine starts, the head is on cell number 0 of tape number 0. If the head is on cell number
i of tape number j, it can overwrite that cell and move either to cell i + 1 or i − 1 (if i ≥ 1) of tape j
or move to cell i of tape j + 1 or tape j − 1 (if j ≥ 1).

Prove that ITM’s and normal TM’s are equivalent.

(you should prove that every normal TM can be simulated using an ITM -this should be the easy part-
and every ITM can be simulated using a normal TM).

(Q4) What does it do?

[Category: Comprehension, Points: 10]

What does the following TM do?

318
0 → 0, R

0 → X, R ␣ → $, R ␣ → 0, R
q5 q8 q9 q2

X → X, R X → 0, L 0 → 0, L
␣ → 0, L

0 → 0, L $ → $, L
q1 q0 q11 q10
$ → $, L

X → X, R 0 → 0, L
$ → $, R 0 → 0, R X → X, R

X → X, L q15 q14 q12 q4

␣ →, L $ → $, R 0 → X, R
0 → X, R
X → X, L 0 → 0, R $ → $, R

q17 q16 X → X, R q13

0 → 0, L
␣ → 0, L 0 → 0, R

50.11 Homework 11: Enumerators and Diagonalization

Spring 09

(Q1) Enumerators
[Category: Construction, Points: 10]
(a) Design an enumerator that will list all positive integral solutions to a polynomial inequality in
three variables, which will be given as input on the input tape of the Turing machine.
I.e., you need to give all positive integral solutions to a given inequality P (x, y, z) ≥ c. An example
of such an inequality is 2x2 y 2 + xy + z ≥ 5.
You may use pseudo code to describe your solution.
(b) Modify the enumerator above to give all integral (positive or negative) solutions.

(Q2) Decision and Enumeration.

[Category: Construction, Points: 12]
(a) Show that L1 is a decidable language:

L1 = {hM, x, ii : Machine M will accept string x in no more than i steps}

(b) Show that L2 is a decidable language:

L2 = {hM, x, ii : M (x) =“yes” and head of M will use only the first i cells of its tape}

319
(c) Explain how to build an enumerator for L3 :

L3 = {hM, xi : M accepts x in no more than 2|x| steps}

(Q3) Aliens in outer space.

[Category: Understanding, Points: 4]
It is widely believed by certain people that intelligent aliens exist in outer space. Some of these people
will be happy to discuss seriously Invisible Pink Unicorns3 , a tendency that dramatically undercuts
their credibility. On the other hand, there are people that conjecture the existence of intelligent life
on earth, despite overwhelming evidence to the contrary.
One could of course try to resolve the existence of aliens in outer space question by building spacecrafts,
go out, look around, etc. But this is an expensive proposition that would require much time and
expense. As such, resolving the question of existence of aliens in outer space seems to be quite
challenging.
A cheaper solution would be to build a TM that would resolve the question. The TM would print
"Yes" on the tape if aliens in outer space exist, and then it would stop. Similarly, it aliens do not exist
then it would print "No way" on the tape and stop.
Argue that the existence of aliens in outer space is a decidable problem. That is, there exists a TM
that always stops and prints "Yes" iff there are aliens in outer-space.

(Q4) I am a liar.
[Category: Puzzle, Points: 4]
A town has two kinds of people, visually indistinguishable, called Grubsies and Greepies. Greepies
always tell the truth; Grubsies always lie (names can be deceiving, you see).
You come to the town, and chance upon a person (who could be a Grubsy or a Greepy) and you want
to find out whether a particular road leads to Wimperland (assume all people in the town know the
answer to the question).
Find a single YES/NO question that you can ask the person so that you can figure out whether the
road leads to Wimperland.
Grubsies and Greepies are fictional; any resemblance to person or persons living or dead or undead is
purely coincidental.
Hint: think of the diagonalization proof of undecidability of the membership problem for Turing ma-
chines.

3 See http://en.wikipedia.org/wiki/Invisible_Pink_Unicorn.

320
50.12 Homework 12: Preparation for Final
Spring 09

1. Language classification.
[Category: Understanding, Points: 10]
Suppose that we have a set of Turing machine encodings defined by each of the following properties.
That is, we have a set
n o

L = hM i M is a TM and M has property P ,

and we are considering different ways to fill in P . Assume that the Turing machines M have only a
single tape.

(A) P is “M accepts an input string of length 3.”

(B) P is “on blank input, M halts in at most 300 transitions, leaving the entire tape blank.”
(C) P is “M ’s code has exactly 3 states to which there are no transitions .”
(D) P is “M accepts no string of length 3.”

For each of these languages, determine whether it is Turing decidable, Turing recognizable, or not
Turing recognizable. Briefly justify your answers.

2. Reduction I.
[Category: Construction, Points: 10]
Define the language L to be
n o

L = M M is a TM and L(M ) is decidable but not context free .

Show that L is undecidable by reducing ATM to L. (Do the reduction directly. Do not use Rice’s
Theorem.)

3. Reduction II
[Category: Construction, Points: 10]
Define the language L to be
n o

L = M M is a TM and ∀n ∃x ∈ L(M ) where |x| = n .

Show that L is undecidable by reducing ATM to L. (Do the reduction directly. Do not use Rice’s
Theorem.)

4. Enumerate this.
[Category: Construction, Points: 10]
Construct an enumerator for the following set:
n o

L = hTi T is a Turing Machine and |L(T)| ≥ 3 .

321
5. DFAs are from Mars, TMs are from Venus.
[Category: Understanding / Proof., Points: 10]
Consider the language n o

L = hD, wi D is a DFA and it accepts w .

We will prove that L is undecidable by reducing ATM to it:

Proof: For every hD, wi, let TD be a TM that simulates DFA D. So D will accept w
iff TD (w) halts and accepts w, which is exactly equivalent to hTD , wi ∈ ATM ; that
is,
hD, wi ∈ L ⇐⇒ hTD , wi ∈ ATM .
This completes the reduction. But since ATM is undecidable, L should be unde-
cidable too.

Why this result is strange? Is it because we did something wrong in the above proof? Is this proof
correct? Explain briefly.
6. Using reductions when building algorithms.
[Category: Construction., Points: 10]
Reductions are not only a technique to prove hardness of problems, but more “importantly” it is used
to solve problems by transforming them into other problems that we know how to solve. This is an
extremely useful technique, and the following is an example of such a reduction in the algorithmic
context.
A set of domino pieces has a solution iff one can put them around
a circle (face up), such that every two adjacent numbers from different
pieces match. The figure on the right shows a set of pieces which has a
solution.
Assume we have an algorithm (i.e., think of it as a black box) is-
Domino which given a set of dominos, can tell use whether they have a
solution or they do not have a solution (it returns “yes” or “no” answer).
Using isDomino, describe an algorithm which given a set of dominos
outputs “no” if there is no solution and if there is a solution, prints out
one possible solution (that is, prints out an ordered list of input dominos,
such that one can put them in the same order around a circle properly).
For example if the input to your algorithm is (1, 2)#(3, 5)#(4, 3)#(1, 3)#(1, 1)#(2, 4) #(3, 7) #(7, 5),
it may output (1, 3)#(3, 7)#(7, 5)#(3, 5)#(4, 3)#(2, 4)#(1, 2)#(1, 1) (which is the solution depicted
in the above figure).
(We emphasize that your solution must use isDomino to construct the solution. These is a direct
algorithm to solve this problem, but we are more interested in the reduction here, than in an efficient
solution.)

322
Chapter 51

Spring 2008

51.1 Homework 1: Overview and Administrivia

Spring 08

1. Set theory (10 points)

Let A = {1, 2, 3}, B = {∅, {1}, {2}} and C = {1, 2, {1, 2}}. Compute A ∪ B, A ∩ B, B ∩ C, A ∩ C,
A × B, A × C, C − A, C − B, A × B × C, and P(B). (P(B) is the power set of B.)
2. Sets of Strings (10 points)
(a) Let A = {apple, orange, grape} and B = {green, blue}. What is the set

C = {ba | a ∈ A, b ∈ B}.

(b) Let X be the string uiuc. List all substrings of X.

3. Induction and Strings (10 points)
(a) Let our alphabet Σ be {a, b, c}. For any non-negative integer n, how many strings of (exactly)
length n are there (over the alphabet Σ)?
(b) Prove your claim in (a) by induction on n.
4. Recursive definitions (10 points)
Consider the following recursive definition of a set S.

(a) (3, 5) ∈ S
(b) If (x, y) ∈ S, then (x + 2, y) ∈ S
(c) If (x, y) ∈ S, then (−x, y) ∈ S
(d) If (x, y) ∈ S, then (y, x) ∈ S
(e) S is the smallest set satifying the above conditions.

Give a nonrecursive definition of the set S. Explain why it is correct.

5. Set Theory (10 points)
Suppose that A, B, and C are sets. We claim that if A ∪ B = A ∪ C and A ∩ B = A ∩ C, then B = C.
(a) Explain informally why this is true, using words and/or a Venn diagram.
(b) Prove it formally, using standard identities such as DeMorgan’s Laws.

323
51.2 Homework 2: Problem Set 2

Spring 08

1. Building a DFA.

Let Σ = {0, 1}. Give DFA state diagrams for the following languages.

(a) L = {w | w contains the substring 001 }

(b) L = {a | the length of a is not divisible by 2 and not divisible 3 }.

2. Interpreting a DFA

(a) What is the language of the following DFA? That is, explicitly specify all the strings that the
following DFA accepts. Briefly explain why your answer is correct.

1
F
1

B 0
1
0
0
G
A 1 0
C
0

1 0 E

D 0
1

(b) What about this one? Again, briefly justify your answer.

324
0
B 1
1 0 C
0
A 0
1
1
1
E
0 D
1
G 0
F 0

3. Sets and trees

Define a “set-based binary tree” (SBBT) as follows:

– Every positive integer is an SBBT.

– If x and y are SBBTs, x 6= y, then {x, y} is also an SBBT.

For example, R = {{3, 2}, 4} and P = {3, {{7, 9}, {8, 3}} are SBBTs. But {2, {4, 5}, 27} and {2, {3}}
are not SBBTs. Let T be the set of all SBBTs.
(a) Let’s define the following function f mapping SBBTs to sets of integers:
S
f (X) = Y ∈X f (Y ) if X is a set
{X} if X is an integer
S
Notice that f (Y ) is always a set, for any input Y . The operation Y ∈X f (Y ) unions together the sets
f (Y ), for all the items Y that are in the set X.
For the SBBTs P and R defined above, compute f (P ) and f (R). Give a general description of what
f does.
(b) Similarly, we can define a function g mapping SBBTs to integers:
P
g(X) = Y ∈X g(Y ) if X is a set
1 if X is an integer

Give the values for g(P ) and g(R), as well as a general description of what g does.
(c) For certain SBBTs, g(X) = |f (X)|. For which SBBTs does this equation work? Explain why it’s
not true in general.
4. Balanced strings
A string over {0, 1} is balanced, if it has the same number of zeros and ones.

325
(a) Provide a pseudo-code (as simple as possible) for a program that decides if the input is balanced.
You can use only a single integer typed variable in your program, and one variable containing the
current character (of type char). You can read the next character using a library function called,
say, get_char, which returns −1, if it reached the end of the input. Otherwise, get_char returns
the next character in the input stream.
In particular, the program prints “accept” if the input is a balanced string, and print “reject”,
otherwise.
(b) For any fixed prespecified value of k, describe how to construct an automata that accepts a
balanced string if, in every prefix of the input string, the absolute difference in the number of
zeros and ones does not exceed k. How many states does your automata needs?
(c) Provide an intuitive explanation of why the number of states in automata for the problem of part
(B), must have at least, say, k states.
(d) Argue, that there is no finite automata that accepts only balanced strings.
For bonus credit, you can provide formal proofs for the claims above.
5. Bonus problem (Coins)
A journalist, named Jane Austen, unfortunately (for her) interviews one of the presidential candidates.
The candidate refuses to let Jane end the interview going on and on about the candidate plans how
to solve all the problems in the world. In the end, the candidate offers Jane a game. If she wins the
game she can leave.
The game board is made out of 2 × 2 coins:

H T

T H
At each round, Jane can decide to flip one or two coins, by specifying which coins she is flipping (for
example, flip the left bottom coin and the right top coin), next the candidate goes and rotates the
board by either 90, 180, 270, or 0 degrees. (Of course, rotation by 0 degrees is just keeping the coins in
their current configuration.)
The game is over when all the four coins are either all heads or all tails. To make things interesting,
Jane does not see the board, and does not know the starting configuration.
Describe an algorithm that Jane can deploy, so that she always win. How many rounds are required
by your algorithm?

51.3 Homework 3: DFAs and regular languages

Spring 08

1. Building and interpreting regular expressions.

Give regular expressions for the following languages over the alphabet Σ = {a, b, c}.
n o

(a) La = w |w| is a multiple of 3 and w starts with aa .
n o

(b) Lb = w w contains the substring bbb or bab .

326
n o

(c) Lc = w w does not contain two consecutive b’s .

Describe the languages represented by the following regular expressions:

(d) (a ∪ b ∪ )((a ∪ b)(a ∪ b)(a ∪ b))∗ (a ∪ b ∪ ).

(e) (aa ∪ bb ∪ ab ∪ ba ∪ )+

2. Product construction

(a) Consider the following two DFAs. Use the product construction (pp. 45–47 in Sipser) to construct
the state diagram for a DFA recognizing the intersection of the two languages.

b a,b
A a
a
b b E F

a B b C a

(b) When the product construction was presented in class (and in Sipser), we assumed that the two
DFAs had the same alphabet. Suppose that we are given two DFAs M1 and M2 with different
alphabets. E.g. M1 = (Q1 , Σ1 , δ1 , q1 , F1 ) and M2 = (Q2 , Σ2 , δ2 , q2 , F2 ). To build a DFA M that
recognizes L(M1 ) ∪ L(M2 ), we need to add two additional sink states s1 and s2 . We send the first
or the second element of each pair to the appropriate sink state if the incoming character is not in
the alphabet for its DFA.
Write out the new equations for M ’s transition function δ and its set of final states F .

3. Algorithms for modifying DFAs.

Suppose that M = (Q, Σ, δ, q0 , F ) and N = (R, Σ, γ, r0 , G) are two DFAs sharing the common alphabet
Σ = {a, b}.

(a) Define a new DFA M 0 = (Q ∪ {qX , qR }, Σ ∪ {#}, δ 0 , q0 , {qX }) whose transition function is defined
as follows


 δ(q, t) q ∈ Q and t ∈ Σ

qX q ∈ F, t = #
δ 0 (q, t) =

 qX q = qX

qR otherwise.

Describe the language accepted by M 0 in terms of the language accepted by M .

(b) Show how to design a DFA N 0 which accepts the language
n o

L0 = ttw (t = a and w ∈ L(M )) or (t = b and w ∈ L(N )) .

Define your DFA using notation similar to the definition of M 0 in part (a).

4. Shared channel.
Funky Computer Systems, who have now gone out of business, submitted the lowest bid for wiring
the DFAs supporting the Siebel center classrooms. These idiots wired two of the DFAs M and N so
that their inputs come in on a shared input channel. When you try to submit a string w to M and a
string y to N , this single channel receives the characters for the two strings interleaved. For example,
if w = abba and y = cccd, then the channel will get a string like abbcccad or acbccbad.

327
Fortunately, these two DFAs have alphabets that do not overlap, so it’s possible to sort this out. Your
job is to design a DFA that accepts a string on the shared channel exactly when M and N would have
accepted the two input strings separately.
Specifically, let M = (Q, Σ, δ, q0 , F ) and N = (R, Γ, γ, r0 , G), where Σ ∩ Γ = ∅. Your new machine M 0
should read strings from (Σ ∪ Γ)∗ . It should be designed using a variation on the product construction,
i.e. its state set should be Q × R.
Give a formal definition of M 0 in terms of M and N . Also briefly explain the ideas behind how it
works (very important especially if your formal notation is buggy).

5. Union in/out game.

A palindrome is a string that if you reverse it, remains the same. Thus tattarrattat1 is a palindrome.
You could consider the empty string to be a palindrome. But, for this problem, let’s consider only
strings containing at least one character.

(a) Let Lk be the language of all palindromes of length k, over the alphabet Σ = {a, b}. Show a DFA
for L4 .
(b) For any fixed k, specify the DFA accepting Lk .
(c) Let L be the language of all palindromes over Σ. Argue,as precisely as you can, that L is not
regular.
(d) We can express the language L as ∪∞ k=1 Lk . It’s tempting to reason as follows: the language L is
the union of regular languages, and as such it is regular.
What is the flaw in this argument, and why is it wrong in our case?

51.4 Homework 4: NFAs

Spring 08

1. NFA design/subset construction.

(a) Design an NFA for the following language:

n o

L = x x is a binary string that has 1101 or 1100 or 0001 as a substring .

Design this initial NFA in a modular way, using -transitions.

(b) Remove all -transitions from your NFA. (Namely, present an equivalent NFA with no -transitions.)
(c) Convert your -free NFA into a DFA using subset construction.

2. NFA interpretation/formal definitions.

Consider the following NFA M .
a ǫ a
A B C D
b
b ǫ ǫ
G F E
b
(a) Give a regular expression that represents the language of M . Explain briefly why it is correct.
1 Which is the longest palindromic word in the Oxford English Dictionary is tattarrattat, coined by James Joyce in Ulysses
for a knock on the door.

328
(b) Recall the definition of an NFA accepting a string w (Sipser p. 54). Show formally that M accepts
the string w = aabb
(c) Let Σ = {a, b}. Give the formal definition of the following NFA N .
a,b

E D
a b
a
A a,b
a
ǫ C
B

3. NFA design with guessing.

Let Σ = {a, b, c} and define the language L
n o

L = x1 #x2 # . . . #xn ∀i, xi ∈ Σ2 , and ∃i, j such that i 6= j and xj = xk .

That is, each xi is a string of two characters from Σ. And two of the xi ’s need to be identical, but you
don’t know which two are identical. So the language contains ab#bb#cc#ab and ac#bb#ac#ab, but
not aa#ac#bb.
Design an NFA that recognizes L. This NFA should “guess” when it is at the start of each matching
string and verify that its guess is correct.

4. NFA modification.
The 2SWP operation on strings interchanges the character in each odd position with the character in
the following even position. That is, if the string length k is even, the string w1 w2 w3 w4 . . . wk−1 wk
becomes w2 w1 w4 w3 . . . wk wk−1 . E.g. abcbac becomes babcca. If the string has odd length, we just
leave the last (unpaired) character alone. E.g. abcba becomes babca.
n o

Given a whole language L, we define 2SWP(L) to be 2SWP(w) w ∈ L .
Show that regular languages are closed under the 2SWP operation. That is, show that if L is a regular
language, then 2SWP(L) is regular. That is, suppose that L is recognized by some DFA M . Explain
how to build an NFA N which accepts 2SWP(L).

51.5 Homework 5: NFA conversion

Spring 08

Problem 1: DFA/NFA to Regex.

In lecture 8 (Sipser pp. 69–76), we saw a procedure for converting a DFA to a regular expression. This
algorithm also works even if the input is an NFA.
For the following NFA, use this procedure to compute an equivalent regular expression. So that everyone
does the same thing (and we don’t create a grading nightmare), you should do this by:

– Adding new start and end states, and then

– removing states A, B, C in that order.

329
Provide detailed drawing of the GNFA after each step in this process.

1: The original NFA.

a
A B
c b a b
b b
C
a+b

Note, that in this problem you will get interesting self-loops. For example, one can travel to from B to
A and then back B. This creates a self-loop at B when A is removed.

51.6 Homework 6: Non-regularity via Pumping Lemma

Spring 08

1. Pumping lemma problems:

(a) Use pumping lemma to prove that L is not regular, where:

L = {ak bm : k ≤ m or m ≤ 2k}

(b) Prove that the following language satisfies pumping lemma:

L = {00, 11}

From the fact that it satisfies the pumping lemma, can we deduce that L is regular? Why or why
not?

2. Decide of the following languages are regular. If they are regular, give a DFA, NFA or regular expression
for the language. If it is not regular, then give a proof using either closure properties or the pumping
lemma.

(a) L1 = {xy | x has the same number of a’s as y}

(b) L2 = {w | w has three times the number of a’s than b’s}
(c) L3 = {xy | x has the same number of a’s as y and |x| = |y|}

3. Let T be the language {0n 1n : n ≥ 0}. Use closure properties to show that the following languages are
not regular, using a proof by contradiction and the fact that T is known not to be regular.
(a) L = {an bm cn+m : n ≥ m ≥ 0}
(b) J = {0n 1n 2n : n ≥ 1}

4. (bonus) Show that if L is regular then t(L) is regular where:

t(L) = {x : for some string y, |y| = |x| and xy ∈ L}

so t is an operation that preserves regularity.

330
51.7 Homework 7: CFGs
Spring 08

1. Suffix languages.
Consider the following DFA:
a,b
6 7 5
a b
b b a a
a
1 4
b
b

a,b 3 2
a
(a) Write down the suffix language for each state.
(b) Draw a DFA that has the same language as the one above, but has the minimal number of states.

2. Context-free grammar design

Give context-free grammars generating the following languages:

n o

(a) L1 = an bp 0 < p < n .
(b) L2 = { an bn cm dm | n, m ∈ IN }
n o

(c) L3 = an bm cp n = m or m = p .
(d) L4 = a(ab∗ )∗ .

3. Context-free grammar interpretation.

(a) What is the language of this grammar? The alphabet is {a, b, c, d} and start symbol is T .

S → aSb |
T → S | cT | T d

(b) Answer the same question for this grammar, with same alphabet and start symbol.

S → aSb |
T → S | cS | Sd

S→Tb
T → aaS | cd

331
4. NFA pattern matching.
Pattern-search programs take two inputs: a pattern given by the user and a file of text. The program
determines whether the text file contains a match to the pattern, typically using some variation on
NFA/DFA technology. Fully developed programs, such as grep, accept patterns containing regular-
expression operators (e.g. union) and also other convenient shorthands. Our patterns will be much
simpler.
Let’s fix an alphabet Σ = {a, b, ....z, t}. Let Γ = Σ ∪ {?, [,], ∗}. A pattern will be any string in Γ∗ .
A string w matches a pattern p if you can line up the characters in the two strings such that:
• When p contains a character from Σ, it must be paired with an identical character in w.
• The character ? in p can match any substring x in w, where x contains at least one character.
• When p contains a substring of the form [w]∗, this can match zero or more repetitions of whatever
w matches.
For example, the pattern “fleck” matches only the string “fleck”. The pattern “margaret?fleck”
will match anything containing “margaret” and “fleck”, separated by at least one character. The
pattern “i t ate t [manyt] ∗ donuts” matches strings like
“i t ate t donuts” and
“i t ate t many t many t donuts”
Instances of []* can be nested. So the pattern cc[bb[a] ∗ bb] ∗ dd matches strings like ccdd or ccbbaaaaabbdd
or ccbbabbbbabbdd.
A text file t contains a match to a pattern p if t contains some substring w such that w matches p.
Design an algorithm which converts a pattern p to an NFA Np that searches for matches to p. That
is, the NFA Np will read an input text file t and accept t if and only if t contains a match to p. Np
searches for only one fixed pattern p. However you must describe a general method of constructing Np
from any input pattern p.
You can assume that your input pattern p has been checked to ensure that it’s well-formed and that we
have a function m which matches open and close brackets. For example, you can assume that an open
bracket (]) at position i in the pattern is immediately followed by a star (*). You can also assume that
there is a matching open bracket ([) at position m(i) in the pattern. The function m is a bijection, so if
there is an open bracket at position j in the pattern, m−1 (j) returns the corresponding close bracket.

51.8 Homework 8: CFGs

Spring 08
1. Extract language from PDA.
Give the language of the following PDA .
b, ǫ → ǫ b, a → ǫ

ǫ, ǫ → ǫ ǫ, $ → ǫ
a, ǫ → aa
ǫ, ǫ → ǫ

ǫ, ǫ → $ b, ǫ → ǫ c, a → ǫ
b, ǫ → ǫ

ǫ, ǫ → ǫ ǫ, $ → ǫ

332
2. Converting CFG to PDA.
For each of the following languages (with alphabet Σ = {a, b}) construct a pushdown automaton
recognizing that language, following the general construction for converting a context-free grammar to
a PDA (lecture 13, pp. 115–118 in Sipser).
For each language, also give a parse tree for the word w, a leftmost derivation for w, and the first 10
configurations (state and stack contents) for the PDA as it accepts w.

a) The word w = aababbaabbbb and the grammar with start symbol S:

3. Language to PDA.
Let Σ = {a, b, c} and consider the language
n o

L = ai bj ck i 6= j or j 6= k .

Design a PDA for L. Present your PDA as a state diagram, with brief comments about how it works.

4. Context-free grammar design.

Let Σ = {a, b}.
A pair of strings (x, w) is an S-pair if they are identical except that two characters have been swapped.
Formally, if x = c1 c2 . . . cn and w = d1 d2 . . . dn , then there are two character positions i and j such
that ci = dj and cj = di , and ck = dk for k other than i or j.

(a) Notice that for any word x ∈ Σ∗ , it holds that (x, x) is always a S-pair. Briefly explain why.
n o

(b) Let LS = wxR x, w ∈ Σ∗ and (x, w) is an S-pair .
Give a context-free grammar that generates LS .
(c) Let

w ∈ Σ∗ for all i and
LT = w1 #w2 # . . . wi # i .
∃i, j such that i < j and (wi , wjR ) is an S-pair

Give a context-free grammar that generates LT .

5. Context-free grammar. (Bonus)

The following grammar (with start symbol S) is ambiguous:

S ⇒ aS | aSbS |

(a) Show that the grammar is ambiguous, by giving two parse trees for some string w.
(b) Give an efficient test to determine whether a string w in L(S) is ambiguous. Explain informally
why your test works.

333
51.9 Homework 9: Chomsky Normal Form
Spring 08

1. Chomsky Normal Form.

(a) Remove nullable variables from the following grammar (with start symbol S):

S→ aAa | bBb | BB
A→ C
B→ S|A
C→ S|

(b) This grammar (with start symbol S) has no nullable variables. Generate its Chomsky normal
form.
S → ASB
A → aAS | a
B → SbS | A | b

2. Context free language closure properties.

We know (example 2.37 in Sipser) that the following language is not context free:
n o

C = ai bj ck 0 ≤ i ≤ j ≤ k .

Using closure properties of context-free languages, and the fact that C is not context-free, prove that
the following languages are not context free:
n o

(a) J = ai bj ck−1 1 ≤ i ≤ j ≤ k
n o

(b) K = ai cj bk cn 0 ≤ i ≤ j ≤ k, 0 ≤ n

3. Grammar-based induction.
Let G be the grammar with start symbol S, terminal alphabet Σ = {a, b} and the following rules:

S → aX | Y.
X → aS | a.
Y → bbY | aa.

Claim version 1. If w is a string in L(G), then w contains an even number of a’s.

It’s actually easier to prove the following, stronger and more explicit claim:

Claim version 2. For any n, show that if a string w ∈ Σ∗ can be derived from either S or Y in n
steps, then w contains an even number of a’s.

(a.) The original claim involved strings in L(G), i.e. strings that can be derived from the start symbol
S.
Why did we extend the claim to include derivations starting with the variable Y? Why didn’t we
extend it even further to include derivations starting with the variable X?
(b.) Prove version 2 of the claim using strong induction on the derivation length n.

334
4. PDA to CFG conversion.
Consider the following PDA.
ǫ, ǫ → $ ǫ, ǫ → x ǫ, x → ǫ ǫ, $ → ǫ
A B X D E

ǫ, ǫ → a a, ǫ → a
b, a → ǫ

C
Recall the proof that a PDA is a context free language on page 119–120 of Sipser (and in the notes for
Lecture 14). For the above PDA.

(a.) Generate rules defined by the first bullet point of the proof on page 120.
(b.) What is the start variable?
(c.) How many rules are generated by the second bullet point. Explain why your answer is correct.

5. Give me that old time PDA, its good enough for me, it was good enough for my father.
Tony had just released into the market a new model of PDA called Blu-PDA (Sushiba released a
competing PDA product called HD-PDA, but thats really a topic for a different exercise).
Instead of a stack, like the good old PDA, the new Blu-PDA has a queue. You can push/pop characters
from both sides of the queue (thus, the Blu-PDA can see both characters stored in the front and back
of the queue when making a transition decision [and the current input character of course]). Since
Tony is targeting this product to the amateur market, they decided to limit the queue size to 25 (if
the queue size exceeds 25, then it stops and reject the input). Tony claims that the new Blu-PDA is a
major breakthrough and a considerably stronger computer than a PDA.
(Of course, if the Blu-PDA does strange things like reading characters from an empty queue, or popping
characters from an empty queue, then it immediately rejects the input.)

(a) So, given a Blu-PDA, is it equivalent to a PDA, DFA, or is it stronger than both?
(b) Explain clearly why your answer is correct.
(c) (5 point bonus) Prove your answer in detail.

51.10 Homework 10: Turing Machines

Spring 08

1. Turing machines.
Give the state diagram for a Turing Machine for the following language.

L = {an bn+1 cn+2 |n ≥ 0}

To simplify your design, you can assume the beginning of the string is marked with ∗. (Inputs that
don’t start with a ∗ should be rejected.) For example, the input may look like ∗abc.
You do not need to draw transitions that lead to the (implicit) reject state. Hence, any transition that
is not present in your diagram will be assumed to lead to the reject state. Indicate which symbol (e.g.
t or B) you are using for the special blank symbol that fills the end of the tape.

335
x→R
x→R
∗→R

b → x, R
a → x, R c → x, R

a, x → R

∗→R b → x, R c → x, R

b, x → R

c →, R ⊔→R

a, b, c, x → L

2. Context-free Pumping Lemma. (bonus)

Use pumping lemma for CFLs to show that L is not a CFL:

n o

L = ai bj ck i < j < k .

51.11 Homework 11: Turing Machines

Spring 08

1. Turing machine tracing.

In the following Turing machine Σ = {a} and Γ = {a, b, c, d, B}.

(a) What is the language of this TM?

(b) Informally and briefly explain why this TM accepts the language you claimed in the previous part.

(c) Trace the execution of this TM as it processes string aaaaaaaaa (i.e., a sequence of 9 as) by
providing the sequence of configurations it goes through (i.e., tape & state in each step - use the
configuration notation shown in class).

336
a → b, R 5
a → b, R

c → c, L d → d, L
c → ,R

a → a, L a → d, L a → c, R
2 3 → ,R
4 8 1
,R

b → c, R
→

d → b, R
a → d, L
7 6
c → c, R
d → d, R

2. High Level TM Design.

Design a Turing Machine that for a given input tape with n cells containing O’s, marks the positions
which are composite numbers. Specifically, cells at the prime-numbered positions are left containing 0,
cells at composite-numbered locations are left containing X, and the cell at the first (leftmost) location
is left containing U (for unit). For example, consider the input OOOOOOOOOO. This represents the first
10 numbers. So the Turing machine should halt with UOOXOXOXXX on the tape.

3. Turing machine encodings.

In this problem we demonstrate a possible encoding of a TM using the alphabet {0, 1, ;, |} where | is the
newline character. We encode M = (Q, Σ, Γ, δ, q0 , qacc , qrej ) as a string n|i|j|t|r|s|w where n, i, j, t, r, s
are integers in binary representing |Q|, |Σ|, |Γ|, q0 , qacc , qrej and w represents Σ as described below.
We adopt the convention that states are numbered from 0 to n − 1, the input alphabet symbols are
numbered from 0 to i − 1, and the tape alphabet symbols are numbered from 0 to j − 1 with j − 1
representing the special blank symbol (therefore j > i).

The string w represents δ as follows. Each transition (q, a) → (q 0 , b, D) is represented as a 5-tuple

q; a; q 0 ; b; D where q, a, q 0 , b are integers in binary and D is either 0 (move left) or 1 (move right). We
adopt the convention that only useful transitions are represented and any transition not represented
leads to the reject state. The string w consists of 5-tuples separated by |. Thus w = w1 |w2 |...|wp where
p is the number of useful transitions.

Here is the representation of a mystery Turing machine M , using this encoding. For ease of reading,

337
we have shown | as an actual line break and given the integers in decimal rather than binary.

8
2
3
7
3
5
7; 1; 0; 2; 1
0; 1; 0; 1; 1
0; 0; 1; 1; 1
1; 1; 1; 1; 1
1; 0; 0; 0; 1
0; 2; 2; 2; 0
6; 2; 3; 2; 1
2; 1; 2; 1; 0
2; 0; 6; 0; 0
6; 1; 6; 1; 0
6; 0; 4; 0; 0
4; 0; 4; 0; 0
4; 1; 4; 1; 0
4; 2; 0; 2; 1

(a) Draw a state diagram for this TM (omitting the reject state).
(b) What is the language of this TM? Give a brief justification.

4. Turing machine simulation.

A PlaneTM is a TM that instead of a tape (line), uses n a quadrant
of the planeo for recording. Consider

all the points in the plane with coordinates P = (x, y) x, y ∈ Z, x, y ≥ 0 . We assume that each
point of P indicates a memory cell in the plane and a PlaneTM has a head that can change the content
of the cell that it is on top of it and after that the head transfers to a neighbor cell.
Formally a PlaneTM is M = (Q, Σ, Γ, δ, q0 , qacc , qrej ), where Q is the set of states, Σ and Γ are the
input and tape alphabets, q0 , qacc , qrej are the initial, accept and reject states, respectively. The initial
position of the head is always cell (0, 0). The input sequence is written from left to right on the cells
(0, 0), (1, 0), (2, 0), · · · ; and the rest of cells are filled with ␣ (blank symbol, ␣ ∈ Γ). Transition function,
δ : Q × Γ → Q × Γ × {U, D, L, R}, where U (resp. D) indicates that the head should be moved up (resp.
down).
If δ(q, a) = (q 0 , b, D) then the PlaneTM moves from state q when seeing character a under the tape
head, to the state q 0 , replace character a with character b. Furthermore, the head is moved to the cell
above/below/left/right of the current cell if D is U, D, L, R, respectively.
As with our normal Turing machines, the head simply stays put if the commanded move would take
it off the edge of the quadrant.
When a PlaneTM powers up, its input (which contains no blanks) occupies a rectangular region in the
lower left corner. The other cells (i.e. the infinite areas to the right and above the input) are filled
with blanks.

(a) Show that every TM can be simulated using a PlaneTM.

(b) Show that every PlaneTM can be simulated using a TM.

From this, we can conclude that TM’s and PlaneTM’s are equivalent.

338
51.12 Homework 12: Enumerators
Spring 08

1. Decidable problems.
Prove that L is a decidable language:

D accepts no string of length ≤ k,
L = hD, ki .
and D is a NFA

2. Enumerators I.
An enumerator for a language L is a Turing machine that writes out a list of all strings in L. See p.
152–153 in Sipser.
The enumerator has no input tape. Instead, it has an output tape on which it prints the strings, with
some sort of separator (e.g. #) between then. The strings can be printed in any order and duplicates
of the same string are ok. But each string in L must be printed eventually.
Design an enumerator that writes all tuples of the form (n, p) where n ∈ N, p ∈ N, and n is a multiple
of p.

3. Enumerators II.
If L and J are two languages, define L ⊕ J to be the language containing all strings that are in exactly
one of L and J. That is

L ⊕ J = {w | w ∈ L and w 6∈ J or w ∈ J and w 6∈ L}

Suppose that you are given two context-free grammars G and H.

(a) Design an enumerator that will print all strings in L(G) ⊕ (H).
(b) Is L(G) ⊕ (H) context-free? TM recognizable? TM decidable? Briefly justify your answer.
(c) Recall that n o

EQCF G = hG, Hi G and H are CFG’s and L(G) = L(H) .

We have mentioned in class that EQCF G is undecidable. Why is this problem harder than the
ones you just solved in parts (a) and (b)?

51.13 Homework 13: Enumerators

Spring 08

1. Language classification.
Suppose that we have a set of Turing machine encodings defined by each of the following properties.
That is, we have a set
n o

L = hM i M is a TM and M has property P ,

and we are considering different ways to fill in P . Assume that the Turing machines M have only a
single tape.

(a) P is “there is an input string which M accepts after no more than 327 transitions.”
(b) P is “on blank input, M halts leaving the entire tape blank.”
(c) P is “M ’s code has no transitions into the reject state.”

339
(d) P is “on input UIUC, M never changes the contents of the even-numbered positions on its tape.”
(That is, it can read the even-numbered positions, but not write a different symbol onto them.)
For each of these languages, determine whether it is Turing decidable, Turing recognizable, or not
Turing recognizable. Briefly justify your answers.
2. Reduction I.
Define the language L to be
n o

L = hM i M is a TM, L(M ) is context free but is not regular .‘

Show that L is undecidable by reducing ATM to L. (Do the reduction directly. Do not use Rice’s
Theorem.)
3. Reduction II.
Define the language L to be
n o

L = M M is a TM and 100 ≤ |L(M )| ≤ 200 .

Show that L is undecidable by reducing ATM to L. (Do the reduction directly. Do not use Rice’s
Theorem.)
4. Interleaving.
Suppose that we have Turing machines M and M 0 which enumerate languages L and L0 , respectively.
(a) Describe how to construct an enumerator P for the language L ∪ L0 . The code for P will need
to make use of the code for M and M 0 (e.g. call it as a subroutine or run it in simulation using
UTM ).
(b) Suppose that the languages L and L0 are infinite and suppose that M and M 0 enumerate their
respective languages in lexicographic order. Explain how to modify your construction from part
(a) so that the output of P is in lexicographic order.
5. Confusing but Interesting(?) Reduction. (bonus)
Reduce L to ATM (notice the different direction of reduction, in particular don’t reduce ATM to L):

L = {hM, wi : execution of M with input w never stops}

51.14 Homework 14: Dovetailing, etc.

Spring 08

1. Dovetailing
(a) Briefly sketch an algorithm for enumerating all Turing machine encodings. Remember that each
encoding is just a string, with some specific internal syntax (e.g. number of states, then number of
symbols in Σ etc).
(b) Now consider a language L that contains Turing machines which take other Turing machines as
input. Specifically,
n o

L = hM i M halts on some input hN i where N is a TM .

As concrete examples of words in this language, consider the following TM M1 and M2 .

340
If M1 is a decider that checks if a given input hXi (that encodes a TM) halts in ≤ 37 steps, then hM1 i
is in L.
Suppose that M2 halts and rejects if its input hXi is not the encoding of a TM, and M2 spins off into
an infinite loop if hXi is a TM encoding. Then, M3 is not in L.
Show that L is (nevertheless) TM recognizable by giving a recognizer for it.

2. Language classification revisited.

Suppose that we have a set of Turing machine encodings defined by each of the following properties.
That is, we have a set
n o

L = hM i M is a TM and M has property P ,

and we are considering different ways to fill in P . Assume that the Turing machines M have only a
single tape.
(a) P is “M accepts some word w ∈ Σ∗ which has |w| ≤ 58”.
(b) P is “M does not accept any word w ∈ Σ∗ which has |w| ≤ 249”.
(c) P is “M stops on some string w containing the character a in ≤ 37 steps.”
n o

(d) P is “M stops on some string w ∈ an bn n ≥ 0 ”.
(e) Given some additional (fixed) TM M 0 , the property P is “there is a word w such that both M
and M 0 accepts it.”

For each of these languages, determine whether it is Turing decidable, Turing recognizable, or not
Turing recognizable. Briefly justify your answers.
3. LBAs emptiness revisited. (15 points)
Consider a TM M and a string w. Suppose that $ and c are two fixed haracters that are not in M ’s
tape alphabet Γ. Now define the following language
n o

LM,w = z = w$$$ci i ≥ 1 and M accepts w in at most i steps .

(a) Show that given M and w, the language LM,w can be decided by an LBA. That is, explain how to
build a decider DM,w for LM,w that uses only the tape region where the input string z is written,
and no additional space on the tape.
(b) M accepts w if and only if L(DM,w ) 6= ∅. Explain briefly why this is true.
(c) Assume that we can figure out how to compute the encoding hDM,w i, given hM i and w. Prove
that the language n o

ELBA = hM i M is a LBA and L(M ) = ∅ .

is undecidable, using (a) and (b).

341
Bibliography

[EZ74] A. Ehrenfeucht and P. Zeiger. Complexity measures for regular expressions. In Proc. 6th Annu. ACM
Sympos. Theory Comput., pages 75–79, 1974. http://portal.acm.org/citation.cfm?id=803886.

[Sip05] M. Sipser. Introduction to the Theory of Computation. 2ed edition, February 2005.

342
Index

λ-calculus, 131 CNF tree, 108

s, t-reachability problem, 143 coding scheme, 142
-production, 87 complexity, 254
CNF, 117 computation history, 175
NFA Concatenation, 36
Acceptance, 42 concatenation, 21
configuration, 122, 137, 270
accept, 131
context-free grammar, 82
acceptance problem, 145
coprime, 240
accepting, 29, 135–137
countable, 239
accepting trace, 175
CYK, 117–120, 128, 148, 202, 246, 247, 314
accepts, 29, 133, 138
alphabet, 21, 29 decidable, 147
ambiguous, 83 decider, 139, 141, 172
and, 34 decision problem, 185
aperiodic, 194 derivation
back-tracking Turing machine, 262 leftmost, 82
backtrack, 262 rightmost, 82
BTM, 262, 263 derives, 82
deterministic finite automata, 24, 27
cardinality, 239 deterministic finite automaton, 29
CFG DFA
Start variable, 82 Number of ones is even, 26
Variables, 82 Number of zero and ones is always within two of
yield, 82 each other, 26
CFG, 80, 82, 85–88, 90, 97–99, 101, 105, 112–117, 123, DFA, 24
124, 126–130, 148, 149, 163, 173, 177–180, DFA, 12, 21, 24–30, 32, 34–36, 39–45, 51–56, 58, 59,
182, 183, 189, 194, 199, 201, 202, 205, 229, 61, 63–67, 72–78, 80, 93, 98–100, 127, 131,
231, 246, 247, 257, 261, 263, 266, 268, 280, 144–147, 149, 150, 166, 173, 181, 182, 199–
281, 283, 285, 286, 290, 298, 310, 313, 330, 201, 211, 214, 219, 220, 224–227, 246, 251–
331, 334, 338 254, 261, 263, 264, 266, 267, 274, 275, 280,
CFL, 12, 127–129, 177, 199, 204, 205, 233, 234, 246, 281, 283–287, 290, 292–294, 303–307, 309–
269, 314, 315, 335 311, 321, 323, 325–331, 334, 342
CFG diagonalization, 152
language, 82 directed graph, 143
chart, 117 disagree, 65
Chomsky Normal Form, 89 distinct, 76
Church-Turing Hypothesis, 131 distinguishable, 66
class of suffix languages, 72 domino tiles, 189
closed, 30 Dovetailing, 170
closure, 30
CNF, 85, 89–91, 96, 98–101, 103–105, 107–110, 117, edit distance, 312
120, 128, 148, 246, 247, 258, 260, 262, 277, effective regular expressions, 254
279, 314, 342 empty string, 21

343
enumerated, 172 PDA strict, 276
enumerator, 240 PDA, 12, 80, 92–95, 101, 102, 112–116, 126–131, 173,
exactly, 36 178, 179, 181, 199, 201, 202, 205, 232, 276,
277, 280, 286, 293, 297, 334
final, 29, 135, 136 periodic, 194
finite, 70 polynomial reductions, 187
Finite-state Transducers,FST,FSTs, 32 pop, 122
FST, 32, 33 Post’s Correspondence Problem, 189
prefix, 22
generalized non-deterministic finite automata, 58
Problem
GNFA, 58–63, 308, 329
3Colorable, 188
graph search, 144
3DM, 188
halting, 153 A, 157
halting problem, 152 B, 157
halts, 158 Circuit Satisfiability, 186, 187
height, 108 Clique, 188
homomorphism, 63 CSAT, 187, 188
formula satisfiability, 187, 188
Kleene star, 36 Hamiltonian Cycle, 188
Independent Set, 188
language, 22, 138 Partition, 188
language recognized, 30 SAT, 185–188
LBA, 163, 173–176, 178, 204, 282, 340 Satisfiability, 185
LBA, 173 Subset Sum, 188
leftmost derivation, 103 TSP, 188
length, 21 Vec Subset Sum, 188
Lexicographic ordering, 22 Vertex Cover, 188
linear bounded automata, 173 X, 157
product automata, 31, 35
match, 189 push, 122
modified Post’s Correspondence Problem, 190 pushdown automaton, 126
MPCP, 190, 192, 193
QBA, 262
NFA, 39
Quadratic Bounded Automaton, 262
NFA, 39–56, 58, 59, 61–63, 80, 92–94, 112, 121, 123,
Quote
127, 128, 131, 144, 145, 147, 173, 181, 199–
A confederacy of Dunces, John Kennedy Toole,
201, 218–221, 223, 224, 252–254, 266, 274,
184
275, 286, 304–309, 315, 327–329, 331, 338,
Andrew Jackson, 173
342
Dirk Gently’s Holistic Detective Agency, Dou-
non-deterministic finite automata, 39
glas Adams., 130
non-deterministic finite automaton, 199
Moby Dick, Herman Melville, 12
non-deterministic Turing machine, 171
The Hitch Hiker’s Guide to the Galaxy, by Dou-
nondeterministic Turing machine, 202
glas Adams., 152
NTM, 171, 172, 202
The lake of tears, by Emily Rodda, 152
nullable, 86, 87
Through the Looking Glass, by Lewis Carroll, 37
onto, 239 QBA, 262
oracle, 157
RA, 12, 121, 124, 125, 128, 131, 148, 247, 257, 260,
pairing, 181 267, 315
palindrome, 310 recognizable, 141
parallel recursive automata, 257 recursion, 131
parse tree, 80 Recursive automata, 12
PCP, 190, 193, 194 Acceptance, 122
PDA recursive automaton, 121

344
reduces, 157
regex, 59
regular, 30, 44
Regular expressions, 36
regular operations, 46
reject, 131
rejecting, 135, 136
rejects, 133
relatively prime, 240
reverse, 45
reversed, 45
Rice Theorem, 164
roots, 130

stack, 122, 126

Star, 36
start, 29
state machine, 23, 24
states, 29
string, 21
Strong Goldbach conjecture, 153
subsequence, 254, 257
substring, 21
suffix, 22
suffix language, 72
symmetric difference, 145

terminal, 82
Theorem
Rice, 164
tiling completion problem, 195
TM, 12, 132–145, 147–178, 180–182, 189, 190, 192,
193, 195, 199, 203–205, 235–238, 242–245,
247, 260–267, 279–283, 298, 316, 317, 320,
321, 335–340
top of the stack, 122
transition function, 29
Turing decidable, 139, 141
Turing machine, 132, 134, 136
Turing recognizable, 138

Union, 36
unistate TM, 282
unit pair, 88
unit production, 85
unit rule, 85, 88
unit-rules, 87
universal Turing machine, 151
useless, 85, 86

xor, 276

yields, 138, 174

345

Intrusion Detection Honeypots
From Everand
Intrusion Detection Honeypots
Chris Sanders
3/5 (2)
Audio, Video, and Media in the Ministry
From Everand
Audio, Video, and Media in the Ministry
Clarence Floyd Richmond
No ratings yet
Introduction To Machine Learning, Third Edition by Alpaydin, Ethem
No ratings yet
Introduction To Machine Learning, Third Edition by Alpaydin, Ethem
2 pages
Basics of Compiler Design - Torben Mogensen - Exercise Solutions
0% (1)
Basics of Compiler Design - Torben Mogensen - Exercise Solutions
23 pages
COMP314notes2017 07 15
No ratings yet
COMP314notes2017 07 15
101 pages
Notes
No ratings yet
Notes
193 pages
Lectures 1 To 31
No ratings yet
Lectures 1 To 31
118 pages
G52lac Notes 2up PDF
No ratings yet
G52lac Notes 2up PDF
84 pages
Automata Notes
No ratings yet
Automata Notes
75 pages
Jflapbook2006 PDF
No ratings yet
Jflapbook2006 PDF
212 pages
Notes On Formal Languages Automata Computability and Complexity Gallier J - The Full Ebook Version Is Available, Download Now To Explore
No ratings yet
Notes On Formal Languages Automata Computability and Complexity Gallier J - The Full Ebook Version Is Available, Download Now To Explore
45 pages
Lec 2
No ratings yet
Lec 2
10 pages
Regular Languages and Finite State Automata
No ratings yet
Regular Languages and Finite State Automata
15 pages
Book
No ratings yet
Book
401 pages
2022.alley Stoughton - Formal Languages - Book
No ratings yet
2022.alley Stoughton - Formal Languages - Book
406 pages
Lecture 2
No ratings yet
Lecture 2
21 pages
Operating System
No ratings yet
Operating System
231 pages
Week4 Chapter2 Automata
No ratings yet
Week4 Chapter2 Automata
52 pages
CHAPTER2 AUTOMATA Landscape
No ratings yet
CHAPTER2 AUTOMATA Landscape
112 pages
Station Island
No ratings yet
Station Island
246 pages
G52lac Notes
No ratings yet
G52lac Notes
128 pages
Lec2 1 Nondeterminism
No ratings yet
Lec2 1 Nondeterminism
9 pages
Lec02 Lexicalanalyzer
100% (1)
Lec02 Lexicalanalyzer
50 pages
TCS Notes
No ratings yet
TCS Notes
14 pages
Formal Methods in Computer Science Cs1502 Equivalence of Nfas and Dfas
No ratings yet
Formal Methods in Computer Science Cs1502 Equivalence of Nfas and Dfas
15 pages
Lecture 2 Final
No ratings yet
Lecture 2 Final
12 pages
Theory of Computation
100% (2)
Theory of Computation
220 pages
Finite Automata: A Simple Computing Model
No ratings yet
Finite Automata: A Simple Computing Model
53 pages
Non Deterministic Finite Automata (NFA)
No ratings yet
Non Deterministic Finite Automata (NFA)
26 pages
Gallier Theory of Computation
No ratings yet
Gallier Theory of Computation
398 pages
3-Lexical Analysis Part2
No ratings yet
3-Lexical Analysis Part2
39 pages
BM NOC Lec3 Lec4
No ratings yet
BM NOC Lec3 Lec4
11 pages
02 Automata
No ratings yet
02 Automata
78 pages
Midterm 1 Eview
No ratings yet
Midterm 1 Eview
10 pages
The Theory of Languages and Computation: Preliminary Notes - Please Do Not Distribute
No ratings yet
The Theory of Languages and Computation: Preliminary Notes - Please Do Not Distribute
109 pages
Models of Computation
No ratings yet
Models of Computation
30 pages
Fa 2
No ratings yet
Fa 2
36 pages
Code Source Tokens Scanner Parser IR
No ratings yet
Code Source Tokens Scanner Parser IR
26 pages
Computer Science Press - Introduction To Logic and Automata
No ratings yet
Computer Science Press - Introduction To Logic and Automata
302 pages
Can We Build A Finite Automaton For Every Regular Expression?, - Build FA Based On The Definition of Regular Expression
No ratings yet
Can We Build A Finite Automaton For Every Regular Expression?, - Build FA Based On The Definition of Regular Expression
66 pages
Automata Notes
No ratings yet
Automata Notes
259 pages
Finite Autometa PDF
No ratings yet
Finite Autometa PDF
40 pages
Automata and Formal Languages
No ratings yet
Automata and Formal Languages
32 pages
Summary - Intro To The Theory of Computation by Michael Sipser
No ratings yet
Summary - Intro To The Theory of Computation by Michael Sipser
13 pages
A Discourse Analysis of 1 Peter
From Everand
A Discourse Analysis of 1 Peter
Ervin Ray Starwalt
No ratings yet
Gray Hat Hacking the Ethical Hacker's
From Everand
Gray Hat Hacking the Ethical Hacker's
Çağatay Şanlı
5/5 (1)
The Linux Terminal for Advanced Users - The Command Line Made Easy: First Edition
From Everand
The Linux Terminal for Advanced Users - The Command Line Made Easy: First Edition
Michael Basler
No ratings yet
Content Creation Revolution with chatGPT
From Everand
Content Creation Revolution with chatGPT
Maria Cowen
No ratings yet
Advanced college algebra study guide
From Everand
Advanced college algebra study guide
Harrison Cook
No ratings yet
ADVANCED COLLEGE ALGEBRA STUDY GUIDE
From Everand
ADVANCED COLLEGE ALGEBRA STUDY GUIDE
Harrison K Cook
No ratings yet
Osama the Gun
From Everand
Osama the Gun
Norman Spinrad
5/5 (1)
Deadline Yemen (The Elizabeth Darcy Series)
From Everand
Deadline Yemen (The Elizabeth Darcy Series)
Peggy Hanson
5/5 (1)
Kellory the Warlock
From Everand
Kellory the Warlock
Lin Carter
No ratings yet
Unlocking Statistics for the Social Sciences
From Everand
Unlocking Statistics for the Social Sciences
Norma Sinclair
No ratings yet
Deadline Istanbul (The Elizabeth Darcy Series)
From Everand
Deadline Istanbul (The Elizabeth Darcy Series)
Peggy Hanson
5/5 (1)
Mathematics N4: FET College Nated, #6
From Everand
Mathematics N4: FET College Nated, #6
Efetobo Emede
No ratings yet
Software Patterns Made Easy
From Everand
Software Patterns Made Easy
Justice Nanhou
No ratings yet
Bimbo Heaven: Stone Angel #7
From Everand
Bimbo Heaven: Stone Angel #7
Marvin H. Albert
No ratings yet
Securing ChatGPT: Best Practices for Protecting Sensitive Data in AI Language Models
From Everand
Securing ChatGPT: Best Practices for Protecting Sensitive Data in AI Language Models
Matthew C. Smith
No ratings yet
Duenna to a Murder
From Everand
Duenna to a Murder
Rufus King
No ratings yet
Operation Longlife
From Everand
Operation Longlife
E. Hoffmann Price
3.5/5 (3)
Keys to Better Reading
From Everand
Keys to Better Reading
Judy McFall
No ratings yet
Pre-Release Material May/June 2020: O Level Computer Science 2210/22
No ratings yet
Pre-Release Material May/June 2020: O Level Computer Science 2210/22
21 pages
Data Structure - PYQ - 2023 DEC - 2017 MAY Comps Sem III - Aeraxia - in
No ratings yet
Data Structure - PYQ - 2023 DEC - 2017 MAY Comps Sem III - Aeraxia - in
9 pages
Cse Daa
No ratings yet
Cse Daa
5 pages
100 C Programming Exercises
82% (11)
100 C Programming Exercises
4 pages
DCDR Question Bank
No ratings yet
DCDR Question Bank
4 pages
Distributed Quantum Advantage For Local Problems: Alkida Balliu Sebastian Brandt Xavier Coiteux-Roy
No ratings yet
Distributed Quantum Advantage For Local Problems: Alkida Balliu Sebastian Brandt Xavier Coiteux-Roy
65 pages
3.3 Function Notation
No ratings yet
3.3 Function Notation
13 pages
Index: S.No. Name of Program T.Signature
No ratings yet
Index: S.No. Name of Program T.Signature
50 pages
Problem Set 4: 0 0 Z T 0 0 0 0 0 0 Z T
No ratings yet
Problem Set 4: 0 0 Z T 0 0 0 0 0 0 Z T
1 page
Apriori Algorithm in Word File
No ratings yet
Apriori Algorithm in Word File
16 pages
DSA With Python
No ratings yet
DSA With Python
57 pages
DS-Important Questions
No ratings yet
DS-Important Questions
93 pages
Data Storage in Computer System: BITS Pilani
No ratings yet
Data Storage in Computer System: BITS Pilani
30 pages
Math 53 Le3 Samplex 02
No ratings yet
Math 53 Le3 Samplex 02
1 page
ED5017 - Digital Signal Processing For Engineering Design - Lec4 - 9
No ratings yet
ED5017 - Digital Signal Processing For Engineering Design - Lec4 - 9
57 pages
0028 1 VBA Macros and Functions
No ratings yet
0028 1 VBA Macros and Functions
5 pages
Universal Approximation Theorem Visualization
No ratings yet
Universal Approximation Theorem Visualization
11 pages
If Conditons
No ratings yet
If Conditons
6 pages
CN C0Cn 1 + C1Cn 2 + +CN 2C1 + CN 1C0 Is A Recursive Definition
No ratings yet
CN C0Cn 1 + C1Cn 2 + +CN 2C1 + CN 1C0 Is A Recursive Definition
2 pages
ECT305: Analog and Digital Communication Module 2, Part 3: DR - Susan Dominic Assistant Professor Dept. of ECE Rset
No ratings yet
ECT305: Analog and Digital Communication Module 2, Part 3: DR - Susan Dominic Assistant Professor Dept. of ECE Rset
21 pages
CSC148 Tt2am 2012W
No ratings yet
CSC148 Tt2am 2012W
10 pages
Modifications On The Singular Pade-Chebyshev Approximation
No ratings yet
Modifications On The Singular Pade-Chebyshev Approximation
44 pages
Nelder-Mead Example
No ratings yet
Nelder-Mead Example
1 page
Introduction To Non-Linear Optimization: Ross A. Lippert
No ratings yet
Introduction To Non-Linear Optimization: Ross A. Lippert
37 pages
ADA LAB Manual 2
No ratings yet
ADA LAB Manual 2
66 pages
Here Is A Pascal Program To Solve Small Problems Using The Simplex Algorithm
No ratings yet
Here Is A Pascal Program To Solve Small Problems Using The Simplex Algorithm
12 pages
Deep Learning 4/7: Convolutional Neural Networks: C. de Castro, IEIIT-CNR, Cristina - Decastro@ieiit - Cnr.it
0% (1)
Deep Learning 4/7: Convolutional Neural Networks: C. de Castro, IEIIT-CNR, Cristina - Decastro@ieiit - Cnr.it
49 pages
Automata
88% (8)
Automata
102 pages

Book

Uploaded by

Book

Uploaded by

Lecture notes of

CS273: Introduction to the Theory of Computation

and CS373: Theory of Computation

Margaret Fleck Sariel Har-Peled1

May 18, 2009

2 Lecture 2: Strings, Languages, DFAs 21

3 Lecture 3: More on DFAs 28

5 Lecture 5: Nondeterministic Automata 39

6 Lecture 6: Closure properties 44

7 Lecture 7: NFAs are equivalent to DFAs 51

8 Lecture 8: From DFAs/NFAs to Regular Expressions 58

9 Lecture 9: Proving non-regularity 64

10 Lecture 10: DFA minimization 72

11 Lecture 11: Context-free grammars 79

12 Lecture 12: Cleaning up CFGs and Chomsky Normal form 85

13 Leftover: Pushdown Automatas – PDAs 92

14 Lecture 13: Even More on Context-Free Grammars 96

16 Lecture 14: Repetition in context free languages 104

17 Leftover: PDA to CFG conversion 112

18 Lecture 15: CYK Parsing Algorithm 117

19 Lecture 16: Recursive automatas 121

20 Instructor notes: Recursive automatas vs. Pushdown automatas 127

21 Lecture 17: Computability and Turing Machines 130

23 Lecture 19: Encoding problems and decidability 141

25 Lecture 21: Undecidability, halting and diagonalization 152

26 Lecture 22: Reductions 157

28 Lecture 24: Dovetailing and non-deterministic Turing machines 169

30 Lecture 26: NP Completeness I 184

31 Lecture 27: Post’s Correspondence Problem and Tilings 189

33 Discussion 1: Review 208

34 Discussion 2: Examples of DFAs 211

35 Discussion 3: Non-deterministic finite automatas 218

36 Discussion 4: More on non-deterministic finite automatas 223

38 Discussion 6: Closure Properties 227

39 Discussion 7: Context-Free Grammars 228

40 Discussion 8: From PDA to grammar 232

41 Discussion 9: Chomsky Normal Form and Closure Properties 233

42 Discussion 10: Pumping Lemma for CFLs 234

43 Discussion 11: High-level TM design 235

44 Discussion 12: Enumerators and Diagonalization 239

45 Discussion 13: Reductions 242

46 Discussion 14: Reductions 243

III Exams 248

49 Exams – Spring 2008 273

51 Spring 2008 322

Getting the source for this work

Getting the source for this work

Margaret Fleck and Sariel Har-Peled

Lecture 1: Overview and

1.1 Course overview

6. Course divides into three sections

• regular languages ———— practical tools

7. Turing machines and decidability → limits of computation.

Context free languages

8. Regular languages and context-free languages:

• Simple and computationally efficient.

• regular languages describe tokens (e.g. what is a legal variable name?)

10. State machines

• widely used in other areas of CS (e.g. networking)

• Are there problems that computers can not solve? ⇒ yes!

12. Models of mathematics

• 19th century - major effort to formalize calculus.

13. Formal models of computation

• Alonzo Church: lambda calculus (like LISP).

1.2 Necessary Administrivia

(a) 30%: Final.

Lecture 2: Strings, Languages, DFAs

2.1 Alphabets, strings, and languages

As such, for w =abcdef

Here is a formal definition of prefix and substring.

Σ = {a, b, c} then Σ∗ = {, a, b, c, aa, ab, ac, ba, . . . , aaaaaabbbaababa, . . .} .

Example 2.1.2 The following is a language

L = {b, ba, baa, baaa, baaaa, ...} .

Now, is the following a language?

, a, b, aa, ab, ba, bb, aaa, aab, . . . .

Languages and set notation

Σ = {a, b, c} then Σ∗ = {, a, b, c, aa, ab, ac, ba, . . . , aaaaaabbbaababa, . . .} .

, a, b, aa, ab, ba, bb, aaa, aab, . . . .

L2 = {, i, il, ill, illi, illin, illino, illinoi, illinois} .

– δ : Q × Σ → Q × Γ is the transition function.

L() = {} and L(∅) = {} ,

Example 5.3.1 Consider the language L = (a + b)∗ a(b + )b.

(i) Inserting/allowing into input character sequence.