0% found this document useful (0 votes)

18 views78 pages

Compiler Construction Lecture 3-4

The document provides an overview of compiler construction, focusing on formal languages, automata theory, and lexical analysis. It explains key concepts such as tokens, patterns, and regular expressions, and describes the process of constructing a lexical analyzer. Additionally, it covers finite automata, including deterministic (DFA) and nondeterministic (NFA) types, and discusses the conversion between regular expressions and finite automata.

Uploaded by

programmerareeba

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views78 pages

Compiler Construction Lecture 3-4

Uploaded by

programmerareeba

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 78

1

COMPILER CONSTRUCTION(CS-636)

Gul Sher Ali

MS(CS & Tech.)-China

______________________________________________________
GIMS- PMAS Arid Agriculture University, Gujrat Campus
B-
Formal Languages & Automata2
Theory (a review in one slide)
 Alphabet: a finite set of symbols
 String: a finite, possibly empty sequence of
symbols from an alphabet
 Language: a set, often infinite, of strings
 Finite specifications of (possibly infinite)
languages
 Automaton – a recognizer; a machine that
accepts all strings in a language (and rejects all
other strings)
 Grammar – a generator; a system for producing
all strings in the language (and no other strings)
 A particular language may be specified by
many different grammars and automata
 A grammar or automaton specifies only one
language
Lexical Analyzer

 Reads the input characters of the source program and

produces a sequence of tokens for each lexeme.

Source Program Lexical Tokens Parser

Analyzer (Syntax analyzer)
(Scanner)
Get next Token

Symbol
Table
3
Lexical Analyzer

 Token : A pair consisting of token name & attribute

value.

 E.g. :
 <id , a>

 <keyword, int >

 < keyword , if >

 < number , 123>

4
Lexical Analyzer

 Pattern: Describes the form that a lexeme can take.

Id [R.E]

Letter {letter + digit )*

Letter = a-z

Digit = 0-9

 Lexeme: A sequence of characters that matches the pattern of a

token.

 Literal: Any thing surrounded by double quotes “”.

5
Lexical Analyzer

 Task of Lexical analyzer:

1. Deletion of comments, white spaces, blanks , tabs, etc.

2. Produces the sequence of tokens.

3. Keeps the tracks of lines in case of errors.

6
Tokens 7
Example:

if( i == j )
z = 0;
else
z = 1;
Tokens 8
 Input is just a sequence of characters:

i f ( \b i \b = = \b j \n \t ....
Tokens 9

Goal:
 partition input string into
substrings
 classify them according to
their role
Tokens 10

A token is a syntactic
category
 Natural language:
“He wrote the program”
 Words: “He”, “wrote”,
“the”, “program”
Tokens 11

 Programming language:
“if(b == 0) a = b”
 Words:
“if”, “(”, “b”, “==”, “0”,
“)”, “a”, “=”, “b”
Tokens 12

 Identifiers:x y11 maxsize

 Keywords: if else while
for
 Integers: 2 1000 -44 5L
 Floats: 2.0 0.0034 1e5
 Symbols: ( ) + * / { } < > ==
 Strings: “enter x” “error”
Scanning 13

 When a token is found:

 It is passed to the next phase of compiler.
 Sometimes values associated with the token,
called attributes, need to be calculated.
 Some tokens, together with their attributes,
must be stored in the symbol/literal table.
 it is necessary to check if the token is already in the
table
 Examples of attributes
 Attributes of a variable are name, address,
type, etc.
 An attribute of a numeric constant is its value.
How to construct a 14

scanner
 Define tokens in the source language.
 Describe the patterns allowed for
tokens.
 Write regular expressions describing
the patterns.
 Construct an FA for each pattern.
 Combine all FA’s which results in an
NFA.
 Convert NFA into DFA
 Write a program simulating the DFA.
Ad-hoc Lexer 15

 Hand-write code to
generate tokens.
 Partition the input string
by reading left-to-right,
recognizing one token at
a time
Ad-hoc Lexer 16

 Look-ahead required to
decide where one token
ends and the next token
begins.
Ad-hoc Lexer 17

Problems:
 Do not know what kind of
token we are going to
read from seeing first
character.
Ad-hoc Lexer 18

Problems:
 If token begins with “i”, is
it an identifier “i” or
keyword “if”?
 If token begins with “=”, is
it “=” or “==”?
Ad-hoc Lexer 19

 Need a more principled

approach
 Use lexer generator that
generates efficient
tokenizer automatically.
How to Describe Tokens? 20

 Regular
Languages are the
most popular for specifying
tokens
Simple and useful theory
Easy to understand
Efficient implementations
Languages 21

 Let S be a set of
characters. S is called
the alphabet.
 A language over S is set
of strings of characters
drawn from S.
Example of Languages 22

Alphabet = English characters

Language = English sentences

Alphabet = ASCII
Language = C++ programs,
Java, C#
Notation 23

 Languages are sets of

strings (finite sequence of
characters)
 Need some notation for
specifying which sets we
want
Notation 24

 Forlexical analysis we
care about regular
languages.
 Regular languages can
be described using
regular expressions.
Regular Languages 25

 Each regular expression is

a notation for a regular
language (a set of words).
 If A is a regular expression,
we write L(A) to refer to
language denoted by A.
Regular Expression 26

A regular expression (RE)

is defined inductively
a ordinary character
from S
e the empty string
Regular Expression 27

R|S = either R or S
RS = R followed by S
(concatenation)
R* = concatenation of R
zero or more times
(R*= e |R|RR|RRR...)
RE Extentions 28

R? = e | R (zero or one R)
R + = RR* (one or more R)
(R) = R (grouping)
RE Extentions 29

[abc] = a|b|c (any of

listed)
[a-z] = a|b|....|z (range)
[^ab] = c|d|... (anything
but
‘a’‘b’)
Regular Expression 30

RE Strings in L(R)
a “a”
ab “ab”
a|b “a” “b”
(ab)* “” “ab” “abab” ...
(a|e)b “ab” “b”
Example: integers 31

 integer: a non-empty
string
of digits
 digit =
‘0’|’1’|’2’|’3’|’4’|
’5’|’6’|’7’|’8’|’9’
 integer = digit digit*
Example: identifiers 32

 identifier:
string or letters or digits
starting with a letter
 C identifier:
[a-zA-Z_][a-zA-Z0-9_]*
Recap 33

Tokens:
strings of characters
representing lexical units
of programs such as
identifiers, numbers,
operators.
Recap 34

Regular Expressions:
concise description of
tokens. A regular
expression describes a
set of strings.
Recap 35

Language L(R):
set of strings represented
by a regular expression
R. L(R) is the language
denoted by regular
expression R.
How to Use REs 36

 We need mechanism to
determine if an input
string w belongs to L(R),
the language denoted
by regular expression R.
Acceptor 37

 Sucha mechanism is called

an acceptor.
input w
string yes, if w e L
acceptor
no, if w e L
language L
Finite Automata (FA) 38

 Specification:
Regular Expressions
 Implementation:
Finite Automata
Finite Automata
Finite Automaton consists of
 An input alphabet (S)
 A set of states
 A start (initial) state
 A set of transitions
 A set of accepting (final) states

39
Finite Automaton 40

State Graphs
A state

The start state

An accepting
state
Finite Automaton 41

State Graphs
a

A transition
Finite Automata 42

A finite automaton
accepts a string if we can
follow transitions labelled
with characters in the
string from start state to
some accepting state.
FA Example 43

A FA that accepts only “1”

1
FA Example 44
 A FA that accepts any number of 1’s followed by a single 0

1
0
FA Example 45
 A FA that accepts ab*a
 Alphabet: {a,b}

b
a a
Table Encoding of FA
 Transition b
table a a
0 1 2

a b
0 1 err
1 2 1
2 err err
46
RE → Finite Automata 47

 Can we build a finite

automaton for every
regular expression?
 Yes, – build FA inductively
based on the definition of
Regular Expression
NFA 48

Nondeterministic Finite
Automaton (NFA)
 Can have multiple
transitions for one input
in a given state
 Can have e - moves
Epsilon Moves 49
 ε – moves
machine can move from state A to state B without consuming
input

e
A B
NFA 50
operation of the automaton is
not completely defined by input
1
0 1
A B C

On input “11”, automaton could be

in either state
Execution of FA 51

A NFA can choose

 Whether to make e-
moves.
 Which of multiple
transitions to take for a
single input.
Acceptance of NFA 52
 NFA can get into multiple states
 Rule: NFA accepts if it can get in a final state

1
0 1
A B C

0
DFA and NFA 53

Deterministic Finite
Automata (DFA)
 One transition per input
per state.
 No e - moves
Execution of FA 54

A DFA
 can take only one path
through the state graph.
 Completely determined by
input.
NFA vs DFA 55

 NFAs and DFAs recognize

the same set of
languages (regular
languages)
 DFAs are easier to
implement – table driven.
NFA vs DFA 56

 Fora given language,

the NFA can be simpler
than the DFA.
 DFA can be
exponentially larger than
NFA.
NFA vs DFA 57

 NFAsare the key to

automating RE → DFA
construction.
RE → NFA Construction 58

Thompson’s construction
(CACM 1968)
 Build an NFA for each RE
term.
 Combine NFAs with
e-moves.
RE → NFA Construction 59

Subset construction
NFA → DFA
 Build the simulation.
 Minimize number of
states in DFA (Hopcroft’s
algorithm)
RE → NFA Construction 60

Key idea:
 NFA pattern for each
symbol and each
operator.
 Join them with e-moves in
precedence order.
RE → NFA Construction 61

a
s0 s1
NFA for a

s0 a
s1 e s3 b s4

NFA for ab
RE → NFA Construction 62
a
NFA for a s0 s1
RE → NFA Construction 63
a
NFA for a s0 s1
b
NFA for b s3 s4
RE → NFA Construction 64
a
NFA for a s0 s1
b
NFA for b s3 s4

s0 a b
s1 s3 s4
RE → NFA Construction 65
a
NFA for a s0 s1
b
NFA for b s3 s4

s0 a
s1 e s3 b s4

NFA for ab
RE → NFA Construction 66

a
e s1 s2 e
s0 s5
e b
s3 s4 e

NFA for a | b
RE → NFA Construction 67

s1 a
s2

NFA for a
RE → NFA Construction 68

s1 a
s2

s3 b s4

NFA for a and b

RE → NFA Construction 69

a
e s1 s2 e
s0 s5
e b
s3 s4 e

NFA for a | b
RE → NFA Construction 70

s0 e s1 a
s2 e s4

e
NFA for a*
RE → NFA Construction 71

s1 a
s2

NFA for a
RE → NFA Construction 72

s0 e s1 a
s2 e s4

e
NFA for a*
Example RE → NFA 73

NFA for a ( b|c )* e

b
e e e s4 s5 e
a e
s0 s1 s2 s3 s8 s9
e s6 c
s7 e

e
Example RE → NFA 74

building NFA for a ( b|c )*

a
s0 s1
Example RE → NFA 75

NFA for a, b and c

b
s4 s5
a
s0 s1
s6 c
s7
Example RE → NFA 76

NFA for a and b|c

b
e s4 s5 e
a
s0 s1 s3 s8
e s6 c
s7 e
Example RE → NFA 77

NFA for a and ( b|c )*

e
b
e e s4 s5 e
a e
s0 s1 s2 s3 s8 s9
e s6 c
s7 e

e
Example RE → NFA 78

NFA for a ( b|c )* e

b
e e e s4 s5 e
a e
s0 s1 s2 s3 s8 s9
e s6 c
s7 e

Lexical Analysis for Programmers
No ratings yet
Lexical Analysis for Programmers
67 pages
Chapter 2
No ratings yet
Chapter 2
99 pages
1st Phase Lexical Analyzer
No ratings yet
1st Phase Lexical Analyzer
33 pages
Lexical Analysis and Token Recognition
100% (3)
Lexical Analysis and Token Recognition
51 pages
Chapter 3 - Lexical Analysis
100% (1)
Chapter 3 - Lexical Analysis
51 pages
Lecture Week 03
No ratings yet
Lecture Week 03
24 pages
ch-2.pdf 2
No ratings yet
ch-2.pdf 2
27 pages
2 - Compilers (Lexical Analysis)
No ratings yet
2 - Compilers (Lexical Analysis)
60 pages
UNIT-I - Lexical Analysis
No ratings yet
UNIT-I - Lexical Analysis
51 pages
Chapter 3 - Lexical Analysis
No ratings yet
Chapter 3 - Lexical Analysis
51 pages
File 1675742677 110405 LexicalAnalysis-Continue1
No ratings yet
File 1675742677 110405 LexicalAnalysis-Continue1
39 pages
Chapter 2
No ratings yet
Chapter 2
56 pages
Chapter 2
No ratings yet
Chapter 2
91 pages
CompilerD L3
No ratings yet
CompilerD L3
36 pages
Chapter 3 - Lexical Analysis
No ratings yet
Chapter 3 - Lexical Analysis
51 pages
CH 2
No ratings yet
CH 2
36 pages
Chapter-2 Compiler Design
No ratings yet
Chapter-2 Compiler Design
98 pages
Code Source Tokens Scanner Parser IR
No ratings yet
Code Source Tokens Scanner Parser IR
26 pages
Lexical Analysis All Token List and Diffence
No ratings yet
Lexical Analysis All Token List and Diffence
4 pages
Unit II - Lexical Analysis-20-1-2021
No ratings yet
Unit II - Lexical Analysis-20-1-2021
49 pages
Compilers CH 3
No ratings yet
Compilers CH 3
58 pages
Lecture 06
No ratings yet
Lecture 06
27 pages
Lexical Analysis1
No ratings yet
Lexical Analysis1
44 pages
Regualr Languages
No ratings yet
Regualr Languages
29 pages
CS 346: Compilers: Lexical Analyzer Lexical Analyzer
No ratings yet
CS 346: Compilers: Lexical Analyzer Lexical Analyzer
52 pages
Lec02 Lexicalanalyzer
100% (1)
Lec02 Lexicalanalyzer
50 pages
Compiler Course: Lexical Analysis
No ratings yet
Compiler Course: Lexical Analysis
50 pages
Lexical Analysis
No ratings yet
Lexical Analysis
36 pages
Acknowledgements: The Slides For This Lecture Are A Modified Versions of The Offering by
No ratings yet
Acknowledgements: The Slides For This Lecture Are A Modified Versions of The Offering by
40 pages
Lect 03
No ratings yet
Lect 03
19 pages
Module 1
No ratings yet
Module 1
20 pages
Chapter 3 - Lexical Analysis
No ratings yet
Chapter 3 - Lexical Analysis
34 pages
19CSE401 CD 02 Scanners
No ratings yet
19CSE401 CD 02 Scanners
82 pages
Chapter Two LexicalAnalysis
No ratings yet
Chapter Two LexicalAnalysis
16 pages
04 Lexi Cal A Analysis
No ratings yet
04 Lexi Cal A Analysis
39 pages
Chapter 3 - Lexical Analysis
No ratings yet
Chapter 3 - Lexical Analysis
52 pages
Lecture 06
No ratings yet
Lecture 06
27 pages
Lexical Analysis
No ratings yet
Lexical Analysis
47 pages
CD - Unit1 - Lecture4 5 6 7
No ratings yet
CD - Unit1 - Lecture4 5 6 7
50 pages
2 - Scanner
No ratings yet
2 - Scanner
49 pages
Lecture 06
No ratings yet
Lecture 06
26 pages
Lecture 04
No ratings yet
Lecture 04
37 pages
Chapter 2
No ratings yet
Chapter 2
39 pages
Lexical Analysis and Lexical Analyzer Generators: COP5621 Compiler Construction
No ratings yet
Lexical Analysis and Lexical Analyzer Generators: COP5621 Compiler Construction
52 pages
Slides 02 - Compiler Construction - UET CS - Lexical Analyzer Rev 2
No ratings yet
Slides 02 - Compiler Construction - UET CS - Lexical Analyzer Rev 2
69 pages
Compiler-Lexical Analysis
100% (1)
Compiler-Lexical Analysis
59 pages
Chapter 3 Implementation - of - Lexical - Analysis
No ratings yet
Chapter 3 Implementation - of - Lexical - Analysis
63 pages
Chapter 3 - Lexical Analysis and Lexical Analyzer Generators
No ratings yet
Chapter 3 - Lexical Analysis and Lexical Analyzer Generators
52 pages
CH 3 Myppt
No ratings yet
CH 3 Myppt
59 pages
Compiler Lexical Analysis Guide
No ratings yet
Compiler Lexical Analysis Guide
56 pages
Chapter 2
No ratings yet
Chapter 2
31 pages
Ch2 CC
No ratings yet
Ch2 CC
47 pages
Scanner
No ratings yet
Scanner
56 pages
Chapter 2
No ratings yet
Chapter 2
77 pages
(3rd Year) Compiler PPT RM
No ratings yet
(3rd Year) Compiler PPT RM
50 pages
Lecture6 Chapter5 - State Reduction and Assignment
100% (1)
Lecture6 Chapter5 - State Reduction and Assignment
21 pages
Turing Machines: Part One
No ratings yet
Turing Machines: Part One
44 pages
Language Learning with Teachers
No ratings yet
Language Learning with Teachers
5 pages
Recognition of Tokens
No ratings yet
Recognition of Tokens
34 pages
Ddco 2020 Esa QP
No ratings yet
Ddco 2020 Esa QP
4 pages
UCS Faults Reference Guide PDF
No ratings yet
UCS Faults Reference Guide PDF
506 pages
Chapter 3 Regular Expression
No ratings yet
Chapter 3 Regular Expression
25 pages
Paper 4 Countering Malware Via Decoy Processes
No ratings yet
Paper 4 Countering Malware Via Decoy Processes
11 pages
Synchronous Sequential Circuits
No ratings yet
Synchronous Sequential Circuits
56 pages
UML Activity Diagrams Guide
No ratings yet
UML Activity Diagrams Guide
13 pages
Pushdown Automata Explained
No ratings yet
Pushdown Automata Explained
11 pages
EEE20001 Unit Outline Semester 2 2023
No ratings yet
EEE20001 Unit Outline Semester 2 2023
12 pages
Regular Languages & Finite Automata
No ratings yet
Regular Languages & Finite Automata
16 pages
Part-B - Toc
No ratings yet
Part-B - Toc
20 pages
Computer Engineering I (ECE290) : Digital System Design With Verilog
No ratings yet
Computer Engineering I (ECE290) : Digital System Design With Verilog
136 pages
Test Uml
No ratings yet
Test Uml
15 pages
Digital Circuits - Finite State Machines
No ratings yet
Digital Circuits - Finite State Machines
3 pages
Regular Languages T1 Mealy Machines
No ratings yet
Regular Languages T1 Mealy Machines
32 pages
Chapter6 StateMachines
No ratings yet
Chapter6 StateMachines
5 pages
Question Bank - Mod2
No ratings yet
Question Bank - Mod2
2 pages
JFLAP Setup Guide for Beginners
No ratings yet
JFLAP Setup Guide for Beginners
23 pages
ASM Charts & Binary Multiplier Design
No ratings yet
ASM Charts & Binary Multiplier Design
58 pages
Chapter 1 Intro To The Theory of Computation2016
No ratings yet
Chapter 1 Intro To The Theory of Computation2016
212 pages
DDCA Ch3
No ratings yet
DDCA Ch3
97 pages
AWS Step Functions Lab Guide
No ratings yet
AWS Step Functions Lab Guide
2 pages
7673formal Language and Automata Theory 2nd Edition K. V. N. Sunitha Available Any Format
No ratings yet
7673formal Language and Automata Theory 2nd Edition K. V. N. Sunitha Available Any Format
146 pages
Formal Language and Compiler Design - 2
No ratings yet
Formal Language and Compiler Design - 2
39 pages
Chapter 1 - Automata
No ratings yet
Chapter 1 - Automata
4 pages
Transition Graphs: National University of Computer and Emerging Sciences, FAST, Islamabad
No ratings yet
Transition Graphs: National University of Computer and Emerging Sciences, FAST, Islamabad
33 pages
Fundamentals of Software Engineering Third Ipm International Conference Fsen 2009 Kish Island Iran April 1517 2009 Revised Selected Papers 1st Edition J C M Baeten Download
100% (3)
Fundamentals of Software Engineering Third Ipm International Conference Fsen 2009 Kish Island Iran April 1517 2009 Revised Selected Papers 1st Edition J C M Baeten Download
90 pages

Compiler Construction Lecture 3-4

Uploaded by

Compiler Construction Lecture 3-4

Uploaded by

1

Gul Sher Ali

 Reads the input characters of the source program and

Source Program Lexical Tokens Parser

 Token : A pair consisting of token name & attribute

 <keyword, int >

 < keyword , if >

 < number , 123>

 Pattern: Describes the form that a lexeme can take.

Letter {letter + digit )*

 Lexeme: A sequence of characters that matches the pattern of a

 Literal: Any thing surrounded by double quotes “”.

 Task of Lexical analyzer:

1. Deletion of comments, white spaces, blanks , tabs, etc.

2. Produces the sequence of tokens.

3. Keeps the tracks of lines in case of errors.

 Identifiers:x y11 maxsize

 When a token is found:

 Need a more principled

Alphabet = English characters

 Languages are sets of

 Each regular expression is

A regular expression (RE)

[abc] = a|b|c (any of

 Sucha mechanism is called

The start state

A FA that accepts only “1”

 Can we build a finite

On input “11”, automaton could be

A NFA can choose

 NFAs and DFAs recognize

 Fora given language,

 NFAsare the key to

NFA for a and b

NFA for a ( b|c )* e

building NFA for a ( b|c )*

NFA for a, b and c

NFA for a and b|c

NFA for a and ( b|c )*

NFA for a ( b|c )* e

You might also like