[go: up one dir, main page]

0% found this document useful (0 votes)
15 views9 pages

Compiler Construction Week 4 Lecture 8

The document discusses the concept of the FIRST set in grammar, its importance in parsing, and the rules for computing it. It also covers left recursion and left factoring as techniques to make grammars suitable for top-down parsing, along with examples and the role of lexical analyzers in tokenizing input for compilers. Additionally, it outlines stack manipulation instructions and arithmetic operations relevant for expression evaluation in a stack machine context.

Uploaded by

bella.shine7799
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views9 pages

Compiler Construction Week 4 Lecture 8

The document discusses the concept of the FIRST set in grammar, its importance in parsing, and the rules for computing it. It also covers left recursion and left factoring as techniques to make grammars suitable for top-down parsing, along with examples and the role of lexical analyzers in tokenizing input for compilers. Additionally, it outlines stack manipulation instructions and arithmetic operations relevant for expression evaluation in a stack machine context.

Uploaded by

bella.shine7799
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

FIRST Set

The FIRST set of a grammar symbol (terminal or non-terminal) contains the set of terminals that
can appear at the start of strings derived from that symbol.
Why FIRST Set Is Important
 Used in LL(1) and LR(1) parsing tables.
 Helps decide which production rule to use.
 Assists in resolving ambiguity.
 Helps compute the FOLLOW set.
 Essential for recursive descent parsing.
Rules for Computing FIRST Set
Let’s compute FIRST(X):
1. If X is a terminal:
FIRST(X) = {X}
2. If X → ε is a production:
Add ε to FIRST(X)
3. If X → Y1 Y2 Y3 ... Yn:
o Add FIRST(Y1) to FIRST(X) (excluding ε)
o If FIRST(Y1) contains ε, add FIRST(Y2), and so on.
o If all Yi (1 to n) derive ε, then add ε to FIRST(X)
Example 1
Grammar:
E → T E’
E’ → + T E’ | ε
T → F T’
T’ → * F T’ | ε
F → ( E ) | id
FIRST Sets:
FIRST(E) = { (, id }
FIRST(E’) = { +, ε }
FIRST(T) = { (, id }
FIRST(T’) = { *, ε }
FIRST(F) = { (, id }
Features of FIRST Set

Feature Description

Purpose To identify which terminal symbols can begin strings derived from a non-terminal.

Used in LL(1) parsing, recursive descent parsers, predictive parsers.

Helps with Rule selection, ambiguity resolution, and error detection.

Advantages
 Helps make parsing decisions fast.
 Supports error detection and recovery.
 Reduces ambiguity in grammar.
Disadvantages
 Complex computation for large grammars.
 Not effective for ambiguous or left-recursive grammars directly.
Tip for Learning
Always start from terminals and work upward. Pay special attention to rules that include ε
because they affect whether we need to consider more symbols in a production.
Left Recursion
A grammar is left-recursive if a non-terminal symbol calls itself on the left side of its production.
Why it's a problem:
Top-down parsers like recursive descent and LL(1) parsers cannot handle left recursion directly
— it can lead to infinite recursion.
Example of Left Recursion:
A → Aα | β
Here, A calls itself on the left side. It’s left-recursive.
Example with expressions:
E→E+T|T
Removing Left Recursion
To remove left recursion, rewrite the productions using the following method:
Convert:
A → Aα | β
Into:
A → β A'
A' → α A' | ε
Example:
Original:
E→E+T|T
After eliminating left recursion:
E → T E'
E' → + T E' | ε
Now it's suitable for top-down parsing!
Left Factoring
Left factoring is a grammar transformation used when two or more productions share a
common prefix. It helps remove ambiguity and make decisions easier during parsing.
Why it's useful:
LL(1) parsers need to choose the correct production based on a single lookahead symbol. Left
factoring helps in such decisions.
Example of Left Factoring:
Before:
A → if E then S else S
A → if E then S
The common prefix is: if E then S
After Left Factoring:
A → if E then S A'
A' → else S | ε
Now, the parser doesn’t have to guess which rule to use—it looks ahead and knows exactly
what to do.
Comparison Table

Feature Left Recursion Left Factoring

Purpose Eliminate infinite recursion Remove ambiguity in rule selection

Problem it solves Recursive descent parser failure Multiple rules starting the same way

Changes Converts recursion into iteration Extracts common prefixes

Helps in Making grammar LL(1)-compatible Making parser decisions easier

A translator for simple expressions is a component of a compiler that converts arithmetic


expressions (like a + b * c) into a different form, such as:
 Postfix (Reverse Polish Notation)
 Intermediate representation (IR)
 Machine code
Simple Expression
Simple expressions include basic arithmetic using identifiers and operators like:
a+b
a+b*c
(a + b) * c
Translator Goals
 Parse the expression
 Apply correct precedence and associativity
 Generate intermediate code or postfix notation
Example 1: Infix to Postfix Translation
Input:
a+b*c
Output:
abc*+
Grammar for Simple Arithmetic Expressions
Let’s define a grammar that captures precedence and associativity:
E → T E'
E' → + T E' | ε
T → F T'
T' → * F T' | ε
F → (E) | id
Translation Scheme: Postfix Notation Generator
Here’s a Syntax-Directed Translation scheme using the above grammar:
Production Rules with Postfix Translation:
E → T E' { E.code = T.code || E'.code }
E' → + T E' { E'.code = T.code || "+" || E'.code }
E' → ε { E'.code = "" }
T → F T' { T.code = F.code || T'.code }
T' → * F T' { T'.code = F.code || "*" || T'.code }
T' → ε { T'.code = "" }
F → (E) { F.code = E.code }
F → id { F.code = id.lexeme }
Example Trace (Input: a + b * c)
Postfix Output:
abc*+
Adding a Lexical Analyzer for Simple Expressions
What is a Lexical Analyzer?
 It reads raw input (a string like "a + b * c")
 Splits it into tokens: identifiers, operators, parentheses
 Each token has a type and possibly a value
Tokens We Need for Simple Expressions

Token Type Examples

ID a, b, c, var1

PLUS +

MULT *

LPAREN (

RPAREN )

EOF (end) End of input marker

Example Lexer in Python


import re
# Token specification using regex
token_specification = [
('ID', r'[a-zA-Z_][a-zA-Z_0-9]*'), # Identifiers
('PLUS', r'\+'), # Plus
('MULT', r'\*'), # Multiply
('LPAREN',r'\('), # Left Parenthesis
('RPAREN',r'\)'), # Right Parenthesis
('SKIP', r'[ \t]+'), # Skip spaces and tabs
]
def lexer(code):
tokens = []
pos = 0
while pos < len(code):
match = None
for tok_type, tok_regex in token_specification:
pattern = re.compile(tok_regex)
match = pattern.match(code, pos)
if match:
if tok_type != 'SKIP': # Ignore whitespace
tokens.append((tok_type, match.group(0)))
pos = match.end()
break
if not match:
raise SyntaxError(f'Illegal character: {code[pos]} at position {pos}')
tokens.append(('EOF', ''))
return tokens

# Example usage:
input_code = "a + b * (c + d)"
tokens = lexer(input_code)
for t in tokens:
print(t)
Integrating Lexer with Parser
Modify the parser to consume tokens as (token_type, token_value) instead of raw strings.
Example snippet of parser token consumption:
tokens = lexer(input_code)
pos = 0

def next_token():
return tokens[pos][0] if pos < len(tokens) else 'EOF'

def match(expected_type):
global pos
if next_token() == expected_type:
pos += 1
else:
raise SyntaxError(f'Expected {expected_type}, got {next_token()}')
Use tokens[pos][1] whenever you want the actual lexeme (like the variable name).
Now you have a complete mini-compiler front-end for simple expressions:
 Lexer to tokenize input
 Parser to validate syntax and parse structure
 Translator to convert infix to postfix

Token Attributes:

When a lexical analyzer (lexer) scans the source code, it produces tokens. A token is a pair
consisting of a token type (like ID, NUMBER, PLUS) and its attributes, which hold additional
information about that token.

Examples of Token Attributes:

Token Type Possible Attributes


ID The actual variable name (lexeme), e.g., "count"
NUMBER Numeric value (int, float), e.g., 42 or 3.14
STRING The string literal value, e.g., "hello"
PLUS Usually no attribute (just the symbol)
KEYWORD The keyword string or an internal code

Why are Token Attributes Important?

 They carry semantic info for later stages (parsing, semantic analysis, code gen).
 For example, the parser needs the lexeme of an ID to build symbol tables.
 The code generator may use numeric values in NUMBER tokens directly.

Example of Token with Attributes

Token = {
'type': 'ID',
'lexeme': 'variableName',
'line': 10,
'column': 5
}

Instructions for Stack Manipulation (in Compilation)

Stack manipulation instructions are central in runtime environments for expressions evaluation,
function calls, and intermediate code generation (especially for postfix evaluation or stack
machine code).

Common Stack Operations:

Instruction Description
PUSH <value> Push a constant, variable value, or intermediate result onto the stack
POP <dest> Pop the top value from the stack into a variable or register
LOAD <var> Push the value of a variable onto the stack
STORE <var> Pop the top value from the stack and store it into a variable
DUP Duplicate the top value of the stack
SWAP Swap the top two values on the stack

Arithmetic Operations (Assuming operands on stack):

Instruction Operation Description


ADD Pop two values, add, push result
SUB Pop two values, subtract, push result
MUL Pop two values, multiply, push result
DIV Pop two values, divide, push result

Example: Evaluating a + b * c in a Stack Machine

For postfix expression a b c * +, instructions might be:

LOAD a # push a
LOAD b # push b
LOAD c # push c
MUL # pop b,c multiply and push result
ADD # pop a, result add and push final result

You might also like