[go: up one dir, main page]

0% found this document useful (0 votes)
5 views27 pages

66fe65b577d4eCCWeek 02lecture04

CC lecture 2
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views27 pages

66fe65b577d4eCCWeek 02lecture04

CC lecture 2
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 27

Compiler Construction

(CSC-320)
Lecture # 04
Course Instructor: M. Ramzan Shahid Khan

Department of Computer Science,


Namal University Mianwali
Fall Semester, 2024
Topics
• Phases of Compiler (Diagram)
• Symbol Table
• Tokens
• Three Types of Errors
• Lexeme
• Pattern (How to define a Pattern?)

2
Phases of Compiler
• There are 6 phases involved in the construction of a Compiler:
1. Lexical Analysis
2. Syntax Analysis
3. Semantic Analysis
4. Intermediate Code Generation
5. Code Optimizer
6. Target Code Generation

3
Source Code

Phases of Compiler Lexical Analysis

Stream of Tokens

Syntax Analysis

Parse Tree

Semantic Analysis

Symbol Table Syntax Tree Error Reporting

Intermediate
Code Generation

IR (Intermediate
Representation)

Code Optimizer

Code Optimized

Target Code
Generation

4
Target Code
Symbol Table
• One of the most important component – no phase of compiler can
work properly without Symbol Table
• Type of Memory which stores information as per follows:

• It is a Data Structure used by Language Translators (Compiler) for


String Information about identifiers (Variable names, Function names,
Classes names etc.) and their attributes (type, address, scope etc.).
• E.g. public int x;
name type location Scope
x int Memory address public
for this instruction
5
Symbol Table
• All 6 phases of Compiler use this Symbol Table to perform their
functionalities.

• E.g. Syntax Analysis Phase makes sure that the variable using by the
user is declared in the scope or not. If no declaration then throws
Error.

• Semantic Analysis performs type checking (meaning) – incompatible


data
name type location scope
6
Error Reporting/ Handling
1. Detects Errors
2. Report it to the User
3. Make some Recovery Strategy and implement it to handle error e.g.
Panic Mode Recovery, Phase Level Recovery etc.

7
Errors in Compiler Design
Errors

Compile time Runtime

Syntactic Semantic
Lexical Phase
Phase Phase

8
Errors in Compiler Design
Errors

Compile time Runtime

Syntactic Semantic
Lexical Phase
Phase Phase

Spelling Errors Errors in Incompatible


structure Types of Operands
Exceeding the length of the Undeclared
identifier or numeric constant Missing
Operators Variable

The appearance of illegal character Unbalanced Parenthesis Not Matching of Actual Argument
Or Missing Parenthesis with Formal Argument (Parameter)
9
Errors – 3 Types of Compile Time Errors
1. Lexical Errors
• Misspelling of identifiers, keywords or operators
• E.g. Int x; int x; int 2x;
2. Syntactical (Syntax) Errors
• Missing Semi-Colon or Unbalanced Parenthesis
• E.g. int x for( {}
3. Semantical Errors
• Incompatible Value Assignment or Type Mismatches between Operator and
Operands
• E.g. int x; x = “String Type” string x; int y=10; x=“Hello”; x+=y;

10
Error Reporting/ Handling
• Panic Mode Recovery in Compiler Design is used to handle Syntactic
Errors
• Syntactic Errors include:
• Misspelled Keyword
• Missing Operator
• Error in a defined structure
• Unbalanced Parenthesis

11
Panic Mode Recovery in Compiler Design
• It is the most simple and easy-to-implement error-handling method.

• Panic mode recovery is used to handle syntax errors by the parser.

• In panic mode recovery, as soon as the compiler discovers a syntax


error, it enters panic mode.

• As a result, it starts to discard the input string until it finds a symbol


from which it can resume its normal operation.
12
Panic Mode Recovery in Compiler Design
• The symbols that define the normal state for the compiler after
entering panic mode are called synchronization symbols.

• The basic idea behind panic mode recovery in compiler design is to


discard until an optimal operation scenario occurs.

13
Working of Panic Mode Recovery
The following steps can help understand the working of panic mode
recovery in compiler design:

• The parser starts parsing the given input.

• As soon as a syntax error occurs, it starts discarding the input symbols


one at a time.

• It continues to discard the input symbols until it encounters a


synchronizing token.
14
Working of Panic Mode Recovery
• The synchronizing symbols are a designated set of delimiters. The
synchronization token indicates the end of an input statement. Examples
of synchronization symbols are semicolons, braces, and other punctuation
marks.

• Once the synchronization token is found, the parser continues discarding


the previous invalid tokens.

• No synchronization token may be found. In that scenario, the parser


continues to discard the token until a block, class, or even function ends.

15
Working of Panic Mode Recovery
• To understand the above steps have a look at the following example
in C:

• int compiler_construction, 10compiler_construction, compiler_design, #CC;

• Here the parsing begins with int, then compiler_construction is


parsed as a valid lexeme.

16
Working of Panic Mode Recovery
• int compiler_construction, 10compiler_construction, compiler_design, #CC;

• As soon as one is encountered(in 10compiler_construction), the


elements are dropped until a synchronization token is met.

• Here ", "acts as a synchronization token.

• Now, compiler_design is parsed as a valid lexeme.

17
Working of Panic Mode Recovery
• int compiler_construction, 10compiler_construction, compiler_design, #CC;

• Lastly, "#" generates a syntax error, and panic mode recovery starts.

• The elements are dropped until "; "is encountered. Here, "; "or
semicolon acts as a synchronization token.

• Therefore we saw the panic mode recovery in the compiler design in


preventing syntax errors.
18
Advantages of Panic Mode Recovery
Panic mode recovery has the following advantages:

• It is easy to implement, helping to remove syntax errors.

• If there are fewer errors in the statement, it is the best choice.

• It can never be hindered by falling into loop traps.

19
Error Reporting/ Handling
Phase Level Recovery in Compiler Design:

• When an error is discovered, the parser performs local correction on


the remaining input.

• If a parser encounters an error, it makes the necessary corrections on


the remaining input so that the parser can continue to parse the rest
of the statement.

20
Phase Level Recovery
• You can correct the error by
• deleting extra semicolons,
• replacing commas with semicolons, or
• reintroducing missing semicolons.

• To prevent going in an infinite loop during the correction, utmost care


should be taken.

• Whenever any prefix is found in the remaining input, it is replaced with


some string.

• In this way, the parser can continue to operate on its execution.


21
Tokens
• Valid Sequence of Characters (Valid Words) in a Language are Called
Tokens.
• Example of Tokens in Language:
• Constants
• Identifiers (y, x, class, z)
• Operators (+, {, -, ++, --)
• Punctuations (;, ,)
• Keywords (int, if, while)
• Some tokens can include a Single Character.

22
Lexemes
• Sequence of Characters matched by Pattern Forming Tokens. OR
• Specific Instance of Token Class is called a Lexeme.

(if , ; , c , 10 , ++ , int)
(x , public , variable , { , while , , - etc.

23
Lexemes – Split into Different Classes
(if , ; , ( , 10 , ++ , int)
(x , public , variable , { , while , , - etc.

• Keywords – if , int , public , while


• Constants - 10
• Identifiers – x , variable
• Operators – ( , ++ , { , -
• Punctuations - ; , ,

24
Lexemes – Split into Different Classes
(if , ; , ( , 10 , ++ , int)
(x , public , variable , { , while , , - etc.

Tokens Lexemes
Keywords if , int , public , while
Constants 10
Identifiers x , variable
Operators ( , ++ , { , -
Punctuations ;,,

25
Lexemes – Split into Different Token Classes
int
{
x
if
Tokens Lexemes
Keyword int
Operator {
identifier x
Keyword if

26
Pattern
• Set of Rules that define a Token/ set of Rules for Formation of Tokens
from Input Characters.

• Identifiers
• Must starts with an alphabet or _ (underscore) followed by any alphanumeric
character.
• E.g. 2x {2

• Set of Rules must be followed to be fall in specific Token

27

You might also like