Lecture 2:
Course Introduction
Course Materials
◼ Dragon Book
◼ Aho, Lam, Sethi, Ullman, “Compilers: Principles,
Techniques, and Tools”, 2nd ed, Addison 2007
200px-Purple_dragon_book_b
Compiler Review
What is a Compiler?
◼ A program that translates a program in one
language to another language
◼ The essential interface between applications &
architectures
◼ Typically lowers the level of abstraction
◼ analyzes and reasons about the program & architecture
◼ We expect the program to be optimized, i.e., better
than the original
◼ ideally exploiting architectural strengths and hiding
weaknesses
Compiler vs. Interpreter (1/5)
◼ Compilers: Translate a source (human-
writable) program to an executable (machine-
readable) program
◼ Interpreters: Convert a source program and
execute it at the same time.
Compiler vs. Interpreter (2/5)
Ideal concept:
Source code Compiler Executable
Input data Executable Output data
Source code
Interpreter Output data
Input data
Compiler vs. Interpreter (3/5)
◼ Most languages are usually thought of as
using either one or the other:
◼ Compilers: FORTRAN, COBOL, C, C++, Pascal, PL/1
◼ Interpreters: Lisp, scheme, BASIC, APL, Perl,
Python, Smalltalk
◼ BUT: not always implemented this way
◼ Virtual Machines (e.g., Java)
◼ Linking of executables at runtime
◼ JIT (Just-in-time) compiling
Compiler vs. Interpreter (4/5)
◼ Actually, no sharp boundary between them.
General situation is a combo:
Source code Translator Intermed. code
Intermed. code
Virtual machine Output
Input Data
Compiler vs. Interpreter (5/5)
Compiler Interpreter
◼ Pros ◼ Pros
◼ Less space ◼ Easy debugging
◼ Fast execution ◼ Fast Development
◼ Cons ◼ Cons
◼ Slow processing ◼ Not for large projects
◼ Partly Solved ◼ Exceptions: Perl, Python
(Separate compilation) ◼ Requires more space
◼ Debugging ◼ Slower execution
◼ Improved thru IDEs ◼ Interpreter in memory all
the time
Phase of compilations
1
1
The Phases of a Compiler
Phase Output Sample
Programmer (source code producer) Source string A=B+C;
Scanner (performs lexical analysis) Token string ‘A’, ‘=’, ‘B’, ‘+’, ‘C’, ‘;’
And symbol table with names
Parser (performs syntax analysis Parse tree or abstract syntax tree ;
|
based on the grammar of the =
programming language) / \
A +
/ \
B C
Semantic analyzer (type checking, Annotated parse tree or abstract
etc) syntax tree
Intermediate code generator Three-address code, quads, int2fp B t1
+ t1 C t2
:= t2 A
Optimizer Three-address code, quads, int2fp B t1
+ t1 #2.3 A
Code generator Assembly code MOVF #2.3,r1
ADDF2 r1,r2
MOVF r2,A
Peephole optimizer Assembly code ADDF2 #2.3,r2
MOVF r2,A
1
2
The Grouping of Phases
◼ Compiler front and back ends:
◼ Front end: analysis (machine independent)
◼ Back end: synthesis (machine dependent)
◼ Compiler passes:
◼ A collection of phases is done only once (single pass) or
multiple times (multi pass)
◼ Single pass: usually requires everything to be defined before being
used in source program
◼ Multi pass: compiler may have to keep entire program
representation in memory
1
3
Compiler-Construction Tools
◼ Software development tools are available to
implement one or more compiler phases
◼ Scanner generators
◼ Parser generators
◼ Syntax-directed translation engines
◼ Automatic code generators
◼ Data-flow engines
Scanning/Lexical analysis
◼ Break program down into its smallest
meaningful symbols (tokens, atoms)
◼ Tools for this include lex, flex
◼ Tokens include e.g.:
◼ “Reserved words”: do if float while
◼ Special characters: ( { , + - = ! /
◼ Names & numbers: myValue 3.07e02
◼ Start symbol table with new symbols found
Parsing
◼ Construct a parse tree from symbols
◼ A pattern-matching problem
◼ Language grammar defined by set of rules that identify
legal (meaningful) combinations of symbols
◼ Each application of a rule results in a node in the parse
tree
◼ Parser applies these rules repeatedly to the program until
leaves of parse tree are “atoms”
◼ If no pattern matches, it’s a syntax error
◼ yacc, bison are tools for this (generate c code that
parses specified language)
Parse tree
◼ Output of parsing
◼ Top-down description of program syntax
◼ Root node is entire program
◼ Constructed by repeated application of rules
in Context Free Grammar (CFG)
◼ Leaves are tokens that were identified during
lexical analysis
Example:
Parsing rules for Pascal
These are like the following:
◼ program PROGRAM identifier (identifier
more_identifiers) ; block .
◼ more_identifiers , identifier more_identifiers | ε
◼ block variables BEGIN statement
more_statements END
◼ statement do_statement | if_statement |
assignment | …
◼ if_statement IF logical_expression THEN
statement ELSE …
Pascal code example
program gcd (input, output)
var i, j : integer
begin
read (i , j)
while i <> j do
if i>j then i := i – j;
else j := j – i ;
writeln (i);
end .
Example: parse tree
Semantic analysis
◼ Discovery of meaning in a program using the symbol
table
◼ Do static semantics check
◼ Simplify the structure of the parse tree ( from parse tree
to abstract syntax tree (AST) )
◼ Static semantics check
◼ Making sure identifiers are declared before use
◼ Type checking for assignments and operators
◼ Checking types and number of parameters to subroutines
◼ Making sure functions contain return statements
◼ Making sure there are no repeats among switch statement
labels