Compiler
1
Overview and History (1)
Cause
Software for early computers was written in assembly language
The benefits of reusing software on different CPUs started to become
significantly greater than the cost of writing a compiler
The first real compiler
FORTRAN compilers of the late 1950s
18 person-years to build
2
Overview and History (2)
Compiler technology
is more broadly applicable and has been employed in rather
unexpected areas.
Text-formatting languages
Silicon compiler for the creation of VLSI circuits
Command languages of OS
Query languages of Database systems
3
What Do Compilers Do (1)
A compiler acts as a translator, transforming human-
oriented programming languages into computer-oriented
machine languages.
Ignore machine-dependent details for programmer
Programming Machine
Language Compiler Language
(Source) (Target)
4
What Do Compilers Do (2)
Compilers may generate three types of code:
Pure Machine Code
Machine instruction set without assuming the existence of any
operating system or library.
Mostly being OS or embedded applications.
Augmented Machine Code
Code with OS routines and runtime support routines.
More often
Virtual Machine Code
Virtual instructions, can be run on any architecture with a virtual
machine interpreter or a just-in-time compiler
Ex. Java
5
What Do Compilers Do (3)
Another way that compilers differ from one another is in
the format of the target machine code they generate:
Assembly or other source format
Relocatable binary
Relative address
A linkage step is required
Absolute binary
Absolute address
Can be executed directly
6
The Structure of a Compiler (1)
Any compiler must perform two major tasks
Compiler
Analysis Synthesis
Analysis of the source program
7 Synthesis of a machine-language program
The Structure of a Compiler (2)
Source
Program Tokens Syntactic Semantic
Scanner Parser Routines
(Character Stream) Structure
Intermediate
Representation
Symbol and Optimizer
Attribute
Tables
(Used by all Phases of The Compiler)
Code
Generator
8
Target machine code
The Structure of a Compiler (3)
Source
Program Tokens Syntactic Semantic
Scanner Parser
(Character Stream) Structure Routines
Intermediate
Scanner Representation
The scanner begins the analysis of the source program by
reading the input, character by character, and grouping
Symbol and Optimizer
characters into individual words and symbols (tokens)
Attribute
Tables
RE
RE (( Regular
Regular expression
expression ))
NFA
NFA (( Non-deterministic
Non-deterministic Finite
Finite Automata
Automata ))
DFA
DFA (( Deterministic
Deterministic Finite (Used by)) all
Finite Automata
Automata
LEX
LEX Phases of
The Compiler) Code
Generator
9
Target machine code
The Structure of a Compiler (4)
Source
Program Tokens Syntactic Semantic
Scanner Parser
(Character Stream) Structure Routines
Intermediate
Parser Representation
Given a formal syntax specification (typically as a context-free
grammar [CFG] ), the parser reads tokens and groups them into
Symbol and Optimizer
units as specified by the productions of the CFG being used.
As syntactic structure isAttribute
recognized, the parser either calls
Tables
corresponding semantic routines directly or builds a syntax tree.
(Used by all
Phases of
The Compiler) Code
Generator
10
Target machine code
The Structure of a Compiler (5)
Source
Program Tokens Syntactic Semantic
Scanner Parser
(Character Stream) Structure Routines
Intermediate
Semantic Routines Representation
Perform two functions
Check the static semantics of each construct
Symbol and
Do the actual translation Optimizer
The heart of a compilerAttribute
Tables
Syntax
Syntax Directed
Directed Translation
Translation
Semantic
Semantic Processing Techniques
Processing Techniques
IR (Intermediate
(Used
by all
Representation)
IR (Intermediate Representation)
Phases of
The Compiler) Code
Generator
11
Target machine code
The Structure of a Compiler (6)
Source
Program Tokens Syntactic Semantic
Scanner Parser
(Character Stream) Structure Routines
Intermediate
Optimizer Representation
The IR code generated by the semantic routines is analyzed and
transformed into functionally equivalent but improved IR code
This phase can be verySymbol and
complex and slow Optimizer
Attribute
loop optimization, register allocation, code scheduling
Tables
Register
Register and
and Temporary
Temporary Management
Management
(Used by all
Phases of
The Compiler) Code
Generator
12
Target machine code
The Structure of a Compiler (7)
Source
Program Tokens Syntactic Semantic
Scanner Parser
(Character Stream) Structure Routines
Intermediate
Code Generator Representation
Interpretive
Interpretive Code
Code Generation
Generation
Generating
Generating Code from Tree
Code from Tree
Grammar-Based
Grammar-Based Code
Code Generator
Generator Optimizer
Code
Generator
13 Target machine code
The Structure of a Compiler (8)
Code
Code Generator
Generator
[Intermediate Code
[Intermediate Code Generator]
Generator]
Non-optimized Intermediate Code
Scanner
Scanner
[Lexical
[Lexical Analyzer]
Analyzer]
Tokens
Code
Code Optimizer
Optimizer
Parser
Parser
[Syntax
[Syntax Analyzer]
Analyzer]
Optimized Intermediate Code
Parse tree
Code
Code Generator
Generator
Semantic
Semantic Process
Process
[Semantic
[Semantic analyzer]
analyzer] Target machine code
Abstract Syntax Tree w/ Attributes
14
The Syntax and Semantics of
Programming Language (1)
A programming language must include the specification
of syntax (structure) and semantics (meaning).
Syntax typically means the context-free syntax because of
the almost universal use of context-free-grammar (CFGs)
Ex.
a = b + c is syntactically legal
b + c = a is illegal
15
The Syntax and Semantics of
Programming Language (2)
The semantics of a programming language are commonly
divided into two classes:
Static semantics
Semantics rules that can be checked at compiled time.
Ex. The type and number of a function’s arguments
Runtime semantics
Semantics rules that can be checked only at run time
16
Computer Architecture and Compiler
Design
Compilers should exploit the hardware-specific feature
and computing capability to optimize code.
The problems encountered in modern computing
platforms:
Instruction sets for some popular architectures are highly
nonuniform.
High-level programming language operations are not always
easy to support.
Ex. exceptions, threads, dynamic heap access …
Exploiting architectural features such as cache, distributed
processors and memory
Effective use of a large number of processors
17
Compiler Design Considerations
Debugging Compilers
Designed to aid in the development and debugging of
programs.
Optimizing Compilers
Designed to produce efficient target code
Retargetable Compilers
A compiler whose target architecture can be changed without
its machine-independent components having to be rewritten.
18