Introduction to
compiler
Dr. Tarik Elamsy
All course slides are mainly from the lecture notes of
Prof. Alex Aiken from Stanford University
Language Processors
• What is a programming language?
• A programming language is a notation for describing computations to people and to
machines.
• All the software running on all the computers was written in some programming
language.
• Before a program can be run, it first must be translated into a form in which it can be
executed by a computer.
• What is a compiler?
• A compiler is a program that can read a program in one language (the source
language) and translate it into an equivalent program in another language (the
target language).
• The compiler should report any errors in the source program that it detects during
the translation process.
Interpreters & Compilers
• Interpreter
• A program that reads an source program and produces the results of
executing that program
• Compiler
• A program that translates a program from one language (the source) to
another (the target)
05/13/2025 © 2002-08 Hal Perkins & UW CSE A-4
Common Issues
• Compilers and interpreters both must read the input – a stream of
characters – and “understand” it; analysis
w h i l e ( k < l e n g t h ) { <nl> <tab> i f ( a [ k ] > 0
) <nl> <tab> <tab>{ n P o s + + ; } <nl> <tab> }
05/13/2025 © 2002-08 Hal Perkins & UW CSE A-5
Interpreter
• Interpreter
• Execution engine
• Program execution interleaved with analysis
running = true;
while (running) {
analyze next statement;
execute that statement;
}
• Usually need repeated analysis of statements (particularly in loops, functions)
• But: immediate execution, good debugging & interaction
05/13/2025 © 2002-08 Hal Perkins & UW CSE A-6
Compiler
• Read and analyze entire program
• Translate to semantically equivalent program in another language
• Presumably easier to execute or more efficient
• Should “improve” the program in some fashion
• Offline process
• Tradeoff: compile time overhead (preprocessing step) vs execution
performance
05/13/2025 © 2002-08 Hal Perkins & UW CSE A-7
Typical Implementations
• Compilers
• FORTRAN, C, C++, Java, COBOL, etc. etc.
• Strong need for optimization in many cases
• Interpreters
• PERL, Python, Ruby, awk, sed, shells, Scheme/Lisp/ML, postscript/pdf, Java
VM
• Particularly effective if interpreter overhead is low relative to execution cost
of individual statements
05/13/2025 © 2002-08 Hal Perkins & UW CSE A-8
Hybrid approaches
• Well-known example: Java
• Compile Java source to byte codes – Java Virtual Machine language (.class
files)
• Execution
• Interpret byte codes directly, or
• Compile some or all byte codes to native code
• Just-In-Time compiler (JIT) – detect hot spots & compile on the fly to native code – standard
these days
• Variation: .NET
• Compilers generate MSIL
• All IL compiled to native code before execution
05/13/2025 © 2002-08 Hal Perkins & UW CSE A-9
Language Processors
• There are other kinds of language processors:
• Interpreters
• Preprocessors
• Assemblers
• Linkers
• Loaders
Language Processors
• In addition to a compiler, several other programs may be required to
create an executable target program.
Programming Language Basics
Programming languages can be classified as follows:
• First-Generation Languages are the machine languages (sequences of
0’s and 1’s)
• Second-Generation Languages are the assembly languages
• Third-Generation Languages are higher-level languages such as C and
Java.
• Fourth-Generation Languages are languages designed for specific
applications like SQL for data base queries.
Programming Language Basics
• Imperative programming languages are languages in which a program
specifies how a computation is done.
• In imperative languages there is a notion of program state and statements
that change the state.
• E.g., Java and C.
• Declarative programming languages are languages in which a program
specifies what computation is to done.
• E.g., Prolog, ML and Haskell
Programming Language Basics
• An object-oriented language is one that supports object-oriented
programming, a programming style in which a program consists of a
collection of objects and objects that interact with one another.
• E.g., Smalltalk, C++, Java, C# and Ruby
• Scripting languages are interpreted languages with high level
operators called scripts.
• Programs written in scripting languages are often much shorter than
equivalent programs written in languages like C.
• E.g., PHP, JavaScript and Awk.
Programming Language Basics
• We will cover the following important terminology:
• The static/dynamic distinction
• Environments and states
• Static scope and block structure
• Explicit access control
• Dynamic scope
• Parameter passing mechanisms
• Aliasing
Programming Language Basics
The Static/Dynamic Distinction
• If a language uses a policy that allows the compiler to decide an issue,
then we say that the language uses a static policy (the issue can be
decided at compile time).
• If a language uses a policy that only allows a decision to be made
when the program is executed, then we say that the language uses a
dynamic policy (the issue can be decided at run time).
Programming Language Basics
The Static/Dynamic Distinction
• Consider the issue of the scope of declarations:
• The scope of a declaration of x is the region of the program in which uses of x
refer to this declaration.
• In a language with static scope (lexical scope) policy, the scope of a
declaration can be determined by looking only at the program; the compiler
can determine the scope at compile time.
• E.g., Java and C.
• On the other hand, in a language with dynamic scope policy, the scope can
only be determined at run time. As the program runs, the same use of x could
refer to any of several declarations of x.
Programming Language Basics
Environments and States
• The environment is a mapping from names to locations in the memory
(store).
• Environments change according to the scope rules of a language.
• The state is a mapping from locations in the memory (store) to their
values.
• The environments and state mappings are generally speaking dynamic.
Programming Language Basics
Environments and States - Example
Programming Language Basics
Static Scope and Block Structure
• Most languages use static scope.
• In static scope, the scope rules are based on program structure. The
scope of a declaration is determined implicitly by where the
declaration appears in the program.
• Some languages, such as C++ and Java, also provide explicit control
over scopes through the use of keywords like public, private, and
protected.
Programming Language Basics
Static Scope and Block Structure
• In programming languages, a declaration is a statement that defines
the type of an identifier.
• E.g., int i
• A definition, on the other hand, is a statement that defines the value
of a variable.
• E.g., i=5
Programming Language Basics
Static Scope and Block Structure
• Names vs. variables vs. identifiers
• An identifier is a string of characters, typically letters or digits, that refers to
(identifies) an entity, such as a data object, a procedure, a class, or a type.
• An identifier is a compile-time name.
• A variable refers to a particular location of the store.
• An identifier can be declared more than once. Here, each such declaration introduces a
new variable.
• A variable is a run-time name.
• All identifiers are names, but not all names are identifiers. Names can also be
expressions.
Programming Language Basics
Static Scope and Block Structure
• The C static-scope policy example:
• A C program consists of a sequence of top-level declarations of
variables and functions.
• Functions may have variable declarations within them, where
variables include local variables and parameters. The scope of each
such declaration is restricted to the function in which it appears.
• The scope of a top-level declaration of a name x consists of the entire
program that follows, with the exception of those statements that lie
within a function that also has a declaration of x.
Programming Language Basics
Static Scope and Block Structure
• In C, the syntax of blocks is given by:
• A block is one type of statement. Blocks can appear anywhere that other
types of statements, such as assignment statements, can appear.
• A block is a sequence of declarations followed by a sequence of statements all
surrounded by braces.
• Blocks can be nested in each other.
Programming Language Basics
Static Scope and Block Structure
• The static-scope rule for variable declarations in block-structured
languages:
We say that a declaration D belongs to a block B if B is the most closely
nested block containing D; that is, D is located within B, but not within
any block that is nested within B.
If declaration D of name x belongs to block B, then the scope of D is all
of B, except for any blocks B’ nested to any depth within B, in which x is
redeclared.
Programming Language Basics
Static Scope and Block Structure
Programming Language Basics
Explicit Access Control
• Classes introduce a new scope for their members.
• Through the use of keywords like public, private, and protected,
object-oriented languages provide explicit control over access to
member names in a superclass.
• These keywords support encapsulation by restricting access.
• Private names are accessible within the class.
• Protected names are accessible also to subclasses.
• Public names are accessible from outside the class.
Programming Language Basics
Dynamic Scope
Programming Language Basics
Dynamic Scope
• Dynamic scope resolution is essential for polymorphic procedures.
• In such procedures, when two or more definitions for the same name
exist, the chosen definition is decided at run time depending only on
the types of arguments.
• Example in Java.
Programming Language Basics
Parameter Passing Mechanisms
• Here we consider how the actual parameters or arguments (the
parameters used in the call of a procedure) are associated with the
formal parameters (those used in the procedure definition).
• There are two main methods:
• Call-by-value
• Call-by-reference
Programming Language Basics
Call-by-Value
• The great majority of languages use the call-by-value mechanism.
• The actual parameter is evaluated (if it is an expression) or copied (if t
is a variable). Then, the value is placed in the location belonging to
the corresponding formal parameter.
• Be careful! Java uses call-by-value. However, many variables are really
references (pointers) to the things they stand for. This is true for
arrays, strings, and objects of all classes.
• Thus, when we pass the name of an object to a method, the value received by
that method is in effect a reference to the object. Thus, the called method can
affect the value of the object itself.
Programming Language Basics
Call-by-Reference
• In call-by-reference, the address of the actual parameter is passed to
the callee as the value of the corresponding formal parameter.
• Thus, changes to the value of the formal parameter actually changes
the value of the actual parameter.
Programming Language Basics
Aliasing
• When two variables refer to the same location in the store, such
variables are said to be aliases of one another.