12 Type System
12 Type System
1
Type system …
• Languages can be divided into three
categories with respect to the type:
– “untyped”
•No type checking needs to be done
•Assembly languages
– Statically typed
•All type checking is done at compile time
•Algol class of languages
•Also, called strongly typed
– Dynamically typed
•Type checking is done at run time
•Mostly functional languages like Lisp,
Scheme etc.
2
Type systems …
• Static typing
– Catches most common programming errors at compile
time
– Avoids runtime overhead
– May be restrictive in some situations
– Rapid prototyping may be difficult
8
Type constructors …
• Pointer: if T is a type expression then
pointer(T) is a type expression denoting
type pointer to an object of type T
• Function: function maps domain set to
range set. It is denoted by type expression
D→R
– For example % has type expression
int * int → int
– The type of function int* f(char a, char b) is
denoted by
char * char pointer(int)
9
Specifications of a type checker
• Consider a language which consists
of a sequence of declarations
followed by a single expression
P→D;E
D → D ; D | id : T
T → char | integer | T[num] | T*
E → literal | num | E%E | E [E] | *E
10
Specifications of a type checker …
• A program generated by this grammar is
key : integer;
key %1999
• Assume following:
– basic types are char, int, type-error
– all arrays start at 0
– char[256] has type expression
array(0 .. 255, char)
11
Rules for Symbol Table entry
D id : T addtype(id.entry, T.type)
T char T.type = char
T integer T.type = int
T T1* T.type = pointer(T1.type)
T T1 [num] T.type = array(0..num-1, T1.type)
12
Type checking of functions
E E1 (E2) E. type =
(E1.type == s → t and E2.type == s)
? t : type-error
13
Type checking for expressions
E → literal E.type = char
E → num E.type = integer
E → id E.type = lookup(id.entry)
E → E1 % E2 E.type = if E1.type == integer and E2.type==integer
then integer
else type_error
E → E1[E2] E.type = if E2.type==integer and E1.type==array(s,t)
then t
else type_error
E → *E1 E.type = if E1.type==pointer(t)
then t
else type_error
14
Type checking for expressions
E → literal E.type = char
E → num E.type = integer
E → id E.type = lookup(id.entry)
E → E1 % E2 E.type = if E1.type == integer and E2.type==integer
then integer
else type_error
E → E1[E2] E.type = if E2.type==integer and E1.type==array(s,t)
then t
else type_error
E → *E1 E.type = if E1.type==pointer(t)
then t
else type_error
15
Type checking for statements
• Statements typically do not have values. Special basic type void can
be assigned to them.
S → id := E S.Type = if id.type == E.type
then void
else type_error
S → if E then S1 S.Type = if E.type == boolean
then S1.type
else type_error
S → while E do S1 S.Type = if E.type == boolean
then S1.type
else type_error
S → S1 ; S2 S.Type = if S1.type == void
and S2.type == void
then void
else type_error
16
Type checking for statements
• Statements typically do not have values. Special basic type void can
be assigned to them.
S → id := E S.Type = if id.type == E.type
then void
else type_error
S → if E then S1 S.Type = if E.type == boolean
then S1.type
else type_error
S → while E do S1 S.Type = if E.type == boolean
then S1.type
else type_error
S → S1 ; S2 S.Type = if S1.type == void
and S2.type == void
then void
else type_error
17
Equivalence of Type expression
19
Efficient implementation
• Bit vectors can be used to represent type
expressions. Refer to: A Tour Through the Portable
C Compiler: S. C. Johnson, 1979.
20
Efficient implementation …
Basic type Encoding Type constructor Encoding
Boolean 0000 pointer 01
Char 0001 array 10
Integer 0010 function 11
real 0011
22
Name equivalence …
variable type expression
next link
last link
p pointer(cell)
q pointer(cell)
r pointer(cell)
23
Name equivalence …
• Some compilers allow type expressions to have names.
• However, some compilers assign implicit type names.
• A fresh implicit name is created every time a type
name appears in declarations.
• Consider
type link = ^ cell;
var next : link;
last : link;
p, q : ^ cell;
r : ^ cell;
• In this case type expression of q and r are given
different implicit names and therefore, those are not
of the same type
24
Name equivalence …
The previous code is equivalent to
type link = ^ cell;
np = ^ cell;
nr = ^ cell;
var next : link;
last : link;
p, q: np;
r : nr;
25
Cycles in representation of types
• Data structures like linked lists are defined
recursively
• Implemented through structures which contain
pointers to structure
• Consider following code
type link = ^ cell;
cell = record
info : integer;
next : link
end;
• The type name cell is defined in terms of link and
link is defined in terms of cell (recursive
definitions)
26
Cycles in representation of …
• Recursively defined type names link = ^ cell;
cell = record
can be substituted by definitions info : integer;
next : link
• However, it introduces cycles into end;
the type graph
record record
X X X X
cell
27
Cycles in representation of …
• C uses structural equivalence for all types
except records (struct)
• It uses the acyclic structure of the type graph
• Type names must be declared before they
are used
– However, allow pointers to undeclared record
types
– All potential cycles are due to pointers to records
• Name of a record is part of its type
– Testing for structural equivalence stops when a
record constructor is reached
28
Type conversion
• Consider expression like x + i where x is of
type real and i is of type integer
• Internal representations of integers and
reals are different in a computer
– different machine instructions are used for
operations on integers and reals
• The compiler has to convert both the
operands to the same type
• Language definition specifies what
conversions are necessary.
29
Type conversion …
• Usually conversion is to the type of the left
hand side
• Type checker is used to insert conversion
operations:
x+i
x real+ inttoreal(i)
• Type conversion is called implicit/coercion if
done by compiler.
• It is limited to the situations where no
information is lost
• Conversions are explicit if programmer has
to write something to cause conversion
30
Type checking for expressions
E → num E.type = int
E → num.num E.type = real
E → id E.type = lookup( id.entry )
E → E1 op E2 E.type =
if E1.type == int && E2.type == int
then int
elif E1.type == int && E2.type == real
then real
elif E1.type == real && E2.type == int
then real
elif E1.type == real && E2.type==real
then real
31
Overloaded functions and operators
• Overloaded symbol has different meaning
depending upon the context
• In math, + is overloaded; used for integer,
real, complex, matrices
• In Ada, () is overloaded; used for array,
function call, type conversion
• Overloading is resolved when a unique
meaning for an occurrence of a symbol is
determined
32
Overloaded functions and operators
• In Ada standard interpretation of * is
multiplication of integers
• However, it may be overloaded by saying
function “*” (i, j: integer) return complex;
function “*” (i, j: complex) return complex;
• Possible type expression for “ * ” include:
integer x integer → integer
integer x integer → complex
complex x complex → complex
33
Overloaded function resolution
• Suppose only possible type for 2, 3 and
5 is integer
• Z is a complex variable
• 3*5 is either integer or complex
depending upon the context
– in 2*(3*5): 3*5 is integer because 2 is
integer
– in Z*(3*5) : 3*5 is complex because Z
is complex
34
Type resolution
• Try all possible types of each overloaded
function (possible but brute force method!)
• Keep track of all possible types
• Discard invalid possibilities
• At the end, check if there is a single unique
type
• Overloading can be resolved in two passes:
– Bottom up: compute set of all possible
types for each expression
– Top down: narrow set of possible types
based on what could be used in an
expression
35
Determining set of possible types
E’ E E’.types = E.types
E id E.types = lookup(id)
E E1(E2) E.types =
{t |∃s in E2.types && st is in E1.types}
E {i,c}
{i} E * E {i}
{ixii
ixic
{i} 3 cxcc} 5 {i}
36
Narrowing the set of possible types
• Ada requires a complete expression to
have a unique type
• Given a unique type from the context
we can narrow down the type choices
for each expression
• If this process does not result in a
unique type for each sub expression
then a type error is declared for the
expression
37
Narrowing the set of …
E’ E E’.types = E.types
E.unique = if E’.types=={t} then t
else type_error
E id E.types = lookup(id)
E E1(E2) E.types =
{ t | ∃s in E2.types && st is in E1.types}
t = E.unique
S = {s | s∈E2.types and (st)∈E1.types}
E2.unique = if S=={s} then s else type_error
E1.unique = if S=={s} then st
else type_error
38
Narrowing the set of …
E’ E E’.types = E.types
E.unique = if E’.types=={t} then t
else type_error
E id E.types = lookup(id)
E E1(E2) E.types =
{ t | ∃s in E2.types && st is in E1.types}
t = E.unique
S = {s | s∈E2.types and (st)∈E1.types}
E2.unique = if S=={s} then s else type_error
E1.unique = if S=={s} then st
else type_error
39
Polymorphic functions
• A function can be invoked with arguments of
different types
• Built in operators for indexing arrays, applying
functions, and manipulating pointers are usually
polymorphic
• Extend type expressions to include expressions
with type variables
• Facilitate the implementation of algorithms that
manipulate data structures (regardless of types of
elements)
– Determine length of the list without knowing types of
the elements
40
Polymorphic functions …
• Strongly typed languages can make programming
very tedious
• Consider identity function written in a language
like Pascal
function identity (x: integer): integer;
• This function is the identity on integers: int int
• If we want to write identity function on char then
we must write
function identity (x: char): char;
• This is the same code; only types have changed.
However, in Pascal a new identity function must
be written for each type
• Templates solve this problem somewhat, for end-
users
• For compiler, multiple definitions still present!
41
Type variables
• Variables can be used in type expressions to
represent unknown types
• Important use: check consistent use of an
identifier in a language that does not require
identifiers to be declared
• An inconsistent use is reported as an error
• If the variable is always used as of the same
type then the use is consistent and has lead to
type inference
• Type inference: determine the type of a
variable/language construct from the way it is
used
– Infer type of a function from its body
42
function deref(p) { return *p; }
• Initially, nothing is known about type of p
– Represent it by a type variable
• Operator * takes pointer to an object and
returns the object
• Therefore, p must be pointer to an object of
unknown type α
– If type of p is represented by β then
β=pointer(α)
– Expression *p has type α
• Type expression for function deref is
for any type α: pointer(α) α
• For identity function, the type expression is
for any type α: α α
43
Reading assignment
• Rest of Section 6.6 and Section 6.7 of Old
Dragonbook [Aho, Sethi and Ullman]
44