PPL Unit-1
PPL Unit-1
1
Understanding of implementation issues leads to an understanding of
why languages are designed the way they are.
This in turn leads to the ability to use a language more intelligently, as
it was designed to be used.
5. Ability to design new languages:
The more languages you gain knowledge of, the better understanding
of programming languages concepts you understand.
Most professional programmers sometime design language of one kind
or the other with complete their needs & which is easy to learn and
most efficient.
6. Better use of languages that are already known.
Many contemporary programming languages are large and complex.
Accordingly, it is uncommon for a programmer to be familiar with and
use all of the features of a language he or she uses.
By studying the concepts of programming languages, programmers
can learn about previously unknown and unused parts of the
languages they already use and begin to use those features.
Some languages are better for some jobs than others.
Example – FORTRAN and APL for calculations, COBOL and RPG for
report generation, LISP and PROLOG for AI, etc.
By understanding how features are implemented, you can make more
efficient use of them. Examples: • Creating arrays, strings, lists,
records. • Using recursions, object classes, etc.
7. Overall advancement of computing:
In some cases, a language became widely used, at least in part,
because those in positions to choose languages were not sufficiently
familiar with Programming Language concepts.
The popularity of programming language can be easily determined by
the most popular languages are not always the best available
language.
Many believe that ALGOL 60 was a better language than FORTRAN.
However, FORTRAN was most widely used. It is attributed to the fact
that the programmers and managers didn’t understand the conceptual
design of ALGOL 60.
PROGRAMMING DOMAINS
programming domains defines the ability of using a specific language for a
specific usage.
There are many programming domains, but let’s take the common domains:
1. Scientific Applications:
2
The first digital computers, which appeared in the late 1940s and early
1950s,
were invented and used for scientific applications. Typically, the scientific
applications of that time used relatively simple data structures, but
required large numbers of floating point arithmetic computations.
The most common data structures were arrays and matrices.
The most common control structures were counting loops and
selections.
2. Business Applications:
The use of computers for business applications began in the 1950s.
Special computers were developed for this purpose, along with special
languages. Business languages are characterized by facilities for
producing elaborate reports, precise ways of describing and storing
decimal numbers and character data, and the ability to specify decimal
arithmetic operations. The first successful high-level language for
business was COBOL(ISO/IEC, 2002), the initial version of which appeared
in 1960. It is still the most commonly used language for these
applications.
There have been few developments in business application languages
outside the development and evolution of COBOL.
3. Artificial Intelligence:
AI is a broad area of computer applications characterized by the use of
symbolic rather than numeric computations.
Symbolic computation means that symbols, consisting of names
rather than numbers, are manipulated.
Also, symbolic computation is more conveniently done with linked
lists of data rather than arrays.
This kind of programming sometimes requires more flexibility than
other programming domains.
For example, in some AI applications the ability to create and execute
code segments during execution is convenient.
The first widely used programming language developed for AI
applications was the functional language LISP.
4. Systems Programming:
3
The operating system and the programming support tools of a computer
system are collectively known as its systems software.
Systems software is used almost continuously and so it must be
efficient.
Furthermore, it must have low-level features that allow the software
interfaces to external devices to be written.
The general programming languages, such as C and C++ are usually
preferred for system programming.
5. Web Software:
The World Wide Web software is used to represent some dynamic web
contents, representation of text, audio or video data, contents
containing some computations and so on.
It needs a support for variety of languages from markup languages
(HTML), scripting language (JavaScript/PHP), General purpose (Java).
6. Scripting Languages:
These languages have evolved over the last 25 years.
Characteristics: a) A scripting language is used by putting a list of
commands, called a SCRIPT in a title to be executed. b) For simple
applications like file management and file filtering Languages
Developed:
I) SH is the first scripting language with simple commands later control,
flow statements variables, functions and other capabilities were added
resulting in a complete programming language
V) PERL
7. Special-Purpose Languages:
They have evolved over the past 40 years and these are designed for a
particular specific purpose Languages Develop
a) PPG to produce business reports
b) API- for instructing programmable machine tools
4
c) GPSS for system simulation.
These are not used that much because of their narrow applicability.
Data Types: Adequate facilities for defining data types and structures.
Support for Abstraction: means the ability to define and use complicated
structures or operations in ways that allow many of the details to be ignored.
5
Exception Handling: means, intercept run-time errors
Restricted Aliasing: referencing the same memory cell with more than one
name.
READABILITY
The ease with which programs can be read and understood is called
readability. The following describe characteristics that contribute to the
readability of a PL.
1) Simplicity-
a. The language that has large no. of basic components is more difficult to
learn than one with a small no of basic components.
b. The language should not have multiplicity of commands.
c. For e.g. I = I + 1 ; I + = 1 ;I + + ; + + I .
d. The language should not have operator overloading in which a single
operator symbol has more than one meaning.
2) Orthogonal –
It makes the language easy to learn and read-context independent.
a. It means that a relatively small number of primitive constructs can be
combined in a number of ways to build the program.
b. Orthogonal language is independent of the context of its appearance in
the program.
3) Control Statements-
a. A program that can be read from top to bottom is much easier to
understand than a program that requires the reader to jump from one
statement to some other non-adjacent statement.
b. Example- goto statements.
4) Data Types and Structures-
a. The presence of adequate facilities for defining data types and data
structures in a language is another significant aid to readability.
b. There should be provision for data structures in a language are another
significant aid to readability.
c. There should be provision for data types, for record type of data types
(representing an array of employee records)
5) Syntax considerations-
6
a. Syntax is the form of elements of language.
b. There are 3 types of syntactic design choices that affect readability.
Different forms of identifiers, special keywords (reserve words), Form &
meaning – constructs that are similar in appearance but different meaning is
not readable.
WRITABILITY
The measure of how easily a language can be used to create programs for a
chosen problem domain. The features that affect the readability of a also
affect the writ ability apart from them, the factors that influence writability
are
1) Simplicity-
a. A large language takes more time to learn.
b. Programmers might learn only a subset.
c. Feature multiplicity (having more than one way to perform a particular
operation) is often confusing.
d. For example, in C++ or Java you can decrement a variable in four
different ways:
x = x – 1; x -= 1; x--; --x.
e. Operator overloading (a single operator symbol has more than one
meaning) can lead to confusion. Some languages (e.g. assembly languages),
can be "too simple" – too low level. 2, 3, 4, 5 or more statements needed to
have the effect of 1 statement in a high-level language
2) Orthogonality-
a. In general use, it means being independent, non-redundant, non-
overlapping, or not related.
b. In computer languages, it means a construct can be used without
consideration as to how its use will affect something else.
c. A programming language is considered orthogonal if its features can be
used without thinking about how their use will affect other features, e.g.
objects and arrays.
d. Having fewer constructs and having few exceptions increases readability
and writability.
e. Orthogonal languages are easier to learn. Examples:Pointers should be
able to point to any type of variable or data structure.
3) Support for abstraction – process & data abstraction both.
7
a. Abstraction means the ability to define and use complicated structures or
operations in ways that allow many of the details to be ignored.
b. For example- To use a subprogram to implement a sort algorithm that is
required several times in a program.
c. Without the subprogram the sort code would have to be replicated in all
places where it was needed, which would make the program much longer
and more tedious to write.
4) Expressivity-
a. The great deal of computation must be accomplished with a very
small program.
b. A language must have relatively convenient, rather than cumbersome
ways of specifying computations.
c. For example, in C, the statement count++ is more convenient and
shorter than count=count+1.
d. Also, the use of for statement in Java makes writing counting loops easier
than with the use of while, which is also possible.
RELIABILITY
A program is said to be reliable if it performs to its specifications under all
conditions. Along with all the features that affect readability and writ-ability
there are several other features that affect reliability
1) Type checking –
It is the testing for type errors in a given program either by compiler or
during program execution. Runtime checking is expensive. Examples of
failures of type checking (i)Countless loops(ii)Formal and actual parameter
being of different types(iii)Array out of bounds
2) Exception Handling –
The ability of a program to intercept run-time errors, take corrective
measures and then continue is a great aid to reliability. ADA, C++, Java
include this capability whereas C, FORTRAN don’t.
3) Aliasing-
Aliasing is referencing the same memory cell with more than one name
E.g., in C, both x and y can be used to refer to the same memory cell
int x = 5;
int *y = &x;
Aliasing is a dangerous feature in a programming language.
8
COST
The ultimate cost of a programming language is a function of many of its
characteristics
1. The cost of training programmers
2. The cost of writing programs
3. The cost of compiling programs
4. The cost of executing programs
5. The cost of Language implementation System
6. The cost of poor reliability
7. The cost of maintaining programs
i. Computer architecture
The most commonly used computer architecture is “von Neumann machine”
(or model). The execution of a machine code program on a von Neumann
architecture computer occurs in a process called the fetch-execute-cycle.
9
A memory is shared between both data and programmers.
The CPU executes the functions or instructions
The basic operations are LOAD (read a value from a memory location)
and STORE (Write a value to memory location)
The instructions and data are transmitted from memory to CPU and the
result after compilation is sent back to memory.
10
• Functional Programming Language.
• Logic Programming Language.
• Object Oriented Programming Language.
11
Functional Programming Language:
The language one in which the primary mean’s of making
computation’s is by applying fun’s to given parameters. These
languages called functional language.
In this paradigm the computations are expressed in the form of
functions.
Functional programming paradigms treat values as single entities.
Unlike variables, values never modified.
Functional programming languages are a class of languages designed
to reflect the way programmers think mathematically, not reflecting
the underlying machine.
Functional languages are based on the lambda calculus, a simple
model of computation, and have a solid theoretical foundation that
allows one to reason formally about the programs written in them.
The most commonly used functional languages are Standard LISP (List
processing), ML, Haskell, and “pure” Scheme.
Merits:
The functions ate not reusable.
It is possible to develop and maintain large programs.
Because of usage of functions programs are easy to understand.
Demerits:
Functional programming is less efficient.
Functional programming concept is not good option for commercial
software development.
Logic Programming Language:
12
Execution of program is very slow.
True/False statements can’t solve most of problems.
Efficiently solve limited set of problems.
Object Oriented Programming Language:
The language in which objects are used. Object have both data
member and member function. Object concept is used in place of
procedures. These languages are called object-oriented programming
language.
OOP not consider the separate category of the PL. OOP naturally
develop from imperative languages.
In this language everything is modelled as object.
This language has modular programming approach in which data and
functions are bounded in one entity called class.
This programming paradigm has gained a great popularity in recent
ears because of characteristic features such as data abstraction,
encapsulation, inheritance, polymorphism and so on.
Examples of object-oriented programming languages are Smalltalk, C+
+, Java.
Merits:
Re-usability
Data Redundancy
Security
Code Maintenance
Objects can be created for real world entities.
14
The operating system and language implementations are layered over the
machine language interface of a computer. These layers can be thought of as
virtual computers, providing interfaces to the user at higher levels. For
example, an operating system and a C compiler provide a virtual C computer.
With other compilers, a machine can become other kinds of virtual
computers. Most computer systems provide several different virtual
computers. User programs form another layer over the top of the layer of
virtual computers.
There are three methods are used for language implementation.
Compilation – Programs are translated into machine language is known
as compilation.
Pure Interpretation – Programs are interpreted line by line to target
program known as an interpreter
Hybrid Implementation Systems – A compromise between compilers
and pure interpreters
COMPILATION
15
Programming languages can be implemented by any of three general
methods. At one extreme, programs can be translated into machine
language, which can be executed directly on the computer. This method is
called a compiler implementation and has the advantage of very fast
program execution, once the translation process is complete. Most
production implementations of languages, such as C, COBOL, C++, and Ada,
are by compilers. The language that a compiler translates is called
the source language. The process of compilation and program execution
takes place in several phases, the most important of which are shown in
Figure.
Compilation process has several phases:
lexical analysis:
This is an initial phase of compilation in which each statement in the
source program is scanned or read line by line and then it broken into
stream of string called tokens. lexical analysis phase is also called as
scanning.
The lexical analyzer gathers the characters of the source program into
lexical units. The lexical units of a program are identifiers, special
words, operators, and punctuation symbols.
The lexical analyzer ignores comments in the source program because
the compiler has no use for them.
The lexical analyzer identifies the type of each lexeme and store it’s
information in the symbol table.
Syntax analysis:
This is the second phase in the process of compilation.
The syntax analyzer takes the lexical units from the lexical analyzer
and uses them to construct hierarchical structures for checking the
syntax of source code called parse trees.
These parse trees represent the syntactic structure of the program.
It is also called as parsing as the parse tree is constructed in this
phase.
Semantics analysis:
The semantic analyzer is a phase in which the meaning of source
program is analyzed.
16
The semantic analyzer is an integral part of the intermediate code
generator.
The semantic analyzer checks for errors, such as type errors, that are
difficult, if not impossible, to detect during syntax analysis.
Code Optimization:
Semantic analyzer produces the code called intermediate code. For
example, consider the statements
a=b+c-d
The intermediate code can be generated as follows.
t1=b+c
t2=t1-d
a=t2
Optimization, which improves programs (usually in their intermediate
code version) by making them smaller or faster or both, is often an
optional part of compilation.
Code generation:
The code generator translates the optimized intermediate code version
of the program into an equivalent machine language program.
The output code produced during this phase can be directly executed
or may be other translation step required to follow.
These steps can be assembly, linking and loading.
Symbol table:
The symbol table serves as a database for the compilation process.
The primary contents of the symbol table are the type and attribute
information of each user-defined name in the program.
This information is placed in the symbol table by the lexical and syntax
analyzers and is used by the semantic analyzer and the code
generator.
17
The user and system code together are sometimes called a load
module, or executable image.
The process of collecting system programs and linking them to user
programs is called linking and loading, or sometimes just linking. It
is accomplished by a systems program called a linker.
18
3. Finally the complete program is loaded into the memory as an executable
machine code. This task is carried out by loader.
19
The speed of the connection between a computer’s memory and its
processor usually determines the speed of the computer, because
instructions often can be executed faster than they can be moved to the
processor for execution. This connection is called the von Neumann
bottleneck.
it is the primary limiting factor in the speed of von Neumann architecture
computers. The von Neumann bottleneck has been one of the primary
motivations for the research and development of parallel computers.
Interpreter Compiler
The source program gets In the process of compilation, the
interpreted every time it is to be program is analyzed only once and
executed, and every time the then the code is generated. Hence
source program is analyzed. Hence compiler is efficient than
interpretation is less efficient than interpreter.
Compiler.
The interpreters do not produce The compilers produce object code.
object code.
The interpreters can be made portal The compilers has to be present on
because they do not produce object the host machine when particular
code. program needs to be compiled.
Interpreters are simpler and give us The compiler is a complex program
improved debugging environment. and it requires large amount of
memory
PURE INTERPRETATION
Pure interpretation lies at the opposite end (from compilation) of
implementation methods. With this approach, programs are interpreted by
another program called an interpreter, with no translation whatever.
Pure interpretation has the advantage of allowing easy implementation
of many source-level debugging operations, because all run-time error
messages can refer to source-level units. For example, if an array index
is found to be out of range, the error message can easily indicate the
source line and the name of the array.
On the other hand, this method has the serious disadvantage that
execution is 10 to 100 times slower than in compiled systems.
The primary source of this slowness is the decoding of the high-level
language statements, which are far more complex than machine
20
language instructions (although there may be fewer statements than
instructions in equivalent machine code).
Furthermore, regardless of how many times a statement is executed, it
must be decoded every time. Therefore, statement decoding, rather
than the connection between the processor and memory, is the
bottleneck of a pure interpreter.
Although some simple early languages of the 1960s (APL, SNOBOL, and LISP)
were purely interpreted, by the 1980s, the approach was rarely used on high-
level languages. However, in recent years, pure interpretation has made a
significant comeback with some Web scripting languages, such as JavaScript
and PHP, which are now widely used. The process of pure interpretation is
shown in Figure
21
The process used in a hybrid implementation system is shown in
Figure. Instead of translating intermediate language code to machine
code, it simply interprets the intermediate code.
Examples
– Perl programs are partially compiled to detect errors before interpretation
– Initial implementations of Java were hybrid, the intermediate form, byte
code, provides portability to any machine that has a byte code interpreter
and a runtime system (together, these are called Java Virtual Machine)
22
.NET languages are implemented with a JIT system
PREPROCESSER
PROGRAMMING ENVIRONMENTS
Programming environment is an environment in which the programs can be
created and tested. A programming environment is the collection of tools
used in the development of software. This collection may consist of only a
file system, a text editor, a linker, and a compiler. A programming language
are not the only measure of the software development capability of a
system. We now briefly describe several programming environments.
UNIX is an older programming environment, first distributed in the
middle 1970s, built around a portable multiprogramming operating
system. It provides a wide array of powerful support tools for software
production and maintenance in a variety of languages. In the past, the
most important feature absent from UNIX was a uniform interface
among its tools. This made it more difficult to learn and to use.
However, UNIX is now often used through a graphical user interface
(GUI) that runs on top of UNIX. Examples of UNIX GUIs are the Solaris
Common Desktop Environment (CDE), GNOME, and KDE. These GUIs
23
make the interface to UNIX appear similar to that of Windows and
Macintosh systems.
Borland JBuilder is a programming environment that provides an
integrated compiler, editor, debugger, and file system for Java
development, where all four are accessed through a graphical
interface. JBuilder is a complex and powerful system for creating Java
software.
Microsoft Visual Studio .NET is a relatively recent step in the evolution
of software development environments. It is a large and elaborate
collection of software development tools, all used through a windowed
interface. This system can be used to develop software in any one of
the five .NET languages: C#, Visual BASIC .NET, JScript (Microsoft’s
version of JavaScript), F# (a functional language), and C++/CLI.
NetBeans is a development environment that is primarily used for Java
application development but also supports JavaScript, Ruby, and PHP.
Both Visual Studio and NetBeans are more than development
environments—they are also frameworks, which means they actually
provide common parts of the code of the application.
1. Separate compilation:
There are varying number of programmers working on design, code
and test the parts of programs. This require the language to structured
so that individual subprograms or other parts can be separately
compiled and executed without requirement of other part.
Separate compilation is difficult because compiling one subprogram
may need some information which is present in some other sub
program or sometimes shared data is also used in the program due to
which separate compilation is not possible. To solve this problem,
some languages have adopted certain features that aid separate
compilation. These features can be
1. Use of keyword extern
2. Use of scoping rules.
3. In object-oriented programming, the feature like inheritance can be
used to use the shared data.
24
Breakpoint feature: By this feature, the programmer can set the
breakpoints in the program. When the breakpoint is reached during the
execution of the program, the execution of the program is interrupted
and control is given to the programmer. The programmer can then
inspect the program for testing and debugging.
Execution trace feature: By this feature particular variable or
statement can be tagged for tracing during the execution of the
program.
Assertions: The assertions is a expression which is inserted as a
separate statement in a program. For example, following is a Java
Code fragment in which the assertion is used
25
count Identifier
+ Add sign
operator
17 Integer literal
; Semicolon
26
It is given by context free It is given by syntax directed
grammar definition
The syntactic errors are easy to The semantic errors hard,
locate difficult to locate.
27
programming languages can be described by regular grammars. The syntax
of whole programming languages, with minor exceptions, can be described
by context-free grammars.
Grammar and Derivations:
The formal language-generation mechanism that are used to describe the
syntax of programming language is called grammar. A grammar is a
generative device for defining languages. The sentences of the language are
generated through a sequence of applications of the rules, beginning with a
special nonterminal of the grammar called the start symbol. This sequence
of rule applications is called a derivation. Derivation is a repeated
application of rules, starting with the start symbol and ending with a
sentence (all terminal symbols). In a grammar for a complete programming
language, the start symbol represents a complete program and is often
named <program>. The simple grammar shown in below
<program> → begin <stmt_list> end
<stmt_list> → <stmt> | <stmt> ; <stmt_list>
<stmt> → <var> = <expression>
<var> → A | B | C
<expression> → <var> + <var | <var> -
<var>
A program consists of
the special word begin, followed by a list of statements separated by
semicolons, followed by the special word end. An expression is either a
single variable or two variables separated by either a + or - operator. The
only variable names in this language are A, B, and C. A derivation of a
A=B+C; B=C in this language follows:
<program> => begin <stmt_list> end
=> begin <stmt> ; <stmt_list> end
=> begin <var> = <expression> ; <st
mt_list> end
=> begin A
= <expression> ; <stmt_list> end
=> begin A
= <var> + <var> ; <stmt_list> end
=> begin A = B
+ <var> ; <stmt_list> end
=> begin A = B + C ; <stmt_list> end
=> begin A = B + C ; <stmt> end
=> begin A = B +
28
C ; <var> = <expression> end
=> begin A = B + C ; B
= <expression> end
=> begin A = B + C ; B = <var> end
=> begin A = B + C ; B = C end
This derivation, like all derivations, begins with the start symbol, in this case
<program>. The symbol => is read “derives.” Each successive string in the
sequence is derived from the previous string by replacing one of the
nonterminals with one of that nonterminal’s definitions. Each of the strings in
the derivation, including <program>, is called a sentential form. In this
derivation, the replaced nonterminal is always the leftmost nonterminal in
the previous sentential form. Derivations that use this order of replacement
are called leftmost derivations.
A Grammar for Simple Assignment Statements is given below
Parse Trees:
29
Hierarchical structures of the language are called parse trees. One of the
most attractive features of grammars is that they naturally describe the
hierarchical syntactic structure of the sentences of the languages they
define. These hierarchical structures are called parse trees.
For example –
Parse tree for a = b + c Parse tree for A = B * ( A + C )
Ambiguity:
A grammar that generates a sentential form for which there are two or more
distinct parse trees is said to be ambiguous. An Ambiguous Grammar for
Simple Assignment Statements.
<assign> → <id> = <expr>
<id> → A | B | C
<expr> → <expr> + <expr> | <expr> * <expr> |
( <expr> ) | <id>
Ex: A = B + C * A. Two distinct parse trees for the same sentence.
30
There are several other characteristics of a grammar that are sometimes
useful in determining whether a grammar is ambiguous. They include the
following:
(1) if the grammar generates a sentence with more than one leftmost
derivation and
(2) if the grammar generates a sentence with more than one rightmost
derivation.
Operator Precedence:
When an expression includes two different operators, for example,
x + y * z, one obvious semantic issue is the order of evaluation of the two
operators. This semantic question can be answered by assigning different
precedence levels to operators.
if * has been assigned higher precedence than + multiplication will be done
first, regardless of the order of appearance of the two operators in the
expression.
An Unambiguous Grammar for Expressions
<assign> → <id> = <expr>
<id> → A | B | C
<expr> → <expr> + <term> |
<term>
<term> → <term> * <factor> |
<factor>
<factor> → ( <expr> ) | <id>
Every derivation with an unambiguous grammar has a unique parse tree,
although that tree can be represented by different derivations. For example,
the following derivation of the sentence A = B + C * A is different from the
derivation of the same sentence given previously.
31
The unique parse tree for A = B + C * A using an unambiguous grammar
Associativity of Operators:
When an expression includes two operators that have the same precedence
(as * and / usually have)—for example, A/B*C a semantic rule is required to
specify which should have precedence. This rule is named associativity. As
was the case with precedence, a grammar for expressions may correctly
32
imply operator associativity. Consider the following example of an
assignment statement:
A=B+C+A
The parse tree for this sentence, as defined with the grammar of Example
shown in below. The parse tree the left addition operator lower than the right
addition operator.
33
3. The third common extension deals with multiple-choice options. When
a single element must be chosen from a group, the options are placed
in parentheses and separated by the OR operator, |.
Ex: <term> → <term> (* | / | %) <factor>
In BNF, a description of this <term> would require the following three
rules:
<term> →
<term> * <factor> | <term> / <factor> | <term> % <factor>
BNF and EBNF Versions of an Expression Grammar
BNF: <expr> → <expr> + <term> | <expr> - <term> | <term>
<term> → <term> * <factor> | <term> / <factor> | <factor>
<factor> → <exp> ** <factor> <exp>
<exp> → (<expr>) | id
EBNF:<expr> → <term> {(+ | -) <term>}
<term> → <factor> {(* | /) <factor>}
<factor> → <exp> { * * <exp>}
<exp> → (<expr>) | id
ATTRIBUTE GRAMMARS
An attribute grammar is a device used to describe more of the structure of
a programming language than can be described with a context-free
grammar. An attribute grammar is an extension to a context-free grammar.
The extension allows certain language rules to be conveniently described,
such as type compatibility.
Static Semantics:
There are some characteristics of the structure of programming languages
that are difficult to describe with BNF, and some that are impossible. As an
example of a syntax rule that is difficult to specify with BNF, consider type
compatibility rules. In Java, for example, a floating-point value cannot be
assigned to an integer type variable, although the opposite is legal.
If all of the typing rules of Java were specified in BNF, the grammar would
become too large to be useful, because the size of the grammar determines
the size of the syntax analyzer. The static semantics of a language is only
34
indirectly related to the meaning of programs during execution. Static
semantics is so named because the analysis required to check these
specifications can be done at compile time.
Basic Concepts:
Attribute grammars are context-free grammars to which have been added
attributes, attribute computation functions, and predicate
functions. Attributes, which are associated with grammar symbols (the
terminal and nonterminal symbols), are similar to variables in the sense that
they can have values assigned to them. Attribute computation functions,
sometimes called semantic functions, are associated with grammar rules.
They are used to specify how attribute values are computed. Predicate
functions, which state the static semantic rules of the language, are
associated with grammar rules.
Attribute Grammars Defined:
An attribute grammar is a grammar with the following additional features:
Associated with each grammar symbol X is a set of attributes A(X). The
set A(X) consists of two disjoint sets S(X) called synthesized
attributes and I(X) called inherited attributes.
A(X)=>S(X) & I(X)
Synthesized attributes are used to pass semantic information up a
parse tree, while inherited attributes pass semantic information down
and across a tree.
Associated with each grammar rule is a set of semantic functions and a
possibly empty set of predicate functions over the attributes of the
symbols in the grammar rule. For a rule X0 X1, X2,,,,Xn, the
synthesized attributes of X0 are computed with semantic functions of
the form S(X0) = f(A(X1), c , A(Xn)).
A predicate function has the form of a Boolean expression on the union
of the attribute set {A(X0), c , A(Xn)} and a set of literal attribute
values.
As a very simple example of how attribute grammars can be used to
describe static semantics, consider the following fragment of an attribute
grammar that describes the rule that the name on the end of an Ada
procedure must match the procedure’s name.
The syntax portion of our example attribute grammar is
<assign> → <var> = <expr> <expr> → <var> + <var>
| <var> <var> → A | B | C
The attributes for the nonterminals in the example attribute grammar are
described in the following paragraphs:
35
actual_type—A synthesized attribute associated with the
nonterminals <var> and <expr>. It is used to store the actual type, int
or real, of a variable or expression.
expected_type—An inherited attribute associated with the
nonterminal <expr>. It is used to store the type, either int or real, that
is expected for the expression, as determined by the type of the
variable on the left side of the assignment statement.
The complete attribute grammar follows in Example
1.Syntax rule: <assign> → <var> = <expr>
Semantic rule: <expr>.expected_type ← <var>.actual_type
2.Syntax rule: <expr> → <var>[2] + <var>[3]
Semantic rule: <expr>.actual_type ←
if (<var>[2].actual_type = int) and (<var>[3].actual_type
= int)
then int
else real
end if
Predica <expr>.actual_type == <expr>.
te: expected_type
3.Syntax rule: <expr> → <var>
Semantic rule: <expr>.actual_type ← <var>.actual_type
Predica <expr>.actual_type == <expr>.
te: expected_type
4.Syntax rule: <var> → A | B | C
Semantic rule: <var>.actual_type ← look-up(<var>.string)
36