[go: up one dir, main page]

0% found this document useful (0 votes)
16 views36 pages

PPL Unit-1

The document discusses the importance of studying programming language principles, highlighting benefits such as improved expression of ideas, better language selection, and enhanced learning of new languages. It also outlines various programming domains, including scientific, business, AI, systems programming, web software, scripting languages, and special-purpose languages. Additionally, it covers language evaluation criteria, influences on language design, and categorizes programming paradigms.

Uploaded by

upender
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views36 pages

PPL Unit-1

The document discusses the importance of studying programming language principles, highlighting benefits such as improved expression of ideas, better language selection, and enhanced learning of new languages. It also outlines various programming domains, including scientific, business, AI, systems programming, web software, scripting languages, and special-purpose languages. Additionally, it covers language evaluation criteria, influences on language design, and categorizes programming paradigms.

Uploaded by

upender
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 36

UNIT-1

REASONS OF STUDYING PRINCIPLE OF PROGRAMING LANGUAGE


The following is what we believe to be a compelling list of potential benefits
of studying principle of programming languages:

1. Increased ability to express ideas:


 It is widely believed that the depth at which a programmer think is
influenced by the expressive power of the language in which
programmer communicate our thoughts. It is difficult for people to
conceptualize structures they can’t describe, verbally or in writing.
 Language in which a programmer develops software places limits on
the kinds of control structures, data structures, and abstractions they
can use.

 Awareness of a wider variety of programming language features can


reduce such limitations in software development.
 Can language constructs be simulated in other languages that do not
support those constructs directly?
2. Improved background for choosing appropriate languages:
 Many programmers, when given a choice of languages for a new
project, continue to use the language with which they are most
familiar, even if it is poorly suited to new projects.
 If these programmers were familiar with other languages available,
they would be in a better position to make informed language choices.
3. Increased greater ability to learn new languages:
 Now a days programming language are still in a state of continuous
evolution, which means continuous learning is essential.
 Programmers who understand the concept of object-oriented
programming will have easier time learning Java.
 Once a thorough understanding of the fundamental concepts of
languages is acquired, it becomes easier to see how concepts are
incorporated into the design of the language being learned.
 It is easier to learn a new language if you understand the underlying
structures of language.
Examples: – It is easier for a BASIC program to FORTRAN than C. – It is
easier for a C++ programmer to learn Java. – It is easier for a Scheme
programmer to learn LISP (List Processing).
4. Better understanding of the significance of implementation:

1
 Understanding of implementation issues leads to an understanding of
why languages are designed the way they are.
 This in turn leads to the ability to use a language more intelligently, as
it was designed to be used.
5. Ability to design new languages:
 The more languages you gain knowledge of, the better understanding
of programming languages concepts you understand.
 Most professional programmers sometime design language of one kind
or the other with complete their needs & which is easy to learn and
most efficient.
6. Better use of languages that are already known.
 Many contemporary programming languages are large and complex.
 Accordingly, it is uncommon for a programmer to be familiar with and
use all of the features of a language he or she uses.
 By studying the concepts of programming languages, programmers
can learn about previously unknown and unused parts of the
languages they already use and begin to use those features.
 Some languages are better for some jobs than others.
Example – FORTRAN and APL for calculations, COBOL and RPG for
report generation, LISP and PROLOG for AI, etc.
By understanding how features are implemented, you can make more
efficient use of them. Examples: • Creating arrays, strings, lists,
records. • Using recursions, object classes, etc.
7. Overall advancement of computing:
 In some cases, a language became widely used, at least in part,
because those in positions to choose languages were not sufficiently
familiar with Programming Language concepts.
 The popularity of programming language can be easily determined by
the most popular languages are not always the best available
language.
 Many believe that ALGOL 60 was a better language than FORTRAN.
However, FORTRAN was most widely used. It is attributed to the fact
that the programmers and managers didn’t understand the conceptual
design of ALGOL 60.

PROGRAMMING DOMAINS
programming domains defines the ability of using a specific language for a
specific usage.
There are many programming domains, but let’s take the common domains:
1. Scientific Applications:

2
The first digital computers, which appeared in the late 1940s and early
1950s,
were invented and used for scientific applications. Typically, the scientific
applications of that time used relatively simple data structures, but
required large numbers of floating point arithmetic computations.
 The most common data structures were arrays and matrices.
 The most common control structures were counting loops and
selections.

The early high-level programming languages invented for scientific


applications were designed to provide for those needs. Their competition
was assembly language, so efficiency was a primary concern. The main
languages that were developed were FORTRAN, ALGOL 60.

2. Business Applications:
The use of computers for business applications began in the 1950s.
Special computers were developed for this purpose, along with special
languages. Business languages are characterized by facilities for
producing elaborate reports, precise ways of describing and storing
decimal numbers and character data, and the ability to specify decimal
arithmetic operations. The first successful high-level language for
business was COBOL(ISO/IEC, 2002), the initial version of which appeared
in 1960. It is still the most commonly used language for these
applications.
There have been few developments in business application languages
outside the development and evolution of COBOL.

3. Artificial Intelligence:
AI is a broad area of computer applications characterized by the use of
symbolic rather than numeric computations.
 Symbolic computation means that symbols, consisting of names
rather than numbers, are manipulated.
 Also, symbolic computation is more conveniently done with linked
lists of data rather than arrays.
 This kind of programming sometimes requires more flexibility than
other programming domains.
For example, in some AI applications the ability to create and execute
code segments during execution is convenient.
The first widely used programming language developed for AI
applications was the functional language LISP.
4. Systems Programming:

3
The operating system and the programming support tools of a computer
system are collectively known as its systems software.
 Systems software is used almost continuously and so it must be
efficient.
 Furthermore, it must have low-level features that allow the software
interfaces to external devices to be written.
The general programming languages, such as C and C++ are usually
preferred for system programming.

5. Web Software:
 The World Wide Web software is used to represent some dynamic web
contents, representation of text, audio or video data, contents
containing some computations and so on.
 It needs a support for variety of languages from markup languages
(HTML), scripting language (JavaScript/PHP), General purpose (Java).

6. Scripting Languages:
These languages have evolved over the last 25 years.
Characteristics: a) A scripting language is used by putting a list of
commands, called a SCRIPT in a title to be executed. b) For simple
applications like file management and file filtering Languages
Developed:
I) SH is the first scripting language with simple commands later control,
flow statements variables, functions and other capabilities were added
resulting in a complete programming language

II) Ksh, developed David Kon

III) AWK developed by AL ATO

IV) TCL developed by John Outerhost

V) PERL

7. Special-Purpose Languages:
They have evolved over the past 40 years and these are designed for a
particular specific purpose Languages Develop
a) PPG to produce business reports
b) API- for instructing programmable machine tools

4
c) GPSS for system simulation.
These are not used that much because of their narrow applicability.

LANGUAGE EVALUATION CRITERIA


The most important criteria for judging a programming language are
1. Readability
2. Writability
3. Reliability
4. Cost

Simplicity: A large language takes more time to learn.

Orthogonality: being independent, non-redundant, non-overlapping, or not


related.

Data Types: Adequate facilities for defining data types and structures.

Syntax Design: Syntax is the form of elements of language.

Support for Abstraction: means the ability to define and use complicated
structures or operations in ways that allow many of the details to be ignored.

Expressivity: The great deal of computation must be accomplished with a


very small program.

Type Checking: testing for type errors.

5
Exception Handling: means, intercept run-time errors

Restricted Aliasing: referencing the same memory cell with more than one
name.

READABILITY
The ease with which programs can be read and understood is called
readability. The following describe characteristics that contribute to the
readability of a PL.
1) Simplicity-
a. The language that has large no. of basic components is more difficult to
learn than one with a small no of basic components.
b. The language should not have multiplicity of commands.
c. For e.g. I = I + 1 ; I + = 1 ;I + + ; + + I .
d. The language should not have operator overloading in which a single
operator symbol has more than one meaning.
2) Orthogonal –
It makes the language easy to learn and read-context independent.
a. It means that a relatively small number of primitive constructs can be
combined in a number of ways to build the program.
b. Orthogonal language is independent of the context of its appearance in
the program.
3) Control Statements-
a. A program that can be read from top to bottom is much easier to
understand than a program that requires the reader to jump from one
statement to some other non-adjacent statement.
b. Example- goto statements.
4) Data Types and Structures-
a. The presence of adequate facilities for defining data types and data
structures in a language is another significant aid to readability.
b. There should be provision for data structures in a language are another
significant aid to readability.
c. There should be provision for data types, for record type of data types
(representing an array of employee records)
5) Syntax considerations-

6
a. Syntax is the form of elements of language.
b. There are 3 types of syntactic design choices that affect readability.
Different forms of identifiers, special keywords (reserve words), Form &
meaning – constructs that are similar in appearance but different meaning is
not readable.

WRITABILITY
The measure of how easily a language can be used to create programs for a
chosen problem domain. The features that affect the readability of a also
affect the writ ability apart from them, the factors that influence writability
are
1) Simplicity-
a. A large language takes more time to learn.
b. Programmers might learn only a subset.
c. Feature multiplicity (having more than one way to perform a particular
operation) is often confusing.
d. For example, in C++ or Java you can decrement a variable in four
different ways:
x = x – 1; x -= 1; x--; --x.
e. Operator overloading (a single operator symbol has more than one
meaning) can lead to confusion. Some languages (e.g. assembly languages),
can be "too simple" – too low level. 2, 3, 4, 5 or more statements needed to
have the effect of 1 statement in a high-level language
2) Orthogonality-
a. In general use, it means being independent, non-redundant, non-
overlapping, or not related.
b. In computer languages, it means a construct can be used without
consideration as to how its use will affect something else.
c. A programming language is considered orthogonal if its features can be
used without thinking about how their use will affect other features, e.g.
objects and arrays.
d. Having fewer constructs and having few exceptions increases readability
and writability.
e. Orthogonal languages are easier to learn. Examples:Pointers should be
able to point to any type of variable or data structure.
3) Support for abstraction – process & data abstraction both.

7
a. Abstraction means the ability to define and use complicated structures or
operations in ways that allow many of the details to be ignored.
b. For example- To use a subprogram to implement a sort algorithm that is
required several times in a program.
c. Without the subprogram the sort code would have to be replicated in all
places where it was needed, which would make the program much longer
and more tedious to write.
4) Expressivity-
a. The great deal of computation must be accomplished with a very
small program.
b. A language must have relatively convenient, rather than cumbersome
ways of specifying computations.
c. For example, in C, the statement count++ is more convenient and
shorter than count=count+1.
d. Also, the use of for statement in Java makes writing counting loops easier
than with the use of while, which is also possible.

RELIABILITY
A program is said to be reliable if it performs to its specifications under all
conditions. Along with all the features that affect readability and writ-ability
there are several other features that affect reliability
1) Type checking –
It is the testing for type errors in a given program either by compiler or
during program execution. Runtime checking is expensive. Examples of
failures of type checking (i)Countless loops(ii)Formal and actual parameter
being of different types(iii)Array out of bounds
2) Exception Handling –
The ability of a program to intercept run-time errors, take corrective
measures and then continue is a great aid to reliability. ADA, C++, Java
include this capability whereas C, FORTRAN don’t.
3) Aliasing-
Aliasing is referencing the same memory cell with more than one name
E.g., in C, both x and y can be used to refer to the same memory cell
int x = 5;
int *y = &x;
Aliasing is a dangerous feature in a programming language.

8
COST
The ultimate cost of a programming language is a function of many of its
characteristics
1. The cost of training programmers
2. The cost of writing programs
3. The cost of compiling programs
4. The cost of executing programs
5. The cost of Language implementation System
6. The cost of poor reliability
7. The cost of maintaining programs

Some other criteria:


i. Portability: This is a criterion that allows to move program from one
implementation to other.
ii. Generality: It is an applicability to wide range of applications.
iii. Well definiteness: The completeness of a language.

INFLUENCES ON LANGUAGE DESIGN


The following factors that influence the evolution of programming language.

i. Computer architecture
The most commonly used computer architecture is “von Neumann machine”
(or model). The execution of a machine code program on a von Neumann
architecture computer occurs in a process called the fetch-execute-cycle.

9
 A memory is shared between both data and programmers.
 The CPU executes the functions or instructions
 The basic operations are LOAD (read a value from a memory location)
and STORE (Write a value to memory location)
 The instructions and data are transmitted from memory to CPU and the
result after compilation is sent back to memory.

II. Programming Methodologies:


 In 1950’s and early 1960’s Simple applications with limited machine
efficiency were built.
 In later 1960’s the programming with better readability and better
control structures were developed. Then the structured programming
approach is being adopted. A new programming method developed
called Top-down design and step-wise refinement
 Late 1970’s the programming with process-oriented to data-oriented
methodology is developed which is data abstraction.
 In middle 1980’s there is strong use of object-oriented methodology
came into existence, which is the extension of data-oriented
methodology.

LANGUAGE CATEGORIES OR PROGRAMMING PARADIGMS


Programming Paradigm can be defined as a method of problem solving and
an approach to programming language design.
Programming languages are often categories as follows:
• Imperative or Procedural Programming Language.

10
• Functional Programming Language.
• Logic Programming Language.
• Object Oriented Programming Language.

Imperative or Procedural Programming Language:


A language around the prevalent computer architecture, this language called
imperative language. The imperative (or procedural) paradigm is the closest
to the structure of actual computers. It is a model that is based on moving
bits around and changing machine state. A program consists of sequence of
statements. After execution of each statement the values are stored in the
memory.
The main features of this language are variables, assignment statements and
iterations.
Programming languages based on the imperative paradigm have the
following characteristics:
 The basic unit of abstraction is the PROCEDURE, whose basic structure
is a sequence of statements that are executed in series, abstracting
the way that the program counter is incremented so as to proceed
through a series of machine instructions residing in sequential
hardware memory cells.
 The sequential flow of execution can be modified by conditional and
looping statements (as well as by the very low-level goto statement
found in many imperative languages), which abstract the conditional
and unconditional branch instructions found in machine instruction set.
 Variables play a key role, and serve as abstractions of hardware
memory cells. Typically, a given variable may assume many different
values of the course of the execution of a program, just as a hardware
memory cell may contain many different values. Thus, the assignment
statement is a very important and frequently used statement.
Examples of imperative languages:
FORTRAN, Algol, COBOL, Pascal, C (and to some extent C++), BASIC, Ada -
and many more.
Merits:
 Simple to implement.
 These languages have low memory utilization.
Demerits:
 Parallel programming is not possible.
 This language is less productive and low level compared to other
programming language.
 Large complex problems are not possible to implement.

11
Functional Programming Language:
 The language one in which the primary mean’s of making
computation’s is by applying fun’s to given parameters. These
languages called functional language.
 In this paradigm the computations are expressed in the form of
functions.
 Functional programming paradigms treat values as single entities.
Unlike variables, values never modified.
 Functional programming languages are a class of languages designed
to reflect the way programmers think mathematically, not reflecting
the underlying machine.
 Functional languages are based on the lambda calculus, a simple
model of computation, and have a solid theoretical foundation that
allows one to reason formally about the programs written in them.
 The most commonly used functional languages are Standard LISP (List
processing), ML, Haskell, and “pure” Scheme.
Merits:
 The functions ate not reusable.
 It is possible to develop and maintain large programs.
 Because of usage of functions programs are easy to understand.
Demerits:
 Functional programming is less efficient.
 Functional programming concept is not good option for commercial
software development.
Logic Programming Language:

 In this paradigm we express computations in terms of mathematical


logic only. It is also called as rule-based language.
 In an imperative language an algorithm is specified in great detail and
specific order of instructions or statements must be included.
 In a rule-based language however rules are specified in no particular
order and language implementation system must choose an execution
order that produce the desire result.
 Example of Logic programming language is Prolog (programming in
logic)
Merits:
 It is reliable.
 The program can be quickly developed using the approach of
True/False statements.
Demerits:

12
 Execution of program is very slow.
 True/False statements can’t solve most of problems.
 Efficiently solve limited set of problems.
Object Oriented Programming Language:
 The language in which objects are used. Object have both data
member and member function. Object concept is used in place of
procedures. These languages are called object-oriented programming
language.
 OOP not consider the separate category of the PL. OOP naturally
develop from imperative languages.
 In this language everything is modelled as object.
 This language has modular programming approach in which data and
functions are bounded in one entity called class.
 This programming paradigm has gained a great popularity in recent
ears because of characteristic features such as data abstraction,
encapsulation, inheritance, polymorphism and so on.
 Examples of object-oriented programming languages are Smalltalk, C+
+, Java.
Merits:
 Re-usability
 Data Redundancy
 Security
 Code Maintenance
 Objects can be created for real world entities.

LANGUAGE DESIGN TRADE-OFFS


There is certain design criterion that are conflicting each other. That means
increase of one generally decreases other. These are:
 Reliability vs. cost of execution: If a program is reliable then it is
probably slower to execute.
Conflicting criteria – Example: Java demands all references to array elements
be checked for proper indexing but that leads to increased execution costs. C
does not require index range checking. C program executes faster than
equivalent Java program, although Java programs more reliable. Java
designer traded execution efficiency for reliability.
 Readability vs. writability: If program is readable then it is probably not
writable or vice versa. Another conflicting criteria – Example: APL provides
many powerful operators for array operands (and a large number of new
symbols), allowing complex computations to be written in a compact
13
program but at the cost of poor readability. Many ALP operators can be used
in a single long complex expression. APL is very writable but poor readability.
 Writability (flexibility) vs. reliability: If program is flexible then it is
probably not safe or vice versa – Another conflicting criteria – Example: C++
pointers are powerful and very flexible but not reliably used. Because of the
potential reliability problems with pointers, they are not included in Java.
IMPLEMENTATION METHODS
the primary components of a computer are its internal memory and its
processor. The internal memory is used to store programs and data. The
processor is a collection of circuits that provides a realization of a set of
primitive operations, or machine instructions, such as those for arithmetic
and logic operations. In most computers, some of these instructions, which
are sometimes called macro instructions are actually implemented with
set of instructions called micro instructions which are defined at even
lower level.
A language implementation system cannot be the only software on a
computer. Also required is a large collection of programs, called the
operating system, which supplies higher-level primitives than those of the
machine language. These primitives provide system resource management,
input and output operations, a file management system, text and/or program
editors, and a variety of other commonly needed functions. Because
language implementation systems need many of the operating system
facilities, they interface with the operating system rather than directly with
the processor (in machine language).

14
The operating system and language implementations are layered over the
machine language interface of a computer. These layers can be thought of as
virtual computers, providing interfaces to the user at higher levels. For
example, an operating system and a C compiler provide a virtual C computer.
With other compilers, a machine can become other kinds of virtual
computers. Most computer systems provide several different virtual
computers. User programs form another layer over the top of the layer of
virtual computers.
There are three methods are used for language implementation.
 Compilation – Programs are translated into machine language is known
as compilation.
 Pure Interpretation – Programs are interpreted line by line to target
program known as an interpreter
 Hybrid Implementation Systems – A compromise between compilers
and pure interpreters

COMPILATION

15
Programming languages can be implemented by any of three general
methods. At one extreme, programs can be translated into machine
language, which can be executed directly on the computer. This method is
called a compiler implementation and has the advantage of very fast
program execution, once the translation process is complete. Most
production implementations of languages, such as C, COBOL, C++, and Ada,
are by compilers. The language that a compiler translates is called
the source language. The process of compilation and program execution
takes place in several phases, the most important of which are shown in
Figure.
Compilation process has several phases:

lexical analysis:
 This is an initial phase of compilation in which each statement in the
source program is scanned or read line by line and then it broken into
stream of string called tokens. lexical analysis phase is also called as
scanning.
 The lexical analyzer gathers the characters of the source program into
lexical units. The lexical units of a program are identifiers, special
words, operators, and punctuation symbols.
 The lexical analyzer ignores comments in the source program because
the compiler has no use for them.
 The lexical analyzer identifies the type of each lexeme and store it’s
information in the symbol table.

Syntax analysis:
 This is the second phase in the process of compilation.
 The syntax analyzer takes the lexical units from the lexical analyzer
and uses them to construct hierarchical structures for checking the
syntax of source code called parse trees.
 These parse trees represent the syntactic structure of the program.
 It is also called as parsing as the parse tree is constructed in this
phase.
Semantics analysis:
 The semantic analyzer is a phase in which the meaning of source
program is analyzed.

16
 The semantic analyzer is an integral part of the intermediate code
generator.
 The semantic analyzer checks for errors, such as type errors, that are
difficult, if not impossible, to detect during syntax analysis.
Code Optimization:
 Semantic analyzer produces the code called intermediate code. For
example, consider the statements
a=b+c-d
The intermediate code can be generated as follows.
t1=b+c
t2=t1-d
a=t2
 Optimization, which improves programs (usually in their intermediate
code version) by making them smaller or faster or both, is often an
optional part of compilation.
Code generation:
 The code generator translates the optimized intermediate code version
of the program into an equivalent machine language program.
 The output code produced during this phase can be directly executed
or may be other translation step required to follow.
 These steps can be assembly, linking and loading.
Symbol table:
 The symbol table serves as a database for the compilation process.
 The primary contents of the symbol table are the type and attribute
information of each user-defined name in the program.
 This information is placed in the symbol table by the lexical and syntax
analyzers and is used by the semantic analyzer and the code
generator.

17
 The user and system code together are sometimes called a load
module, or executable image.
 The process of collecting system programs and linking them to user
programs is called linking and loading, or sometimes just linking. It
is accomplished by a systems program called a linker.

Language Processing Using Translation:


The process of translation is performed in following steps:
1. The program modules are separately translated into relocatable machine
code. This translator is called as compiler.
2. These translated modules are linked together in a relocatable unit. This
task is carried out by linker or linkage editor.

18
3. Finally the complete program is loaded into the memory as an executable
machine code. This task is carried out by loader.

The modern translation technique uses the combined two technique -


translation and then interpretation. That means the source code is translated
into intermediate code and then the intermediate code is interpreted.

Execution of Machine Code:


Fetch-execute-cycle (on a von Neumann architecture)
initialize the program counter
repeat forever
fetch the instruction pointed by the counter
increment the counter
decode the instruction
execute the instruction
end repeat

In addition to systems programs, user programs must often be linked to


previously compiled user programs that reside in libraries. So the linker not
only links a given program to system programs, but also it may link it to
other user programs.

19
The speed of the connection between a computer’s memory and its
processor usually determines the speed of the computer, because
instructions often can be executed faster than they can be moved to the
processor for execution. This connection is called the von Neumann
bottleneck.
it is the primary limiting factor in the speed of von Neumann architecture
computers. The von Neumann bottleneck has been one of the primary
motivations for the research and development of parallel computers.

Differences between Interpreter and Compiler.

Interpreter Compiler
The source program gets In the process of compilation, the
interpreted every time it is to be program is analyzed only once and
executed, and every time the then the code is generated. Hence
source program is analyzed. Hence compiler is efficient than
interpretation is less efficient than interpreter.
Compiler.
The interpreters do not produce The compilers produce object code.
object code.
The interpreters can be made portal The compilers has to be present on
because they do not produce object the host machine when particular
code. program needs to be compiled.
Interpreters are simpler and give us The compiler is a complex program
improved debugging environment. and it requires large amount of
memory

PURE INTERPRETATION
Pure interpretation lies at the opposite end (from compilation) of
implementation methods. With this approach, programs are interpreted by
another program called an interpreter, with no translation whatever.
 Pure interpretation has the advantage of allowing easy implementation
of many source-level debugging operations, because all run-time error
messages can refer to source-level units. For example, if an array index
is found to be out of range, the error message can easily indicate the
source line and the name of the array.
 On the other hand, this method has the serious disadvantage that
execution is 10 to 100 times slower than in compiled systems.
 The primary source of this slowness is the decoding of the high-level
language statements, which are far more complex than machine

20
language instructions (although there may be fewer statements than
instructions in equivalent machine code).
 Furthermore, regardless of how many times a statement is executed, it
must be decoded every time. Therefore, statement decoding, rather
than the connection between the processor and memory, is the
bottleneck of a pure interpreter.

Although some simple early languages of the 1960s (APL, SNOBOL, and LISP)
were purely interpreted, by the 1980s, the approach was rarely used on high-
level languages. However, in recent years, pure interpretation has made a
significant comeback with some Web scripting languages, such as JavaScript
and PHP, which are now widely used. The process of pure interpretation is
shown in Figure

HYBRID IMPLEMENTATION SYSTEMS


 Some language implementation systems are a compromise between
compilers and pure interpreters; they translate high-level language
programs to an intermediate language designed to allow easy
interpretation.
 This method is faster than pure interpretation because the source
language statements are decoded only once. Such implementations
are called hybrid implementation systems. It is faster than pure
interpretation.

21
 The process used in a hybrid implementation system is shown in
Figure. Instead of translating intermediate language code to machine
code, it simply interprets the intermediate code.
Examples
– Perl programs are partially compiled to detect errors before interpretation
– Initial implementations of Java were hybrid, the intermediate form, byte
code, provides portability to any machine that has a byte code interpreter
and a runtime system (together, these are called Java Virtual Machine)

Just-in-Time Implementation Systems


 Initially translate programs to an intermediate language
 Then compile intermediate language into machine code
 Machine code version is kept for subsequent calls
 JIT systems are widely used for Java programs

22
 .NET languages are implemented with a JIT system

PREPROCESSER

Definition: Preprocessor is a program that processes a program just before


the program is compiled.
 Preprocessor instructions are embedded in the program.
 Preprocessor macros (instructions) are commonly used to specify that
code from another file is to be included.
 For example: The C preprocessor expands #include or #define in the
following manner –
 #include "myLib.h"
causes the preprocessor to copy the contents of mylib.h into the
program at the position of the #include.
 #define SIZE 5
In the program we can declare an array of size 5 in following
manner- int a[SIZE];
 #define max(A, B) ((A) > (B) ? (A) : (B))
Other preprocessor instructions are used to define symbols to
represent expressions.

PROGRAMMING ENVIRONMENTS
Programming environment is an environment in which the programs can be
created and tested. A programming environment is the collection of tools
used in the development of software. This collection may consist of only a
file system, a text editor, a linker, and a compiler. A programming language
are not the only measure of the software development capability of a
system. We now briefly describe several programming environments.
 UNIX is an older programming environment, first distributed in the
middle 1970s, built around a portable multiprogramming operating
system. It provides a wide array of powerful support tools for software
production and maintenance in a variety of languages. In the past, the
most important feature absent from UNIX was a uniform interface
among its tools. This made it more difficult to learn and to use.
However, UNIX is now often used through a graphical user interface
(GUI) that runs on top of UNIX. Examples of UNIX GUIs are the Solaris
Common Desktop Environment (CDE), GNOME, and KDE. These GUIs

23
make the interface to UNIX appear similar to that of Windows and
Macintosh systems.
 Borland JBuilder is a programming environment that provides an
integrated compiler, editor, debugger, and file system for Java
development, where all four are accessed through a graphical
interface. JBuilder is a complex and powerful system for creating Java
software.
 Microsoft Visual Studio .NET is a relatively recent step in the evolution
of software development environments. It is a large and elaborate
collection of software development tools, all used through a windowed
interface. This system can be used to develop software in any one of
the five .NET languages: C#, Visual BASIC .NET, JScript (Microsoft’s
version of JavaScript), F# (a functional language), and C++/CLI.
 NetBeans is a development environment that is primarily used for Java
application development but also supports JavaScript, Ruby, and PHP.
Both Visual Studio and NetBeans are more than development
environments—they are also frameworks, which means they actually
provide common parts of the code of the application.

The programming environment influence on language design in two major


areas namely separate compilation and testing and debugging.

1. Separate compilation:
 There are varying number of programmers working on design, code
and test the parts of programs. This require the language to structured
so that individual subprograms or other parts can be separately
compiled and executed without requirement of other part.
 Separate compilation is difficult because compiling one subprogram
may need some information which is present in some other sub
program or sometimes shared data is also used in the program due to
which separate compilation is not possible. To solve this problem,
some languages have adopted certain features that aid separate
compilation. These features can be
1. Use of keyword extern
2. Use of scoping rules.
3. In object-oriented programming, the feature like inheritance can be
used to use the shared data.

2. Testing and Debugging:


Many languages contain some features that help in testing and
debugging. For example-

24
 Breakpoint feature: By this feature, the programmer can set the
breakpoints in the program. When the breakpoint is reached during the
execution of the program, the execution of the program is interrupted
and control is given to the programmer. The programmer can then
inspect the program for testing and debugging.
 Execution trace feature: By this feature particular variable or
statement can be tagged for tracing during the execution of the
program.
 Assertions: The assertions is a expression which is inserted as a
separate statement in a program. For example, following is a Java
Code fragment in which the assertion is used

System.out.print("Enter your age: ");


int age = reader.nextInt();
Assertion.assert(age<18, "You are not allowed conditional to
vote");
GENERAL PROBLEM OF DESCRIBING SYNTAX AND SEMANTICS
Sentences: A language, whether natural (such as English) or artificial (such
as Java), is a set of strings of characters from some alphabet. The strings of a
language are called sentences or statements. So, syntax rules of language
specify which string of characters from the language’s alphabets are in
language.
Language: A Language is set of sentences.

Lexeme: In programming language the smallest unit is lexeme, which


describes the meaning of the particulate program sentence. The lexeme of
programming of language includes its numerical literals, operators and
special words, among others. One can think of programs as strings of lexeme
rather than of characters. Lexemes are partitioned into groups—for example,
the names of variables, methods, classes, and so forth in a programming
language form a group called identifiers.
Consider a following language is C language.
index=2*count+17; is a one lexeme
Lexeme Token
index Identifier
= Equal sign
2 Integer literal
* Mul sign operator

25
count Identifier
+ Add sign
operator
17 Integer literal
; Semicolon

Token: A token is smallest element of the program that is meaningful to the


compiler. A token is category of lexemes. Token can be classified as follows:
Keywords, Identifiers, Constants, Strings, Special symbols, Operators.
Syntax: A syntax of a language is a set rules that defines of form of a
language. The syntax of (programming) language is a set of rules that
defines what sequence of symbols are considered to be valid expression
(program) in language. A syntax provides significant information needed for
understanding a program and provides much needed information towards
the translation of source program into the object program.
A valid representation of syntax is X=Y+Z
A invalid representation may be is XY+-
Example: dd/mm/yy or dd/mm/yyyy or mm/dd/yyyy
Semantics: Semantics of a programming language refers to the meaning
designated to various syntactic constructs such as expression, statements as
programming units.
 Any language, whether English, C, C++, Java is a set of strings of
characters from some alphabet.
 That string of characters is called sentence or statements.
 The syntax rules specify which language the statement belongs.
The syntax of if statement in C language.
If(<expression>)
<statements>
 The syntax and semantic of a programming language are closely
related to each other.
 The language semantic should follow the syntax only.

Difference between Syntax and Semantics.


Syntax Semantics

The structure of expression, The meaning of expression,


statements and program units is statements and programming
called syntax. units is called semantics.
It Handled at compile time It is handled at run time

26
It is given by context free It is given by syntax directed
grammar definition
The syntactic errors are easy to The semantic errors hard,
locate difficult to locate.

In general, languages can be formally defined in two distinct ways: Language


recognition and Language generation
Language Recognizer: The language recognizer is a part of a compiler is a
recognizer for the language the compiler translates. It is responsible for the
syntax analysis of the programming language. In this role, the
recognizer need not test all possible strings of characters from some set to
determine whether each is in the language. In effect then, the syntax
analyzer determines whether the given programs are syntactically correct.
The structure of syntax analyzers, also known as parsers.
Language Generators: A language generator is a device that can be used
to generate the sentences of a language. They have the following
advantages over language recognizer.
a. They are easy to read & understand
b. They are more useful then the recognizer because they are not only
used in trail-and-error mode but also other modes.
c. These language generators also called grammars which are used for
describing the language syntax. We have two types of grammars. 1.
Regular grammar 2. Context free grammar.
d. The regular grammar describes the token of the programming
language, whereas context free grammar describes the syntax of
programming language with little exceptions.

FORMAL METHODS OF DESCRIBING SYNTAX


There are two methods are used to describing the syntax.
1. Backus-Naur Form and
2. Extended Backus-Naur Form.
BACKUS-NAUR FORM AND CONTEXT-FREE GRAMMARS:
In the middle to late 1950s, two men, Noam Chomsky and John Backus, in
unrelated research efforts, developed the same syntax description
formalism, which subsequently became the most widely used method for
programming language syntax.
Context-Free Grammars:
In the mid-1950s, Chomsky, described four classes of generative devices or
grammars that define four classes of languages. The forms of the tokens of

27
programming languages can be described by regular grammars. The syntax
of whole programming languages, with minor exceptions, can be described
by context-free grammars.
Grammar and Derivations:
The formal language-generation mechanism that are used to describe the
syntax of programming language is called grammar. A grammar is a
generative device for defining languages. The sentences of the language are
generated through a sequence of applications of the rules, beginning with a
special nonterminal of the grammar called the start symbol. This sequence
of rule applications is called a derivation. Derivation is a repeated
application of rules, starting with the start symbol and ending with a
sentence (all terminal symbols). In a grammar for a complete programming
language, the start symbol represents a complete program and is often
named <program>. The simple grammar shown in below
<program> → begin <stmt_list> end
<stmt_list> → <stmt> | <stmt> ; <stmt_list>
<stmt> → <var> = <expression>
<var> → A | B | C
<expression> → <var> + <var | <var> -
<var>
A program consists of
the special word begin, followed by a list of statements separated by
semicolons, followed by the special word end. An expression is either a
single variable or two variables separated by either a + or - operator. The
only variable names in this language are A, B, and C. A derivation of a
A=B+C; B=C in this language follows:
<program> => begin <stmt_list> end
=> begin <stmt> ; <stmt_list> end
=> begin <var> = <expression> ; <st
mt_list> end
=> begin A
= <expression> ; <stmt_list> end
=> begin A
= <var> + <var> ; <stmt_list> end
=> begin A = B
+ <var> ; <stmt_list> end
=> begin A = B + C ; <stmt_list> end
=> begin A = B + C ; <stmt> end
=> begin A = B +

28
C ; <var> = <expression> end
=> begin A = B + C ; B
= <expression> end
=> begin A = B + C ; B = <var> end
=> begin A = B + C ; B = C end

This derivation, like all derivations, begins with the start symbol, in this case
<program>. The symbol => is read “derives.” Each successive string in the
sequence is derived from the previous string by replacing one of the
nonterminals with one of that nonterminal’s definitions. Each of the strings in
the derivation, including <program>, is called a sentential form. In this
derivation, the replaced nonterminal is always the leftmost nonterminal in
the previous sentential form. Derivations that use this order of replacement
are called leftmost derivations.
A Grammar for Simple Assignment Statements is given below

<assign> → <id> = <expr>


<id> → A | B | C
<expr> <expr> →
<id> + <expr> | <id> * <expr> | (<expr>)
| <id>

For example, the statement A = B * ( A + C )is generated by the leftmost


derivation:

<assign> => <id> = <expr> =>


A = <expr>
=> A = <id> * <expr>
=> A = B * <expr>
=> A = B * ( <expr> )
=> A = B *
( <id> + <expr> )
=> A = B * ( A
+ <expr> )
=> A = B * ( A
+ <id> )
=> A = B * ( A + C )

Parse Trees:

29
Hierarchical structures of the language are called parse trees. One of the
most attractive features of grammars is that they naturally describe the
hierarchical syntactic structure of the sentences of the languages they
define. These hierarchical structures are called parse trees.
For example –
Parse tree for a = b + c Parse tree for A = B * ( A + C )

Ambiguity:
A grammar that generates a sentential form for which there are two or more
distinct parse trees is said to be ambiguous. An Ambiguous Grammar for
Simple Assignment Statements.
<assign> → <id> = <expr>
<id> → A | B | C
<expr> → <expr> + <expr> | <expr> * <expr> |
( <expr> ) | <id>
Ex: A = B + C * A. Two distinct parse trees for the same sentence.

30
There are several other characteristics of a grammar that are sometimes
useful in determining whether a grammar is ambiguous. They include the
following:
(1) if the grammar generates a sentence with more than one leftmost
derivation and
(2) if the grammar generates a sentence with more than one rightmost
derivation.
Operator Precedence:
When an expression includes two different operators, for example,
x + y * z, one obvious semantic issue is the order of evaluation of the two
operators. This semantic question can be answered by assigning different
precedence levels to operators.
if * has been assigned higher precedence than + multiplication will be done
first, regardless of the order of appearance of the two operators in the
expression.
An Unambiguous Grammar for Expressions
<assign> → <id> = <expr>
<id> → A | B | C
<expr> → <expr> + <term> |
<term>
<term> → <term> * <factor> |
<factor>
<factor> → ( <expr> ) | <id>
Every derivation with an unambiguous grammar has a unique parse tree,
although that tree can be represented by different derivations. For example,
the following derivation of the sentence A = B + C * A is different from the
derivation of the same sentence given previously.

31
The unique parse tree for A = B + C * A using an unambiguous grammar

Associativity of Operators:
When an expression includes two operators that have the same precedence
(as * and / usually have)—for example, A/B*C a semantic rule is required to
specify which should have precedence. This rule is named associativity. As
was the case with precedence, a grammar for expressions may correctly

32
imply operator associativity. Consider the following example of an
assignment statement:
A=B+C+A
The parse tree for this sentence, as defined with the grammar of Example
shown in below. The parse tree the left addition operator lower than the right
addition operator.

EXTENDED BACKUS-NAUR FORM (EBNF):


Because of a few minor inconveniences in BNF, it has been extended in
several ways. Most extended versions are called Extended BNF, or simply
EBNF, even though they are not all exactly the same. The extensions do not
enhance the descriptive power of BNF; they only increase its readability and
writability. Three extensions are commonly included in the various versions
of EBNF.
1. first of these denotes an optional part of an RHS, which is delimited by
brackets. For example, a C if-else statement can be described as
<if_stmt> → if (<expression>) <statement> [else <statement>]
Without the use of the brackets, the syntactic description of this
statement would require the following two rules:
<if_stmt> → if (<expression>) <statement>
| if (<expression>) <statement> else <statement>
2. The second extension is the use of braces in an RHS to indicate that
the enclosed part can be repeated indefinitely or left out altogether.
<ident_list> → <identifier> {, <identifier>}
In BNF we write this statement like this
<ident_list> → <identifier | identifie, > <ident_list>

33
3. The third common extension deals with multiple-choice options. When
a single element must be chosen from a group, the options are placed
in parentheses and separated by the OR operator, |.
Ex: <term> → <term> (* | / | %) <factor>
In BNF, a description of this <term> would require the following three
rules:
<term> →
<term> * <factor> | <term> / <factor> | <term> % <factor>
BNF and EBNF Versions of an Expression Grammar
BNF: <expr> → <expr> + <term> | <expr> - <term> | <term>
<term> → <term> * <factor> | <term> / <factor> | <factor>
<factor> → <exp> ** <factor> <exp>
<exp> → (<expr>) | id
EBNF:<expr> → <term> {(+ | -) <term>}
<term> → <factor> {(* | /) <factor>}
<factor> → <exp> { * * <exp>}
<exp> → (<expr>) | id

ATTRIBUTE GRAMMARS
An attribute grammar is a device used to describe more of the structure of
a programming language than can be described with a context-free
grammar. An attribute grammar is an extension to a context-free grammar.
The extension allows certain language rules to be conveniently described,
such as type compatibility.
Static Semantics:
There are some characteristics of the structure of programming languages
that are difficult to describe with BNF, and some that are impossible. As an
example of a syntax rule that is difficult to specify with BNF, consider type
compatibility rules. In Java, for example, a floating-point value cannot be
assigned to an integer type variable, although the opposite is legal.
If all of the typing rules of Java were specified in BNF, the grammar would
become too large to be useful, because the size of the grammar determines
the size of the syntax analyzer. The static semantics of a language is only

34
indirectly related to the meaning of programs during execution. Static
semantics is so named because the analysis required to check these
specifications can be done at compile time.
Basic Concepts:
Attribute grammars are context-free grammars to which have been added
attributes, attribute computation functions, and predicate
functions. Attributes, which are associated with grammar symbols (the
terminal and nonterminal symbols), are similar to variables in the sense that
they can have values assigned to them. Attribute computation functions,
sometimes called semantic functions, are associated with grammar rules.
They are used to specify how attribute values are computed. Predicate
functions, which state the static semantic rules of the language, are
associated with grammar rules.
Attribute Grammars Defined:
An attribute grammar is a grammar with the following additional features:
 Associated with each grammar symbol X is a set of attributes A(X). The
set A(X) consists of two disjoint sets S(X) called synthesized
attributes and I(X) called inherited attributes.
A(X)=>S(X) & I(X)
Synthesized attributes are used to pass semantic information up a
parse tree, while inherited attributes pass semantic information down
and across a tree.
 Associated with each grammar rule is a set of semantic functions and a
possibly empty set of predicate functions over the attributes of the
symbols in the grammar rule. For a rule X0  X1, X2,,,,Xn, the
synthesized attributes of X0 are computed with semantic functions of
the form S(X0) = f(A(X1), c , A(Xn)).
 A predicate function has the form of a Boolean expression on the union
of the attribute set {A(X0), c , A(Xn)} and a set of literal attribute
values.
As a very simple example of how attribute grammars can be used to
describe static semantics, consider the following fragment of an attribute
grammar that describes the rule that the name on the end of an Ada
procedure must match the procedure’s name.
The syntax portion of our example attribute grammar is
<assign> → <var> = <expr> <expr> → <var> + <var>
| <var> <var> → A | B | C
The attributes for the nonterminals in the example attribute grammar are
described in the following paragraphs:

35
 actual_type—A synthesized attribute associated with the
nonterminals <var> and <expr>. It is used to store the actual type, int
or real, of a variable or expression.
 expected_type—An inherited attribute associated with the
nonterminal <expr>. It is used to store the type, either int or real, that
is expected for the expression, as determined by the type of the
variable on the left side of the assignment statement.
The complete attribute grammar follows in Example
1.Syntax rule: <assign> → <var> = <expr>
Semantic rule: <expr>.expected_type ← <var>.actual_type
2.Syntax rule: <expr> → <var>[2] + <var>[3]
Semantic rule: <expr>.actual_type ←
if (<var>[2].actual_type = int) and (<var>[3].actual_type
= int)
then int
else real
end if
Predica <expr>.actual_type == <expr>.
te: expected_type
3.Syntax rule: <expr> → <var>
Semantic rule: <expr>.actual_type ← <var>.actual_type
Predica <expr>.actual_type == <expr>.
te: expected_type
4.Syntax rule: <var> → A | B | C
Semantic rule: <var>.actual_type ← look-up(<var>.string)

36

You might also like