Distributed
Computing
Principles and Applications
M. L. LiuDistributed Computing
PRINCIPLES and APPLICATIONS
MLL. Liu
California Polytechnic State University,
San Luis Obispo
SAMPIT. COPY
NUI Fun SALE
PEARSON
—$<<—
Boa Wer Late tyContents
Preface
Acknowledgments 9
CHAPTE! 1
———DIstributed Computing, An introduction 27:
1.1 Definitions 21
1.2. The History of Distributed Computing 22
1.3 Different Forms of Computing 25
Monolithic Computing 25
Distributed Computing 25
Parallel Computing 6
‘Cooperative Computing 28
1.4. The Strengths and Weaknesses of Distributed Computing 28
1.5. Basics of Operating Systems 32
Computer Programs and Processes 32
Concurrent Programming 36
1.6 Network Basics “
Protocols 40
Network Architecture 41
Network Architecture Protocols 4g
Connection-Oriented versus Connectionless Communication a
Network Resources 4612
Contents
WwW
Host Identification and Internet Protocol Addresses
Identifying Processes with Protocol Ports
Email Addresses
URLs
Software Engineering Basics
Procedural versus Object-Oriented Programming
The Unified Modeling Language
The Architecture of Distributed Applications
Toolkits, Frameworks, and Components
Summary
Exercises 59 @ References 65
CHAPTER 2
_____Interprocess Communications === 67
21
22
23
24
25
26
27
28
29
2.10
211
‘An Archetypal IPC Program Interface
Event Synchronization
‘Synchronous Send and Synchronous Receive
Asynchronous Send and Synchronous Receive
Synchronous Send and Asynchronous Receive
Asynchronous Send and Asynchronous Receive
Timeouts and Threading
Deadlocks and Timeouts
Data Representation
Data Encoding
Text-Based Protocols
Request-Response Protocols
Event Diagram and Sequence Diagram
Connection-Oriented versus Connectionless IPC
The Evolution of Paradigms for Interprocess Communications
Summary
Exercises 89 @ References 93
51
52
52
55
55
55
37
70
2
73
74
74
76
”
7
79
seesesCHAPTER 3
Distributed Computing Paradigms = === 95
3.1 Paradigms and Abstraction s
Abstraction 95
Paradigms 96
3.2. An Example Application 7
3.3 Paradigms for Distributed Applications 7
Message Passing 9
The Client-Server Paradigm 98
The Peer-to-Peer Paradigm 99
The Message System Paradigm 102
Remote Procedure Call Model 103
The Distributed Objects Paradigms 105
The Object Space 107
The Mobile Agent Paradigm 108
The Network Services Paradigm 109
The Collaborative Application (Groupware) Paradigm 110
3.4 Trade-offs mw
Level of Abstraction versus Overhead 12
Scalability 113
Cross-Platform Support 113
Summary 114
Exercises115 ™ References 115
CHAPTER 4
———Thhe Socket APL CT
4.1 Background "7
4.2 The Socket Metaphor in IPC 18
4.3 The Datagram Socket API ne
The Connectionless Datagram Socket 19
Connection-Oriented Datagram Socket API 130
44 The Stream-Mode Socket API 133
Operations and Event Synchronization
135
Contents“4
Contents
45
Sockets with Nonblocking I/O Operations
The Java™ Secure Socket Extension
Summary
Exercises 148 ™ References 152
cuarrer 5
‘The Client-Server Paradigm 153
5.1
5.2
53
54
5.5
5.6
Background
Client-Server Paradigm Issues
A Service Session
‘The Protocol for a Service
Interprocess Communications and Event Synchronization
Data Representation
Software Engineering for a Network Service
Software Architecture
IPC Mechanism
Daytime Client-Server Using Connectionless Datagram Socket
Daytime Client-Server Using Stream-Mode Socket
Testing a Network Service
Connection-Oriented and Connectionless Servers
Connectionless Echo Client-Server
‘The Echo Server
Connection-Oriented Echo Client-Server
Iterative Server and Concurrent Server
Stateful Servers
Global State Information
Session State Information
Summary
Exercises 195 @ References 199
4
144
45
146
146
154
154
155
156
158
158
159
160
160
167
174
175
175
176
178
183
187
187
190
19416
7.8
79
7.10
mM
7.12
A Sample RMI Application
Steps for Building an RMI Application
Algorithm for Developing Server-Side Software
Algorithm for Developing Client-Side Software
Testing and Debugging
Comparison of RMI and Socket APIs
Food for Thought
Summary
Exercises 249 References 251
CHAPTER 8
——Advanced RMI
8.1
82
83
Client Callback
Client-Side Augmentation for Client Callback
Server-Side Augmentations for Client Callback
Steps for Building an RMI Application with Client Callback
Stub Downloading
RMI Security Manager
Instantiation of a Security Manager in an RMI Program
The Syntax of a Java Security Policy File
Specifying Stub Downloading and a Security Policy File
Algorithms for Building an RMI Application,
Allowing for Stub Downloading
Summary
Exercises 277 ™ References 278
CHAPTER 9
_____Internet Applications 9.0.9.9... S279
9.1
9.2
HTML
XML—The Extensible Markup Language
253
255
259
263
267
268
27
272
274
27593
94
95
9.6
HTTP
The Client Request
‘The Server Response
Content Type and MIME
A Basic HTTP Client
HTTP, a Connection-Oriented, Stateless Protocol
Dynamically Generated Web Contents
‘Common Gateway Interface (CG!)
A Web Form
Query String Processing
Encoding and Decoding Query Strings
Environment Variables Used with CGI
Web Session and Session State Data
Using Hidden Form Fields for Transferring Session State Data
Using Cookies for Transferring Session State Data
‘Syntax of the Set-Cookie HTTP Response Header Line
Syntax of the Cookie HTTP Request Header Line
Sample Code for Using Cookies to Pass State Data
Data Privacy and Security Concerns
Summary
Exercises 323 @ References 328
curren 10
——_The Common Object Request Broker Architecture 329
10.1
10.2
10.3
10.4
10.5
10.6
The Basic Architecture
The CORBA Object Interface
Inter-ORB Protocols
Object Servers and Object Clients
CORBA Object References
‘CORBA Naming Service and the Interoperable
Naming Service
The CORBA Naming Service
The Interoperable Naming Service
282
283
286
288
289
291
292
293
296
299
303
304
3
313
315
316
320
320
330
331
331
332
332
333
333
335
Contents
718
Contents
10.7 CORBA Object Services
10.8 Object Adapters
10.9 Java IDL
Key Java IDL Packages
Java IDL Tools
An Example CORBA Application
Compiling and Running a Java IDL Application
Client Callback
10.10 Trade-offs
Summary
Exercises 354 ™ References 355
CHAPTER 11
335
335
337
337
338
338
351
352
353
——_—Internet Applications—Part 2. ______357.
11.1 Applets
11.2 Serviets
Architectural Support
Serviet Programming
State Information Maintenance in Servlet Programming
11.3 Web Services
11.4 The Simple Object Access Protocol (SOAP)
A SOAP Request
A SOAP Response
‘Apache SOAP
Ready-Made Web Services
Invoking a Web Service Using Apache SOAP
Implementing a Web Service Using Apache SOAP
Summary
Exercises 397 ™ References 405
357
360
360
363
369
381
383
385
388
390
391
392
395
396cuarter 12
12.1 Message Queue System Paradigm 408
The Point-to-Point Message Model 408
The Publish/Subscribe Message Model 409
122 Mobile Agents a3
Basic Architecture 414
Advantages of Mobile Agents 420
Mobile-Agent Framework Systems 421
12.3 Network Services 422
12.4 Object Spaces 42s
Summary 429
Exercises 430 ™ References 431
Epilogue 433
Index 435
Contents
19CHAPTER
Distributed Computing,
An Introduction
This book addresses distributed computing. In this chapter, we will begin by clar-
ifying what is meant by distributed computing in the context of this book. We will
do so by looking at the history of distributed computing and by comparing this
type of computing with other forms of computing. We will then present some
basic concepts in the disciplines of operating systems, networks, and software
engineering, concepts that you will need to be familiar with in order to under-
stand the material presented in later chapters.
1.1 Definitions
A source of confusion in the field of distributed computing is the lack of a uni-
versal vocabulary, perhaps because of the breathtaking pace with which new
ideas evolve in the field. Following are the definitions of some of the key terms
used in the context of this book. When you read the book, please keep these def-
initions in mind, and be aware that some of these terms may not have the same
definition in other contexts.
Early computing was performed on a single processor. A uni-processor, or
monolithic computing, makes use of a single central processing unit (CPU) to
execute one or more programs for each application.
A distributed system is a collection of independent computers, interconnected
via a network, that are capable of collaborating on a task. Computers are con-
sidered independent if they do not share memory or program execution space.
2122. CHAPTER 1 Distributed Computing, An Introduction
Request for Comments
are specifications pro-
posed by Internet engi-
‘neers to invite public
comments. Over the
‘years, thousands of such
specifications have
arisen, and they are
archived and accessible
in a number of Web
sites, including The
Internet RFC/STD/FYV/BCP
Archives (faqs.org, 5).
‘The ARPANET, initiated
in 1970, was the prede-
cessor of the internet.
Such computers are called loosely coupled computers, as opposed to tightly
coupled computers; the latter can share data using common memory space.
Distributed computing is computing performed in a distributed system. In this
book, we explore the ways that computer programs, running on independent
computers, collaborate with each other to perform computing such as network
services and Web-based applications.
= A network service is a service provided by a special kind of program known
as a server on a network. The World Wide Web is such a service, as is elec-
tronic mail (email) and file transfer (FTP). A server program is just half of the
story in the so-called client-server model of distributed computing. Client-
server will be studied extensively in later chapters of this book.
™ A network application is an application that runs on networked computers
for end users. Network applications range from enterprise applications such
as online shopping carts and electronic auction sites to noncommercial
applications such as chatrooms and network games.
The distinction between network services and network applications is not
always clear-cut, and the terms are often used interchangeably.
1.2 The History of Distributed Computing
In the beginning there were stand-alone computers, each of which was capable
of executing stored programs. Connecting stand-alone computers so that data
could be exchanged among them was a natural progression. Rudimentary con-
nection of computers using cables for file sharing was practiced as early as the
1960s. However, such an undertaking requires manual intervention and cannot
be called a computer application when one or more computer programs execute
autonomously to carry out a task. Such a computer application requires data
communication, whereby two computers spontaneously exchange data using
software and hardware in order to carry out the tasks inherent in the application.
The first Internet Request for Comments (RFC), RFC 1, is a proposal that speci-
fies how participating hosts can exchange information with each other through
messages. Whereas there may have been individual attempts to create network
applications on a small scale (perhaps involving two or more computers con-
nected via cables), the earliest network application was electronic mail, or
email, by which the first message was reportedly sent in 1972 on a four-node
ARPANET. (A node on a network is a computer, or host, that participates in the
network.) Automated file transfer mechanisms, which allowed data files to be
exchanged between hosts, were another natural progression, and as early as
1971 there was a proposal for such a mechanism (see RFC 114 and RFC 141). To
this day, email and file transfer remain two of the most popular network serv-
ices. The best-known network service, however, is undoubtedly the World Wide
‘Web (WWW). The Web was originally conceived in the late 1980s by scientists
at the Swiss research institute CERN in Geneva as an application that could sup-
port the access of hypertext over a network. The WWW has since become a plat-
form for network applications and services, including email, search engines,
and electronic commerce (e-commerce).1.2 The History of Distributed Computing 23
‘The WWW was responsible for an explosion in the scale of the Internet. Until
1990, ARPANET, the predecessor of the Internet as we know it, was primarily a
data network used by scientists, researchers, and academicians. Spurred by the
popularity of the WWW, the network grew spectacularly in the 1990s, as illus-
trated in Figures 1.1 and 1.2.
If you are interested in the history of network computing, some Web sites that
are well worth visiting are [vimp.museophile.com, 1], [zakon.org, 2], and
[isoc.org, 38]. In addition, [Hafner and Lyon, 4] is a fascinating account of the
ly development of the Internet, including the people and the organizations
it.
Hobbes'Intemet Tinaline Copyright C2002 Robert H Zakon
YZakon orgfobertintemettimeine/
Figure 1.1 The growth of Intemet hosts [zakon.org, 2] (reprinted by permission).
Hobbes iret Timeline Copyright G2O12 Rober Zakon
hip mew zakonorghoben/eteretheeine!
PEPSEPGEPERTLD ESTE
Figure 1.2 Internet domains [zakon.org, 2] (reprinted by permission).24 © — CHAPTER 1 Distributed Computing, An Introduction
Copyrighted material1.3 Different Forms of Computing 25
‘Once there is sufficient computational | and mainly facilitates communication:
power, the ability to connect and com- | email, the Web. Recall how fast Internet
municate is the dominant factor deter- | popularity soared first with email and,
mining value. Today for most people, a | more recently, once the Web and
computer runs only a few browsers became prevalent.
1.3 Different Forms of Computing
To understand what is meant by distributed computing in the context of this
book, it is instructive to look at various forms of computing using computers.
Monolithic Computing
In the simplest form of computing, a single computer, such as a personal com-
puter (PC), is used for computing. The computer is not connected to any net-
work, and thus it may use only those resources within its immediate access. This
form of computing may be called monolithic computing. In the most basic
monolithic computing, the computer is used by a single user at a time. The user
runs applications on the system with no access to resources beyond those avail-
able with the system. When you use applications such as a word processing pro-
gram or a spreadsheet on a PC, you are practicing this form of computing,
which may be called single-user monolithic computing.
Multiple users can engage in monolithic computing. This form of computing
(ee Figure 1.3a), where the resources of a single computer can be shared by con-
current users using a technique known as timesharing, was popular in the
1970s and 1980s. The computer that provides the centralized resource is usually
called a mainframe to differentiate it from smaller computers such as minicom-
puters and microcomputers. Through devices known as terminals, users (who
may be geographically dispersed) can be connected to the mainframe computer
and interact with it during a terminal session. Some widely used mainframe
computers include the IBM 360 series and the Univac 1100 series. Applications
using this form of computing are typically separate programs designed to per-
form a single function, such as payroll or billing for a firm or a university.
Distributed Computing
In contrast, distributed computing involves computing performed among
multiple network-connected computers, each of which has its own processor(s)
and other resources (see Figure 1.3b). A user, using a workstation, has full use of
the resources on the local computer to which its workstation is connected. In26
CHAPTER 1 Distributed Computing, An Introduction
addition, through the interaction of the local computer and the remote com-
puters, the user may access resources on the remote computers. The World Wide
Web is an excellent example of this type of computing. When you use a browser
to visit a Web site, a program such as Netscape or Internet Explorer runs on your
local system and interacts with a program (known as a Web server) running on
a remote system to fetch a file that may reside on yet another remote system.
Terminal Mainframe computes Workstation
=" =
oer /
ng =
—e
Figure 1.3 Centralized computing (a) versus distributed computing (b).
Parallel Computing
Similar to but distinct from distributed computing is a form of
known as parallel computing or parallel processing, which uses more than
one processor simultaneously to execute a single program. “Ideally, parallel pto-
cessing makes a program run faster because there are more engines (CPUs) run-
ning it. In practice, it is often difficult to divide a program in such a way that
separate CPUs can execute different portions without interfering with each
other” (Koniges, 9]. Parallel computing is typically performed on a single com-
puter that has multiple CPUs, but, according to Koniges, it is also possible to
“perform parallel processing by connecting the computers in a network.
However, this type of parallel processing requires very sophisticated software
called distributed processing software” [Koniges, 9].
Using parallel computing, one can solve problems that are otherwise impossible
to solve on one computer or solve computing-intensive problems that are oth-
erwise economically untenable. Today, parallel computing is primarily used in
large-scale scientific computing in areas such as biology, aerospace, weather
forecasting, and semiconductor design. Although a fascinating subject, parallel
computing is not within the scope of this book.Where It Goes
by Joseph Menn
1.3 Different Forms of Computing 27
(From Los Angeles Times, Los Angeles, Calif., Dec. 2, 1999, Joseph Menn.
Copyright © 1995, Los Angeles Times.)
Reprinted with permission,
EBay users rarely think about the bidding process—
DD Bidder at home registers and submits an elec-
tronic bid from a personal computer.
1 The bid travels from the consumer's internet
service provider, through switches and routers, to
the ISP company’s servers.
EJ The bid is sent through the internet backbone.
Gl The bid travels to one of EBay’s ISPs, most likely
Sprint or UUNet, and through pipes to EBay.
Gi The bid passes through EBay’s Cisco switches
and routers.
TG the Information reaches one of about 200
front-line Compaq servers running on Windows NT.
‘The servers are mirrored, so that if any one fails, the
‘others pick up the slack.
EA The bid is passed along to one of Sun
Starfire servers, named Bull and Bear,
‘that mirror each other.
FD the bid is added to two information-storage
funning Oracle software, where it is
matched with the seller's 4
ED The information flow is reversed back out of
EBay, into e-mails sent to both the seller and poten-
‘tial buyers who are outbid. Confirmation is also sent
to the bidder.
TD) From Bull, the bid amount and other detais
are sent to another Starfire server, ‘Anaconda,
‘and recorded on mirrored storage disks.
Sources: Times staff, EBay
EBay is planning to add another Starfire attached to
the final data disks, mirroring Anaconda.28 CHAPTER 1 Distributed Computing, An Introduction
An interested computer
‘owner will download a
free piece of software
(for example, a screen
saver) from SETI@home.
Then, when his or her
computer is idle while
online, the software
downloads a data file
from an Internet site for
analysis on his or her
computer. The results of
the analysis are sent
back to the Internet site
where they are com-
bined with those con-
tributed by other
SETI@home participants
and used to help in the
search for extraterrestrial
signals
Cooperative Computing
Recently, the term distributed computing has also been applied to cooperative com-
puting projects such as the Search for Extraterrestrial Intelligence (SETI) {setiath-
ome.ssl.berkeley.edu, 10] and distributed.net [distributed.net, 33]. These are
projects that parcel out large-scale computing to workstations on Internet hosts,
making use of surplus CPU cycles, as described in the sidebar. (Note: Further dis-
cussion of this type of computing is not within the scope of this book.)
1.4 The Strengths and Weaknesses of -
Distributed Computing
Prior to the appearance of the World Wide Web, monolithic computing, such as
business applications running on a mainframe computer, or a single user using
a personal computer to perform word processing or spreadsheet functions, was
the dominant form of computing. Thomas Watson, the founder of IBM, was
said to have made the following statement in 1943: “I think there is a world
market for maybe five computers.” Since the 1980s, however, distributed com-
puting has become as important as—if not more important than—monolithic
computing.
‘There are a number of reasons for the popularity of distributed computing:
® The affordability of computers and availability of network access. Today's
personal computer has computing power superior to that of the mainframe
computers of the early days, at a fraction of the size and the cost. Coupled
with the fact that connectivity to the Internet has become universally avail-
able and generally affordable, the large number of interconnected computers
makes for an ideal community for distributed computing.
® Resource sharing. The architecture of distributed computing mirrors the
computing architecture of modern organizations. Each organization inde-
pendently maintains computers and resources that are local to the organiza-
tion while sharing resources over the network. Using distributed computing,
organizations can pool their resources very effectively. The Web, for example,
is a powerful platform for sharing documents and other resources within and
among organizations.
® Scalability. With monolithic computing, the available resources are limited
to the capacity of one computer. By contrast, distributed computing provides
scalability in that increasing demand for resources can be addressed effec-
tively with additional resources. For example, more computers providing a
service such as email can be added to the network to satisfy an increase in the
demand for that service.
® Fault tolerance. Compared to monolithic computing, distributed computing
provides the opportunity for fault tolerance in that a resource can be repli-1.4 The Strengths and Weaknesses of Distributed Computing 29
cated (or mirrored) to sustain its availability in the presence of failures. For
example, backup copies of a database can be maintained on different systems
on the network, so that when one system fails, other copies can be accessed
without disrupting the service. Although it is not possible to build a distrib-
uted system that is completely reliable in the presence of failures (Fischer,
Lynch, and Paterson, 30], it is the responsibility of a developer, when design-
ing and implementing such a system, to maximize its fault tolerance. Fault
tolerance in distributed computing is a complex topic that has received
extensive attention in the research community. Interested readers may want
to refer to sources such as the work of Pankaj Jalote Jalote, 31}.
In any form of computing, there is always a trade-off between advantages and
disadvantages. The advantages already mentioned are offset by disadvantages.
Some of the most significant ones are:
® Multiple points of failure. There are more points of failure in distributed
computing. Since multiple computers are involved, all of which depend on
the network for communication, the failure of one or more computers, or
one or more network links, can spell trouble for a distributed computing sys-
tem. There is a popular quote, attributed to noted computer scientist Leslie
‘Lamport, which says that “a distributed system is one in which the failure of
a computer you didn’t even know existed can render your own computer
unusable.”
& Security concerns. In a distributed system, there are more opportunities for
unauthorized attack. Whereas in a centralized system all the computers and
resources are typically under the control of a single administration, in a dis-
tributed system management is decentralized, often involving a large num-
ber of independent organizations. The decentralization makes it difficult to
implement and enforce security policies; hence distributed computing is vul-
nerable to security breaches and unauthorized access, which unfortunately
can affect all participants on the system. This problem is clearly illustrated by
well-known attacks on the Internet, such as worms and viruses [Eichen and
Rochlis, 21; Zetter, 22}.
Because of its importance, computer security is a widely researched and
studied topic, and successful techniques have been developed for writing and
deploying secure applications. Such techniques include encryption, keys,
certificates, digital signatures, sandboxes, authentication, and authorization.
Security is a broad topic that is beyond the scope of this book. Readers are
encouraged to pursue the topic in references such as (Oaks, 32].
Now that we have clarified the objective of this book, let’s next look at some of
the basic concepts in three related disciplines in computer science: operating
systems, networks, and software engineering. Although no in-depth knowledge
of these disciplines is required as a prerequisite for this course, this book does
refer to some concepts and terminologies associated with these disciplines. In
the rest of this chapter we will introduce these concepts and terminologies:CHAPTER 1 Distributed Computing, An Introduction
‘by Matt Richtel and Sara Robinson (NYT), Feb. 11, 2000
(reprinted with permission of the New York Times)
SAN FRANCISCO, Feb. 10—Computer
said
attack on multiple would
have been able to muster large “copy cat”
assaults on other sites.
‘And while the Inte
for leads,
experts said that it would be difficult even
to
the attacks, let alone find the responsible
CERT, a federally computer
iH
Hi
i
z
i
i
g
} |
3
z
5
;
!
Also
major Web sites than had been previously
i
zité
il
i
day of assaults. Those included
See tank eoclent Gen ae
attacked early Wednesday evening
|
At least two other major ct
companies were hit with attacks on
Wednesday, according to IFsec, a com-
‘puter security firm in New York, though it
the companies,
security officer for Stanford, sai
type of data included in the packets1.4 The Strengths and Weaknesses of Distributed Computing
i
i
?
in
HI
fea
i!
i
Ess
ig :
gaz &
i
i
the problem, even as the assault is goingCHAPTER 1 Distributed Computing, An introduction
1.5 Basics of Operating Systems
Distributed computing involves programs running on multiple computers. Let’s
look at some of the concepts involved with the execution of programs in mod-
ern-day computers.
Computer Programs and Processes
A software program is an artifact constructed by a software developer using
some form of programming language. Typically, the language is a high-level one
that requires a compiler or an interpreter to translate it into machine language.
When a program is “run,” or executed, on a computer, it is represented as a
process. On modern computers, a process consists of an executing program, its
current values, state information, and the resources used by the operating sys-
tem to manage the execution of the program. In other words, a process is a
dynamic entity that exists only when a program is run.
Figure 1.4 illustrates the state transitions during the lifetime of a process. A
process enters a ready state when a program is at the start of its execution, when
it is placed in a queue by the operating system, along with other programs that
are to be executed. When system resources (such as the CPU) are available for
its actual execution, thie process is dispatched, at which point it enters the run-
ning state. It continues to execute until the process must wait for the occurrence
of an event (such as the completion of some input/output operation), at which
time it enters a blocked state. Once the anticipated event occurs, the process will
be placed on the execution queue and await its turn to execute once again. The
process repeats the ready-running-blocked cycle for as many times as necessary
until the execution of the process is completed, at which time the process is said
to be terminated.
In this book, we will use Java programs, or fragments of them, as code examples.
There are three types of Java programs: applications (Figure 1.5), applets (Figure
1.6), and serviets (Figure 1.7). Regardless of which type of program you are writ-
ing, each one is written as a Java class. A Java application program has a main
Figure 1.4 A simplified state transition diagram of a process.1.5 Basics of Operating Systems 33
method, and it is run as an independent (stand-alone) process. On the other
hand, an applet does not have a main method, and it is run using a browser or
the appletviewer. A servlet is similar to an applet in that it does not have a main
method, and it is run in the context of a Web server. We will have occasion to
see examples of all three types of programs and program fragments in this book,
with applications being the form of programs most frequently employed.
A Java program is compiled into bytecode, a universal object code. When run,
bytecode is translated by the Java Virtual Machine (JVM) to the machine code
native to the computer, following the state transitions that we have studied ear-
‘Astand-alone Java application is run on a local machine.
Computer
Java object
Java Virtual Machine
[eensentaneanennsenenennessnesansnnenensnsanensnnnennney
+ A sample of a simple Java application.
+ M. Lin 1/8/02 ss
atetenersenasestaennecnesnassacensestntenenanseneeeas/
import java.io.*;
class MyProgram(
public static void main(string{ } args)
‘throws TOException{
BufferedReader keyboard = new
BufferedReader (new InputStreamReader(System.in));
String theName;
Byetem.out.printin(*what ie your nane?*);
theNiame = keyboard.readLine( );
System.out.print("Hello “ + theName);
System.out.printin(* - welcome to CSC369.\n");
> // end main
> /fend class
Figure 1.5 A stand-alone Java application (top) and the code that activates it (bottom).34 — CHAPTER 1 Distributed Computing, An Introduction
‘An applet is an object downloaded (transferred) from a remote machine
{and then run on a local machine.
Figure 1.6 An applet (top) and the Web page (bottom) that activates it.
Copyrighted material1.5 Basics of Operating Systems 35
A servlet is an object that runs on a remote machine and interacts -
‘with a local process using @ protocol.
Figure 1.7 A servlet (top) and the code that activates it (bottom).
Copyrighted material36 CHAPTER 1 Distributed Computing, An Introduction
lier. Because the bytecode is an intermediate code that is the same regardless of
machine types and is translated to the specific machine code at run time, Java
programs are therefore said to be platform-independent, meaning that the
same program can be run on any machine type that supports the JVM.
In this book, it is assumed that you have knowledge of basic Java programming,
to the extent that you can compile and execute a stand-alone application or
applet. A stand-alone program is one that executes on its own, without
exchanging messages with another program. .
Concurrent Programming
Distributed computing involves concurrent programming, which is
ming that involves the simultaneous execution of processes. In the
paragraphs we look at three kinds of concurrent programming.
© Concurrent processes executed on multiple computers. Much of the mate-
rial in this book deals with separate processes running concurrently on sepa-
rate, independent computers interconnected via a network. The processes
interact with each other by exchanging data over the network, but their exe-
cution is otherwise completely independent. When you access a Web page
using a browser, a process of the browser program, running on your machine,
interacts with a process running on the Web server machine.
Concurrent programming involving multiple machines requires program-
ming support; that is, the software for the participating program must be
written to contain logic to support the interaction between processes. How
this logic can be expressed in the programs is a main theme of this book.
= Concurrent processes executed on a single computer. Modern computers
are supported by multitasking operating systems, which allow multiple tasks,
or processes, to be executed concurrently. The concurrency may be real or vir-
tual. True concurrent multitasking on a single computer is feasible only if the
computer has multiple CPUs, so that each CPU can execute a separate
process. On a computer that has only one CPU, timesharing (see Figure 1.8),
or time-slicing, is used to allow processes to take turns being executed, creat-
ing the illusion that they are being executed in parallel.
Processes
Timesharing of a resource
Figure 1.8 Timesharing on a computer.1.5 Basics of Operating Systems
Since multitasking is a functionality of the operating system, no pro-
gramming is needed for this type of concurrent programming. No special
software logic needs to be contained in a program to initiate multitasking.
® Concurrent programming in a process. In addition to concurrent pro-
gramming in separate processes, it is often necessary for a single program to
initiate tasks that are to be executed concurrently. For example, it may be
necessary for a program to perform other tasks while waiting indefinitely for
user input in one user interface window. It may also be desirable for a pro-
gram to execute tasks in parallel, for performance reasons. Concurrent pro-
gramming within a process is performed using two types of facilities
provided by the operating system.
Parent and Child Processes
At run time, a process may spawn subordinate processes, or child processes.
Through real or virtual multitasking, the original process, called the parent
Process, continues to run simultaneously with the child processes (see Figure
1,9). A child process is a complete process, consisting of an executing program,
its own current values, and state information, some of which is inherited from
the parent process. A parent process can be notified when a child process has
terminated.
A parent process may spawn child processes. | | A process may spawn child threads.
Parent process Approcess
Main thread
Child thread 1
Child processes [Child thread 2
& ()
Figure 1.9 Concurrent processing within a process.
Threads
In lieu of child processes, a process may spawn threads, also known as light-
weight processes. Threads carry a minimum of state information, but other-
wise behave the same as processes. Since they incur less overhead, threads are
preferred over child processes.
3738
CHAPTER 1 Distributed Computing, An Introduction
The spawning and coordination of child threads requires programming support.
‘The software for the program must be written to-contain logic to support the
spawning of the threads and to coordinate, or synchronize, the execution of the
family of threads spawned by the parent thread.
‘The concurrent execution of threads may result in a race condition. A race con-
dition occurs when a series of commands in a program are executed in parallel,
in an arbitrarily interleaved fashion, yielding nondeterministic execution out-
come. Figure 1.10 illustrates such a situation. Suppose counter is a variable shared
among two concurrent threads. Execution sequence 1, in which the instructions
of the two processes are executed serially, will result in the counter value being
incremented to 2. On the other hand, in execution sequence 2, in which the two:
sets of instructions are interleaved, the counter will only be incremented. to 1.
Race conditions can be avoided if mutual exclusion is provided to a code seg-
ment to ensure that the commands in the segment can only be executed by one
thread at a time. Such a code segment is called a critical region. For our exam-
ple, the critical region comprises the code where the counter variable is accessed
and incremented.
Programming using threads is called multi-threaded programming, or
threaded programming for short. A multi-threaded program that is written to
guard against race conditions is said to be thread-safe. The development of a
complex. thread-safe program requires advanced programming skills.
fetch value in counter and load into a register | __ fetch value in counter and load into a register
increment value in register Teich value in counter and load into a register
store value in register to counter increment value in register
Tetch value in counter and load into a register] Encrement value in register,
store value in register to counter
Sorevawe nregerto counter]
‘This execution results in the ‘This execution results in the
value 2 in the counter, value 1 in the counter.
[7 instruction executed in concurrent process or thread 1
[Instruction executed in concurrent process or thread 2
Figure 1.10 A race condition resulting from unsynchronized concurrent processes or threads.1.5 Basics of Operating Systems 39
Fortunately, in this book we will seldom have to use threads explicitly, as
threaded programming is often provided behind the scenes by the toolkits that
Support network applications.
Java Threads
‘The Java Virtual Machine enables an application to have multiple threads of
execution running concurrently. When a Java Virtual Machine starts up, there
- is usually a single thread (although in some systems a program may start with
more than one thread) that calls the method named main of some des-
pendently and in parallel with other threads until it terminates.
To support threading in a program, Java provides a class named Thread as well
as an interface named Runnable interface.
From within a Java program, there are two ways to create a new thread of exe-
1. Declare a class to be a subclass of Thread. This subclass should override the
tun method of class Thread. When an instance of the subclass is allocated
a sate Pieced ae raat ssid Benet oe crenaty en
pen ee eee implements
the run method of the interface. When an instance of the class is allocated
and started, the code in the run method is executed concurrently with the
main thread.
Figure 1.11 illustrates the use of the first means of a new thread of exe-
cution, while Figure 1.12 illustrates the use of the second way.Figure 1.12 Sample application that spawns three threads using an implementation of the
Runnable interface.
static is
ally exclusive. For the example shown in Figure 1.10, the code for
the counter variable should be enclosed in a synchronized static method so that
the increments to the counter can only be made by one thread at a time. A Java
code sample illustrating the use of threads and a synchronized static method
can be found in Exercise 2(4.) at the end of this chapter.
In subsequent chapters, we will use the terms process and thread frequently. If
you are not familiar with threading, there are-some exercises at the end of this
chapter that allow you to practice threading using Java programming.
1.6 Network Basics
Having looked at some key concepts in operating systems that are relevant to
distributed computing, next we will do the same with network basics.
Protocols
In the context of communications, a protocol is a set of rules that must be
herve by mescanayi ro-ace ing, Ieee olay ieaectrly
follow an on eye contact, gestures.
This stipulates that only one person speaks at a time while the others
listen. In a phone conversation, one party initiates the call, and then, after the
call is answered, the parties at the two ends take turns speaking, using pauses or1.6 Network Basics 41
questions to signify when it is the other party's turn to talk.
In communications involving computers, protocols must be formally defined
and precisely implemented. For each protocol, there must be rules that specify
the following:
® How is the data exchange encoded?
® How are events (sending, receiving) synchronized (ordered) so that the par-
ticipants can send and receive in a coordinated manner?
‘The concept of protocols will become more concrete when we study a number
of protocols in the rest of this book.
It should be emphasized that a protocol is a set of rules. The specification of a
protocol does not dictate how the rules are to be implemented. For example,
Hypertext Transfer Protocol (HTTP) specifies the rules that must be observed
between a Web browser process and a Web server process. Any Web server pro-
gram written in conformance to these rules satisfies the protocol, regardless of
what programming language or syntax is employed. Therefore, you should
understand that a protocol (such as HTTP) is distinct from its implementations
(such as the varieties of Web browsers, including Netscape and Internet
Explorer).
Asan analogy, the rules for a sport, say basketball, are specified by some author-
ity, say the National Basketball Association (NBA), but it is up to each individ-
ual team and then each player to execute or implement the game while
observing those rules.
Network Architecture
In the textbooks for data networks, the functionalities of a network are fre-
quently presented using a network architecture (Figure 1.13). The classic net-
work architecture, called the Open System Interconnect (OSI) architecture,
divides the complex functionalities of a network into seven layers. All or part of
these functionalities must be present on a computer that participates in data
communication and hence also in distributed computing. If you are interested
in the specifics of the OSI model, you should be able to find them in textbooks
‘on networks such as (Tanenbaum, 35]. For the purposes of this book, a simpli-
fied architecture that is appropriate for the Internet will be presented.
The network architecture for the Internet is illustrated in Figure 1.14, where
there are four layers: physical, Internet, transport, and application. The physi-
cal layer provides the functionalities for the transmission of signals, represent-
ing a stream of data, from one computer to another. The Internet layer allows
a packet of data to be addressed to a remote computer and delivered to that
‘computer. The transport layer provides the functionalities for data packets to
bbe delivered to a specific process running on a remote computer. Finally, the
application layer allows messages to be exchanged between programs in sup-
port of an application such as the World Wide Web.
The syntax of a pro-
gramming language is
the set of language
rules, including spelling
and grammar, of the
language.
SI stands for Open
System Interconnect,
the name given to a
‘model of network archi-
tecture promoted by an
organization called the
International
Organization for
Standardization (!S0).42. CHAPTER 1 Distributed Computing, An Introduction
Figure 1.13 The OSI seven-layer network architecture.
Figure 1.14 The Intemet network architecture.
The division of the layers is conceptual: the implementation of these function-
Copyrighted material1.6 Network Basics 43
Secondly, the layered architecture allows the details of the network's function-
alities to be abstracted, or hidden. When writing an application, it is helpful
when one does not have to be concerned with the details of data communica-
tion but can instead concentrate on the application protocol at hand. A layered
architecture allows a program to be written as if data can be exchanged directly
(see the dashed lines in Figures 1.13 and 1.14). In actuality and behind the
scenes, a message sent from one application must be processed by all the func-
tionalities in the lower layers of the network architecture (see the dotted lines).
Eventually, the stream of data signals representing the message is transmitted
over the physical link interconnecting the computers (see the solid lines). Upon
arriving at the receiving computer, the data signals are then processed by the
functionalities of the network architecture in the reverse order, until eventually
the data is reassembled into a message and delivered to the appropriate process.
Network Architecture Protocols
Let’s now look at some of the specific protocols for the Internet architecture,
The protocol for the Internet layer is named, aptly enough, the Internet
Protocol. This protocol uses a particular naming scheme, which we will soon
study, for identifying computers on the network and for routing the data. At the
transport layer, there are two widely used protocols: The Transmission. Control
Protocol (TCP) provides connection-oriented communication, while the User
Datagram Protocol (UDP) supports connectionless communication. (A discus-
sion of connection-orierited versus connectionless communication will be
introduced in the next section and then discussed further in Chapter 2.) Finally,
at the application layer, protocols such as File Transfer Protocol (FTP), Simple
Network Mail Protocol (SNMP), and Hypertext Transmission Protocol (HTTP)
are specified for network applications. The well-known Transmission Control
Protocol/Internet Protocol (TCP/IP) is a set of protocols encompassing the
Internet and transport layers of this architecture; these protocols are universally
employed for data communication over the Internet. An Internet application
therefore must be run on a computer that implements this portion of the
Internet architecture, colloquially termed the TCP/IP stack.
Readers who are interested in the protocols at the lower layers may want to con-
sult textbooks such as (Stallings, 12; Tanenbaum, 13). This book is devoted to
the study of protocols at the application layer. We will start by looking at some
of the popular application protocols, such as those just mentioned in the pre-
vious paragraph. We will then go on to study how such applications are sup-
ported using distributed computing.
Connection-Oriented versus Connectionless Communication
Although connection-oriented and connectionless communication are more
properly a topic for data networks discussion, we will have occasion to make a
distinction between the two in our discussions.44 — CHAPTER 1 Distributed Computing, An introduction
A data network trans-
mits data; a voice net-
work transmits voice.
Modern networks trans-
mit both data and voice.
In connection-oriented communication, a connection—which may be physical
(Le,, tangible, provided using hardware such as cables, modems, and receivers)
or logical (ie., abstract or virtual, using software that emulates a connection) —
is established between two parties, the caller and the callee. Such is the case
when you (the caller) dial a number to make a phone call to a friend (the callee).
Once a connection is established, data (voice, in the case of a phone call) can be
sent repeatedly over the connection continuously until the session is over, such
as when you hang up the phone at the end of a conversation, at which point
the connection is severed. Note that in this mode of communication there is no
need to address an individual data packet explicitly while a connection is in use.
As the name implies, connectionless communication involves no connection.
Instead, data is sent a packet at a time. Each packet must be explicitly addressed
by the sender to the receiver. An example of connectionless communication is
when you correspond with a friend using rounds of email messages or letters.
Each email or letter you send, containing a message, must be addressed to your
friend. In reply, your friend sends an email or letter addressed to you. The
exchange continues until the correspondence, ot session, is over.
On a data network, connectionless communication is simpler to provide, since
there is no need to maintain separate connections. Yet the lack of a connection
can result in data packets being lost during delivery or being delivered out of
order. For example, if you send multiple emails or letters to your friend in suc-
cession, each one containing part of a message, it is entirely possible for your
friend to receive the emails or letters in a scrambled order, since each email/let-
ter is delivered independently.
On the other hand, connection-oriented communication can ensure that data
packets are delivered safely and in order along an established connection, at the
cost of additional processing overhead. This is another example of trade-offs,
Figure 1.15 graphically illustrates the difference between these two forms of
communication. In exercise 3 at the end of this chapter, you will be guided
through a simplified analysis of the trade-offs between the two forms.
At any layer of a network architecture, communication can be carried out using
a connection-oriented facility or a connectionless facility. At the transport layer
of the TCP/IP suite, the User Datagram Protocol (UDP) is a connectionless pro-
tocol, while the Transmission Control Protocol (TCP) is a connection-oriented
protocol. A facility or protocol that uses UDP to transmit data is said to be con-
nectionless at the transport layer, while one that uses TCP is said to be connec-
tion-oriented at the same layer, Note that it is possible for a communication
facility to be connection-oriented at one layer but connectionless at another.
For example, a Web application uses HTTP, a connection-oriented protocol, at
the application layer, but actual data transmitted to and from the application
may use UDP at the transport layer.
‘Table 1.1 compares the two modes of communication.1.6 Network Basics 45
Connection-oriented communication
Connectionless communicé
Bm A data packet
Figure 1.15 Connection-oriented versus connectionless communication.
Table 1.1 Comparisons of Connection-Oriented and Connectionless interprocess
‘Communication (IPC).
‘Connection-Oriented Connectionless
‘Addressing ‘Specified at connection ‘Addressing is specified with |
time; there is no need each operation.
to re-specify with each
subsequent operation
(end or receive).
Connection overhead | There is overhead Not applicable.
for establishing a
connection.
‘Addressing overhead | There is no addressing ‘Overhead is incurred with
overhead with each ‘each operation.
individual operation.
Data delivery order | The connection abstraction
allows the IPC mechanism
to maintain the order of IPC facility to maintain
delivering data packets. delivery order.
(continued on next page)46 CHAPTER 1 Distributed Computing, An Introduction
‘An Internet host is a
‘computer that imple-
ments the Internet pro-
tocol architecture and
hence is capable of par
ticipating in internet
‘communications.
Arouter is a computer
that specializes in for-
warding data between
networks. On the
Internet, @ router imple-
ments the functionalities
of the Internet layer.
Table 1.1 Comparisons of Connection-Oriented and Connectionless Interprocess
Communication (IPC). (continued)
‘Connection-Oriented Connectionless
Protocols This mode of communica- | This mode of communica-
tion is appropriate for tion is appropriate for
protocols that require protocols that exchange a
exchange of a large stream | small amount of data in a
of data and/or a large limited number of rounds of
number of rounds of exchange.
exchange.
Network Resources
Throughout this book you will often encounter the term network resources. By
network resources we are referring to resources that are available to the partici-
pants of a distributed computing community. For example, on the Internet the
network resources include hardware such as computers (including Internet
hosts and routers) and equipment (printers, facsimile machines, cameras, etc.),
and software such as processes, email mailboxes, files, and Web documents. An
important class of network resources is network services, such as the World
Wide Web (WWW) and file transfer service, which are provided by specific
processes running on computers.
Although the idea may seem simple, one of the key challenges in distributed
computing is the unique identification of resources that are available on the
network. In the next section, we will look at how resource identification is
accomplished on the Internet.
Host Identification and Internet Protocol Addresses
Physically, the Internet is a gigantic mesh of network links and computers.
Conceptually (see Figure 1.16), the main arteries of the Internet are a set of high-
bandwidth network links that constitute the “backbone” of the network.
Connected to the backbone are individual networks, each of which has a unique
identifier. Computers with TCP/IP support, called Internet hosts, are linked to
individual networks. Through this system of “information highways,” data can
be transmitted from a host H, on network N; to another host H on network No.
To transfer data from within a program, it must be possible to uniquely identify
the process that is to receive the data, similar to addressing the recipient of a let-
ter delivered by the postal service.
As you recall, a process is a run-time representation of a program when the pro-
gram is executed on a computer. Further, recall that on the Internet a computer,
or a host, is linked to a network. In order to identify a process, it is therefore
necessary to name the network, the host linked to that network, and then the
particular process running on the host.1.6 Network Basics
FE) aninternet host
Networks
(J Tree Internet backbone
Figure 1.16 The internet topology.
In the Internet architecture, host identification is part of the Internet protocol
(IP), which, as you recall, is the protocol at the Internet layer of the TCP/IP
suite. The discussion that follows refers to the host identification scheme spec-
ified in version 4 of IP, ot IPv4. Although the scheme has been modified in IP
version 6 (IPv6) to accommodate more Internet addresses, the principles of the
scheme are unchanged in the two versions, and IPv4 is chosen here for its rela-
tive simplicity. In the context of this book, the distinctions between the two
versions are not significant.
In IPv4, each host on the Internet is identified by a unique 32-bit string. Given
a length of 32 bits, the total number of addresses allowable is 2:2. Put another
way, the address space of IPv4 accommodates 252 (4,294,967,296 or over 4 bil-
lion) addresses in total.
Each IP address must identify both the network on which a host resides and
then the particular host on that network. The IPv4 addressing scheme does so
as follows:
The address space is divided into five classes, A through E. As illustrated in.
Figure 1.17, each class has a unique prefix. Class A starts with a bit 0, Class B
starts with a bit sequence of 10, Class C with 110, and so forth. The remaining
bits in each address are used for identifying the network and the host on a par-
ticular network. Thus a Class A address has 31 bits for network-host identifica-
tion, a Class B address 30 bits, and so forth. This means that a total of 23! (about
2 billion) Class A addresses are available, while a maximum of 2% (about 1 bil-
lion) Class B addresses are available. The maximum number of addresses in
Class C, D, or E can be calculated similarly. It should be noted that within each
class a small number of addresses (such as all Os and all 1s) are reserved for spe-
cial purposes.
a748 CHAPTER 1 Distributed Computing, An Introduction
Byteo Byte Byte2 —_Byte3
Class A address
Class B address =
Class C address
Multicast address|1] 1] 70 Malas Group EEE Host porton
Reserved address | 1/11 |1 Reeved
Figure 1.17 The IPv4 address scheme.
You may wonder why it is necessary to have different classes of addresses. This,
has to do with the number of computers that each individual network can
accommodate. Consider a Class A address (see Figure 1,17): the 7 bits immedi-
ately following the prefix 0 are allotted for network identification, with the rest
of the 32-8 = 24 bits devoted to the identification of hosts within a network.
Therefore, each class A network can support 224 (roughly 16 million) hosts,
although there can be no more than 27, or 128, such networks. Using the same
analysis, you can see that each of the 2! (16,384) class B network addresses can
accommodate up to 2'6 (65,536) hosts. Likewise, there are far more Class C net-
works than there are Class B networks, but each Class C network can support far
fewer hosts.
‘As has already been mentioned, we will seldom have occasion to identify IP
hosts using the 32-bit address string. On the rare occasions when we do use a
numerical network address, we most likely will use the so-called dotted decimal
notation instead. The dotted decimal notation of an IP address uses a decimal
value for each of the 4 bytes in the IP address.
For example, suppose the dotted-decimal notation for a particular Internet
address is 129.65.24.50. The 32-bit binary expansion of the notation is as follows:
129.652450
maa | \
Since the leading bit sequence is 10, the address is a Class B address. Within the
class, the network portion is identified by the remaining bits in the first 2 bytes,
that is, 0000101000001, and the host portion consists of the values in the last
2 bytes, or 0001100000110010. For convenience, the binary prefix for class
identification is often included as part of the network portion of the address, so
we would say that this particular address is at network 129.65 and at host
address 24.50 on that network.1.6 Network Basics 49
Here is another example. Given the address 224.0.0.1, one can expand it as follows:
224.0.0.1
ZN
The binary prefix 1110 signifies that this is a class D, or multicast, address. Data
packets sent to this address should therefore be delivered to the multicast group
9000000000000000000000000001.
An IP network address is assigned by an authority, known as the Internet
Assigned Numbers Authority (IANA) (community-ml.org, 25] to an organiza-
tion such as a university or an Internet Service Provider (ISP). (Note: The assign-
ment of the authority is dynamic. See http://www.wia.org/pub/iana.html for a
history of the evolution of this authority.) Within each network, the assignment.
of the host portion is internal to the organization. Typically, an organization
makes use of this portion of the address to subdivide its network into a hierar-
chy of subnets, with a unique host number assigned to each computer attached
to a subnet. For instance, the administrator of class B network 129.65 may
choose to designate byte 2 (that is, the leftmost 8 bits of the host portion) in the
address as a subnet identifier. Under this subnet scheme, the IP address
129.65.32.3 identifies a host of ID 3 on a subnet of ID 32 on this particular net-
work.
Since the 1990s, the demand for IP addresses has skyrocketed, to the point that
the address space has been exhausted. The static addressing scheme we have just
described has since been augmented with numerous changes in response to
ever-increasing demand for addresses, including the dynamic addressing
scheme that is popular with Internet Service Providers (ISPs) such as America
Online (AOL). Using dynamic addressing, an ISP or large organization can
extend the address space for a given IP network by pooling the addresses. For
example, a static class B network address may accommodate up to 2'6 or 65,536
static hosts. By pooling the approximately 65 thousand addresses and allocating
each to an active session on an as-needed basis, it is possible to support millions
of IP hosts, assuming that no more than 65 thousand are active at the same
time. For this reason, when you access the Internet through an ISP, the IP
address of your computer may vary from one logon session to the next.
Most of us have problems memorizing a 32-bit string, even with the aid of the
dotted decimal notation. Hence, a symbolic name for identifying a host is
preferable. That is why the Domain Name System (DNS) was adopted by the
Internet community. The acronym DNS also expands to Domain Name Service,
which refers to the service provided by a Domain Name System. Every time you
use email or browse a Web page, you identify an Internet host using a domain
name based on the DNS protocol.50 CHAPTER 1 Distributed Computing, An Introduction
Every domain name contains two or more components separated by dots. In an
address such as acme.com, the last component, com in this case, is called the
top-level domain. To the left of the dot in that name, acme in this case, is what
is called the second-level domain. It is also possible to have subdomains, such
as marketing.acme.com. Domain names are not case-sensitive; that is, there is
no distinction between the uppercase and the lowercase of the same character
when a name is spelled.
Currently, the top-level domains are classified as shown in Table 1.2 (Brain, 15].
Table 1.2 Top-level Domain Names
com For commercial entities, which anyone, anywhere in the world, can
register.
snet Originally designated for organizations directly involved in Internet,
‘operations. This domain is increasingly being used by businesses when
the desired name under .com is already registered by another organiza-
tion. Today anyone can register a name in the .net domain.
07g For miscellaneous organizations, including nonprofits.
edu For four-year accredited institutions of higher learning.
gov For U.S. federal government entities.
mil For the U.S. military.
Country For individual countries based on the International Standards
codes Organization; for example, .ca for Canada, and .jp for Japan. See
(Connolly, 18] if you are interested in a list of the country codes.
The second-level domain combined with the first-level domain (eg.,
calpoly.edu) typically, but not necessarily, maps to the network portion of an IP
address while the rest of the domain name (e.g., www.csc) serves to identify the
subnet, if any, and the host name. See Figure 1.18 for a pictorial depiction.
Each domain name is mapped to a corresponding IP address, although the map-
ping may not be permanent. For example, the domain name ebay.com currently
maps to the IP address 216.32.120.133. The mapping of a domain name to its
current corresponding IP address, and vice versa, can be performed using a net-
work service known as DNS naming resolution. Exercise 4 shows you a way to
experiment with this service.
Finally, the domain name localhost can be used to refer to the computer on
which the process is run. The name is always mapped to the IP address 127.0.0.1
and simply addresses “this computer.”
Once a host is located using either an IP address or domain name, we can then
identify individual resources on that host. In the following paragraphs we will1.6 Network Basics 51
(Root domain)
So ce
Figure 1.18 Domain name hierarchy.
look at three examples of such schemes for identifying a process, an email recip-
ient, and Web documents, respectively.
Identifying Processes with Protocol Ports
Specifying the correct domain name or its corresponding IP address allows us to
locate a computer or host on the Internet. But in network applications, data
needs to be delivered to a specific process running on a computer. Thus we need.
a naming scheme to allow us to uniquely identify such a process. There are any
number of possible schemes to do so. For example, one possibility is to make
use of a unique process identifier (PID) assigned to the process by the operat-
ing system (see exercise 4). On the Internet, the protocol for process identifica-1.6 Network Basics 53
‘The Uniform Resource Name (URN) is a scheme specified by RFC2141 and
related documents, intended to serve as persistent, location-independent,
resource identifiers. A URN provides persistent names within a namespace, thus
allowing a permanent object to be mirrored over several known sites; if a site is
unavailable, the object could be found/resolved at another site. Several propos-
als for URNs exist, but none of them has been widely adopted yet (aboutdo-
mains.com, 16].
Although informal, the URL is by far the best known of these terms. A URL pro-
vides a nonpersistent (that is, not necessarily permanent) means to uniquely
identify an object within a namespace. A namespace, in the context of a nam-
ing system, refers to the set of names that the system provides. In its most gen-
eral form, the format of a URL is
//:@:/
where
is the exact but case-insensitive name of the application-layer
protocol you wish to use to access the resource; for example, HTTP if you are
attempting to access a Web browser;
: is for access authorization, if required by the protocol;
is the domain name or dotted-decimal IP address of the host that
provides the service allowing you to access the protocol; for example,
www.calpoly.edu;
is the transport-layer protocol port for the process that pro-
vides the service on the remote host; for example, 80 (by default) for HTTP
or Web servers;
specifies where in the file system of the remote host the
resource can be located; for example, ~mliu/csc102/index.html.
When you are entering a URL in a browser, you may skip the specifications of the
protocol (in which case HTTP is assumed), the user:password (not used in HTTP),
the port-number (80 by default), and the directory path (the root of the docu-
ment directory hierarchy is assumed). For example, the URL www.csc.
calpoly.edu entered to Netscape specifies the home page of the California
Polytechnic State University at San Luis Obispo, to be fetched from the host
with the domain name www.csc.calpoly.edu, running on port 80.
A shortened form of a URL, termed a relative URL, can be used at times.
During a session when a document (say http://www.csc.calpoly.edu/index.
html) is accessed, you can use a relative URL to name another file in the same
directory, to be fetched via the same Web server. For example, if another file
exists in that same directory called courses.html, then the URL courses.html
can be named in that document in lieu of the full URL, http://www.csc
calpoly.edu/courses.html.54 CHAPTER 1 Distributed Computing, An Introduction
Unicode is a standard
for representing charac-
ters, According to the
Unicode Home Page,
“Unicode provides a
unique [numerical repre-
sentation] for every
character, no matter
what the platform, no
matter what the pro-
gram, no matter what
the language”
[unicode.org, 29).
Extensible Name Service
Extensible Name Service (XNS) is an Internet naming service managed by the
XNS Public Trust Organization (XNSORG), an independent, open-forum organ-
ization. The service supports a naming scheme that allows a single, universal
address to be used by a user to perform “communications of all types—email,
phone, fax, Web pages, instant messaging, even postal mail. . . . As a naming
and addressing service, XNS operates at a higher level than DNS. DNS is
designed to resolve a name into the address of an Internet host computer. XNS
is designed to resolve a universal address into any type of address on any type
of communications network. You could say that XNS is to DNS what DNS is to
a phone number (and, in fact, XNS uses DNS to resolve the Internet address of
an XNS agency)” {omg.org, 27]. An XNS is a character string. There are three
types of XNS names: personal names, business names, and general names, each
of which starts with a unique leading character (=, @, and +, respectively) and
each of which can contain up to 64 Unicode characters.
Name Resolution
Whenever a symbolic name is used to identify a resource, the name must be
translated to the corresponding physical address in order to locate the resource.
‘We have already seen that a domain name such as
someComputer.someDivision.someCompany.com
for an Internet host must be translated to the numerical address, say
129.65.123.7, of that particular computer. The process of the translation is
called name resolution, or more simply, name lookup.
To perform name resolution, a database (also called a directory or a registry)
must exist containing the mappings between symbolic names and physical
names, If the namespace of a naming scheme is of a limited size, then it is pos
sible to perform name resolution manually. In the case of the DNS or XNS, a
manual process is out of the question; instead, a network service has to be pro-
vided to support online name resolution.
For the DNS, the name lookup service is provided by machines that are called
DNS servers. A central authority maintains the name database and sees to it
that the database is distributed throughout the Internet to the DNS servers.
When a domain name is specified—whether entered into a browser or coded in
a program being executed—the name is submitted to the nearest DNS server for
resolution. If the nearest server does not have the mapping, that server forwards
the request to another DNS server. The propagation of the request continues
until the name is resolved, at which time the mapping is sent back to the
process that originated the request.
In later chapters of this book we will have many occasions to work with nam-
ing schemes and their supporting facilities.1.7 Software Engineering Basics
1.7 Software Engineering Basics
Software engineering is a discipline in computer science that covers the process,
of developing applications. Although this book provides the technical back-
ground for building network applications, it is not intended to cover the process
of developing s such applications. At the same time, some of the basic concepts
from the discipline of software engineering will be relevant to our i
These concepts are introduced in this section.
Procedural versus Object-Oriented Programming
In building network applications, there are two main classes of programming
languages: procedural language and object-oriented language. (Although there
are other classes of languages, such as functional language, they are not widely
used in network applications.)
Procedural languages—the C language being the primary example—use proce-
dures to break down the complexity of the tasks of an application. For example,
an application may be coded using a procedure (also called a function,
although in some contexts the term procedure is used for a void function) to per-
form the input, another procedure to perform the computation, and a third pro-
cedure for generating the output.
Object-oriented languages, exemplified by Java, the language chosen for this
book, use objects to encapsulate the details. Each object simulates an object in
real life, carrying state data as well as behaviors. State data is represented as
instance data (in Java) or data members (in C++). Behaviors are represented as
methods,
The Unified Modeling Language
An important step in software engineering is the production of artifacts, or doc-
uments, to record the conceptual design of the application being developed. For
readability, these documents should be written using a universal set of notations
and languages. The Unified Modeling Language (UML), developed by the
‘Object Management Group [omg.org, 27], is such a facility. UML provides a
common set of language and notations “for specifying, visualizing, construct-
ing, and documenting the artifacts of software systems” [omg.org, 27].
OMG-UML provides a rich set of tools for all aspects of software engineering,
the coverage of which belongs in software engineering courses. In this book we
will occasionally make use of one of the notations: UML class diagrams (and
only a subset of them), for documenting the relationships of some of the Java
classes that appear in our presentation, Figure 1.19 presents the subset of class
diagrams you will see in'this book.56 CHAPTER 1 Distributed Computing, An Introduction
A.dass or interface is ‘A dass that implements a java interface
represented as follows: is represented as follows:
Interface Name
Cass B depends (less Bimlements lass B inherits
on Class A. from Class A.
‘Note: The style of the lines and the shape of arrowheads are significant.
Figure 1.19 A subset of UML class diagrams.
The Architecture of Distributed Applications
‘The idea of using a multilayer architecture to organize the functionalities of a
data network can be applied to distributed applications. Figure 1.20 presents an
example of such an architecture.
Figure 1.20 The architecture of distributed applications.1.7 Software Engineering Basics
Using this architecture, the functionalities of a distributed application can be
classified in three layers:
= The presentation layer provides the user interface. For example, if the appli-
cation is a shopping cart, this layer generates the set of Web pages that are
viewable by a shopper using a browser.
© The application logic layer provides the computation for the application.
This layer is also called the business logic layer for enterprise applications.
Ina shopping cart application, this layer is responsible for such tasks as credit
verification and computing the dollar amounts of the orders, sales tax, and
delivery cost.
™ The service layer provides the underlying services needed to support the
functionalities of the top two layers. Services may include data access facili-
ties (such as a database management system), directory services for name
lookups (such as the Domain Name Service), and interprocess communica-
tion (which allows data to be exchanged among processes).
Of these layers, the service layer will be the focus of this book. The other two
layers belong in the realm of software engineering.
Toolkits, Frameworks, and Components
Toolkits, frameworks, and components are terms associated with software engi-
neering for enterprise systems (that is, large-scale commercial applications).
In the context of software development, a toolkit or framework is a collection
of classes, tools, and programming samples. As examples, the Java Development
Tookit (IDK) is such a collection for developing Java programs, while Microsoft's
NET framework is meant for building Web-based applications. It is assumed
that you are proficient with the JDK for developing Java programs; other tool-
kits for distributed computing (for example, the Java Socket Toolkit) will be cov-
ered later in this book.
‘Component-based software development is an approach to building enterprise
software systems. Using this approach, software is developed and evolved by
assembling selected pre-engineered, pretested, and reusable software compo-
nents. This approach promotes software reuse and has the potential to signifi-
cantly reduce the cost and errors of development (Pour, 37]. The Enterprise Java
Bean (EJB) and Microsoft's Component Object Model (COM) are platforms that
support component-based applications. Although these platforms are important
to enterprise distributed computing, their coverage is beyond the scope of this,
book.
5758 CHAPTER 1 Distributed Computing, An Introduction
Summary
In this introductory chapter, we have discussed the following topics:
| What is meant by distributed computing and how it is related to or differ-
ent from terms such as distributed system and parallel computing.
The basic concepts of operating systems that are important to our study.
Such concepts include processes and threads.
™ Basic concepts in data communication that are relevant to this book. Such
topics include
‘© Network architectures: the OSI model and the Internet model
© Connkction-oriented communication versus connectionless communication
© Naming schemes for network resources, including
—The Domain Name System (DNS)
—The Extensible Name System (XNS)
— Protocol port numbers
— Uniform Resource Mentifier (URI) and Uniform Resource Locator
— Email addresses
Basic concepts in software engineering that are important to our study.
Such concepts include
© Procedural programming compared to object-oriented programming
© Class diagram using the notations ofthe Unified Modeling Language
(UML)
© The three-layered architecture of distributed applications consisting of (1)
the presentation layer, (ji) the application or business logic layer, and (iti)
the service layer
© The terms toolkit, framework, and component in the context of software
engineeringExercises
1. Distributed Computing
a. Consider distributed computing as defined in this chapter. For each of the
following activities, determine and explain whether it is an example of
distributed computing:
i, Using Excel on a stand-alone personal computer
i, Web surfing
ili, Instant messaging
iv. Compiling and testing a Cobol program on a department machine that
has no network connection
vy. Using the electronic mail on your department's computer to send a
message to yourself Napster.com is a digital
vi. Using Napster.com to download music music service.
b. In this exercise we will use a simplified mathematical model to analyze AudioGalaxy and KaZaA
failures in a distributed system. Explain your answers. fier sieviler services.
Suppose each computer in this question has a probability p of failing at
any time, p< 1.
i. If m computers are interconnected and the availability of each com-
puter is needed to maintain a service provided using distributed com-
puting involving these computers,
a. What is the probability p that the service will not be available at any
time, assuming that no other components in the distributed system
will fail? Express p as a mathematical function of n and p.
b. Based on your answer for part a, what is the probability p when the
computing is not distributed at all, that is, for the case of 1 = 1?
c. Based on your answer for part a, use p = 0.2 and n = 3 to compute
the probability p. How does that probability compare with the fail-
ure probability if the same computing is performed using mono-
lithic computing, that is, on one computer only?
li, Now suppose the service provided using distributed computing
requires only one of the three computers, with the other two comput-
ers serving as backups (that is, each of the three computers, on its own,
is capable of providing the service). What is the probability that the
service will not be available at any time, assuming that no other com-
ponents in the distributed system will fail? How does the failure prob-
ability of this system compare with the failure probability if the same
computing is performed using monolithic computing, that is, on one
computer only?60 CHAPTER 1 Distributed Computing, An introduction
¢. Do research on either the Internet worm [Eichin and Rochlis, 21] or a virus
attack such as the I-Love-You virus [Zetter, 22] and summarize what each
one is and how it happened. Why are such occurrences significant in dis-
tebuted ‘computing? Can you think of some measures to avoid these prob-
lems
4d. Do research on “distributed computing” (or, more accurately, collaborative
computing) projects such as seti@home (setiathome.ssl.berkeley.edu, 10]
and genome@home [genomeathome.stanford.edu, 23]. Choose one of
them. Write a report to (i) explain the objective of the project, (ii) explain
how the computing is performed in a distributed system, and (iii) explain
what you have to do to participate in the project.
e. Do research on the early days of the Internet (see sources
[vimp.museophile.com, 1), [zakon.org, 2], [silkroad.com, 3), or [Hafner and
Lyon, 4], for example) and write a short report on one of the key organiza-
tions and one of the prominent figures in the history of the Internet.
2. Concurrent Programming
a. Look up the online API specification for Java [java.sun.com, 20].
Choose the link for the Runnable interface and then the Thread class.
Browse each carefully, reading the specifications for the methods of each.
i. According to the specifications, which of the two, Runnable interface or
Thread class, is preferred if you only intend to implement the run
method? Why?
ii. What does the Thread class method sleep do? Write the Java state-
ment(s) that appears in the code for a thread to suspend the execution
of the thread for 5 seconds.
iii, What does the Thread class method activeCount do? What should the
method return in a program where three threads are spawned?
iv. The Thread class method stop is said to be deprecated. What is meant by
a deprecated method?
v. How many methods are there in the Runmable interface? Name each.
vi. How do you use the Runmable interface to create a thread? Explain.
b. Compile and run the Java class files shown in Figure 1.11 and provided in
the program sample folder. What is the outcome? Capture the output of
the run and write a paragraph to explain the output, paying special atten-
tion to the order of the lines of output.
¢. Compile and run the Java class files shown in Figure 1.12. What is the out-
come? Capture the output of the run and write a paragraph to explain the
output, paying special attention to the order of the lines of output. Also,
how does the output compare with the output from part b (the second
part)?d. Consider the following Java classes:
i. What do you expect the outcome to be when RunThread3 is executed?
Compile and run it.
ii. Comment out the word synchronized in the heading of the method
update. Compile and run RunThread3 again. What is the outcome?
Copyrighted material62 CHAPTER 1 Distributed Computing, An Introduction
3. Connection-Oriented versus Connectionless Communication
In this exercise we will use a simplified mathematical model to analyze the
trade-off between connection-oriented communication and connectionless
communication. Explain your answer.
On a certain network both forms of communication are provided:
© Using connection-oriented communication, it takes 50 seconds to estab-
lish a connection, after which a packet of up to 10 characters can be sent
in 1.0 seconds over the connection, in either direction.
| Using connectionless communication, a packet of up to 10 characters can
be sent in 1.2 seconds (the sending of each packet takes slightly longer
than in the connection-oriented case, since each packet must find its way
to the receiver).
Suppose processes A and B exchange messages on this network. A initiates
the communication and sends to B a message of 100 characters, which are
Partitioned into 10 packets. In reply, B sends a message of SO characters,
which are partitioned into 5 packets.
Assuming that there is no delay other than the time it takes for establish-
ing a connection (in the connection-oriented case) and for packet transmis-
sion:
a. How long does the session between A and B last, using connection-ori-
ented communication? Explain,
b, How long does the session between A and B last, using connectionless
communication? Explain.
¢. How much data (in number of characters) must be exchanged between A
and B in order for the connection-oriented communication to yield a
shorter session than the connectionless communication? Explain.
4. Naming
a, What is the size of the address space (that is, the total number of addresses
allowable) in each of the five classes of the IPv4 addresses? Show your
computation.
b. Find out the IP network address assigned to your organization. What class
(A thorugh E) is it?
c. Find out the domain name of the Web server host of your organization.
What is its IP address?
4. A network program, nslookup, can be used to obtain DNS name-lookup
service. You can invoke this program in at least three ways:
= Ona UNIX system, enter nslookup at the system prompt.
© On a Windows system, enter nslookup at the prompt in a command
prompt window.
= Browse to the site http://cc-www.utia.ac.be/ds/nslookup.html.
Use the service to complete the following table:IP Address Domain Name
127.0.0.1
ie.technion.ac.il
204.198,135.62
129.65.2.119
. Complete the following table:
Net ID (in Host ID (in.
Domain Class of dotted decimal dotted decim:
IP Address Name Address (A-E) notation) notation)
18.181.0.31 |
129.65.2.119 |
204,198,135.62 |
224.0.1.24 |
. Using the country code top-level domains listed by the Internet Assigned
Numbers Authority fiana.org, 19} to help, find out the domain name
country code for the following nations:
Armenia, Brazil, Canada, Cuba, Germany, Spain, France, Guatemala,
India, Mexico, Qatar, Singapore, Sweden, El Salvador, Turkey.
Identify the nation for each of the following country codes:
Td, tv, zw, nz, ph, pk, eg, bt, ao.
. Consider this URI: http://www.someSite.org:808 1 /foo/index.htm.
i, What is the protocol specified?
ii, What is the host name of the service?
iii, What is the port number of the process that provides the service?
iv. Where is the document located?
. Look up the well-known port number assignments by browsing the page
hittp://www.iana.org/assignments/port-numbers.
i, Which port number is assigned to each of these services: (i) FTP, (ii)
telnet, (iii) SMTP-and (iv) World Wide Web HTTP? Are these services
available using TCP, UDP, or both?
Exercises64 — CHAPTER 1 Distributed Computing, An Introduction
fi, What services are assigned to ports 13 and 17, respectively?
iii, On a UNIX system, or from the command-prompt window of a
Windows system, one way that you can access a network service is by
issuing a command such as the following:
telnetcspace>
For example, the command telnet foo.com 13 will access the service
provided by the process running on port 13 on the Internet host
telnet.foo.
‘Try using this method to access the services offered on port 13 on a
machine you know. Describe the outcome.
i. Instead of using the Internet scheme of using a protocol port number as
part of an address to deliver data to a process on a given host, consider an
alternative scheme where the process is located using a unique process ID
(PID), such as that assigned to each active process by the Unix operating
system. Note that a PID is assigned dynamically to a process when the
process is started, so that it is not possible to know ahead of a program's
execution what the ID will be at run time, The range of values for PIDs also
varies from system to system. What, if any, is the problem with such an
addressing scheme?
j. A naming scheme is said to allow location transparency {community-
mLorg, 25] if the scheme allows objects to be addressed without explicit
knowledge of their physical location. For example, the U.S. phone num-
ber system is location transparent, since a caller does not need to know the
whereabouts of the callee when dialing up. The U.S. Postal Service address
system, on the other hand, does not allow location transparency, since
you must address the recipient with his/her physical address (excluding
post office box numbers, that is).
Consider each of the following naming schemes. For each, determine
whether it is location transparent. Justify your answer.
i. The Domain Name System (DNS).
.. Uniform Resource Locator (URL)
Uniform Resource Name (URN)
iv. Extensible Name Service (XNS)
5. UML Class Diagrams
a. Using the notations shown in Figure 1.19, draw the class diagram for the
classes shown in Figure 1.11.
b. Using the notations shown in Figure 1.19, draw the class diagram for the
classes shown in Figure 1.12.References
1, The Virtual Museum of Computing, http://vimp.anuseophile.com/computing html
2, Hobbes’ Internet Timeline—the definitive ARPAnet & Internet history,
‘http://www.zakon.org/robert/internet/timeline/
3. The Silk Road Group, Ltd, A Brief History of Networking,
‘http://www silkroad.com/net-history.htm
4. Katie Hafner and Matthew Lyon. Where Wizards Stay Up Late: The Origins of the
Internet, New York, NY: Simon & Schuster, 1996,
5, Internet REC/STD/FYU/BCP, Archives, http://www. fags.org/fes/
6, Todd Campbell, “The first email message,” PRETEXT Magazine, 1998,
‘http://www pretext. com/mar98 /features/story2.htm
7. nttp://www.sun.com/jini/overview/, September 2000.
8, webopedia, ittp://webopedia. internet.com
9. Alice E. Koniges. Industrial Strength Parallel Computing. San Francisco, CA: Morgan
Kaufman Publishers, 2001.
10, SETI@home, the Search for Extraterrestrial Intelligence, http://setiathome.sst.
berkeley.edu/
11, Java Resources, http://www. Anoperation
— Execution flow
~~» Suspended period
Figure 2.8 Asynchronous send and asynchronous receive.
2.3 Timeouts and Threading
Although blocking provides the necessary synchronization for IPG, it is gener-
ally unacceptable to allow a process to be suspended indefinitely. There are two
measures to address this issue. First, timeouts may be used to set a maxitnum .
time period for blocking. Timeouts are provided by the IPC facility and may'be
specified in a program with an operation. Second, a program may spawn a child:
Process or a thread to issue a blocking operation, allowing the main thread or
Parent process of the program to proceed with other processing while the child
process or child thread is suspended. Figure 2.9 illustrates this use of a thread.
‘Timeouts are important if the execution of a synchronous operation has the
potential of resulting in indefinite blocking. For example, a blocking conmect
Tequest can result in the requesting process being suspended indefinitely if the
the requesting process will be unblocked, allowing it to resume processing.2.4 Deadlocks and Timeouts 77
@ An operation
Main thread = | —— Thread
executing
Thread
blocked
New thread issues a blocking
IPC operation
Main thread continues it
Mat earns hread is blocked
Thread is unblocked after the
‘operation is fulfilled
Figure 2.9 Using a thread for a blocking operation.
2.4 Deadlocks and Timeouts
Indefinite blocking may also be caused by a deadlock. In IPC, a deadlock can
result from operations that were issued improperly, perhaps owing to a misun-
derstanding of a protocol, or owing to programming errors. Figure 2.10 illus-
trates such a case. In process 1, a blocking receive operation is issued to receive
Process 1 Process 2
“receive from process 2" issued; — |
process 1 blocked pending data
from process 2
— “receive from process 1” issued;
@ An operation process 2 blocked pending data
— process from process 1
executing,
—~ Process
blocked
Figure 2.10 A deadiock caused by blocking operations.78 CHAPTER 2 interprocess Communications
Analog is the opposite
of digital: it refers to
something mechanical,
as opposed to some-
thing represented using
data, Analog signal pro-
cessing is a topic in net-
worki
A binary stream is a
stream of bits (0 and 1),
such as
00101...1010111.
Heterogeneous hosts
are computers that have
different hardware and
hence different repre-
sentations of data.
data from process 2. Concurrently, process 2 issues a blocking receive operation
where a send operation was intended. As a result, both processes are blocked
awaiting data sent from the other, which can never occur (since each process is
now blocked). As a result, each process will be suspended indefinitely until a
timeout occurs, or until the operating systems abort the processes.
2.5 Data Representation
At the physical layer (that is, the lowest layer, as opposed to the application
layer, which is the highest) of the network architecture, data is transmitted as
analog signals, which represent a binary stream. At the application layer, a
more complex representation of transmitted data is needed in order to support
data types and data structures provided in programming languages, such as
character strings, integers, floating point values, arrays, records, and objects.
Consider the simple case of two processes, process 1 on Host A and process 2 on.
Host B, engaged in a protocol that calls for the exchange of an integer value
determined at run time. Process 1 computes the value and issues a send opera-
tion to transmit it to process 2, which issues a receive operation to accept the
value, based on which process 2 performs further processing of the protocol.
Consider the integer value that needs to be sent by process 1. This value is rep-
resented in the integer representation of Host A, which is a 32-bit machine that
uses the “big-endian” representation for multi-byte data types. (The terms big
endian and little endian refer to which bytes are most significant in multi-byte
data types. In big-endian architectures, the leftmost bytes [those with a lower
address} are most significant. In little-endian architectures, the rightmost bytes
are most significant.)
Host B, on the other hand, is a 16-bit machine that uses the “little-endian” rep-
resentation. Suppose the value is sent as a 32-bit strearn directly from process 1's
memory storage and placed into process 2's memory location. Then (1) 16 bits
of the value sent will need to be truncated, since an integer value only occupies
16 bits on host B, and (2) the byte order of the integer representation must be
swapped in order for the value to be interpreted correctly by process 2.
As can be seen from the example, when heterogeneous hosts are involved in
IPC, it is not enough to transmit data values or structures using raw bit streams
unless the participating processes take measures to package and interpret the
data appropriately. For our example, there are three schemes for doing so:
1. Prior to issuing the send operation, process 1 converts the value of the inte-
ger to the 16-bit, little-endian data representation of process 2.
2. Process 1 sends the data ir. 32-bit, big-endian representation. Upon receiving
the data, process 2 converts it to its 16-bit, little-endian representation.
3. A third scheme is for the processes to exchange the data in an external rep-
resentation: data will be sent using this representation, and the data received
will be interpreted using the external representation and converted to the
native representation.2.6 Data Encoding 79
As another example, suppose process 1, running on host A, wishes to send a sin-
gle character a to process 2, running on host B. The program for process 1 uses
ASCII representation for characters, while the program for process 2 uses
Unicode representation. Scheme 1 will call for process 1 to convert the a to
Unicode before sending. Scheme 2 will call for process 2 to receive the data,
then convert it from ASCII representation to the corresponding Unicode repre-
sentation. Scheme 3 will call for the two sides to agree on an external represen-
tation, say ASN.1 (Abstract Syntax Notation Number 1), so that process 1 will
convert the @ to its ASN.1 representation before sending, while process 2 will
convert the data received from the ASN.1 representation to the Unicode repre-
sentation for a. (The ASN.1 data representation will be explained in section 2.6.)
Consider another case where the transmission of a data structure, such as a list
of values, is called for. In addition to the need for an external representation of
data values, there is now a need to “flatten” or serialize the data structure at the
sender's end and to unpack the data at the other end to reconstruct the data
structure,
‘The term data marshaling is used in the context of IPC to refer to the process-
ing necessary to transmit data values and data structures. Data marshaling is
needed for all IPC and includes necessary steps for conditioning the data to be
transmitted: (1) serializing the data structures, and (2) converting the data val-
ues to an external representation. Figure 2.11 illustrates the data marshaling
concept.
For network applications written in object-oriented programming languages
such as Java, an important data structure that requires special attention for data
marshaling is an object. Unlike static data structures such as arrays or records of
data, an object encapsulates both data (representing the state of an object) and
methods (representing the behavior of an object). If an object is to be transmit-
ted using IPC, it is necessary for the data marshaling (again, flattening and
encoding) to cover both the data and the representation of the methods—
including the execution state—so that an object, once un-marshalled by the
receiving process, can function as an object in the execution space of the receiv-
ing process, Because of the complexity involved, data marshaling of objects
poses a bigger challenge than data marshaling of other data structures, and has
been given a special term: object serialization [java.sun.com, 11]. In Java,
“(O)bject serialization supports the encoding of objects, and the objects reach-
able from them, into a stream of bytes; and it supports the complementary
reconstruction of the object ... from the stream” [Harold, 12].
2.6 Data Encoding
Although customized programs can be written to perform IPC using any mutu-
ally agreed upon scheme of data marshaling, general-purpose distributed appli-
cations require a universal, platform-independent scheme for encoding the
exchanged data. Hence there exist network data encoding standards.
ASCII stands for
‘American Standard
Code for Information
Interchange, an encod-
ing scheme for mapping
2 character used in the
English language to a
numeric value in the
range of 0 to 127.
Unicode is a complex
‘encoding scheme for
mapping a character,
riot limited to those
used in the English tart
‘guage, to a numeric
value in the range of 0
to 65,535, For a precise
definition of the
scheme, see
hittp://www.unicode.orgCHAPTER 2 Interprocess Communications
“This isa test” 12 | 73 [as
[ 1. Flatening of structured data items
Hosta Marshaling ——— 9. Converting data to external (network)
representation
110011 ... 10000100
[7. Converting data to internal representation
2. Rebuilding data structures
External to internal representation and vice versa
is not required
iif the two sides are of the same host type; or
1 ifthe two sides negotiate at connection,
Host 6
Figure 2.11 Data marshaling.
As illustrated in Figure 2.12, data encoding standards are available at varying
levels of abstraction.
At the simplest level, an encoding scheme such as External Data
Representation (XDR) ietf.org, 1] allows a selected set of programming data
types and data structures to be specified with IPC operations. The data mar-
shaling and unmarshaling are performed automatically by the IPC facilities on
the two ends, transparent to the programmer.
At a higher level of abstraction (that is, detail hiding; more on this concept in
the next chapter), standards such as ASN.1 (Abstract Syntax Notation Number
1) {oss.com, 2] exist. ASN.1 is an Open Systems Interconnection (OSI) standard
that specifies a transfer syntax for representing network data. The standard cov-
ers a wide range of data structures (such as sets and sequences) and data types
(such as integer, Boolean, and characters) and supports the concept of data tag-
ging. Each data item transmitted is encoded using syntax that specifies its type,
its length, its value, and optionally a tag to identify a specific way of interpret
ing the syntax.
At an even higher level of abstraction, the Extensible Markup Language
(XML) [w3.org, 9] has emerged as a data description language for data sharing2.7 Text-Based Protocols 81
Level of Abstraction Data Encoding Schemes Sample Standards
Application-specific data encoding language XML (Extensible Markup Language)
General data encoding language ‘ASN.1 (Abstract Syntax Notation)
Network data encoding standard ‘Sun XDR (Extemal Data Representation)
Figure 2.12 Network data representation standards.
among applications, primarily Internet applications, using syntax similar to the
Hypertext Markup Language (HTML), which is the language used for compos-
ing Web pages. XML goes one step beyond ASN.1 in that it allows a user to use
customized tags (such as the tags , , and in the example in
Figure 2.13) to specify a unit of data content. XML can be used to facilitate data
interchange among heterogeneous systems, to segregate the data content of a
‘Web page (written in XML) from the display syntax (written in HTML), and to
allow the data to be shared among applications. Since its introduction in 1998,
XML has gained considerable attention and is now widely employed in com-
puter applications.
MaryJ@BigU.edu
Interprocess Communications
IPC is the backbone of distributed computing ...
Figure 2.13 A sample XML file.
2.7 Text-Based Protocols
Data marshaling is at its simplest when the data exchanged is a stream of char-
acters or text encoded using a representation such as ASCII. Exchanging data in
text has the additional advantage that the data can be easily parsed in a program
and displayed for human perusal. Hence it is a popular practice for protocols to
exchange requests and responses in the form of character strings. Such proto-
cols are said to be text-based, Many popular network protocols, including FTP
ile Transfer Protocol), HTTP, and SMTP (Simple Mail Transfer Protocol), are
text-based. You will have a chance to investigate and experiment with these pro-
tocols in the exercises at the end of this chapter, and we will study some of these
protocols in detail in subsequent chapters.82
CHAPTER 2 Interprocess Communications
2.8 Request-Response Protocols
An important type of protocol is the request-response protocol. In this proto-
col, one side issues a request and awaits a response from the other side.
Subsequently, another request may be issued, which in turn elicits another
response. The protocol proceeds in an iteration of request-response, until the
desired task is completed. The popular network protocols FTP, HTTP, and SMTP
are all request-response protocols.
2.9 Event Diagram and Sequence Diagram
An event diagram, introduced in Section 2.2, is a diagram that can be used to
document the detailed sequence of events and blocking during the execution of
a protocol. Figure 2.14 is an event diagram for a request-response protocol
involving two concurrent processes, A and B. The execution of each process
with respect to time is represented using a vertical line, with time increasing
downward. A solid line interval along the execution line represents a time
period during which the process is active. A broken line interval represents
when the process is blocked. In the example, both processes are initially active.
Process B issues a blocked receive operation in anticipation of request 1 from
Process B
Process A Time
t request 1
response 1
°
request 2
> interprocess communication
——— Execution flow
response 2
Process blocked —
Figure 2.14 An event diagram.2.9 Event Diagram and Sequence Diagram
process A. Process A meanwhile issues the awaited request 1 using a nonblock-
ing send operation, then subsequently a blocking receive operation in ant
tion of process B's response. The arrival of request 1 reactivates process B, which
processes the request before issuing a send operation to transmit response 1 to
process A. Process B then issues a blocking receive for request 2 from process A.
‘The arrival of response 1 unblocks process A, which resumes execution to work
on the response and to issue request 2, which unblocks Process B. A similar
sequence of events follows.
Note that each round of request-response entails two pairs of send and receive
‘operations to exchange two messages. The protocol can extend to any number
of rounds of exchange using this pattern.
Note also that it is essential that the programs implementing the protocol must
be written to issue the send and receive operations in the order prescribed,
otherwise one or both of the participating processes may wait for a request or a
response that never arrives, and the processes may become blocked indefinitely.
Figure 2.15 uses an event diagram to describe basic HYIP. In its basic form,
HTTP isa text-based, request-response protocol that calls for only one round of
exchange of messages. A Web server process is a process that constantly listens
for incoming requests from Web browser processes. A Web browser process
makes a connection to the server, then issues a request in a format dictated by
the protocol. The server processes the request and dispatches a response com-
posed of a status line, header information, and the document requested by the
browser process. Upon receiving the response, the browser process parses the
response and displays the document. (We will study the client-server model and
HTTP in further detail in Chapter 5.)
Web Server Web Browser
Request is a message in three parts
eau —
— an optional header
— optional data for CGI data using post method
response Response is a message consisting of three parts:
— a status line of the format
— header information, which may span several lines
— the document itself
Figure 2.15 Event diagram for an HTTP session.84 — CHAPTER 2 Interprocess Communications
An event diagram is a useful device for illustrating the synchronization of
events. It is, however, too detailed for documenting complex protocols. A sim-
plified form of diagram, known as a sequence diagram and part of the UML
notations, is more commonly used to document interprocess communications.
In a sequence diagram, the execution flow of each participant of a protocol is
represented as a dashed line and does not differentiate between the states of
blocked and executing. Each message exchanged between the two sides is
shown using a directed line between the two dashed lines, with a descriptive
label above the directed line, as illustrated in Figure 2.16.
‘The sequence diagram for the basic HTTP is shown in Figure 2.17.
Process A Process B
Figure 2.16 A sequence diagram.
Web Browser Web Server
HTTP request
eae
1 HTTP response j
oon"
Figure 2.17 The sequence diagram for HTTP.2.10 Connection-Oriented versus Connectionless IPC 85.
Figure 2.18 captures the text of the messages exchanged during a sample HTTP.
session. Using a telnet client (telnet is a protocol normally used for a terminal
session to a remote machine), itis possible to make a connection to a Web server
process and enter the text of an HTIP request by hand. (Using telnet to com-
municate with a process in the way described here allows you to experiment
with IPC without having to write a program; be aware that it is not the normal
way to interact with such a process. In subsequent chapters we will learn about
how to use programming to do the same thing.) In this case, the Web server
process runs on port 80 of the host named www.csc.calpoly.cdu. The request
GET /-mliu/ HTTP/1.0 is keyed in. The response from the Web server process
then follows. In Chapter 9 we will study the meanings of the requests and the
responses when we explore HTTP in detail.
Script started on Tue Oct 10 21:49:28 2000
9:49pm telnet wew.csc.calpoly.edu 80
‘Trying 129.65.241.20...
Connected to tiedye2-srv.csc.calpoly.edu.
Escape character is ‘~)'.
GET /-mliu/ HTTP/1.0 <———__—______——_ HTTP request
ETTP/1.1 200 OK
Date: Wed, 11 Oct 2000 04:51:18 cur <<——~——_ HTTP response status line
Server: Apache/1.3.9 (Unix) ApacheJServ/1.0~«—— HTTP response header
Last-Modified: Tue, 10 Oct 2000 16:51:54 GuT
ETag: “Iddle~e27-39e3492a"
Accept-Ranges: bytes
Content-Length: 3623
Connection: close
Content-Type: text/html
]
|
Mei-Ling L. Liu's Home Page |
——— Document content
Figure 2.18 The dialog during an HTTP session.
2.10 Connection-Oriented versus Connectionless IPC
In Chapter 1 we introduced the distinction between connection-oriented and
connectionless communication. We can now apply that distinction to IPC.
Using a connection-oriented IPC facility, two processes establish a connection
(which, as a reminder, may be logical—that is, implemented in software—rather
than physical), then exchange data by inserting data to and extracting data86 CHAPTER 2 Interprocess Communications
Serial data transfer
refers to transmitting
data one bit at a time.
The opposite of serial
data Lransfer is parallel
data transfer, in which
several bits are transmit-
ted concurrently.
Socket is a term bor-
rowed from the early
days of telephone com-
munications, when an
operator had to manu-
from the connection. Once a connection is established, there is no need to iden-
tify the sender or the receiver,
Using a connectionless IPC facility, data is exchanged in independent packets,
cach of which needs to be addressed specifically to the receiver.
When we study the socket APL in Chapter 4, we will look at how connection-
oriented and connectionless IPCs are provided at the application layer.
2.11 The Evolution of Paradigms for Interprocess
Communications
Now that we have explored the concept of IPC, we will next look at the differ-
ent models, or paradigms, through which IPC can be provided to a programmer
who wishes to make use of IPC in a program. Earlier in this chapter we have
seen that data encoding schemes exist at different levels of abstraction, The
same can be said of paradigms for IPC, as illustrated in Figure 2.19.
At the least abstract, IPC involves the transmission of a binary stream over a
connection, using low-level serial or parallel data transfer. This IPC paradigm
may be appropriate for network driver software, for instance. IPC of this form
falls in the realm of network or operating system programming and will not be
covered in this book.
At the next level is a well-known paradigm called the socket application pro-
gram interface (the socket API). Using the socket paradigm, two processes
exchange data using a logical construct called a socket, one of which Is estab-
lished at either end. Data to be sent is written to the socket. At the other end, a
receiving process reads or extracts data from its socket. We will study the socket
API in the Java language in the next chapter.
ally establish aconnec- The remote procedure call or remote method invocation paradigm provides
tion for two parties by further abstraction by allowing a process }o make procedure calls or method
inserting the two ends invocations to a remote process, with data transferred between the two
of a cable into the cor- processes as arguments and return values. We will study one implementation of
rect sockets, this paradigm, the Java remote method invocation, in Chapter 8.
Level of
Abstraction IPC Paradigms Example IPC Implementations
Remote Procedure Call (RPC),
Remote procedure/method Java Remote Method Invocation (RMI)
Socket API UNIX socket API, Winsock
Data transmission Serial/paraiiel communication
Figure 2.19 IPC paradigms.Summary
Interprocess communications (IPC) form the backbone of distributed comput-
ing. In this chapter we have looked at the principles of IPC, including the fol-
lowing:
& Interprocess communications (IPC): the ability for separate, independent
processes to communicate among themselves to collaborate on a task. When
‘communication is from one process to a single other process, the IPC is said
to be a unicast. When communication is from one process to a group of
processes, the IPC is said to be a multicast.
© A basic API that facilitates IPC must provide the following:
© Primitive operations: send, receive, connect, an¢ disconnect.
® Event synchronization, which allows the processes involved to execute
independently, with no knowledge of what takes place at the other end.
‘The simplest way for an IPC facility to provide for event synchronization
is by using blocking. Operations that are blocking are also called syn-
chronous operations, while operations that are nonblocking are also
called asynchronous operations. Deadlocks can result from blocking
operations. Threading or process forking can be used by a program to
perform separate tasks while awaiting the fulfillment of a blocked opera-
tion.
© Data marshaling, which includes these steps needed to condition the
data to be transmitted: (i) serializing the data structures, and (ii) convert-
ing the data values to an external or network data representation,
® Different network data representation schemes exist at different levels of
abstraction. Some well-known schemes are the Sun XDR (External Data
Representation), ASN.1 (Abstract Syntax Notation Number 1), and the XML
(Extensible Markup Language).
= Data marshaling is at its simplest when the data exchanged is a stream of
characters or text encoded using a representation such as ASCII. Protocols
that use text encoding are called text-based protocols.
= Request-response protocols are protocols that proceed in an iteration of
request-response until the desired tasks are completed.
© An event diagram can be used to document the detailed sequence of events
and blocking in a protocol. A solid interval along the execution line repre-
sents a time period during which the process is active. A broken line interval
represents when the process is blocked.
© A sequence diagram is part of the UML notations and is used to document
complex interprocess communications. In a sequence diagram, the execution
flow of each participant of a protocol is represented as a dashed line and does
not differentiate between the states of blocking and executing.
Summary 8788 © CHAPTER 2 Interprocess Communications
© IPC facilities can be connection-oriented or connectionless:
© Using a connection-oriented IPC facility, two processes establish a logical
connection, then exchange data by inserting data to and extracting data
from the connection. Once a connection is established, there is no need
to identify the sender or the receiver.
© Using a connectionless IPC facility, data is exchanged in independent
packets, each of which needs to be addressed specifically to the receiver.
® IPC facilities can be classified according to their levels of abstraction, ranging
from serial or parallel data transfer at the lowest level, to socket API at the
next level, to remote procedure or method call at the highest level.Exercises
1. Consider interhuman communications.
a. Classify each of the following scenarios in terms of unicast or multicast:
i. A student speaking to a friend on a wireless phone
ii, An executive speaking on a conference phone with managers in differ-
ent cities
ili, A teacher lecturing in a classroom
iv. A child playing with another child using a “walkie-talkie”
v. The president addressing the nation on television
b. How are event synchronization and data representation handled during a
session of face-to-face conversation, such as when you speak to someone
seated next to you?
¢. How are event synchronization and data representation handled during a
session of remote conversation, such as when you speak to someone over
the phone?
d. How are event synchronization and data representation handled during a
meeting of two heads of nations who speak different languages?
2. Process A sends a single message to process B using connectionless IPC. To do
$0, A issues a send operation (specifying the message as an argument) some-
time during its execution, and B issues a receive operation. Suppose the send
operation is asynchronous (nonblocking) and the receive operation is syn-
chronous (blocking). Draw an event diagram (not a sequence diagram) for
each of the following scenarios:
a. Process A issues its send operation prior to process B issuing its receive
operation.
b. Process B issues its receive operation prior to process A issuing its send
operation.
3. Repeat the last question. This time both operations (send, receive) are blocking,
4. Consider the following interprocess communication API:
Using this API, messages are sent to and received from mailboxes. A
can communicate with another process using a mailbox shared by the two
processes. For example, if process A wishes to communicate with processes B
and G, it has to share mailbox 1 with B, and another mailbox, mailbox 2,
with C. Messages between A and B are deposited into and retrieved from
mailbox 1, while messages between B and C are deposited into and retrieved
from mailbox 2. See Figure 2.20.
The send and receive operations are defined as follows:
| send (n, message): send a message to mailbox n, blocking (that is, the
sender will be suspended indefinitely until a response arrives in the shared
mailbox)90 CHAPTER 2 Interprocess Communications
® receive (n, message): examines mailbox n in anticipation of receiving a
message; this is a blocking operation, meaning that the receiving process
will be suspended until the message arrives in the named mailbox
A process blocked waiting for a message coming to one mailbox will not be
able to receive any message arriving at any other mailbox.
a. Suppose a process P expects to receive two messages, one from mailbox 1
and one from mailbox 2. It is not known ahead of time which message will
arrive first. What sequence of send and receive, if any, can it execute to
make sure that process P does not block forever?
b. What sequence of send and receive, if any, should process P execute if it
wants to wait for a message either from mailbox 1 or from mailbox 2 (or
from both)? Again, it is not known ahead of time which message will
arrive first. Also, your sequence should not cause indefinite blocking.
(Note: Your answer should use the given operations only; you should not use
threading or other operating system support.)
Process A Process B
Mailbox 1
> Message flow
Mailbox 2
Figure 2.20 An IPC application program interface for Exercise 4.
5. Is it possible for a deadlock to occur during interprocess communications
(involving send/receive operations)
a. on a communications system that provides a blocking send operation and
a blocking receive operation?
b. on a communications system that provides a nonblocking send operation
and a blocking receive operation?Justify your answers. If the answer is yes, give an example. If the answer is no,
give a brief argument.
. Consider the provision of an API for multicasting built on an existing unicast
API. The unicast API provides send and receive operations between two
processes. The multicast API provides operations for (1) sending to a group of
processes, and (2) receiving from a multicasting process. Describe how you
would provide the multicast operations using the unicast operations only.
(Note: Your answer should use the given operations only; you should not use
threading or other operating system support.)
. Ina distributed system, three processes P1, P2, and P3 are engaged in inter-
process communication. Suppose the following sequence of events occurred:
At time 1, P3 issues a receive from P2.
At time 2, P1 sends mi to P2.
At time 3, P2 issues a receive from P1.
At time 4, P2 receives ml.
At time 5, P2 sends message m1 to P3.
At time 6, P3 receives ml; P1 issues a receive from P2.
At time 7, P2 issues a receive from P3.
At time 8, P3 sends m2 to P2.
At time 9, P2 receives m2.
At time 10, P2 sends m2 to P1.
At time 11, PI receives m2.
a. For each of the following scenarios, draw an event diagram to show the
sequence of events and the blocking and unblocking of each process:
i, ona communication system that provides blocking for the send opera-
tions and blocking receive operations,
fi, on a communication system that provides nonblocking send opera-
tions and blocking receive operations.
b. Draw a sequence diagram to document the interprocess communication
between Pi, P2, and P3.
. This is an exercise on data marshaling,
a. In the context of IPC:
i. What is meant by data marshaling? There are two components to data
marshaling; name and describe each. Why is data marshaling necessary?
i, What is meant by object serialization?
iii, How do the two components of data marshaling apply to (i) an array
of integers, and (ii) an object? Describe in general terms what needs to
be done with the data.
b. Process A sends to process B a single data item, a date. Process A uses the
American date format: // (for example, 01/31/2001).
Process B uses the European date format: /cmonth>/ (for exam-
ple, 31/01/2001).
192 CHAPTER 2 Interprocess Communications
°
10.
uu.
12.
13.
i, Suppose no external data representation has been agreed upon.
a. How can A send the date to B so that A does not have to do any
conversion?
b. How can A send the date to B so that B does not have to do any
conversion?
Ji, Suppose the same date has to be communicated to process C, which
uses a date format of -- (for example, 2001-01-
31).
How can A send the date to both B and C:so that A does not have
to do any conversion?
ili, Describe an external representation of the date so that any sending
process may convert a date of its local representation to the external
representation prior to sending, and any receiving process may con-
vert the date received from this representation to its native represen-
tation.
It may be of interest for you to read reference [saqqara.demon.co.uk, 10].
Use telnet to interact with a Daytime [RFC 867, 4] server process on a
machine that you have access to. Daytime server processes reside on port 13
of an IP host. From a console screen on a command prompt screen, enter
telnet 13
Example: telnet somehost.sonel.edu 13
Provide a script of the session and describe your observations.
Draw a sequence diagram for the Daytime protocol.
Is it possible for a Daytime client to be blocked indefinitely? Explain.
Use telnet to interact with an echo [RFC 862, 6] server process on a machine
to which you have access. By default, Echo server processes reside on port 7
of an IP host.
a. Draw a time event diagram for the echo protocol.
b.Is it possible for an echo client to be blocked indefinitely? Explain.
Consider the FTP (File Transfer Protocol) [RFC 959, 5].
Note that this protocol uses two connections: one for transmitting
requests and responses, the other for transmitting data of the files being
sent/received.
a. Use telnet to connect to an FIP server that you have access to. Then issue
a command to list the contents of the root directory on the server.
b. Use an event diagram to describe the interactions among the participat-
ing processes.
c. What is the format of each request?
4d. What is the format of each response?
e, Consider the MODE command of the protocol: it allows a client to spec-
ify what type of file (text or binary) is to be transferred. What is the data
representation for different modes of files?Exercises 93
14. Consider the Simple Mail Transfer Protocol (SMTP) (RFC 821, 3]. An excerpt
from the REC for this protocol provides the following sample session:
Ri 220 USC-ISI.ARPA Simple Mail Transfer Service Ready
St HELO LBL-UNIX.ARPA
Rr 250 USC-ISI.ARPA
St MAIL PROM:
Rr 250 OK
St RCPT TO:
R: OK
St DATA
Rr 354 Start mail input; end with .
S: Blah blah blah.
St ...ete. ete.
st.
Rr 250 OK
Ss: QuiT
Rr 221 USC-ISI.ARPA Service closing transmission channel
a, Use a sequence diagram to describe the interactions among the partici-
pating processes.
b. What is the format of each request?
c. What is the format of each response?
d.Use telnet to connect to a system on which you have an SMTP email
account, and then send yourself an email. Log onto the system and check
that the email indeed arrived.
Provide a script of the session and describe your observations.
References
(Note: All Requests for Comments [RFCs] can be browsed online at this archival
site: IETF RFC Page, http://www.ietf.org/rfc.htm!)
1. RFC 1014, External Data Representation.
2. “ASN.1 Overview,” http://www.oss.com/asn I/overview htm!
3. RFC 821, SMTP.
4. RFC 867, Daytime Protocol.
5. RFC 959, FTP Protocol.
6. RFC 862, Echo Protocol.
7. John Shapley Gray. Interprocess Communications in UNIX. Upper Saddle River, NJ:
Prentice Hall, 1997.
8. RFC 742, Finger protocol.
9. Extensible Markup Language (XML), http://www.w3.0rg/XML/
10. International Date Format Campaign, http://www.saqgara.demon.co.uk/datefmt.htm
11. Java Object Serialization, http://java.sun.com/}2se/1.3/docs/guide/serialization/
12: Elliotte Rusty Harold. Java 1/0, Sebastopol, CA: O’Rellly Press, 1999.CHAPTER
Distributed Computing
Paradigms
Distributed computing is one of the most vibrant areas in computer science. There
is an ongoing evolution of new technologies for supporting network applications,
bringing with them new conceptual models and terminologies. Seemingly a new
buzzword, another acronym, or yet one more groundbreaking technology surfaces
everyday. To a casual observer or a beginning student, sorting out the terminolo-
gies and technologies proves a daunting task.
This chapter presents a classification of the various paradigms for distributed
applications, as well as an introduction to some of the existing well-known tools
and protocols based on these paradigms. In subsequent chapters we will explore
some of the paradigms, tools, and protocols in detail.
3.1 Paradigms and Abstraction
‘The terms paradigms and abstraction have already been used in previous chap-
ters. Here we will examine them closely.
Abstraction
Arguably the most fundamental concept in computer science, abstraction is the
idea of encapsulation, or detail hiding. To quote David J. Barnes (Barnes, 1]:
9596 CHAPTER 3 Distributed Computing Paradigms
We often use abstraction when it is not necessary to know the exact details of
how something works or is represented, because we can still make use of it in its
simplified form. Getting involved with the detail often tends to obscure what we
are trying to understand, rather than illuminate it. . . . Abstraction plays a very
important role in programming because we often want to model, in software,
simplified versions of things that exist in the real world . . . without having to
build the real things.
In software engineering, abstraction is realized with the provision of tools or
facilities that allow software to be built without the developer having to be cog-
nizant of some of the underlying complexities. It is not an overstatement to say
that the tools for abstraction are the force behind modern-day software, and they
exist in every aspect of application development. As examples, we make use of
compilers to abstract the detail of the machine languages, and Java programmers
use the Abstract Window Toolkit (AWT) to rapidly develop graphic displays.
In the area of distributed applications, there has been an explosion of tools and
facilities based on a wide variety of paradigms that offer varying degrees of
abstraction.
Paradigms
Webster's Dictionary defines the word paradigm as “a pattern, example, or
model.” In the study of any subject of great complexity, it is useful to identify
the basic patterns or models and classify the details according to these models.
‘This chapter aims to present a classification of the paradigms for distributed
applications. The paradigms will be presented in the order of their level of
abstraction, as shown in Figure 3.1. At the lowest level of abstraction is message
passing, which encapsulates the least amount of detail. Object space occupies
the other extreme of the spectrum, as it is the most abstract of all the paradigms.
Level of Abstraction
High
‘Object space, collaborative applications
Network services, object request broker, mobile agent.
Remote procedure call, remote method invocation
Client-server, peer-to-peer
Message passing
Low
Figure 3.1 Distributed computing paradigms and their level of abstraction.3.3 Paradigms for Distributed Applications
3.2 An Example Application
Throughout the discussion that follows, a common application will be used to
illustrate how each paradigm may be applied.
‘The example application is an online auctioning system. (Note that the imple-
mentations described in this chapter intentionally skip the details (such as user
interface and data storage] for an actual application. The example implementa-
tions are meant to serve as a common thread in this chapter. Using these
implementations, you may compare and contrast the differences and the effects
of the abstractions provided by the different paradigms.) We will simplify the
system to one that handles only one auctioned item per session. During each
auctioning session, an item is open for bids placed by the auction participants.
At the end of a session, the auctioneer announces the outcome. In the descrip-
tion of the various implementations, we will confine our attention to the dis-
tributed computing aspects of the service layer (that is, within a three-layered
architecture of distributed applications, the layer that provides the underlying
services in support of the upper layers) of the application’s architecture.
3.3 Paradigms for Distributed Applications
Message Passing
‘The basic approach to interprocess communications is message passing. In this
paradigm, data representing messages are exchanged between two processes, a
sender and a receiver.
Message passing is the most fundamental paradigm for distributed applications.
A process sends a message representing a request. The message is delivered to a
receiver, which processes the request and sends a message in response. In turn,
the reply may trigger a further request, which leads to a subsequent reply, and
so forth. Figure 3.2 illustrates the message-passing paradigm.
‘The basic operations required to support the message-passing paradigm are send
and receive. For connection-oriented communication, the operations connect
and disconnect are also required. (The details of operations, such as arguments
and return values, will be given with specific tools or facilities in later chapters.)
With the abstraction provided by this model, the interconnected processes per-
form input and output to each other, in a manner similar to file input and out-
put (1/0). As with file I/O, the operations serve to encapsulate the detail of
network communication at the operating system level, so that a programmer
may make use of the operations to send and receive message without having to
deal with the underlying detail
‘The socket application program interface (which will be studied in Chapter 4)
is based on this paradigm. Using a socket, a logical construct, two processes may
exchange data as follows: A sender writes or inserts a message into the socket;
at the other end, a receiver reads or extracts a message from the socket.
7Distributed Computing
Principles and Applications
M. L. Liu
California Polytechnic State University,
Ren etiene str
PS NT om Oe Le Me ema ee ee ele eel ee ed)
Pena ek atch ecu hCcune Coma scien noe ited
Pucca ue yee eel ee cn enue eon
The book covers computing paradigms, protocols, and application program interfaces (APIs),
Rotts mun seach uM Eee Cun LU) ecuumentemacc tract
bole Cg eel cet eee aM cule Blu oun ec uretR Ue Olea eo)
Gateway Interface (CGI), and Simple Object Access Protocol (SOAP). Each chapter
introduces a paradigm and/or protocol and then presents the use of an API that illustrates the
ete Teen cl meee Coca mek eee le eee)
Pen ee
Cee gee aU es RCE Nace aes ult Ch acer eure n cree tee Cet)
Dae ale eee) ee Mee ea eee
a |