Module 3 - Distributed Objects & Remote Invocation
Module 3 - Distributed Objects & Remote Invocation
Lecture Notes by Steve Donald and Noura Limam Modified by Maxwell Young
Last Class
Defining a ``protocol'' OSI model (layered protocols), message format for protocol stack Connection-less, connection-oriented protocols Discussion of TCP, UDP Criteria for applications: data loss, bandwidth, timing Interface between Transport and Application layers (sockets) Hardware infrastructure of the Internet Circuit Switching vs Packet Switching, Delay
Course Modules
Module 1: Architectures/Models of Distributed Systems Module 2: Overview of Computer Networks Module 3: Distributed Objects and Remote Invocation Module 4: Distributed Naming Module 5: Distributed File Systems Module 6: Synchronization Module 7: Data Replication Module 8: Fault Tolerance Module 9: Security
CS454/654
0-3
Extension of the conventional procedure call model Allows client programs to call procedures in server programs Object-oriented Extension of local method invocation Allows an object in one process to invoke the methods of an object in another process Allows sender and receiver to communicate without blocking and without the receiver having to be running
3-4
Message-Based Communication
CS454/654
This is the responsibility of the networks and the request-reply protocol Interfaces Processes have to be able to understand what each other is sending
CS454/654
3-5
Interface
Defines what each module can do in a distributed system Service interface (in RPC model)
specification of procedures offered by a server specification of methods of an object that can be invoked by objects in other processes
CS454/654
3-6
Extension of the familiar procedure call semantics to distributed systems (with a few twists) Allows programs to call procedures on other machines Process on machine A calls a procedure on machine B:
The calling process on A is suspended; and execution of the called procedure takes place on B; Information can be transported from the caller to the callee in the parameters; and information can come back in the procedure result No message passing or I/O at all is visible to the programmer Calling and called procedures execute in different address spaces Parameters and results have to be passed, sometimes between different machines Both machines can crash
3-7
Subtle problems:
CS454/654
SP
SP
SP
The read routine is extracted from library by linker and inserted into the object program. It puts the parameters in registers and issues a READ system call by trapping to the kernel. CS454/654
From Tanenbaum and van Steen, Distributed Systems: Principles and Paradigms Prentice-Hall, Inc. 2002
3-8
RPC Operation
Objective: Make interprocess communication among remote processes similar to local ones (i.e., make distribution of processes transparent) RPC achieves its transparency in a way analogous to the read operation in a traditional single processor system
When read is a remote procedure (that will run on the file servers machine), a different version of read, called client stub, is put into the library Like the original read, client stub is called using the calling sequence described in previous slide Also, like the original read, client stub traps to the kernel Only unlike the original read, it does not put the parameters in registers and ask the kernel to give it data Instead, it packs the parameters into a message and asks the kernel to send the message to the server Following the call to send, client stub calls receive, blocking itself until the reply comes back When reply returns, extract results and return to caller
CS454/654
3-9
Server
Comm. (OS) Comm. (OS) (3) Call packet(s) (8) Result packet(s) Server stub Server procedure
Call() . . .
(1)
(2)
Transmit
Receive
(4) (7)
(5) (6)
Call
Receive
Transmit
Return
1. 2. 3. 4. 5.
Client procedure calls the stub as local call Client stub builds the message and calls the local OS Client OS sends msg to server OS Server OS hands over the message to server stub Server stubs unpacks parameters and calls server procedure
6. 7. 8. 9. 10.
Server procedure does its work and returns results to server stub Server stub packs results in a message and calls its local OS Server OS sends msg to client OS Client OS hands over the message to client stub Client stubs unpacks result and returns from procedure call
CS454/654
3-10
Parameter Passing
Compose messages that consist of structures for values Problem: Multiple machine types in a distributed system. Each often has its own representation for data! Agree on data representation Difficult Use copy/restore
Stubs
sum 4 7
3-11
Data Representation
Not a problem when processes are on the same machine Different machines have different representations Examples:
Characters: IBM Mainframes use EBCDIC character code and IBM PCs use ASCII. It is not possible to pass a character from an IBM PC client to an IBM Mainframe server. Integers: Different computers may store bytes of an integer in different order in memory (e.g.,: little endian, big endian), called host byte order
00000000 00000001 5 J
0 4
Little Endian (80*86, Intel 486) Big Endian (IBM 370, Sun SPARC) 0
3 7
00000000 00000000 0
2 6
00000000 00000000 0 L
0 4
00000001 00000000
1 5
2 6
1 5
0 4
1 5
3 7
2 6
3 7
CS454/654
3-12
Available, since the items in the message correspond to the procedure identifier and the parameters.
Once a standard has been agreed upon for representing each of the basic data types, given a parameter list and a message, it is possible to deduce which bytes belong to which parameter, and thus to solve the problem Given these rules, the requesting process knows that it must use this format, and the receiving process knows that incoming message will have this format Having the type information of parameters makes it possible to make necessary conversions
CS454/654
3-13
Option 1: Devise a canonical form for data types and require all senders to convert their internal representation to this form while marshalling.
Example: Use ASCII for characters, 0 (false) and 1 (true) for Booleans, etc., with everything stored in little endian Consequence: receiver no longer needs to worry about the byte ordering employed by the sender Sometimes inefficient: Both processes talking big endian; unnecessary conversion at both ends!
Option 2: Sender uses its own native format and indicates in first byte of message which format is used. Receiver stub examines first byte to see what the client is. Everyone can convert from everyone elses format.
Everyone has to know everyone elses format Insertion of a new machine with another format is expensive (luckily, does not happen too often) CORBA uses this approach
CS454/654
3-14
First grad asks: How have you been? Second: Great! I got married and I have three daughters. First: How old are they? Second: The product of their ages is 72, and the sum of their ages is the same as the number on that building over there.. First: Right, ok.. oh wait.. hmm, i still dont know Second: Oh sorry, the oldest one just started to play the piano First: Wonderful! my oldest is the same age!
CS454/654
3-15
The client is unable to locate the server The request message from the client to the server is lost The reply message from the server to the client is lost The server crashes after receiving a request The client crashes after sending a request
CS454/654
3-16
Examples:
Server might be down Server evolves (new version of the interface installed and new stubs generated) while the client is compiled with an older version of the client stub Use a special code, such as -1, as the return value of the procedure to indicate failure. In Unix, add a new error type and assign the corresponding value to the global variable errno.
Possible solutions:
-1 can be a legal value to be returned, e.g., sum(7, -8) Not every language has exceptions/signals (e.g., Pascal). Writing an exception/signal handler destroys the transparency
Have the error raise an exception (like in ADA) or a signal (like in C).
CS454/654
3-17
If timer expires before reply or ACK comes back: Kernel retransmits If message truly lost: Server will not differentiate between original and retransmission everything will work fine If many requests are lost: Kernel gives up and falsely concludes that the server is down we are back to Cannot locate server
CS454/654
3-18
Problem: Not sure why no reply (reply/request lost or server slow) ? Problem: What if the request is not idempotent, e.g. money transfer
Way out: Clients kernel assigns sequence numbers to requests to allow servers kernel to differentiate retransmissions from original
CS454/654
3-19
Server crashes
REQ
Server
Receive Execute Reply (a)
REQ No REP
Server
Receive Execute Crash (b)
REQ No REP
Server
Receive Crash (c)
REP
Wait until the server reboots and try the operation again. Guarantees that RPC has been executed at least one time (at least once semantic) Give up immediately and report back failure. Guarantees that RPC has been carried out at most one time (at most once semantics) Client gets no help. Guarantees nothing (RPC may have been carried out anywhere from 0 to a large number). Easy to implement.
3-20
CS454/654
Client Crashes
Client sends a request and crashes before the server replies: A computation is active and no parent is waiting for result (orphan)
Orphans waste CPU cycles and can lock files or tie up valuable resources Orphans can cause confusion (client reboots and does RPC again, but the reply from the orphan comes back immediately afterwards) Extermination: Before a client stub sends an RPC, it makes a log entry (in safe storage) telling what it is about to do. After a reboot, the log is checked and the orphan explicitly killed off.
Possible solutions
Expense of writing a disk record for every RPC; orphans may do RPCs, thus creating grandorphans impossible to locate; impossibility to kill orphans if the network is partitioned due to failure.
CS454/654
3-21
Reincarnation: Divide time up into sequentially numbered epochs. When a client reboots, it broadcasts a message declaring the start of a new epoch. When broadcast comes, all remote computations are killed. Solve problem without the need to write disk records
If network is partitioned, some orphans may survive. But, when they report back, they are easily detected given their obsolete epoch number Gentle reincarnation: A variant of previous one, but less Draconian
When an epoch broadcast comes in, each machine that has remote computations tries to locate their owner. A computation is killed only if the owner cannot be found
Expiration: Each RPC is given a standard amount of time, T, to do the job. If it cannot finish, it must explicitly ask for another quantum.
Choosing a reasonable value of T in the face of RPCs with wildly differing requirements is difficult
CS454/654
3-22
Asynchronous RPC
CS454/654
From Tanenbaum and van Steen, Distributed Systems: Principles and Paradigms Prentice-Hall, Inc. 2002
3-23
3-24
Server
Comm. (OS) Comm. (OS) (3) Call packet(s) (8) Result packet(s) Server stub Server procedure
Call() . . .
(1)
(2)
Transmit
Receive
(4) (7)
(5) (6)
Call
Receive
Transmit
Return
In many systems, stub procedures are generated automatically by a special stub compiler Given a specification of the server procedure and the encoding rules, the message format is uniquely determined It is, thus, possible to have a compiler read the server specification and generate a client stub that packs its parameters into the officially approved message format Similarly, the compiler can also produce a server stub that unpacks the parameters and calls the server Having both stub procedures generated from a single formal spec. of the server not only make life easier for the programmers, but reduces the chance of error and makes the system transparent with respect to differences in internal representation of data.
CS454/654
3-25
CS454/654
From Tanenbaum and van Steen, Distributed Systems: Principles and Paradigms Prentice-Hall, Inc. 2002
3-26
When the server begins executing, the call to initialize outside the main loop exports the server interface This means that the server sends a message to a program called a binder, to make its existence known. This process is referred to as registering To register, the server gives the binder its name, its version number, a unique identifier (32-bits), and a handle used to locate it The handle is system dependent (e.g., Ethernet address, IP address, an X.500 address, ) Call Input Output Register Deregister Name, version, handle, unique id Name, version, unique id 3-27
Dynamic Binding
When the client calls one of the remote procedures for the first time, say, read:
The client stub sees that it is not yet bound to a server, so it sends a message to the binder asking to import version x of server interface The binder checks to see if one or more servers have already exported an interface with this name and version number.
If no currently running server is willing to support this interface, the read call fails If a suitable server exists, the binder gives its handle and unique identifier to the client stub
Client stub uses the handle as the address to send the request message to. The message contains the parameters and a unique identifier, which the servers kernel uses to direct incoming message to the correct server
Call Look up Input Name, version Output Handle, unique id 3-28
Advantages
Flexibility Can support multiple servers that support the same interface, e.g.:
Binder can spread the clients randomly over the servers to even load Binder can poll the servers periodically, automatically deregistering the servers that fail to respond, to achieve a degree of fault tolerance Binder can assist in authentication: For example, a server specifies a list of users that can use it; the binder will refuse to tell users not on the list about the server
The binder can verify that both client and server are using the same version of the interface The extra overhead of exporting/importing interfaces costs time The binder may become a bottleneck in a large distributed system
3-29
Disadvantages
CS454/654
Bad news:
Additional complications
Distribution problems Object-orientation problems
Good news:
Computation is distributed among objects; now assume objects are located on different computers
Object-orientation features
Encapsulation Abstraction Extensible type system Code reuse (inheritance)
CS454/654
3-30