[go: up one dir, main page]

0% found this document useful (0 votes)
11 views27 pages

Module 2

Download as pdf or txt
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 27

2

BCS515D : Distributed Systems

Module - 2
DISTRIBUTED FILE SYSTEMS: Introduction, File service architecture.
NAME SERVICES: Introduction, Name services and the Domain Name System, Directory
services.
Textbook: Chapter- 12.1,12.2, 13.1-13.3

Characterization of Distributed Systems

Introduction

A file system is responsible for the organization, storage, retrieval, naming, sharing, and protection of
files. File systems provide directory services, which convert a file name (possibly a hierarchical one)
into an internal identifier (e.g. inode, FAT index). They contain a representation of the file data itself
and methods for accessing it (read/write). The file system is responsible for controlling access to the
data and for performing low-level operations such as buffering frequently used data and issuing disk
I/O requests.
A distributed file system is to present certain degrees of transparency to the user and the
system:
Access transparency: Clients are unaware that files are distributed and can access them in the
same way as local files are accessed.
Location transparency: A consistent name space exists encompassing local as well as remote
files. The name of a file does not give it location.
Concurrency transparency: All clients have the same view of the state of the file system.
This means that if one process is modifying a file, any other processes on the same system or
remote systems that are accessing the files will see the modifications in a coherent manner.
Failure transparency: The client and client programs should operate correctly after a server
failure. Heterogeneity: File service should be provided across different hardware and operating
system platforms.
Scalability: The file system should work well in small environments (1 machine, a dozen
machines) and also scale gracefully to huge ones (hundreds through tens of thousands of
systems).
Replication transparency: To support scalability, we may wish to replicate files across
multiple servers. Clients should be unaware of this.

SUSHMITHA S, DEPT. OF CSE-AIML, RNSIT


3
BCS515D : Distributed Systems

Migration transparency: Files should be able to move around without the client's knowledge.
Support fine-grained distribution of data: To optimize performance, we may wish to locate
individual objects near the processes that use them.
Tolerance for network partitioning: The entire network or certain segments of it may be
unavailable to a client during certain periods (e.g. disconnected operation of a laptop). The file
system should be tolerant of this.

File service types


To provide a remote system with file service, we will have to select one of two models of
operation. One of these is the upload/download model. In this model, there are two fundamental
operations: read file transfers an entire file from the server to the requesting client, and write
file copies the file back to the server. It is a simple model and efficient in that it provides local
access to the file when it is being used. Three problems are evident. It can be wasteful if the
client needs access to only a small amount of the file data. It can be problematic if the client
doesn't have enough space to cache the entire file. Finally, what happens if others need to
modify the same file?
The second model is a remote access model. The file service provides remote operations such
as open, close, read bytes, write bytes, get attributes, etc. The file system itself runs on servers.
The drawback in this approach is the servers are accessed for the duration of file access rather
than once to download the file and again to upload it.
Another important distinction in providing file service is that of understanding the difference
between directory service and file service. A directory service, in the context of file systems,
maps human-friendly textual names for files to their internal locations, which can be used by
the file service. The file service itself provides the file interface (this is mentioned above).
Another component of file distributed file systems is the client module. This is the client-side
interface for file and directory service. It provides a local file system interface to client software
(for example, the vnode file system layer of a UNIX kernel).

Introduction
• File system were originally developed for centralized computer systems and desktop
computers.
• File system was as an operating system facility providing a convenient programming
interface to disk storage.

SUSHMITHA S, DEPT. OF CSE-AIML, RNSIT


4
BCS515D : Distributed Systems

• Distributed file systems support the sharing of information in the form of files and
hardware resources.
• With the advent of distributed object systems (CORBA, Java) and the web, the picture
has become more complex.
• Figure 1 provides an overview of types of storage system.

Figure 1. Storage systems and their properties

Figure 2 shows a typical layered module structure for the implementation of a non distributed
file system in a conventional operating system.

Figure 2. File system modules

• File systems are responsible for the organization, storage, retrieval, naming, sharing and
protection of files.
• Files contain both data and attributes.
• A typical attribute record structure is illustrated in Figure 3.

SUSHMITHA S, DEPT. OF CSE-AIML, RNSIT


5
BCS515D : Distributed Systems

Figure 3. File attribute record structure

Figure 4 summarizes the main operations on files that are available to applications in UNIX
systems.

Figure 4. UNIX file system operations

Distributed file system requirements


Many of the requirements and potential pitfalls in the design of distributed services were first
observed in the early development of DFS. Initial offerings were access transparency and
location transparency. Later on, came the performance, scalability, concurrency control, fault
tolerance and security requirements.
Transparency
• The file service is usually the most heavily loaded service in an intranet, so its
functionality and performance are critical.
• The design of the file service should support many of the transparency requirements for
distributed systems identified

SUSHMITHA S, DEPT. OF CSE-AIML, RNSIT


6
BCS515D : Distributed Systems

Access transparency
• Client programs should be unaware of the distribution of files.
• A single set of operations is provided for access to local and remote files.
• Programs written to operate on local files are able to access remote files without
modification
Location transparency: Client programs should see a uniform file name space. Files or
groups of files may be relocated without changing their pathnames, and user programs see
the same name space wherever they are executed.
Mobility transparency: Neither client programs nor system administration tables in client
nodes need to be changed when files are moved. This allows file mobility – files or, more
commonly, sets or volumes of files may be moved, either by system administrators or
automatically.
Performance transparency: Client programs should continue to perform satisfactorily while
the load on the service varies within a specified range.
Scaling transparency: The service can be expanded by incremental growth to deal with a
wide range of loads and network sizes.

Concurrent file updates – Protected – Record locking


Changes to a file by one client should not interfere with the operation of other clients
simultaneously accessing or changing the same file. This is the well-known issue of
concurrency control, discussed in detail in Chapter 16. The need for concurrency control for
access to shared data in many applications is widely accepted and techniques are known for its
implementation, but they are costly. Most current file ervices follow modern UNIX standards
in providing advisory or mandatory file- or
record-level locking.

File Replication – performance


In a file service that supports replication, a file may be represented by several copies of its
contents at different locations. This has two benefits – it enables multiple servers to share the
load of providing a service to clients accessing the same set of files, enhancing the scalability
of the service, and it enhances fault tolerance by enabling clients to locate another server that
holds a copy of the file when one has failed.

SUSHMITHA S, DEPT. OF CSE-AIML, RNSIT


7
BCS515D : Distributed Systems

Hardware and Operating System heterogeneity.


The service interfaces should be de- fined so that client and server software can be implemented
for different operating systems and computers. This requirement is an important aspect of
openness.

Fault tolerance.
The central role of the file service in distributed systems makes it essential that the service
continue to operate in the face of client and server failures. Fortunately, a moderately fault-
tolerant design is straightforward for simple servers. To cope with transient communication
failures, the design can be based on at-most-once invocation semantics or it can use the simpler
at-least-once semantics with a server protocol designed in terms of idempotent operations,
ensuring that duplicated requests do not result in invalid updates to files. The servers can be
stateless, so that they can be restarted and the service restored after a failure without any need
to recover previous state.

Consistency – UNIX uses on copy update semantics. This may be difficult to achieve in
DFS.
Conventional file systems such as that provided in UNIX offer one-copy update semantics.
This refers to a model for concurrent access to files in which the file contents seen by all of the
processes accessing or updating a given file are those that they would see if only a single copy
of the file contents existed. When files are replicated or cached at different sites, there is an
inevitable delay in the propagation of modifications made at one site to all of the other sites
that hold copies, and this may result in some deviation from one-copy semantics.

Security.
Virtually all file systems provide access-control mechanisms based on the use of access control
lists. In distributed file systems, there is a need to authenticate client requests so that access
control at the server is based on correct user identities and to protect the contents of request
and reply messages with digital signatures and (optionally) encryption of secret data.

Efficiency. A distributed file service should offer facilities that are of at least the same power
and generality as those found in conventional file systems and should achieve a comparable
level of performance.

SUSHMITHA S, DEPT. OF CSE-AIML, RNSIT


8
BCS515D : Distributed Systems

Case studies
File service architecture
This is an abstract architectural model that underpins both NFS and AFS. It is based upon a
division of responsibilities between three modules – a client module that emulates a
conventional file system interface for application programs, and server modules, that perform
operations for clients on directories and on files. The architecture is designed to enable a
stateless implementation of the server module.

SUN NFS
Sun Microsystems’s Network File System (NFS) has been widely adopted in industry and in
academic environments since its introduction in 1985. The design and development of NFS
were undertaken by staff at Sun Microsystems in 1984. Although several distributed file
services had already been developed and used in universities and research laboratories, NFS
was the first file service that was designed as a product. The design and implementation of NFS
have achieved success both technically and commercially.

Andrew File System


Andrew is a distributed computing environment developed at Carnegie Mellon University
(CMU) for use as a campus computing and information system. The design of the Andrew File
System (henceforth abbreviated AFS) reflects an intention to support information sharing on a
large scale by minimizing client-server communication. This is achieved by transferring whole
files between server and client computers and caching them at clients until the server receives
a more up-to-date version.

File Service Architecture


An architecture that offers a clear separation of the main concerns in providing access to files
is obtained by structuring the file service as three components: ¬
• A flat file service
• A directory service
• A client module.
The relevant modules and their relationship is shown in Figure 5.

SUSHMITHA S, DEPT. OF CSE-AIML, RNSIT


9
BCS515D : Distributed Systems

Figure 5. File service architecture

The Client module implements exported interfaces by flat file and directory services on server
side.
Responsibilities of various modules can be defined as follows:
Flat file service:
Concerned with the implementation of operations on the contents of file. Unique File
Identifiers (UFIDs) are used to refer to files in all requests for flat file service operations. UFIDs
are long sequences of bits chosen so that each file has a unique among all of the files in a
distributed system.
Directory service:
Provides mapping between text names for the files and their UFIDs. Clients may obtain the
UFID of a file by quoting its text name to directory service. Directory service supports
functions needed generate directories, to add new files to directories.
Client module:
It runs on each computer and provides integrated service (flat file and directory) as a single
API to application programs. For example, in UNIX hosts, a client module emulates the full
set of Unix file operations. It holds information about the network locations of flat-file and
directory server processes; and achieve better performance through implementation of a cache
of recently used file blocks at the client.
Flat file service interface:
Figure 6 contains a definition of the interface to a flat file service.

SUSHMITHA S, DEPT. OF CSE-AIML, RNSIT


10
BCS515D : Distributed Systems

Figure 6. Flat file service interface

Access control
In distributed implementations, access rights checks have to be performed at the server because
the server RPC interface is an otherwise unprotected point of access to files.
Directory service interface
Figure 7 contains a definition of the RPC interface to a directory service.

Figure 7. Directory service interface

Hierarchic file system


A hierarchic file system such as the one that UNIX provides consists of a number of directories
arranged in a tree structure.

File Group
A file group is a collection of files that can be located on any server or moved between servers
while maintaining the same names.
– A similar construct is used in a UNIX file system.
– It helps with distributing the load of file serving between several servers.

SUSHMITHA S, DEPT. OF CSE-AIML, RNSIT


11
BCS515D : Distributed Systems

– File groups have identifiers which are unique throughout the system (and hence for an open
system, they must be globally unique).
To construct globally unique ID we use some unique attribute of the machine on which it is
created. E.g: IP number, even though the file group may move subsequently

SUSHMITHA S, DEPT. OF CSE-AIML, RNSIT


2
BCS515D : Distributed Systems

Module - 2
DISTRIBUTED FILE SYSTEMS: Introduction, File service architecture.
NAME SERVICES: Introduction, Name services and the Domain Name System, Directory
services.
Textbook: Chapter- 12.1,12.2, 13.1-13.3

NAME SERVICES
Which one is easy for humans and machines? and why?
74.125.237.83 or google.com
128.250.1.22 or distributed systems website
128.250.1.25 or Prof. Buyya
Disk 4, Sector 2, block 5 OR /usr/raj/hello.c

Introduction
• In a distributed system, names are used to refer to a wide variety of resources such as:
o Computers, services, remote objects, and files, as well as users.
• Naming is fundamental issue in DS design as it facilitates communication and resource
sharing.
o A name in the form of URL is needed to access a specific web page.
o Processes cannot share particular resources managed by a computer system
unless they can name them consistently
o Users cannot communicate within one another via a DS unless they can name
one another, with email address.
• Names are not the only useful means of identification: descriptive attributes are another.

Names, addresses and other attributes


Any process that requires access to a specific resource must possess a name or an identifier for
it.
• Examples of human-readable names are file names such as /etc/passwd,
• URLs such as http://www.cdk5.net/ and Internet domain names such as www.cdk5.net.
The term identifier is sometimes used to refer to names that are interpreted only by programs.
Remote object references and NFS file handles are examples of identifiers. Identifiers are
chosen for the efficiency with which they can be looked up and stored by software.

SUSHMITHA S, DEPT. OF CSE-AIML, RNSIT


3
BCS515D : Distributed Systems

Needham [1993] makes the distinction between a pure name and other names. Pure names are
simply uninterpreted bit patterns. Non-pure names contain information about the object that
they name; in particular, they may contain information about the location of the object. Pure
names always have to be looked up before they can be of any use. At the other extreme from a
pure name is an object’s address: a value that identifies the location of the object rather than
the object itself.
Addresses are efficient for accessing objects, but objects can sometimes be relocated, so
addresses are inadequate as a means of identification. For example, users’ email addresses
usually have to change when they move between organizations or Internet service providers;
they are not in themselves guaranteed to refer to a specific individual over time.
The association between a name and an object is called a binding. In general, names are bound
to attributes of the named objects, rather than the implementation of the objects themselves.
An attribute is he value of a property associated with an object. A key attribute of an entity that
is usually relevant in a distributed system is its address. For example:
• The DNS maps domain names to the attributes of a host computer: its IP address, the
type of entry (for example, a reference to a mail server or another host) and, for
example, the length of time the host’s entry will remain valid.
• The X500 directory service can be used to map a person’s name onto attributes
including their email address and telephone number.

Figure below shows the domain name portion of a URL resolved first via the DNS into an IP
address and then, at the final hop of Internet routing, via ARP to an Ethernet address for the
web server. The last part of the URL is resolved by the file system on the web server to locate
the relevant file.

SUSHMITHA S, DEPT. OF CSE-AIML, RNSIT


4
BCS515D : Distributed Systems

1. Names and service


Many of the names used in a distributed system are specific to some particular service.
For example, users of the social networking web site twitter.com, have names such as
@magmapoetry that no other service resolves. Also, a client may use a service-specific
name when requesting a service to perform an operation upon a named object or
resource that it manages. For example, a file name is given to the file service when
requesting that the file be deleted, and a process identifier is presented to the process
management service when requesting that it be sent a signal.

2. Uniform Resource Identifiers


Uniform Resource Identifiers (URIs) [Berners-Lee et al. 2005] came about from the
need to identify resources on the Web, and other Internet resources such as electronic
mailboxes. An important goal was to identify resources in a coherent way, so that they
could all be processed by common software such as browsers.
URIs are ‘uniform’ in that their syntax incorporates that of indefinitely many individual
types of resource identifiers (that is, URI schemes), and there are procedures for
managing the global namespace of schemes.
The advantage of uniformity is that it eases the process of introducing new types of
identifier, as well as using existing types of identifier in new contexts, without
disrupting existing usage.

3. Uniform Resource Locators


Some URIs contain information that can be used to locate and access a resource; others
are pure resource names. The familiar term Uniform Resource Locator (URL) is often
used for URIs that provide location information and specify the method for accessing
the resource, including the ‘http’
For example, http://www.cdk5.net/ identifies a web page at the given path (‘/’) on the
host www.cdk5.net, and specifies that the HTTP protocol be used to access it. Another
example is a ‘mailto’ URL, such as mailto:fred@flintstone.org, which identifies the
mailbox at the given address.
URLs are efficient identifiers for accessing resources. But they suffer from the
disadvantage that if a resource is deleted or if it moves, say from one web site to another,
there may be dangling links to the resource containing the old URL. If a user clicks on

SUSHMITHA S, DEPT. OF CSE-AIML, RNSIT


5
BCS515D : Distributed Systems

a dangling link to a web resource, then the web server will either respond that the
resource is not found or – worse, perhaps – supply a different resource that now
occupies the same location.

4. Uniform Resource Names


Uniform Resource Names (URNs) are URIs that are used as pure resource names rather
than locators. For example, the URI:
mid:0E4FC272-5C02-11D9-B115-000A95B55BC8@hpl.hp.com
is a URN that identifies the email message containing it in its ‘Message-Id’ field. The
URI distinguishes that message from any other email message. But it does not provide
the message’s address in any store, so a lookup operation is needed to find it.

Name services and the Domain Name System


A name service stores information about a collection of textual names, in the form of bindings
between the names and the attributes of the entities they denote, such as users, computers,
services and objects.
The collection is often subdivided into one or more naming contexts:
• individual subsets of the bindings that are managed as a unit.
• The major operation that a name service supports is to resolve a name – that is, to look
up attributes from a given name.
Name management is separated from other services largely because of the openness of
distributed systems, which brings the following motivations:
Unification: It is often convenient for resources managed by different services to use the same
naming scheme. URIs are a good example of this.
Integration: It is not always possible to predict the scope of sharing in a distributed system. It
may become necessary to share and therefore name resources that were created in different
administrative domains. Without a common name service, the administrative domains may use
entirely different naming conventions.

General name service requirements


To handle an essentially arbitrary number of names and to serve an arbitrary number of
administrative organizations:

SUSHMITHA S, DEPT. OF CSE-AIML, RNSIT


6
BCS515D : Distributed Systems

For example, the system should be capable of handling the names of all the
documents in the world.
A long lifetime: Many changes will occur in the organization of the set of names and in the
components that implement the service during its lifetime.
High availability: Most other systems depend upon the name service; they can’t work when it
is broken.
Fault isolation: Local failures should not cause the entire service to fail.
Tolerance of mistrust: A large open system cannot have any component that is trusted by all
of the clients in the system.

Name Space
A name space is the collection of all valid names recognized by a particular service. The service
will attempt to look up a valid name, even though that name may prove not to correspond to
any object – i.e., to be unbound. Name spaces require a syntactic definition to separate valid
names from invalid names.
For example, ‘...’ is not acceptable as the DNS name of a computer, whereas
www.cdk99.net is valid
Names may have an internal structure that represents their position in a hierarchic name space
such as pathnames in a file system, or in an organizational hierarchy such as Internet domain
names; or they may be chosen from a flat set of numeric or symbolic identifiers
One important advantage of a hierarchy is that it makes large name spaces more manageable
/etc/passwd is a hierarchic name with two components. The first, ‘etc’, is resolved relative to
the context ‘/’, or root, and the second part, ‘passwd’, is relative to the context ‘/etc’.
The name /oldetc/passwd can have a different meaning because its second component is
resolved in a different context. Similarly, the same name /etc/passwd may resolve to different
files in the contexts of two different computers.

Aliases
An alias is a name defined to denote the same information as another name, similar to a
symbolic link between file path names. Aliases allow more convenient names to be substituted
for relatively complicated ones, and allow alternative names to be used by different people for
the same entity.

SUSHMITHA S, DEPT. OF CSE-AIML, RNSIT


7
BCS515D : Distributed Systems

An example is the common use of URL shorteners, often used in Twitter posts and other
situations where space is at a premium.
For example, using web redirection, http://bit.ly/ctqjvH refers to
http://cdk5.net/additional/rmi/programCode/ShapeListClient.java.

Naming Domains
A naming domain is a name space for which there exists a single overall administrative
authority responsible for assigning names within it. This authority is in overall control of
which names may be bound within the domain, but it is free to delegate this task.
Responsibility for a naming domain normally goes hand in hand with responsibility for
managing and keeping up-to-date the corresponding part of the database stored in an
authoritative name server and used by the name service. Naming data belonging to different
naming domains are in general stored by distinct name servers managed by the corresponding
authorities.

Combining and customizing name spaces


The DNS provides a global and homogeneous name space in which a given name refers to the
same entity, no matter which process on which computer looks up the name. By contrast, some
name services allow distinct name spaces Sometimes heterogeneous name spaces to be
embedded into them; and some name services allow the name space to be customized to suit
the needs of individual groups, users or even processes.

Merging
The practice of mounting file systems in UNIX and NFS provides an example in which a part
of one name space is conveniently embedded in another. But consider how to merge the entire
UNIX file systems of two (or more) computers called red and blue. Each computer has its own
root, with overlapping file names.
For example, /etc/passwd refers to one file on red and a different file on blue.
The obvious way to merge the file systems is to replace each computer’s root with a ‘super
root’ and mount each computer’s file system in this super root, say as /red and /blue.
But the new naming convention by itself would cause programs on the two computers that still
use the old name /etc/passwd to malfunction

SUSHMITHA S, DEPT. OF CSE-AIML, RNSIT


8
BCS515D : Distributed Systems

A solution is to leave the old root contents on each computer and embed the mounted file
systems /red and /blue of both computers (assuming that this does not produce name clashes
with the old root contents).
The moral is that we can always merge name spaces by creating a higher-level root context,
but this may raise a problem of backward-compatibility.
Users and programs can then refer to /red/etc/passwd and /blue/etc/passwd.

Heterogeneity: The Distributed Computing Environment (DCE) name space [OSF 1997]
allows heterogeneous name spaces to be embedded within it. DCE names may contain
junctions, which are similar to mount points in NFS and UNIX

Customization: We saw in the example of embedding NFS-mounted file systems above that
sometimes users prefer to construct their name spaces independently rather than sharing a
single name space.
File system mounting enables users to import files that are stored on servers and shared, while
the other names continue to refer to local, unshared files and can be administered
autonomously. But the same files accessed from two different computers may be mounted at
different points and thus have different names.

Name resolution
Name resolution is an iterative or recursive process whereby a name is repeatedly presented to
naming contexts in order to look up the attributes to which it refers. A naming context either
maps a given name onto a set of primitive attributes (such as those of a user) directly, or maps
it onto a further naming context and a derived name to be presented to that context.
Name servers and navigation: Any name service, such as DNS, that stores a very large
database and is used by a large population will not store all of its naming information on a
single server computer. Such a server would be a bottleneck and a critical point of failure. Any
heavily used name services should use replication to achieve high availability. We shall see that
DNS specifies that each subset of its database is replicated in at least two failure-independent
servers. The process of locating naming data from more than one name server in order to
resolve a name is called navigation.

SUSHMITHA S, DEPT. OF CSE-AIML, RNSIT


9
BCS515D : Distributed Systems

Iterative navigation:

To resolve a name, a client presents the name to the local name server, which attempts to resolve
it. If the local name server has the name, it returns the result immediately. If it does not, it will
suggest another server that will be able to help. Resolution proceeds at the new server, with
further navigation as necessary until the name is located or is discovered to be unbound.
As DNS is designed to hold entries for millions of domains and is accessed by vast numbers of
clients, it would not be feasible to have all queries starting at a root server, even if it were
replicated heavily. The DNS database is partitioned between servers in such a way as to allow
many queries to be satisfied locally and others to be satisfied without needing to resolve each
part of the name separately
In multicast navigation, a client multicasts the name to be resolved and the required object
type to the group of name servers. Only the server that holds the named attributes responds to
the request. Unfortunately, however, if the name proves to be unbound, the request is greeted
with silence. Cheriton and Mann [1989] describe a multicast-based navigation scheme in which
a separate server is included in the group to respond when the required name is unbound.

Non-recursive and recursive server-controlled navigation

SUSHMITHA S, DEPT. OF CSE-AIML, RNSIT


10
BCS515D : Distributed Systems

Under non-recursive server-controlled navigation, any name server may be chosen by the
client. This server communicates by multicast or iteratively with its peers in the style described
above, as though it were a client. Under recursive server-controlled navigation, the client once
more contacts a single server. If this server does not store the name, the server contacts a peer
storing a (larger) prefix of the name, which in turn attempts to resolve it. This procedure
continues recursively until the name is resolved.
If a name service spans distinct administrative domains, then clients executing in one
administrative domain may be prohibited from accessing name servers belonging to another
such domain. Moreover, even name servers may be prohibited from discovering the disposition
of naming data across name servers in another administrative domain. Then, both client-
controlled and non-recursive server-controlled navigation are inappropriate, and recursive
server-controlled navigation must be used. Authorized name servers request name service data
from designated name servers managed by different administrations, which return the attributes
without revealing where the different parts of the naming database are stored.

Caching
In DNS and other name services, client name resolution software and servers maintain a cache
of the results of previous name resolutions. When a client requests a name lookup, the name
resolution software consults its cache. If it holds a recent result from a previous lookup for the
name, it returns it to the client; otherwise, it sets about finding it from a server. That server, in
turn, may return data cached from other servers.

The Domain Name System


The Domain Name System is a name service design whose main naming database is used
across the Internet. DNS replaced the original Internet naming scheme, in which all host names
and addresses were held in a single central master file and downloaded by FTP to all computers
that required them
This original scheme was soon seen to suffer from three major shortcomings:
• It did not scale to large numbers of computers.
• Local organizations wished to administer their own naming systems.
• A general name service was needed – not one that serves only for looking up computer
addresses.

SUSHMITHA S, DEPT. OF CSE-AIML, RNSIT


11
BCS515D : Distributed Systems

Domain names
The DNS is designed for use in multiple implementations, each of which may have its own
name space. In practice, however, only one is in widespread use, and that is the one used for
naming across the Internet. The Internet DNS name space is partitioned both organizationally
and according to geography. The names are written with the highest-level domain on the right.
The original top-level organizational domains (also called generic domains) in use across the
Internet were:

New top-level domains such as biz and mobi have been added since the early 2000s.
In addition, every country has its own domains:

Countries, particularly those other than the US often use their own subdomains to distinguish
their organizations. The UK, for example, has domains co.uk and ac.uk, which correspond to
com and edu respectively (ac stands for ‘academic community’).

DNS queries
The Internet DNS is primarily used for simple host name resolution and for looking up
electronic mail hosts, as follows:
Host name resolution: In general, applications use the DNS to resolve host names into IP
addresses.
For example, when a web browser is given a URL containing the domain name
www.dcs.qmul.ac.uk, it makes a DNS enquiry and obtains the corresponding IP address. As
was pointed out in Chapter 4, browsers then use HTTP to communicate with the web server at
the given IP address, using a reserved port number if none is specified in the URL. FTP and
SMTP services work in a similar way; for example, an FTP program may be given the domain

SUSHMITHA S, DEPT. OF CSE-AIML, RNSIT


12
BCS515D : Distributed Systems

name ftp.dcs.qmul.ac.uk and can make a DNS enquiry to get its IP address and then use TCP
to communicate with it at the reserved port number.

Mail host location: Electronic mail software uses the DNS to resolve domain names into the
IP addresses of mail hosts – i.e., computers that will accept mail for those domains.
For example, when the address tom@dcs.rnx.ac.uk is to be resolved, the DNS is queried with
the address dcs.rnx.ac.uk and the type designation ‘mail’. It returns a list of domain names of
hosts that can accept mail for dcs.rnx.ac.uk, if such exist (and, optionally, the corresponding IP
addresses). The DNS may return more than one domain name so that the mail software can try
alternatives if the main mail host is unreachable for some reason. The DNS returns an integer
preference value for each mail host, indicating the order in which the mail hosts should be tried.

Some other types of query that are implemented in some installations but are less frequently
used than those just given are:
Reverse resolution: Some software requires a domain name to be returned given an IP address.
This is just the reverse of the normal host name query, but the name server receiving the query
replies only if the IP address is in its own domain.
Host information: The DNS can store the machine architecture type and operating system
with the domain names of hosts. It has been suggested that this option should not be used in
public, because it provides useful information for those attempting to gain unauthorized access
to computers.

DNS name servers


The problems of scale are treated by a combination of partitioning the naming database and
replicating and caching parts of it close to the points of need. The DNS database is distributed
across a logical network of servers. Each server holds part of the naming database – primarily
data for the local domain. Queries concerning computers in the local domain are satisfied by
servers within that domain. However, each server records the domain names and addresses of
other name servers, so that queries pertaining to objects outside the domain can be satisfied.
The DNS naming data are divided into zones. A zone contains the following data:
• Attribute data for names in a domain, less any subdomains administered by lower-
level authorities. For example, a zone could contain data for Queen Mary, University

SUSHMITHA S, DEPT. OF CSE-AIML, RNSIT


13
BCS515D : Distributed Systems

of London – qmul.ac.uk – less the data held by departments (for example the
Department of Computer Science – dcs.qmul.ac.uk).
• The names and addresses of at least two name servers that provide authoritative data
for the zone. These are versions of zone data that can be relied upon as being reasonably
up-to-date.
• The names of name servers that hold authoritative data for delegated subdomains; and
‘glue’ data giving the IP addresses of these servers.
• Zone-management parameters, such as those governing the caching and replication of
zone data.
System administrators enter the data for a zone into a master file, which is the source of
authoritative data for the zone. There are two types of server that are considered to provide
authoritative data. A primary or master server reads zone data directly from a local master file.
Secondary servers download zone data from a primary server. They communicate periodically
with the primary server to check whether their stored version matches that held by the primary
server. If a secondary’s copy is out of date, the primary sends it the latest version. The frequency
of the secondary’s check is set by administrators as a zone parameter, and its value is typically
once or twice a day.
Figure below shows the arrangement of some of the DNS database as it stood in the year 2001.

SUSHMITHA S, DEPT. OF CSE-AIML, RNSIT


14
BCS515D : Distributed Systems

Navigation and query processing


A DNS client is called a resolver. It is normally implemented as library software. It accepts
queries, formats them into messages in the form expected under the DNS protocol and
communicates with one or more name servers in order to satisfy the queries. A simple request-
reply protocol is used, typically using UDP packets on the Internet (DNS servers use a well-
known port number). The resolver times out and resends its query if necessary. The resolver
can be configured to contact a list of initial name servers in order of preference in case one or
more are unavailable.

Resource records
Zone data are stored by name servers in files in one of several fixed types of resource record.
For the Internet database, these include the types given in Figure below. Each record refers to
a domain name, which is not shown. The entries in the table refer to items already mentioned,
except that AAAA records store IPv6 addresses whereas A records store IPv4 addresses, and
TXT entries are included to allow arbitrary other information to be stored along with domain
names.

The data for a zone starts with an SOA-type record, which contains the zone parameters that
specify, for example, the version number and how often secondaries should refresh their copies.
This is followed by a list of records of type NS specifying the name servers for the domain and
a list of records of type MX giving the domain names of mail hosts, each prefixed by a number
expressing its preference. For example, part of the database for the domain dcs.qmul.ac.uk at
one point is shown in Figure below, where the time to live 1D means 1 day.

SUSHMITHA S, DEPT. OF CSE-AIML, RNSIT


15
BCS515D : Distributed Systems

Further records of type A later in the database give the IP addresses for the two name servers
dns0 and dns1. The IP addresses of the mail hosts and the third name server are given in the
databases corresponding to their domains.

Load sharing by name servers: At some sites, heavily used services such as the Web and FTP
are supported by a group of computers on the same network. In this case, the same domain
name is used for each member of the group. When a domain name is shared by several
computers, there is one record for each computer in the group, giving its IP address. By default,
the name server responds to queries for which multiple records match the requested name by
returning the IP addresses according to a round-robin schedule. Successive clients are given
access to different servers so that the servers can share the workload. Caching has a potential
for spoiling this scheme, for once a non- authoritative name server or a client has the server’s
address in its cache it will continue to use it. To counteract this effect, the records are given a
short time to live.

The BIND implementation of the DNS


The Berkeley Internet Name Domain (BIND) is an implementation of the DNS for computers
running UNIX. Client programs link in library software as the resolver. DNS name server
computers run the named daemon.
BIND allows for three categories of name server: primary servers, secondary servers and
caching-only servers. The named program implements just one of these types, according to the
contents of a configuration file. The first two categories are as described above. Caching-only
servers read in from a configuration file sufficient names and addresses of authoritative servers
to resolve any name. Thereafter, they only store this data and data that they learn by resolving
names for clients.
A typical organization has one primary server, with one or more secondary servers that provide
name serving on different local area networks at the site. Additionally, individual computers

SUSHMITHA S, DEPT. OF CSE-AIML, RNSIT


16
BCS515D : Distributed Systems

often run their own caching-only server, to reduce network traffic and speed up response times
still further.

Directory services
A service that stores collections of bindings between names and attributes and that looks up
entries that match attribute-based specifications is called a directory service. Examples are
Microsoft’s Active Directory Services, X.500 and its cousin LDAP Univers [Bowman et al.
1990] and Profile [Peterson 1988].
Directory services are sometimes called yellow pages services, and conventional name services
are correspondingly called white pages services, in an analogy with the traditional types of
telephone directory. Directory services are also sometimes known as attribute-based name
services.

SUSHMITHA S, DEPT. OF CSE-AIML, RNSIT

You might also like