Distributed Systems CSE
DISTRIBUTED FILE SYSTEMS
• File System performs organization, storage, retrieval, sharing and protection of files.
• DFS is a resource management component of distributed operating system.
• It provides storage and retrieval of files in distributed environment.
• Users and storage devices of DFS are physically dispersed.
Two main purposes of using files:
1. Permanent storage of information on a secondary storage media.
2. Sharing of information between applications.
A file system is a subsystem of the operating system that performs file management activities such
as organization, storing, retrieval, naming, sharing, and protection of files.
A file system frees the programmer from concerns about the details of space allocation and layout of
the secondary storage device.
The design and implementation of a distributed file system is more complex than a conventional file
system due to the fact that the users and storage devices are physically dispersed.
In addition to the functions of the file system of a single-processor system, the distributed file system
supports the following:
1. Remote information sharing: Thus any node, irrespective of the physical location of the file, can
access the file.
2. User mobility: User should be permitted to work on different nodes.
3. Availability: For better fault-tolerance, files should be available for use even in the event of
temporary failure of one or more nodes of the system. Thus the system should maintain multiple
copies of the files, the existence of which should be transparent to the user.
DISTRIBUTED FILE SYSTEM SERVICES
A distributed file system provides the following types of services:
1. Storage service: Allocation and management of space on a secondary storage device thus
providing a logical view of the storage system.
2. True file service: Includes file-sharing semantics, file-caching mechanism, file replication
mechanism, concurrency control, multiple copy update protocol etc.
3. Name/Directory service: Responsible for directory related activities such as creation and
deletion of directories, adding a new file to a directory, deleting a file from a directory, changing the
name of a file, moving a file from one directory to another etc.
SUTHOJU GIRIJA RANI, Assistant Professor, CSE, NGIT 1
Distributed Systems CSE
Desirable characteristics of a distributed file system
1. Transparency
- Structure transparency
Clients should not know the number or locations of file servers and the storage devices. Note: multiple
file servers provided for performance, scalability, and reliability.
- Access transparency
Both local and remote files should be accessible in the same way. The file system should automatically
locate an accessed file and transport it to the client’s site.
- Naming transparency
The name of the file should give no hint as to the location of the file. The name of the file must not be
changed when moving from one node to another.
- Replication transparency
If a file is replicated on multiple nodes, both the existence of multiple copies and their locations should
be hidden from the clients.
2. User mobility: Automatically bring the user’s environment (e.g. users home directory) to the node
where the user logs in.
3. Performance: Performance is measured as the average amount of time needed to satisfy client
requests. This time includes CPU time + time for accessing secondary storage + network access time. It
is desirable that the performance of a distributed file system be comparable to that of a centralized
file system.
4. Simplicity and ease of use: User interface to the file system be simple and number of commands
should be as small as possible.
5. Data integrity: Concurrent access requests from multiple users who are competing to access the file
must be properly synchronized by the use of some form of concurrency control mechanism. Atomic
transactions can also be provided.
6. High availability: A distributed file system should continue to function in the face of partial failures
such as a link failure, a node failure, or a storage device crash.
7. High reliability: Probability of loss of stored data should be minimized. System should automatically
generate backup copies of critical files.
8. Scalability: Growth of nodes and users should not seriously disrupt service.
• A highly reliable and scalable distributed file system should have multiple and independent file
servers controlling multiple and independent storage devices.
9. Security: Users should be confident of the privacy of their data.
SUTHOJU GIRIJA RANI, Assistant Professor, CSE, NGIT 2
Distributed Systems CSE
10. Heterogeneity: There should be easy access to shared data on diverse platforms (e.g. Unix
workstation, Wintel platform etc).
GOALS OF DISTRIBUTED FILE SYSTEMS
• DFS has two important goals:
1. Network transparency: Users are not aware of location of files
2. High Availability: System failures or failures in regular activities should not result into
unavailability of files
NETWORK FILE SYSTEM (NFS)
• The Way of arrangement is Client-Server Architecture
• Network File System (NFS) is a distributed file system developed by sun micro systems. It is
very popular.
• The model underlying NFS and similar systems is that of a remote file service
• The idea behind NFS is that each file server provides a standardized view of local file
SUTHOJU GIRIJA RANI, Assistant Professor, CSE, NGIT 3
Distributed Systems CSE
Architecture of Distributed File System
Client-Server Architecture
Goal: Try to make a file system transparently available to remote
clients.
(a) The remote access model. (b) The upload/download model
The basic NFS architecture for UNIX systems
SUTHOJU GIRIJA RANI, Assistant Professor, CSE, NGIT 1
Distributed Systems CSE
File System Operations
Cluster-Based Distributed File Systems
(a) distributing whole files across several servers
(b) striping files for parallel access
SUTHOJU GIRIJA RANI, Assistant Professor, CSE, NGIT 2
Distributed Systems CSE
The organization of a Google cluster of servers
Ivy Distributed File Systems
DHash: Computing look-up keys: content-based or public key based
SUTHOJU GIRIJA RANI, Assistant Professor, CSE, NGIT 3
Distributed Systems CSE
RPC calls in NFS
Communication
(a) Reading data from a file in NFS version 3.
(b) Reading data using a compound procedure in version 4.
Coda RPC2 Subsystem
Side effects in Coda’s RPC2 system allows application specific protocols during communication
A file is modified, and all outdated copies need to be invalidated
(a) Sending an invalidation message one at a time.
(b) multicasting: sending invalidation messages in parallel
SUTHOJU GIRIJA RANI, Assistant Professor, CSE, NGIT 4
Distributed Systems CSE
Files associated with a single TCP connection
SUTHOJU GIRIJA RANI, Assistant Professor, CSE, NGIT 5
Distributed Systems CSE
Naming & Name Service in NFS
• Naming is a mapping between logical and physical objects. For example, users refer to a file by
a textual name, but it is mapped to disk blocks.
• Path Name : Files are named by some combination of machine or host name and path name.
This may be used in server side.
• Mount service : mount the remote directories to the local directories.
Name Service is the principal mechanism used in distributed systems for referring to objects within
your applications via a name identifying that object. Examples are filenames, domain names and so on.
The association between a name & an object is called a binding.
A name service is a collection of naming context. Its operation is name resolution.
SUTHOJU GIRIJA RANI, Assistant Professor, CSE, NGIT 1
Distributed Systems CSE
SUTHOJU GIRIJA RANI, Assistant Professor, CSE, NGIT 2
Distributed Systems CSE
SUTHOJU GIRIJA RANI, Assistant Professor, CSE, NGIT 3
Distributed Systems CSE
SUTHOJU GIRIJA RANI, Assistant Professor, CSE, NGIT 4
Distributed Systems CSE
Mounting, Synchronization, File Sharing and locking
Mounting (part of) a remote file system in NFS.
SUTHOJU GIRIJA RANI, Assistant Professor, CSE, NGIT 1
Distributed Systems CSE
Mount can be of 3 types: Soft mount-Time bound mounting, hard mount - no time bound, auto mount
– on demand mounting
Auto mounting, also known as autofs, is a client-side service that automatically mounts and unmounts
file systems in a distributed system
A simple automounter for NFS
Synchronization - Semantics of File Sharing
On a single processor, when a read follows a write, value returned by read is the value just written.
In a distributed system with caching, obsolete values may be returned.
Four ways of dealing with the shared files in a distributed system.
SUTHOJU GIRIJA RANI, Assistant Professor, CSE, NGIT 2
Distributed Systems CSE
File Locking
NFSv4 operations related to file locking
File Sharing in Coda
The transactional behavior in sharing files in Coda
SUTHOJU GIRIJA RANI, Assistant Professor, CSE, NGIT 3
Distributed Systems CSE
Caching and Replication in Distributed File System
• Client-side caching
• Caching in NFS
• Caching in Coda
• Server-side replication
• Server replication in Coda
NFS Client-Side Caching
Uses NFSv4 callback mechanism to recall file delegation.
File delegation : process of granting a client the right to access a file on an NFS server
Client-Side Caching in Coda
The use of local copies when opening a session in Coda.
SUTHOJU GIRIJA RANI, Assistant Professor, CSE, NGIT 1
Distributed Systems CSE
Server Replication in Coda
Coda uses a variant of replicated-write protocol, ROWA(Read-one, write all)
Two clients with a different AVSG for the same replicated file.
Handling Byzantine Failures
3k+1 replicas for k faulty tolerance
A Byzantine failure in a distributed system occurs when a node provides incorrect or misleading
information to other nodes. Byzantine failures are also known as Byzantine generals problems or
Byzantine agreement problems.
The different phases in Byzantine fault tolerance
SUTHOJU GIRIJA RANI, Assistant Professor, CSE, NGIT 2
Distributed Systems CSE
Security in NFS - The NFS security architecture
Secure RPCs : In NFS-V4
SUTHOJU GIRIJA RANI, Assistant Professor, CSE, NGIT 3
Distributed Systems CSE
Access Control
The various kinds of users and processes distinguished by NFS with respect to access control.
Secure Collaborative Storage
Storage claims in the peer-to-peer system
SUTHOJU GIRIJA RANI, Assistant Professor, CSE, NGIT 4
Distributed Systems CSE
Google File System (GFS)
• Each GFS cluster consists of a single master along with multiple chunk servers.
• Each GFS file is divided into chunks of 64 Mbyte each, after which these chunks are distributed
across what are called chunk servers.
• An important observation is that a GFS master is contacted only for metadata information.
• In particular, a GFS client passes a file name and chunk index to the master, expecting a
contact address for the chunk.
• The contact address contains all the information to access the correct chunk server to obtain
the required file chunk.
Cluster-Based Distributed File Systems - The organization of a Google cluster of servers
Design Considerations
SUTHOJU GIRIJA RANI, Assistant Professor, CSE, NGIT 1
Distributed Systems CSE
SUTHOJU GIRIJA RANI, Assistant Professor, CSE, NGIT 2
Distributed Systems CSE
SUTHOJU GIRIJA RANI, Assistant Professor, CSE, NGIT 3
Distributed Systems CSE
SUTHOJU GIRIJA RANI, Assistant Professor, CSE, NGIT 4
Distributed Systems CSE
SUTHOJU GIRIJA RANI, Assistant Professor, CSE, NGIT 5
Distributed Systems CSE
SUTHOJU GIRIJA RANI, Assistant Professor, CSE, NGIT 6
Distributed Systems CSE
Sun NFS, VFS & AFS
Sun NFS : Developed by Sun Microsystems
It allows a remote client to access file system over a network.
It’s a client server application. It uses RPC to route requests between client & server.
SUTHOJU GIRIJA RANI, Assistant Professor, CSE, NGIT 1
Distributed Systems CSE
Virtual File System:
File handle : File identifier used in NFS is calle file handle
VFS is used to distinguish between local & remote files.
SUTHOJU GIRIJA RANI, Assistant Professor, CSE, NGIT 2
Distributed Systems CSE
SUTHOJU GIRIJA RANI, Assistant Professor, CSE, NGIT 3
Distributed Systems CSE
SUTHOJU GIRIJA RANI, Assistant Professor, CSE, NGIT 4
Distributed Systems CSE
• Andrew File Systems is a distributed file system, which uses a set of remote servers to access
files
• AFS uses a local cache to reduce the workload increase the performance of distributed
computing environment
SUTHOJU GIRIJA RANI, Assistant Professor, CSE, NGIT 5
Distributed Systems CSE
SUTHOJU GIRIJA RANI, Assistant Professor, CSE, NGIT 6
Distributed Systems CSE
CODA
• CONTENT DELIVERY ARCHITECHTURE
• COMMON DATA AVAILABLITY
CoDA architecture is based on AFS architecture.
CoDA is a file system for a large scale distributed computing environment.
CoDA optimizes: Availability, Performance, Highest degree of consistency.
It provides resiliency to server and network failures through 2 mechanisms
SUTHOJU GIRIJA RANI, Assistant Professor, CSE, NGIT 1
Distributed Systems CSE
SUTHOJU GIRIJA RANI, Assistant Professor, CSE, NGIT 2
Distributed Systems CSE
Security in CODA
CODA Architecture consists of two parts which deal with:
• Setting up a secure channel between client and a server using RPC system level authentication
• Controlling access to files
Utilization of CODA
• Many organizations like University (CMU) are using CODA at their campus and making a
serious effort to improve Coda in the given areas:
• Reliability and performance
• Ports to important platforms
• Documentation, mailing groups
• Extensions in functionality
SUTHOJU GIRIJA RANI, Assistant Professor, CSE, NGIT 3
Distributed Systems CSE
DISTRIBUTED WEB BASED SYSTEMS
• The World wide web(www) can be viewed as huge distributed system consisting of millions of
client's and servers for accessing linked documents
• Server maintain collection of documents, While client provides users an easy to use interface
for presenting and accessing these document
Traditional Web-Based Systems
The overall organization of a traditional Web site
SUTHOJU GIRIJA RANI, Assistant Professor, CSE, NGIT 1
Distributed Systems CSE
Six top-level MIME types : text, Audio, Video, Image,Application & Multipart
SUTHOJU GIRIJA RANI, Assistant Professor, CSE, NGIT 2
Processes and Communication in Distributed Web-based Systems
Distributed Systems CSE
The logical components of a Web browser
SUTHOJU GIRIJA RANI, Assistant Professor, CSE, NGIT 3
Distributed Systems CSE
The principle of using a server cluster in combination with a front end to implement a Web service
A scalable content-aware cluster of Web servers
SUTHOJU GIRIJA RANI, Assistant Professor, CSE, NGIT 4
Distributed Systems CSE
Web Proxy Caching, Replication & CDNs
• Web proxy caching in a distributed system refers to the method of using proxy servers to store
and manage cached web content across multiple locations within a network
• In a distributed system, each proxy server caches copies of frequently accessed web content.
This means that when multiple users request the same content, the proxy server can deliver it
from its cache rather than fetching it from the original web server every time.
Components
• Clients: These are the end-user devices (computers, smartphones, tablets) that make HTTP
GET requests for web content.
• Web Proxy: Acts as an intermediary between clients and web servers. It handles client
requests, retrieves content from its local cache if available, or forwards the request if
necessary.
• Cache: Storage within the proxy server where cached web content is saved.
• Neighboring Proxy Caches: Other proxy servers in the network that can be queried if the
requested content is not found locally.
• Web Server: The original server that hosts the requested web content.
SUTHOJU GIRIJA RANI, Assistant Professor, CSE, NGIT 1
Distributed Systems CSE
CDN : Content Delivery networks
Content Providers are the customers of CDN Services.
CDN has 2 levels of load balancing : local & Global
SUTHOJU GIRIJA RANI, Assistant Professor, CSE, NGIT 2
Distributed Systems CSE
SUTHOJU GIRIJA RANI, Assistant Professor, CSE, NGIT 3
Distributed Systems CSE
HTTP, SOAP, SOA, REST & Web Services
Connections can be done by HTTP -Hyper Text transfer protocol & SOAP – Simple Object Access
Protocol
HTTP is a client server protocol. Communication between clients & server is based on HTTP.
HTTP includes HTTP connections, HTTP Methods & HTTP Messages.
SUTHOJU GIRIJA RANI, Assistant Professor, CSE, NGIT 1
Distributed Systems CSE
SUTHOJU GIRIJA RANI, Assistant Professor, CSE, NGIT 2
Distributed Systems CSE
Service-Oriented Architecture (SOA)
• Service-Oriented Architecture (SOA) is a stage in the evolution of application development
and/or integration. It defines a way to make software components reusable using the
interfaces.
SOA is different from micro-service architecture.
• SOA allows users to combine a large number of facilities from existing services to form
applications.
• SOA encompasses a set of design principles that structure system development and provide
means for integrating components into a coherent and decentralized system.
• SOA-based computing packages functionalities into a set of interoperable services, which can
be integrated into different software systems belonging to separate business domains.
Characteristics of SOA
• Provides interoperability between the services.
• Provides methods for service encapsulation, service discovery, service composition, service
reusability and service integration.
• Facilitates QoS (Quality of Services) through service contract based on Service
Level Agreement (SLA).
• Provides loosely couples services.
• Provides location transparency with better scalability and availability.
• Ease of maintenance with reduced cost of application development and deployment.
SOA can take a role of both service provider & service consumer accordingly.
REST
• REST (REpresentational State Transfer) is an architectural style for developing web
services and systems that can easily communicate with each other. REST is popular due to its
simplicity and the fact that it builds upon existing systems and features of the
internet's HTTP to achieve its objectives, as opposed to creating new standards, frameworks
and technologies.
• It is popularly believed that REST is a protocol or standard. However, it is neither. REST is an
architectural style that is commonly adopted for building web-based application programming
interfaces .
SUTHOJU GIRIJA RANI, Assistant Professor, CSE, NGIT 3
Distributed Systems CSE
Advantages of REST
• Resource-based. REST enforces statelessness through resources rather than commands,
improving reliability, performance and scalability.
• Simple interface. In REST, each resource involved in client-server interactions is identified and
is uniformly represented in the server response to define a consistent and simple interface for
all interactions.
• Familiar constructs. REST interactions are based on constructs that are familiar to anyone
accustomed to using HTTP, including operations (GET, POST, DELETE, etc.) and URIs. That said,
REST and HTTP are not the same and developers must note the differences when
implementing and using REST.
• Communication. The status of REST-based interactions between the server and clients is
communicated through numerical HTTP status codes.
WEB SERVICES: Services available over the web are web services. It uses WSDL web service Definition
Language.
The principle of a Web service
SUTHOJU GIRIJA RANI, Assistant Professor, CSE, NGIT 4