[go: up one dir, main page]

0% found this document useful (0 votes)
17 views19 pages

Distributed Computing

Download as pdf or txt
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 19

Unit 6

Distributed File Systems


DFS
• A file system is a subsystem of an operating system that performs file
management activities such as organization, storing, retrieval,
naming, sharing, and protection of files.
• The two main purposes of using files are as follows:
• 1. Permanent storage of information. This is achieved by storing a file on a
secondary storage media such as a magnetic disk.
• 2. Sharing of information. Files provide a natural and easy means of
information sharing. That is, a file can be created by one application and then
shared with different applications at a later time
DESIRABLE FEATURES OF A GOOD
DISTRIBUTED FILE SYSTEM
1. Transparency. The following four types of transparencies are
desirable:
1. Structure transparency.
2. Access transparency.
3. Naming transparency.
4. Replication transparency.
2. User mobility
3. Performance
4. Simplicity and ease of use
DESIRABLE FEATURES OF A GOOD
DISTRIBUTED FILE SYSTEM
5. Scalability.
6. High Availability.
7. High reliability.
8. Data Integrity.
9. Security.
10. Heterogenity.
FilE MODELS
• Unstructured and Structured Files –
• According to the simplest model, a file is an unstructured sequence of data.
In this model, there is no substructure known to the file server and the
contents of each file of the file system appears to the file server as an
uninterpreted sequence of bytes. The operating system is not interested in
the information stored in the files. Hence, the interpretation of the
meaning and structure of the data stored in the files are entirely up to the
application programs.
• Most modern operating systems use the unstructured file model. This is
mainly because sharingof a file by different applicationsis easier with the
unstructured file model as compared to the structured file model. Since a
file has no structure in the unstructured model, different applications can
interpret the contents of a file in different ways
• Unstructured and Structured Files –
• In the structure file model, a file appears to the file server as an ordered
sequence of records. Records of different files of the same file system can
be of different size. Therefore, many types of files exist in a file system, each
having different properties. In this model, a record is the smallest unit of file
data that can be accessed, and the file system read or write operations are
carried out on a set of records.
• Structured files are again of two types-files with nonindexed records and
files with indexed records. In the former model, a file record is accessed by
specifying its position within the file, for example, the fifth record from the
beginning of the file or the second record from the end of the file.
• Mutable and Immutable Files
• According to the modifiability criteria, files are of two types-mutable and
immutable. Most existing operating systems use the mutable file model. In
this model, an update performed on a file overwrites on its old contents to
produce the new contents. That is, a file is represented as a single stored
sequence that is altered by each update operation.
• On the other hand, In the immutable model, a file cannot be modified once
it has been created except to be deleted. The file versioning approach is
normally used to implement file updates, and each file is represented by a
history of immutable versions. That is, rather than updating the same file, a
new version of the file is created each time a change is made to the file
contents and the old version is retained unchanged. In practice, the use of
storage space may be reduced by keeping only a record of the differences
between the old and new versions rather than creating the entire file once
again.
• Mutable and Immutable Files
• it is much easier to support file caching and replication in a
distributed system with the immutable file model because it
eliminates all the problems associated with keeping multiple copies of
a file consistent. However, due to the need to keep multiple versions
of a file, the immutable file model suffers from two potential
problems-increased use of disk space and increased disk allocation
activity.
FILE-ACCESSING MODELS
• Accessing Remote Files.
• 1. Remote service model - In this model, the processing of the client's request
is performed at the server's node. That is, the client's request for file access is
delivered to the server, the server machine performs the access request, and
finally the result is forwarded back to the client.
• 2. Data-caching model. In the remote service model, every remote file access
request results in network traffic. The data-caching model attempts to reduce
the amount of network traffic by taking advantage of the locality feature
found in file accesses. As compared to the remote service model, the data-
caching model offers the possibility of increased performance and greater
system scalability because it reduces network traffic, contention for the
network, and contention for the file servers. Therefore, almost all existing
distributed file systems implement some form of caching.
FILE-ACCESSING MODELS
• Unit or Data Transfer
• 1. File-level transfer model. In this model, when an operation requires file data to be
transferred across the network in either direction between a client and a server, the
whole file is moved.
• 2. Block-level transfer model. In this model, file data transfers across the network
between a client and a server take place in units of file blocks. A file block is a
contiguous portion of a file and is usually fixed in length. For file systems in which
block size is equal to virtual memory page size, this model is also called a page-level
transfer model.
• 3. Byte-level transfer model. In this model, file data transfers across the network
between a client and a server take place in units of bytes. This model provides
maximum flexibility because it allows storage and retrieval of an arbitrary sequential
subrange of a file, specified by an offset within a file, and a length.
• 4. Record-level transfer model. The three file data transfer models described above
are commonly used with unstructured file models. The record-level transfer model is
suitable for use with those file models in which file contents are structured in the
form of records
FILE-CACHING SCHEMES
• The idea in file caching in these systems is to retain recently accessed
file data in main memory, so that repeated accesses to the same
information can be handled without additional disk transfers.
• In addition to these issues, a file-caching scheme for a distributed file
system should also address the following key decisions:
• 1. Cache location
• 2. Modification propagation
• 3. Cache validation
Cache location
• Cache location refers to the place where the cached data is stored.
Assuming that the original location of a file is on its server's disk,
there are three possible cache locations in a distributed file system
• Client's disk –
• The second option is to have the cache in a client's disk. A cache
located in a client's disk eliminates network access cost but requires
disk access cost on a cache hit. A cache on a disk has several
advantages. The first is reliability. Modifications to cached data are
lost in a crash if the cache is kept in volatile memory.
• The second advantage is large storage capacity. As compared to a
main-memory cache, a disk cache has plenty of storage space.
Therefore, more data can be cached, resulting in a higher hit ratio.
Furthermore, several distributed file systems use the file-level data
transfer model in which a file is always cached in its entirety.
• Client's main memory –
• The third alternative is to have the cache in a client's main memory. A
cache located in a client's main memory eliminates both network
access cost and disk access cost. Therefore, it provides maximum
performance gain on a cache hit. It also permits workstations to be
diskless. Like a client's disk cache, a client's main memory cache also
contributes to scalability and reliability because on a cache hit the
access request can be serviced locally without the need to contact the
server. However, a client's main-memory cache is not preferable to a
client's disk cache when large cache size and increased reliability of
cached data are desired.
Modification Propagation
• when the caches of all these nodes contain exactly the same copies of
the file data, we say that the caches are consistent. It is possible for
the caches to become inconsistent when the file data is changed by
one of the clients and the corresponding data cached at other nodes
are not changed or discarded.
• A distributed file system may use one of the modification propagation
schemes
• Write-through Scheme
• Delayed-Write Scheme
Cache Validation Schemes
• There are basically two approaches to verify the validity of cached
data-the
• 1. Client-initiated approach –
In this approach, a client contacts the server and checks whether its locally
cached data is consistent with the master copy. The file-sharing semantics
depends on the frequency of the validity check. One of the following
approaches may be used:
a. Checking before every access
b. Periodic checking
c. Check on file open.
• 2. Server-Initiated Approach –
• In this method, a client informs the file server when opening a file, indicating
whether the file is being opened for reading, writing, or both. The file server
keeps a record of which client has which file open and in what mode.
• In this manner, the server keeps monitoring the file usage modes being used
by different clients and reacts whenever it detects a potential for
inconsistency. A potential for inconsistency occurs when two or more clients
try to open a file in conflicting modes. For example, if a file is open for
reading, other clients may be allowed to open it for reading without any
problem, but opening it for writing cannot be allowed.
FILE REPLICATION
• A replicated file is a file that has multiple copies, with each copy located
on a separate file server. Each copy of the set of copies that comprises a
replicated file is referred to as a replica of the replicated file.
• Difference between Replication and Caching-
• 1. A replica is associated with a server, whereas a cached copy is normally
associated with a client.
• 2. The existence of a cached copy is primarily dependent on the locality in file
access patterns, whereas the existence of a replica normally depends on
availability and performance requirements.
• 3. As compared to a cached copy, a replica is more persistent, widely known,
secure, available, complete, and accurate.
• 4. A cached copy is contingent upon a replica. Only
Advantages of Replication

• Increased availability.
• Increased reliability.
• Improved Response Time.
• Reduced network traffic.
• Improved system throughput.
• Better Scalability.
• Autonomous operation.

You might also like