This document discusses distributed file systems. It covers desirable features of DFS like transparency, user mobility, performance, and scalability. It also discusses file models like unstructured vs structured files and mutable vs immutable files. Different file accessing models are covered including remote service model, data caching model, and different levels of unit or data transfer. Finally, it discusses file caching schemes including cache location, modification propagation, and cache validation approaches.
This document discusses distributed file systems. It covers desirable features of DFS like transparency, user mobility, performance, and scalability. It also discusses file models like unstructured vs structured files and mutable vs immutable files. Different file accessing models are covered including remote service model, data caching model, and different levels of unit or data transfer. Finally, it discusses file caching schemes including cache location, modification propagation, and cache validation approaches.
This document discusses distributed file systems. It covers desirable features of DFS like transparency, user mobility, performance, and scalability. It also discusses file models like unstructured vs structured files and mutable vs immutable files. Different file accessing models are covered including remote service model, data caching model, and different levels of unit or data transfer. Finally, it discusses file caching schemes including cache location, modification propagation, and cache validation approaches.
This document discusses distributed file systems. It covers desirable features of DFS like transparency, user mobility, performance, and scalability. It also discusses file models like unstructured vs structured files and mutable vs immutable files. Different file accessing models are covered including remote service model, data caching model, and different levels of unit or data transfer. Finally, it discusses file caching schemes including cache location, modification propagation, and cache validation approaches.
DFS • A file system is a subsystem of an operating system that performs file management activities such as organization, storing, retrieval, naming, sharing, and protection of files. • The two main purposes of using files are as follows: • 1. Permanent storage of information. This is achieved by storing a file on a secondary storage media such as a magnetic disk. • 2. Sharing of information. Files provide a natural and easy means of information sharing. That is, a file can be created by one application and then shared with different applications at a later time DESIRABLE FEATURES OF A GOOD DISTRIBUTED FILE SYSTEM 1. Transparency. The following four types of transparencies are desirable: 1. Structure transparency. 2. Access transparency. 3. Naming transparency. 4. Replication transparency. 2. User mobility 3. Performance 4. Simplicity and ease of use DESIRABLE FEATURES OF A GOOD DISTRIBUTED FILE SYSTEM 5. Scalability. 6. High Availability. 7. High reliability. 8. Data Integrity. 9. Security. 10. Heterogenity. FilE MODELS • Unstructured and Structured Files – • According to the simplest model, a file is an unstructured sequence of data. In this model, there is no substructure known to the file server and the contents of each file of the file system appears to the file server as an uninterpreted sequence of bytes. The operating system is not interested in the information stored in the files. Hence, the interpretation of the meaning and structure of the data stored in the files are entirely up to the application programs. • Most modern operating systems use the unstructured file model. This is mainly because sharingof a file by different applicationsis easier with the unstructured file model as compared to the structured file model. Since a file has no structure in the unstructured model, different applications can interpret the contents of a file in different ways • Unstructured and Structured Files – • In the structure file model, a file appears to the file server as an ordered sequence of records. Records of different files of the same file system can be of different size. Therefore, many types of files exist in a file system, each having different properties. In this model, a record is the smallest unit of file data that can be accessed, and the file system read or write operations are carried out on a set of records. • Structured files are again of two types-files with nonindexed records and files with indexed records. In the former model, a file record is accessed by specifying its position within the file, for example, the fifth record from the beginning of the file or the second record from the end of the file. • Mutable and Immutable Files • According to the modifiability criteria, files are of two types-mutable and immutable. Most existing operating systems use the mutable file model. In this model, an update performed on a file overwrites on its old contents to produce the new contents. That is, a file is represented as a single stored sequence that is altered by each update operation. • On the other hand, In the immutable model, a file cannot be modified once it has been created except to be deleted. The file versioning approach is normally used to implement file updates, and each file is represented by a history of immutable versions. That is, rather than updating the same file, a new version of the file is created each time a change is made to the file contents and the old version is retained unchanged. In practice, the use of storage space may be reduced by keeping only a record of the differences between the old and new versions rather than creating the entire file once again. • Mutable and Immutable Files • it is much easier to support file caching and replication in a distributed system with the immutable file model because it eliminates all the problems associated with keeping multiple copies of a file consistent. However, due to the need to keep multiple versions of a file, the immutable file model suffers from two potential problems-increased use of disk space and increased disk allocation activity. FILE-ACCESSING MODELS • Accessing Remote Files. • 1. Remote service model - In this model, the processing of the client's request is performed at the server's node. That is, the client's request for file access is delivered to the server, the server machine performs the access request, and finally the result is forwarded back to the client. • 2. Data-caching model. In the remote service model, every remote file access request results in network traffic. The data-caching model attempts to reduce the amount of network traffic by taking advantage of the locality feature found in file accesses. As compared to the remote service model, the data- caching model offers the possibility of increased performance and greater system scalability because it reduces network traffic, contention for the network, and contention for the file servers. Therefore, almost all existing distributed file systems implement some form of caching. FILE-ACCESSING MODELS • Unit or Data Transfer • 1. File-level transfer model. In this model, when an operation requires file data to be transferred across the network in either direction between a client and a server, the whole file is moved. • 2. Block-level transfer model. In this model, file data transfers across the network between a client and a server take place in units of file blocks. A file block is a contiguous portion of a file and is usually fixed in length. For file systems in which block size is equal to virtual memory page size, this model is also called a page-level transfer model. • 3. Byte-level transfer model. In this model, file data transfers across the network between a client and a server take place in units of bytes. This model provides maximum flexibility because it allows storage and retrieval of an arbitrary sequential subrange of a file, specified by an offset within a file, and a length. • 4. Record-level transfer model. The three file data transfer models described above are commonly used with unstructured file models. The record-level transfer model is suitable for use with those file models in which file contents are structured in the form of records FILE-CACHING SCHEMES • The idea in file caching in these systems is to retain recently accessed file data in main memory, so that repeated accesses to the same information can be handled without additional disk transfers. • In addition to these issues, a file-caching scheme for a distributed file system should also address the following key decisions: • 1. Cache location • 2. Modification propagation • 3. Cache validation Cache location • Cache location refers to the place where the cached data is stored. Assuming that the original location of a file is on its server's disk, there are three possible cache locations in a distributed file system • Client's disk – • The second option is to have the cache in a client's disk. A cache located in a client's disk eliminates network access cost but requires disk access cost on a cache hit. A cache on a disk has several advantages. The first is reliability. Modifications to cached data are lost in a crash if the cache is kept in volatile memory. • The second advantage is large storage capacity. As compared to a main-memory cache, a disk cache has plenty of storage space. Therefore, more data can be cached, resulting in a higher hit ratio. Furthermore, several distributed file systems use the file-level data transfer model in which a file is always cached in its entirety. • Client's main memory – • The third alternative is to have the cache in a client's main memory. A cache located in a client's main memory eliminates both network access cost and disk access cost. Therefore, it provides maximum performance gain on a cache hit. It also permits workstations to be diskless. Like a client's disk cache, a client's main memory cache also contributes to scalability and reliability because on a cache hit the access request can be serviced locally without the need to contact the server. However, a client's main-memory cache is not preferable to a client's disk cache when large cache size and increased reliability of cached data are desired. Modification Propagation • when the caches of all these nodes contain exactly the same copies of the file data, we say that the caches are consistent. It is possible for the caches to become inconsistent when the file data is changed by one of the clients and the corresponding data cached at other nodes are not changed or discarded. • A distributed file system may use one of the modification propagation schemes • Write-through Scheme • Delayed-Write Scheme Cache Validation Schemes • There are basically two approaches to verify the validity of cached data-the • 1. Client-initiated approach – In this approach, a client contacts the server and checks whether its locally cached data is consistent with the master copy. The file-sharing semantics depends on the frequency of the validity check. One of the following approaches may be used: a. Checking before every access b. Periodic checking c. Check on file open. • 2. Server-Initiated Approach – • In this method, a client informs the file server when opening a file, indicating whether the file is being opened for reading, writing, or both. The file server keeps a record of which client has which file open and in what mode. • In this manner, the server keeps monitoring the file usage modes being used by different clients and reacts whenever it detects a potential for inconsistency. A potential for inconsistency occurs when two or more clients try to open a file in conflicting modes. For example, if a file is open for reading, other clients may be allowed to open it for reading without any problem, but opening it for writing cannot be allowed. FILE REPLICATION • A replicated file is a file that has multiple copies, with each copy located on a separate file server. Each copy of the set of copies that comprises a replicated file is referred to as a replica of the replicated file. • Difference between Replication and Caching- • 1. A replica is associated with a server, whereas a cached copy is normally associated with a client. • 2. The existence of a cached copy is primarily dependent on the locality in file access patterns, whereas the existence of a replica normally depends on availability and performance requirements. • 3. As compared to a cached copy, a replica is more persistent, widely known, secure, available, complete, and accurate. • 4. A cached copy is contingent upon a replica. Only Advantages of Replication