OS Unit 5 Notes
OS Unit 5 Notes
DEFINITION OF A FILE
A file is a named collection of related information that is stored on secondary storage devices
such as hard disks or SSDs. It acts as the smallest unit of logical storage that the operating system
can manage.
Data must be stored within files to be written to secondary storage; we cannot directly
write raw data.
A file may contain program code or data.
Data files can be of different types: numeric, alphanumeric, or binary.
The creator of the file decides its purpose and what kind of content it holds.
A file can store a wide range of content including:
o Source code or executable programs
o Text or numeric data
o Photos, music, and video
Files allow structured and permanent storage of information on secondary storage. Without files,
the operating system would have no organized way of accessing, naming, or protecting stored data.
The nature of data helps define the file structure and its behavior during read/write operations.
FILE TYPES
FILE ATTRIBUTES
The operating system stores a number of attributes to manage and access files properly. These
include:
The directory maintains the structure and access path for files on secondary storage:
A directory entry includes the file’s name and its unique identifier.
This identifier helps locate other file attributes efficiently.
Storing all directory information may require more than a kilobyte per file, especially in
large systems.
FILE OPERATIONS
The operating system supports several fundamental operations on files. These are implemented
using system calls:
1. Creating a File
o System call: create ()
o Checks for available space and creates a new directory entry if successful.
2. Repositioning Within a File (Seek)
o Updates the current-file-position pointer to a specified value.
o Does not require actual I/O; only internal pointer movement.
3. Deleting a File
o System call: delete ()
o Searches for the file in the directory, removes its entry, and releases the occupied
storage space.
4. Truncating a File
o Erases all the file contents but retains file attributes.
o Resets the file length to zero.
5. Writing to a File
o System call: write ()
o Needs the file name and data to write.
o The system locates the file and performs the write at the current position.
6. Reading from a File
o System call: read ()
o Needs the file name and location to store data.
o Locates the file and reads from it.
To access a file, it must first be opened using open() and then closed after use with close().
1. Per-Process Table
o Maintained individually for each process.
o Tracks files opened by the process.
o Each entry points to the system-wide open-file table.
2. System-Wide Open-File Table
o Maintains info that is independent of specific processes.
o Stores data like file location on disk, access permissions, file size, and last
modification time.
o Once a file is opened by any process, an entry is made here.
o If another process opens the same file, it adds an entry to its per-process table
pointing to this system-wide table.
File Pointer
o Unique for each process.
o Tracks the current I/O position.
o Stored separately from file attributes on disk.
File-Open Count
o Counts how many processes have the file open.
o Decrements with each close() call.
o When it reaches zero, the file’s entry is removed from the open-file table.
Disk Location of File
o Maintained in memory for faster access during I/O operations.
o Avoids repetitive disk reads.
Access Rights
o Specifies the allowed operations (read, write, execute) for the process.
o Stored in the per-process table.
Internal File Structure
Managing the internal organization of files is an important function of the operating system. This
involves how data is stored, accessed, and aligned with the physical hardware characteristics such
as disk blocks.
Locating a particular offset (position) inside a file can be a challenging task for the
operating system.
This complexity arises because files are logically divided into records, but the disk
hardware accesses data in fixed-sized units called blocks.
Every disk system has a block size that is well defined and is typically the size of a sector.
Disk input/output operations are always performed in units of one block, also called a
physical record.
All disk blocks are the same fixed size.
This means all reading and writing to disk happens in chunks of this block size.
Logical records represent the meaningful units of data from the user’s perspective, and
their size may vary.
The physical block size, however, is fixed by the hardware.
Often, the size of logical records does not match the size of physical disk blocks exactly.
This mismatch creates a challenge for storing data efficiently and for accessing logical
records correctly.
To address the mismatch, packing is used to fit several logical records inside one physical
block.
This way, the system can store multiple smaller logical records together inside a fixed-size
block.
Packing improves the use of disk space and aligns logical data storage with the physical
storage constraints.
Example: UNIX File System Model
UNIX treats all files as a stream of bytes, rather than fixed-size records.
Each byte can be accessed individually via its offset from the beginning of the file.
This simple model allows flexible handling of data and avoids complications caused by
fixed record sizes.
The number of logical records stored in each physical block depends on several factors:
1. Logical Record Size – Size of each individual data unit as defined by the user or
application.
2. Physical Block Size – The fixed size of blocks used by the disk hardware for I/O.
3. Packing Technique – The method or algorithm used to combine logical records into
physical blocks.
Packing can be done by the user’s application program, which arranges records before
writing.
Alternatively, the operating system can handle packing automatically.
Regardless of who performs packing, the goal is to efficiently utilize disk space and align
logical data storage with physical disk blocks.
Internal fragmentation refers to wasted space inside an allocated block that is not used for
actual data.
All file systems experience some level of internal fragmentation because logical records
rarely fill blocks exactly.
The larger the block size, the more internal fragmentation occurs, since it is more likely
that some space in the block remains unused.
Therefore, choosing block size involves balancing efficient disk I/O and minimizing
wasted space.
FILE ACCESS METHODS
Access Methods
When a file is accessed to read or write data, different methods can be used based on how the data
needs to be retrieved or stored. The three main file access methods are:
1. Sequential Access
2. Direct Access
3. Indexed Access
Each method has its own way of locating and retrieving information from a file, with specific
advantages and disadvantages.
Sequential Access
In sequential access, data in the file is processed one record after another in a fixed order.
How it works:
o The file pointer starts at the beginning and moves forward as data is read or written
sequentially.
o Typical operations include:
read_next( ): Reads the next record and moves the pointer forward.
write_next( ): Appends data at the end and advances the pointer.
Common Uses:
o Used in applications like text editors and compilers where files are processed in a
linear fashion.
Advantages
Disadvantages
Direct access views the file as a collection of fixed-length records or blocks that can be accessed
in any order.
How it works:
o Files are divided into numbered blocks.
o Any block can be read or written directly using its block number (relative block
number).
o Operations:
read(n): Reads block number n.
write(n): Writes to block number n.
o Common in disk-based file systems where random access is possible.
Example Use Case:
o Airline reservation systems where records for specific flights or customers are
accessed directly.
Advantages
Disadvantages
Requires fixed-length records, which may lead to wasted space if records vary in size.
More complex file management compared to sequential access.
User must know or compute the correct block number for access.
Indexed Access
Indexed access uses an index structure to map keys or record identifiers to their physical location
in the file.
How it works:
o An index file contains pointers to blocks in the main data file.
o Searching the index (often by binary search) identifies the block containing the
desired record.
o The data block is then accessed directly.
Handling Large Files:
o For very large files, multi-level indexes are used.
o A primary index points to secondary indexes, which point to actual data blocks.
Advantages
Disadvantages
Maintaining the index adds overhead, especially for insertions and deletions.
Index files themselves can become large, requiring additional storage and memory.
Slightly more complex to implement and manage compared to other methods.
DIRECTORY STRUCTURE
Files are stored on random-access storage devices such as Hard Disks, Optical Disks, and Solid-
State Drives (SSDs). These storage devices form the physical basis for file systems.
A storage device can be divided into smaller parts called partitions to allow more precise control
and management. For example, a hard disk can be divided into four quarters, with each quarter
able to hold a separate file system.
A volume is any entity (such as a partition or a whole device) that contains a file system. Each
volume acts like a virtual disk to the operating system.
Volumes can store multiple operating systems, enabling a computer to boot and run
different OSs from the same physical device.
Each volume maintains a file system that manages all the files stored within it.
The file system keeps information about all files in the volume through a directory structure.
The directory or volume table of contents records details about each file, including:
o File name
o Location of the file on the device
o Size of the file
o File type
The directory acts like an index or a catalog of files on the volume, enabling the
operating system to quickly locate and manage files.
Operations on Directory
A directory plays a crucial role in managing files within a file system. Various operations can be
performed on a directory to maintain and organize files efficiently. These operations include:
This operation involves looking through the directory structure to find an entry
corresponding to a specific file.
The search can locate files by matching names exactly or by matching a pattern (such as
wildcard characters).
It is essential for file access, as the directory provides the link between the file name and
its metadata.
Creating a File
When a new file is created, an entry for that file is added to the directory.
This entry contains important details such as the file name, location, size, and type.
Creating a file involves updating the directory to reflect the existence of the new file.
Deleting a File
When a file is no longer required, its entry can be removed from the directory.
Deleting the file frees up the space used by the file on the storage device and removes its
metadata from the directory.
Proper deletion ensures that the file system remains clean and does not waste resources.
Listing a Directory
Renaming a File
Traversal involves accessing every directory and every file within the directory hierarchy.
This operation is useful for tasks like backups, file searches across directories, or
displaying a full directory tree structure.
.
DIRECTORY STRUCTURE
Directories organize and manage files on a storage system. Different directory structures have been
designed to address various needs of users and systems. The common directory structures are:
1. Single-Level Directory
2. Two-Level Directory
3. Tree-Structured Directory
4. Acyclic-Graph Directory
1. Single-Level Directory
Every file shares the same directory space and must have a unique name.
This structure is simple but has major limitations when the number of files grows or
multiple users share the system.
Users must remember unique filenames across the entire system, which becomes difficult
as file count increases.
Typically, a user may have hundreds of files, making management challenging.
Advantages
Disadvantages
Example
Each user has their own User File Directory (UFD) containing only their files.
The system maintains a Master File Directory (MFD) indexing each user's UFD by
username or account number.
When a user logs in, only their UFD is searched, so multiple users can have files with the
same name without conflicts.
Advantages
Disadvantages
Isolates users completely, which is a problem if users want to share or cooperate on files.
No direct sharing of files between users is possible.
Tree-Structured Directories
The Tree-Structured Directory allows directories to contain files and subdirectories, creating a
hierarchical tree.
The root directory is at the top, and every file or subdirectory has a unique path name from
the root.
A directory entry marks whether it points to a file or a subdirectory.
Each process has a current directory to simplify file access.
Users can specify absolute or relative path names to access files.
Absolute vs Relative Pathnames
Absolute Pathname: Starts at the root and specifies the complete path.
Relative Pathname: Starts from the current directory.
Directory Deletion
Advantages
Disadvantages
An Acyclic-Graph Directory extends the tree structure by allowing files and subdirectories to be
shared between directories.
Unlike trees, this structure allows multiple directory entries to point to the same file or
subdirectory.
Sharing is enabled through links, which are pointers to files or directories.
Sharing means changes in one place reflect everywhere the file or directory is linked.
Advantages
Supports file and directory sharing, improving collaboration and storage efficiency.
Changes in shared files are reflected system-wide instantly.
Disadvantages
UNIX and Linux implement links (hard and symbolic links) supporting acyclic
graph directories.
FILE PROTECTION
Protection in operating systems controls access to files and directories to prevent unauthorized
users from reading, modifying, or executing them. UNIX-like systems implement protection by
assigning permission bits that specify allowed operations.
User Categories
UNIX permissions are stored in 7 bits, but conceptually broken down as:
Read 4 bits Permission to read the file or list the directory contents
Write 2 bits Permission to modify the file or add/remove files in the directory
Execute 1 bit Permission to run the file as a program or access the directory
Read (4 bits): Read permission requires more bits because it applies differently depending
on the file or directory context (e.g., reading file content vs listing directory contents).
Write (2 bits): Write permission involves modifying file contents or directory entries.
Execute (1 bit): Execute permission means running the file or entering the directory.
The first character in the permission string shows the type of the object:
Symbol Meaning
- Regular File
d Directory
drwxr-xr--
Breaking it down:
Position Description
Bit-Level Representation
Let's analyze the permission bits for owner, group, and universe, considering the breakdown:
To store files on a disk, an operating system must allocate disk space efficiently and reliably. Three
main allocation strategies are widely used:
1. Contiguous Allocation
2. Linked Allocation
3. Indexed Allocation
1. Contiguous Allocation
Concept:
Files are stored in contiguous blocks on the disk — all blocks of a file are sequentially
placed one after another.
The directory entry stores:
o The starting block address (e.g., block number b).
o The length of the file in blocks (e.g., n blocks).
Example: If a file is 5 blocks long starting at block 100, the file occupies blocks 100, 101,
102, 103, and 104.
Access Method:
Disadvantages:
External Fragmentation: Over time, as files are created and deleted, free space becomes
fragmented into small chunks scattered across the disk.
Finding a large enough contiguous space for a new file or file extension becomes difficult.
If space is insufficient or fragmented, file creation or extension fails unless compaction
(rearranging files) is performed.
Compaction is costly, especially on large disks, as it involves moving files to create a large
free block.
User/program must often estimate file size beforehand; underestimating wastes space or
causes file growth failure.
OS Implementations:
Simple file systems in early operating systems and some embedded systems.
Early versions of FAT file systems use this concept partially.
2. Linked Allocation
Concept:
Files are stored as a linked list of disk blocks scattered anywhere on the disk.
Each block contains:
o Data portion.
o Pointer to the next block of the file.
Directory entry stores the pointer to the first block of the file.
No contiguous allocation needed — blocks can be anywhere.
Access Method:
Sequential Access: Follow pointers from the first block through the chain.
Direct Access: Inefficient, because to reach the ith block, OS must traverse through i-1
pointers sequentially.
Advantages:
Disadvantages:
Pointer overhead: Each block stores a pointer which uses some space (e.g., 4 bytes out of
512 bytes per block).
Inefficient for direct/random access due to sequential traversal of pointers.
Reliability issues: Corrupted or lost pointers may lead to loss of data or cross-linking errors.
Disk head may need to move frequently, causing poor performance in random access.
Mitigation:
Clustering: Allocate multiple blocks together as a cluster (e.g., 4 blocks). This reduces
pointer overhead and improves throughput but increases internal fragmentation (wasted
space within clusters).
OS Implementations:
MS-DOS and early Windows FAT file systems use File Allocation Table (FAT) which is
a variant of linked allocation.
Simple file systems in embedded devices.
File Allocation Table (FAT)
FAT stores the linked list of blocks in a table in memory instead of in disk blocks.
The FAT contains one entry per disk block.
Directory entry points to the first block.
FAT entries contain the address of the next block.
End-of-file is marked with a special value.
Allocating a block updates the FAT instead of modifying pointers on disk.
Advantages of FAT:
Disadvantages of FAT:
Concept:
Each file has an index block (or inode in UNIX) that contains an array of pointers.
Each pointer in the index block points directly to a data block of the file.
Directory entry points to the index block.
Supports direct access efficiently because OS can directly use the index block to get any
data block.
Access Method:
Direct access by reading the index block and then the desired data block.
Sequential access by traversing pointers in the index block.
Advantages:
Disadvantages:
Wasted space if the index block is large but file is small (pointer overhead).
Fixed-size index blocks limit maximum file size unless advanced techniques are used.
Accessing large files requires complex indexing schemes.
b) Multilevel Indexing
OS Implementations:
A file system is a method used by an operating system to organize, store, retrieve, and manage
files on storage devices like hard disks, SSDs, or USB drives.
To manage files efficiently, the file system is structured in layers — each layer has a specific role
in handling file operations.
The file system works like a pipeline — each layer performs its part and passes the request down
to the next layer.
Here are the layers from top (user level) to bottom (hardware level):
1. Application Program
What it is: This is the program written or used by the user (like Notepad, Word, C/C++
programs).
What it does: Requests file operations like read, write, open, delete.
Example: A C program writing output to result.txt.
What it does:
o Checks if the requested file exists.
o Collects the logical block number (used to locate the file inside the file system).
o Handles file name to inode mapping, access control, and protection.
Example: Finds that result.txt starts at logical block 105.
Acts like the file manager who finds where the file lives.
What it does:
o Translates the logical block number to a physical block number (exact location
on the disk).
o Handles block allocation, free space management, and keeps track of file data
blocks.
Example: Maps logical block 105 → physical block 3200 on the disk.
Like converting a library's catalog number into the actual shelf location.
What it does:
o Issues commands to I/O control (like read/write).
o Handles raw block I/O operations using the physical block number.
Example: Issues command “read block 3200”.
What it does:
o Receives the I/O commands (read/write/print).
o Converts them into device-specific instructions.
o Uses device drivers to talk to hardware.
Example: Sends "read block 3200" to the disk controller using the device driver.
6. Devices
What it is:
o The actual hardware that stores the files.
o Can be hard drives (HDD), solid-state drives (SSD), USBs, etc.
What it does:
o Executes the operation (like read/write) on the physical storage.
FREE-SPACE MANAGEMENT
Disk space is a limited resource. As files are created, modified, and deleted, disk blocks get
allocated and freed dynamically. Efficient management of free space on the disk is critical to
ensure smooth operation and maximum utilization of storage.
When a file is deleted or truncated, the disk blocks it occupied become free.
These free blocks must be tracked so they can be reused for new files or for file extensions.
Without proper free-space management, the system would not know which disk blocks are
available, leading to wasted space or errors when allocating new files.
The Free-Space List is a data structure maintained by the operating system to keep track
of all free disk blocks.
It records all disk blocks not currently allocated to any file or directory.
Whenever a new file is created or extended, the OS refers to the free-space list to find and
allocate the required blocks.
When a file is deleted, the freed blocks are returned back to the free-space list for future
reuse.
Methods of Free-Space Management
There are several ways to manage the free space efficiently, depending on the size of the disk and
the OS design:
The free-space list is represented as a bit vector (bit map) — a simple array of bits.
Each bit corresponds to one disk block.
If the bit is 1, the corresponding disk block is free.
If the bit is 0, the block is allocated.
Example:
Suppose the disk blocks 2, 3, 4, 5, 8, 9, 10, 11, 12, 13, 17, 18, 25, 26, and 27 are free, and the rest
are allocated. The free-space bit vector might look like:
Block#: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 ... 27
Bits: 0 0 1 1 1 1 0 0 1 1 1 1 1 1 ... 1
Advantages:
Disadvantages:
For very large disks, the bit vector can become huge.
For example, a 1 TB disk with 4 KB blocks has about 256 million blocks.
The bit vector would require 256 million bits ≈ 32 million bytes ≈ 256 MB of memory.
Keeping such large bit vectors in main memory is expensive or impractical.
2. Linked List
Given free blocks 2, 3, 4, 5, 8, 9, 10, 11, 12, 13, 17, 18, 25, 26, and 27:
Advantages:
Disadvantages:
To traverse the entire free space, the OS needs to read each block sequentially.
This causes many disk I/O operations, which is slow.
Not efficient for quickly finding large amounts of free space.
3. Grouping
How it works:
The OS can find multiple free blocks at once by reading one block.
Reduces disk I/O compared to simple linked lists.
Advantages:
4. Counting
Used especially when free space is allocated in contiguous blocks (e.g., contiguous
allocation or clustering).
Instead of listing all free blocks individually, this method records:
o The starting block address of a free space segment.
o The number of contiguous free blocks following it.
Example:
If blocks 100 to 110 are free contiguously, this is stored as (start = 100, count = 11).
Similarly, (start = 200, count = 5) for another group of 5 contiguous free blocks.
Advantages:
Disadvantages:
Not suitable for fragmented disks with many small scattered free blocks.
5. Space Maps
In an Operating System, a system call is a way for a program (usually written in C/C++) to request
services from the kernel. These services could be related to:
System calls work as a bridge between user-level applications and the core of the OS (kernel).
When you write a C program to open a file using open() or create a directory using mkdir(),
your program can't directly communicate with the hardware. So, it makes a system call, which is
a secure and predefined way to ask the kernel to perform the task.
Purpose Manage contents inside files Manage file structures and navigation
System calls are the fundamental interface between a user application and the operating system
(kernel). They allow user programs to request services from the OS such as file handling, process
control, and device management. File and Directory System Calls specifically handle operations
like creating, opening, modifying, deleting, reading, and writing to files and directories.
System calls execute in kernel mode, which provides privileged access to system resources. These
calls ensure controlled, secure access to hardware and file systems.
<unistd.h> – For basic I/O calls like read, write, close, lseek
<fcntl.h> – For file status flags like O_CREAT, O_RDWR
<sys/types.h> and <sys/stat.h> – For defining data types and file metadata
structure
<dirent.h> – For directory handling functions like opendir, readdir, etc.
FILE MANAGEMENT SYSTEM CALLS
1. creat()
Parameters:
o pathname: Full or relative path of the file to be created.
o mode: Permission bits (e.g., 0644) set at creation, affected by current umask().
Return: File descriptor on success; -1 on failure.
Example:
2. open()
Flags:
o O_RDONLY, O_WRONLY, O_RDWR — Access modes
o O_CREAT — Create file if not exists
o O_TRUNC — Truncate file to 0 length
o O_APPEND — Append data to end of file
Returns: File descriptor or -1 on failure.
Example:
3. close()
4. read()
Parameters:
o fd: File descriptor
o buf: Pointer to memory buffer
o count: Number of bytes to read
Returns: Number of bytes read, 0 on EOF, -1 on error.
5. write()
Example:
6. lseek()
Parameters:
o fd: File descriptor
o offset: Number of bytes to move
o whence: Starting point:
SEEK_SET: From beginning
SEEK_CUR: From current position
SEEK_END: From file end
Returns: New offset value or -1 on error.
Usage Examples:
Properties:
o Same inode for both paths
o Reference count increases
o File deleted only when last link is removed
Limitation: Must be on the same filesystem
Properties:
o Different inode
o Can span across filesystems
o Stores path as string
o Breaks if target file is removed
Cross-filesystem? No Yes
Purpose:
The stat() system call retrieves detailed information (metadata) about a file or
directory given its pathname.
It follows symbolic links, meaning if the pathname points to a symlink, stat() returns
info about the actual file it points to.
Syntax:
Parameters:
What is returned?
Returns 0 on success.
Returns -1 on failure (e.g., if the file does not exist or permission denied).
Example Usage:
#include <sys/stat.h>
#include <stdio.h>
int main() {
struct stat fileStat;
if(stat("myfile.txt", &fileStat) == 0) {
printf("File size: %ld bytes\n", fileStat.st_size);
printf("File permissions: %o\n", fileStat.st_mode &
0777);
} else {
perror("stat error");
}
return 0;
}
2. lstat()
Purpose:
Syntax:
Parameters:
Same as stat().
Return:
0 on success.
-1 on failure.
When you want to inspect the properties of a symbolic link itself, such as its length,
permissions, or timestamps.
To distinguish if a file is a symbolic link before deciding how to handle it.
Example:
if(lstat("mylink", &fileStat) == 0) {
if (S_ISLNK(fileStat.st_mode)) {
printf("It is a symbolic link.\n");
}
}
3. fstat()
Purpose:
Syntax:
Parameters:
Return:
0 on success.
-1 on failure.
When file is already open and you want info without needing its path.
More efficient in some cases since you avoid resolving pathnames again.
Example:
System Follows
Input Type Info Returned For Typical Use Case
Call Symlink?
11. chmod()
Syntax:
Example:
12. chown()
Syntax:
1. opendir()
2. readdir()
Structure:
struct dirent {
ino_t d_ino; // Inode number
char d_name[256]; // Filename
};
3. closedir()
4. mkdir()
Example:
mkdir("myfolder", 0755);
5. rmdir()
6. umask()
Example:
umask(022);
creat("new.txt", 0666); // Final permission: 0644
chmod() vs umask()