Chapter 14: File System
Implementation
Outline
• File-System Structure
• File-System Operations
• Directory Implementation
• Allocation Methods
• Free-Space Management
• Efficiency and Performance
• Recovery
• Example: WAFL File System
Objectives
• Describe the details of implementing
local file systems and directory
structures
• Discuss block allocation and free-
block algorithms and trade-offs
• Explore file system efficiency and
performance issues
• Look at recovery from file system
failures
• Describe the WAFL file system as a
concrete example
File-System Structure
• File structure
– Logical storage unit
– Collection of related information
• File system resides on secondary
storage (disks)
– Provided user interface to storage,
mapping logical to physical
– Provides efficient and convenient access
to disk by allowing data to be stored,
located retrieved easily
• Disk provides in-place rewrite and
random access
Layered File System
File System Layers
• Device drivers manage I/O devices at the I/O control layer
Given commands like
read drive1, cylinder 72, track 2, sector 10, into memory location 1060
Outputs low-level hardware specific commands to
hardware controller
• Basic file system given command like “retrieve block 123”
translates to device driver
• Also manages memory buffers and caches (allocation,
freeing, replacement)
– Buffers hold data in transit
– Caches hold frequently used data
• File organization module understands files, logical
address, and physical blocks
• Translates logical block # to physical block #
• Manages free space, disk allocation
File System Layers (Cont.)
• Logical file system manages metadata
information
– Translates file name into file number, file
handle, location by maintaining file control
blocks (inodes in UNIX)
– Directory management
– Protection
• Layering useful for reducing complexity and
redundancy, but adds overhead and can
decrease performance
• Logical layers can be implemented by any
coding method according to OS designer
File System Layers (Cont.)
• Many file systems, sometimes more
than one within an operating system
– Each with its own format:
– CD-ROM is ISO 9660;
– Unix has UFS, FFS;
– Windows has FAT, FAT32, NTFS as well
as floppy, CD, DVD Blu-ray,
– Linux has more than 130 types, with
extended file system ext3 and ext4
leading; plus distributed file systems, etc.)
– New ones still arriving – ZFS, GoogleFS,
FUSE
File-System Operations
• We have system calls at the API level, but
how do we implement their functions?
– On-disk and in-memory structures
• Boot control block contains info needed by
system to boot OS from that volume
– Needed if volume contains OS, usually first block
of volume
• Volume control block (superblock, master
file table) contains volume details
– Total # of blocks, # of free blocks, block size, free
block pointers or array
• Directory structure organizes the files
– Names and inode numbers, master file table
File Control Block (FCB)
• OS maintains FCB per file, which
contains many details about the file
– Typically, inode number, permissions,
size, dates
– Example
In-Memory File System Structures
• Mount table contains information
about each mounted volume.
• System-wide open-file table
contains a copy of the FCB of
each file and other info
• Per-process open-file table
contains pointers to appropriate
entries in system-wide open-file
table as well as other info
In-Memory File System Structures (Cont.)
• Figure 12-3(a) refers to opening a file
• Figure 12-3(b) refers to reading a file
Directory Implementation
• Linear list of file names with pointer to the data
blocks
– Simple to program
– Time-consuming to execute
• Linear search time
• Could keep ordered alphabetically via linked list
or use B+ tree
• Hash Table – linear list with hash data structure
– Decreases directory search time
– Collisions – situations where two file names hash to
the same location
– Only good if entries are fixed size, or use chained-
overflow method
Allocation Method
• An allocation method refers to
how disk blocks are allocated for
files:
– Contiguous
– Linked
– File Allocation Table (FAT)
Contiguous Allocation Method
• Each file occupies set of contiguous
blocks
– Best performance in most cases
– Simple – only starting location (block #) and
length (number of blocks) are required
– Problems include:
• Finding space on the disk for a file,
• Knowing file size,
• External fragmentation, need for compaction
off-line (downtime) or on-line
Contiguous Allocation (Cont.)
• Mapping from logical
to physical (block
size =512 bytes)
Q
LA/512
R
• Block to be accessed
= starting address +
Q
• Displacement into
block = R
Linked Allocation
• Each file is a linked list of blocks
• File ends at nil pointer
• No external fragmentation
• Each block contains pointer to next block
• No compaction, external fragmentation
• Free space management system called when
new block needed
• Improve efficiency by clustering blocks into
groups but increases internal fragmentation
• Reliability can be a problem
• Locating a block can take many I/Os and disk
seeks
Linked Allocation Example
• Each file is a linked list of disk blocks:
blocks may be scattered anywhere on
the disk
• Scheme
Linked Allocation (Cont.)
• Mapping
Q
LA/511
R
• Block to be accessed is the Qth
block in the linked chain of blocks
representing the file.
• Displacement into block = R + 1
File-Allocation Table
Indexed Allocation Method
• Each file has its own index block(s) of
pointers to its data blocks
• Logical view
index table
Example of Indexed Allocation
Indexed Allocation – Small Files
• Need index table
• Random access
• Dynamic access without external fragmentation,
but have overhead of index block
• Mapping from logical to physical in a file of
maximum size of 256K bytes and block size of
512 bytes. We need only 1 block for index table
Q
LA/512
R
• Calculation:
– Q = displacement into index table
– R = displacement into block
Indexed Allocation – Large Files
• Mapping from logical to physical in
a file of unbounded length (block
size of 512 words)
– Linked scheme – Link blocks of index
table (no limit on size)
– Multi-level indexing
Indexed Allocation – Two-level Scheme
• Two-level index (4K blocks could store 1,024 four-
byte pointers in outer index -> 1,048,567 data
blocks and file size of up to 4GB)
Q1
LA / (512 x 512)
R1
• Mapping scheme for outer-index:
– Q1 = displacement into outer-index
– R1 is used as follows:
Q2
R1 / 512
R2
– Mapping scheme for index level:
• Q2 = displacement into block of index table
• R2 displacement into block of file
Indexed Allocation – Two-Level Scheme
Free-Space Management
• File system maintains free-space list to track
available blocks/clusters
– (Using term “block” for simplicity)
• Bit vector or bit map (n blocks)
01 2 n-1
…
1 block[i] free
bit[i] =
0 block[i] occupied
Block number calculation
(number of bits per word) *
(number of 0-value words) +
offset of first 1 bit
Example : Free-Space Management
• Number of bits per word = 32 (i.e., each word
is 32 bits long).
• Number of 0-value words = 3 (meaning the
first three words are filled entirely with 0s).
• Offset of the first 1-bit = 7 (meaning the first
1-bit occurs at the 7th position in the next
word, relative to that word).
• Now, applying the formula:
• Block Number = 32×3+7=96+7=103
• So, the block number would be 103
Free-Space Management
• File system maintains free-space list to
track available blocks
• Bit vector or bit map (n blocks)
• Bit map requires extra space
– Example:
block size = 4KB = 212 bytes
disk size = 240 bytes (1 terabyte)
n = 240/212 = 228 bits (or 32MB)
if clusters of 4 blocks -> 8MB of
memory
• Easy to get contiguous files
Linked Free Space List on Disk
Linked list (free list)
• Cannot get
contiguous space
easily
• No waste. Linked
Free Space List on
Disk of space
• No need to
traverse the entire
list (if # free blocks
recorded)
Free-Space Management (Cont.)
• Grouping
– Modify linked list to store address of next
n-1 free blocks in first free block, plus a
pointer to next block that contains free-
block-pointers (like this one)
• Counting
– Because space is frequently contiguously
used and freed, with contiguous-
allocation, extents, or clustering
• Keep address of first free block and count of
following free blocks
• Free space list then has entries containing
addresses and counts
TRIMing Unused Blocks
• HDDS overwrite in place so need only free list
• Blocks not treated specially when freed
– Keeps its data but without any file pointers to it,
until overwritten
• Storage devices not allowing overwrite (like
NVM) suffer badly with same algorithm
– Must be erased before written, erases made in
large chunks (blocks, composed of pages) and are
slow
– TRIM is a newer mechanism for the file system to
inform the NVM storage device that a page is free
• Can be garbage collected or if block is free, now
block can be erased
Efficiency and Performance
• Efficiency dependent on:
– Disk allocation and directory algorithms
– Types of data kept in file’s directory entry
– Pre-allocation or as-needed allocation of
metadata structures
– Fixed-size or varying-size data structures
Efficiency and Performance (Cont.)
• Performance
– Keeping data and metadata close together
– Buffer cache – separate section of main memory
for frequently used blocks
– Synchronous writes sometimes requested by
apps or needed by OS
• No buffering / caching – writes must hit disk before
acknowledgement
• Asynchronous writes more common, buffer-able,
faster
– Free-behind and read-ahead – techniques to
optimize sequential access
– Reads frequently slower than writes
Page Cache
• A page cache caches pages
rather than disk blocks using
virtual memory techniques and
addresses
• Memory-mapped I/O uses a page
cache
• Routine I/O through the file
system uses the buffer (disk)
cache
• This leads to the following figure
I/O Without a Unified Buffer Cache
Unified Buffer Cache
• A unified buffer cache uses
the same page cache to cache
both memory-mapped pages
and ordinary file system I/O to
avoid double caching
But which caches get priority, and
what replacement algorithms to use?
I/O Using a Unified Buffer Cache
Recovery
• Consistency checking – compares
data in directory structure with data
blocks on disk, and tries to fix
inconsistencies
– Can be slow and sometimes fails
• Use system programs to back up data
from disk to another storage device
(magnetic tape, other magnetic disk,
optical)
• Recover lost file or disk by restoring
data from backup
Example: WAFL File System
• Used on Network Appliance “Filers” –
distributed file system appliances
• “Write-anywhere file layout”
• Serves up NFS, CIFS, http, ftp
• Random I/O optimized, write optimized
– NVRAM for write caching
• Similar to Berkeley Fast File System,
with extensive modifications
The WAFL File Layout
Snapshots in WAFL