Os 2 All Assignments2
Os 2 All Assignments2
Os 2 All Assignments2
The system call and library interface represent the border between user programs and
the kernel. System calls look like ordinary function calls in C programs. Assembly
language programs may invoke system calls directly without a system call library.
The libraries are linked with the programs at compile time.
The set of system calls into those that interact with the file subsystem and some
system calls interact with the process control subsystem. The file subsystem manages
files, allocating file space, administering free space, controlling access to files, and
retrieving data for users.
Processes interact with the file subsystem via a specific set of system calls, such as
open (to open a file for reading or writing), close, read, write, stat (query the
attributes of a file), chown (change the record of who owns the file), and chmod
(change the access permissions of a file).
The file subsystem accesses file data using a buffering mechanism that regulates data
flow between the kernel and secondary storage devices. The buffering mechanism
interacts with block I/O device drivers to initiate data transfer to and from the kernel.
Device drivers are the kernel modules that control the operator of peripheral devices.
The file subsystem also interacts directly with “raw” I/O device drivers without the
intervention of the buffering mechanism. Finally, the hardware control is responsible
for handling interrupts and for communicating with the machine. Devices such as
disks or terminals may interrupt the CPU while a process is executing. If so, the
kernel may resume execution of the interrupted process after servicing the interrupt.
Interrupts are not serviced by special processes but by special functions in the kernel,
called in the context of the currently running process.
Difference between Unix and Linux
Linux is essentially a clone of Unix. But, basic differences are shown below:
Linux Unix
The source code of Linux is freely available The source code of Unix is not freely
to its users available general public
Command Line Interface of Linux is Bash, Command Line Interface of unix is Bourne,
Zsh, Tcsh. Korn, C, Zsh.
Advantages of UNIX:
1. Stability: UNIX is known for its stability and reliability. It can run for long periods of
time without requiring a reboot, which makes it ideal for critical systems that need to
run continuously.
2. Security: UNIX has a robust security model that includes file permissions, user
accounts, and network security features. This makes it a popular choice for systems
that require high levels of security.
3. Scalability: UNIX can be scaled up to handle large workloads and can be used on a
variety of hardware platforms.
4. Flexibility: UNIX is highly customizable and can be configured to suit a wide range
of needs. It can be used for everything from simple desktop systems to complex
server environments.
5. Command-line interface: UNIX’s command-line interface allows for powerful and
efficient interaction with the system.
Disadvantages of UNIX:
1. Complexity: UNIX can be complex and difficult to learn for users who are used to
graphical user interfaces (GUIs).
2. Cost: Some UNIX systems can be expensive, especially when compared to open-
source alternatives like Linux.
3. Lack of standardization: There are many different versions of UNIX, which can make
it difficult to ensure compatibility between different systems.
4. Limited software availability: Some specialized software may not be available for
UNIX systems.
5. Steep learning curve: UNIX requires a certain level of technical knowledge and
expertise, which can make it challenging for novice users.
Figure 2.1 gives a block diagram of the kernel, showing various modules and their relationships
to each other. In particular, it shows the file subsystem on the left and the process control
subsystem on the right, the two major component of the kernel.
The diagram serves as a useful logical view of the kernel, although in practice the kernel deviates
from the model because some modules interact with the internal operations of others.
Figure 2.1 shows three levels: user, kernel, and hardware. The system call and library interface
represent the border between user programs and the kernel depicted in Figure 1.1.
System calls look like ordinary function calls in C programs, and libraries map these function
calls to the primitives needed to enter the operating system.
Assembly language programs may invoke system calls directly without a system call library,
however. Programs frequently use other libraries such as the standard I/O library to provide a
more sophisticated use of the system calls. The libraries are linked with the programs at compile
time.
The figure partitions the set of system calls into those that interact with the file subsystem and
those that interact with the process control subsystem. The file subsystem manages files,
allocating file space, administering free space, controlling access to files, and retrieving data for
users.
Processes interact with the file subsystem via a specific set of system calls, such as open (to open
a file for reading or writing), close, read, write, stat (query the attributes of a file).
The file subsystem accesses file data using a buffering mechanism that regulates data flow
between the kernel and secondary storage devices.
Device drivers are the kernel modules that control the operation of peripheral devices. Block I/O
devices are random access storage devices alternatively, their device drivers make them appear
to be random access storage devices to the rest of the system.
The scheduler module allocates the CPU to processes. It schedules them to run in turn until they
voluntarily relinquish the CPU while awaiting a resource or until the kernel preempts them when
their recent run time exceeds a time quantum.
Explain with example: Building block primitives.
Unix enables to Write Small modular Program that can be Used as building block to build more
complex programs. There 2 Primitives:
1. I/O redirection
2. Pipe(|)
I/O REDIRECTION
Most unix command read input, such as a file or another attribute for the command, and write
output. By default, input is being given with the keyboard, and output is displayed on your
screen. Your keyboard is your standard input (stdin) device, and the screen or a particular
terminal window is the standard output (stdout) device.
Sometimes you will want to put output of a command in a file, or you may want to issue another
command on the output of one command. This is known as redirecting output. Redirection is
done using either the “>” (greater-than symbol), or using the “|” (pipe) operator which sends the
standard output of one command to another command as standard input. The file descriptor 0 is
assigned to the standard output file.
the cat command concatenates files and puts them all together to the standard output. By
redirecting this output to a file, this file name will be created - or overwritten if it already exists,
so take care.
Input redirection
In another case, you may want a file to be the input for a command that normally wouldn't accept
a file as an option. This redirecting of input is done using the “<” (less-than symbol) operator.
The file descriptor 1 is assigned to the standard input file.
In this the cat will take each line of file test1 as input and display it on the monitor.
Error Redirection
When invalid command is typed at the $prompt the shell displays error message on the
monitor.Thus the monitor is reffered as standard error file. The file descriptor 2 is assigned to the
standard error file.
PIPES
Unix has a feature by which filters and other commands can be combined in such a way that the
standard output of 1 filter command can be sent as standard input to another filter or command.
$ ls –l|wc –l
Command to display the no of lines where the word ‘and’ occurs in a file test.
During system initialization, the kernel allocates space for a number of buffers, configurable
according to memory size and system performance constraints. A buffer consists of two parts: a
memory array that contains data from the disk and a buffer header that identifies the buffer.
Because there is a one to one mapping of buffer headers to data arrays, the ensuing text will
frequently refer to both parts as a "buffer," and the context should make clear which part is being
discussed.
The data in a buffer corresponds to the data in a logical disk block on a file system, and the
kernel identifies the buffer contents by examining identifier fields in the buffer header. The
buffer is the in-mcmory copy of the disk block; the contents of the disk block map into the
buffer, but the mapping is temporary until the kernel decides to map another disk block into the
buffer. A disk block can never map into more than one buffer at a time. If two buffers were to
contain data for one disk block, the kernel would not know which buffer contained the current
data and could write incorrect data back to disk. For example, suppose a disk block maps into
two buffers, A and B. If the kernel writes data first into buffer A and then into buffer B, the disk
block should contain the contents of buffer B if all write operations completely fill the buffer.
However, if the kernel reverses the order when it copies the buffers to disk, the disk block will
contain incorrect data.
The buffer header (Figure 3.1) contains a device number field and a block number field that
specify the file system and block number of the data on disk and uniquely identify the buffer.
The device number is the logical file system number
1. The buffer cache is a software structure that should not be confused with hardware caches that
speed memory references.
on free list
(see Section 2.2.1), not a physical device (disk) unit number. The buffer header also contains a
pointer to a data array for the buffer, whose size must be at least as big as the size of a disk
block, and a status field that summarizes the current status of the buffer. The status of a buffer is
a combination of the following conditions:
• The buffer is currently locked (the terms "locked" and "busy" will be used interchangeably, as
will "free" and "unlocked"),
• The kernel must write the buffer contents to disk before reassigning the buffer; this condition is
known as "dclayed-write,"
• The kernel is currently reading or writing the contents of the buffer to disk,
The buffer header also contains two sets of pointers, used by the buffer allocation algorithms to
maintain the overall structure of the buffer pool, as explained in the next section.
Internal representation of a file is given by an inode, which contains a description of the disk
layout of the file data and other information such as the file owner, access permissions, and
access times. The term inode is a contraction of the term index node. Every file has one inode,
but it may have several names, all of which map into the inode. Each name is called a link. When
a process creates a new file, the kernel assigns it an unused inode.
Inodes are stored in the file system, but the kernel reads them into an in-core (in-memory) inode
table when manipulating files. The kernel contains two other data structures, the file table and
the user file descriptor table. The file table is a global kernel structure, but the user file
descriptor table is allocated per process. When a process opens or creats a file, the kernel
allocates an entry from each table, corresponding to the file's inode. Entries in the three
structures -- user file descriptor table, file table, and inode table -- maintain the state of the file
and the user's access to it. The file table keeps track of the byte offset in the file where the user's
next *read8 or write will start, and the access rights allowed to the opening process. The user file
descriptor table identifies all open files for a process.
The kernel returns a file descriptor for the open and creat system calls, which is an index into the
user file descriptor table. These three tables also enable sharing of files.
On a logical level, the kernel deals with file systems rather than with disks. It treats each file
system as a logical device identified by a logical device number.
A file system consists of a sequence of logical blocks, each containing 512, 1024, 2048, or any
convenient multiple of 512 bytes. The size of a local block is homogeneous within a file system
but may vary between different file systems in a system configuration.
In this text, the size of a "block" is assumed to be 1K, unless stated otherwise.
The boot block occupies the beginning of a file system, typically the first sector, and may
contain the bootstrap code that is read into the machine to boot , or initialize, the
operating system. Although only one boot block is needed to boot the system, every file
system has a (possibly empty) boot block.
The super block describes the state of a file system -- how large it is, how many files it
can store, where to find free space on the file system, and other information.
The inode list is a list of inodes that follows the super block in the file system.
Administrators specify the size of the inode list when configuring a file system. The
kernel references inodes by index into the inode list. One inode is the root inode of the
file system: it is the inode by which the directory structure of the file system is accessible
after execution of the mount system call.
The data blocks start at the end of the inode list and contain file data and administrative
data. An allocated data block can belong to one and only one file in the file system.
Processes
A process is the execution of a program and consists of a pattern of bytes that the CPU interprets
as machine instructions (called text), data, and stack. Processes communicate with other
processes and with the rest of the world via system calls.
A process on a UNIX system is the entity that is created by the fork system call. Every process
except process 0 is created when another process executes the fork system call. The process
which invoked fork system call is called the parent process and the newly created process is
called the child process. A process can have only one parent process but it can have many child
processes. The kernel identifies each process by its process number, called the process ID (PID).
Process 0 is a special process that is created "by hand" when the system boots; after forking a
child process (process 1), process 0 becomes the swapper process. Process 1, known as init is the
ancestor of every other process.
#include <stdio.h>
char buffer[2048];
int version = 1;
main() {
printf("Hello, world!");
}
In the code given above, the initialized data is the variable version and the uninitialized data
(i.e bss) is the array buffer.
The kernel loads an executable file into memory during an exec system call, and the loaded
process consists of at least three parts, called regions: text, data, and stack. The text and data
regions correspond to the text and data-bss sections of the executable file, but the stack region is
automatically created and its size is dynamically adjusted by the kernel at runtime. The stack
consists of logical stack frames that are pushed when calling a function and popped when
returning; a special register called the stack pointer indicates the current stack depth. A stack
frame consists of parameters to a function, its local variables and the data necessary to recover
the previous stack frame, including the value of the program counter and stack pointer at the time
of the function call.
Because a process in the UNIX system can execute in two modes, kernel or user, it uses a
separate stack for each mode. When a system call is made, a trap instruction is executed which
causes an interrupt which makes the hardware switch to kernel mode. The kernel stack of a
process is null when the process executes in user mode.
Every process has an entry in the kernel process table, and each process is allocated a u area ("u"
stands for "user") that contains private data manipulated only by the kernel. The process table
contains (or points to) a per process region table, whose entries point to entries in a region table.
A region is a contiguous area of a process's address space, such as text, data, and stack. Region
table entries describe the attributes of the region, such as whether it contains text or data, whether
it is shared or private, and where the "data" of the region is located in memory. The extra level of
indirection (from the per process region table to the region table) allows independent processes
to share regions.
a state field
identifiers indicating the user who owns the process (user IDs, or UIDs)
an event descriptor set when a process is suspended (in the sleep state)
The u area contains information that needs to be accessible only when the process is executing.
Important fields in the u area are:
The kernel internally uses a structure variable u which points to the u area of the currently
executing process. When another process executes, the kernel rearranges its virtual address space
that u refers to the u area of the new process.
Context of a process
text region
values of global variables and data structures
values of machine registers
values stored in its process table slot
u area
contents of user and kernel stacks
The text of the operating system and its global data structures are shared by all processes but do
not constitute part of the context of a process.
When the kernel decides that it should execute another process, it does a context switch, so that
the system executes in the context of the other process.
The kernel services the interrupts in the context of the interrupted process even though it may not
have caused the interrupt. Interrupts are served in kernel mode.
sleep (event any buffer becomes free); continue; /* back to while loop */
if (buffer marked for delayed write) ( /* scenario 3 */ asynchronous write buffer to disk;
continue; /* back to while loop */
/* scenario 2 — found a free buffer "/ remove buffer from old hash queue; put buffer onto new
hash queue; return buffer;
(b) Remove Block 4 from Free List Figure 3.5. Scenario 1 in Finding a Buffer: Buffer on Hash
Queue algorithm brelse input: locked buffer output: none (
wakeup all procs: event, waiting for any buffer to become free; wakeup all procs: event, waiting
for this buffer to become free; raise processor execution level to block interrupts; if (buffer
contents valid and buffer not old) enqueue buffer at end of free list else enqueue buffer at
beginning of free list lower processor execution level to allow interrupts; unlock(buffer);
Before continuing to the other scenarios, let us consider what happens to a buffer after it is
allocated. The kernel may read data from the disk to the buffer and manipulate it or write data to
the buffer and possibly to the disk. The kernel leaves the buffer marked busy; no other process
can access it and change its contents while it is busy, thus preserving the integrity of the data in
the buffer. When the kernel finishes using the buffer, it releases the buffer according to algorithm
brelse (Figure 3.6). It wakes up processes that had fallen asleep because the buffer was busy and
processes that had fallen asleep because no buffers remained on the free list. In both cases,
release of a buffer means that the buffer is available for use by the sleeping processes, although
the first process that gets the buffer locks it and prevents the other processes from getting it
(recall Section 2.2.2.4). The kernel places the buffer at the end of the free list, unless an I/O error
occurred or unless it specifically marked the buffer "old," as will be seen later in this chapter; in
the latter cases, it places the buffer at the beginning of the free list. The buffer is now free for
another process to claim it.
Just as the kernel invokes algorithm brelse when a process has no more need for a buffer, it also
invokes the algorithm when handling a disk interrupt to release buffers used for asynchronous
I/O to and from the disk, as will be seen in Section 3.4. The kernel raises the processor execution
level to prevent disk interrupts while manipulating the free list, thereby preventing corruption of
the buffer pointers that could result from a nested call to brelse. Similar bad effects could happen
if an interrupt handler invoked brelse while a process was executing getblk, so the kernel raises
the processor execution level at strategic places in getblk, too. The exercises explore these cases
in greater detail.
In the second scenario in algorithm getblk, the kernel searches the hash queue that should contain
the block but fails to find it there. Since the block cannot be on another hash queue because it
cannot "hash" elsewhere, it is not in the buffer blkno 3 mod 4
(a) Search for Block 18 - Not in Cache blkno 2 mod 4
blkno 1 mod 4
blkno 1 mod 4
blkno 2 mod 4
blkno 3 mod 4
freelist header
blkno 3 mod 4
(b) Remove First Block from Free List, Assign to 18 Figure 3.7. Second Scenario for Buffer
Allocation blkno 1 mod 4
blkno 2 mod 4
blkno 3 mod 4
freelist header cache. So the kernel removes the first buffer from the free list instead; that buffer
had been allocated to another disk block and is also on a hash queue. If the buffer has not been
marked for a delayed write (as will be described later), the kernel marks the buffer busy, removes
it from the hash queue where it currently resides, reassigns the buffer header's device and block
number to that of the disk block for which the process is searching, and places the buffer on the
correct hash queue. The kernel uses the buffer but has no record that the buffer formerly
contained data for another disk block. A process searching for the old disk block will not find it
in the pool and will have to allocate a new buffer for it from the free list, cxactly as outlined here.
When the kernel finishes with the buffer, it releases it as described above. In Figure 3.7, for
example, the kernel searches for block 18 but docs not find it on the hash queue marked "blkno 2
mod 4." It therefore removes the first buffer from the free list (block 3), assigns it to block 18,
and places it on the appropriate hash queue.
In the third scenario in algorithm getblk, the kernel also has to allocate a buffer from the free list.
However, it discovers that the buffer it removes from the free list has been marked for "delayed
write," so it must write the contents of the buffer to disk before using the buffer. The kernel starts
an asynchronous write to disk and tries to allocate another buffer from the free list. When the
asynchronous write completes, the kernel releases the buffer and places it at the head of the free
list. The buffer had started at the end of the free list and had traveled to the head of the free list.
If, after the asynchronous write, the kernel were to place the buffer at the end of the free list, the
buffer would get a free trip through the free list, working against the least recently used
algorithm. For example, in Figure 3.8, the kernel cannot find block 18, but when it attempts to
allocate the first two buffers (one at a time) on the free list, it finds them marked for delayed
write. The kernel removes them from the free list, starts write operations to disk for the blocks,
and allocates the third buffer on the free list, block 4. It reassigns the buffer's device and block
number fields appropriately and places the buffer, now marked block 18, on its new hash queue.
In the fourth scenario (Figure 3.9), the kernel, acting for process A, cannot find the disk block on
its hash queue, so it attempts to allocate a new buffer from the free list, as in the second scenario.
However, no buffers arc available on the free list, so process A goes to sleep until another
process executes algorithm brelse, freeing a buffer. When the kernel schedules process A, it must
search the hash queue again for the block. It cannot allocate a buffer immediately from the free
list, because it is possible that several processes were waiting for a free buffer and that one of
them allocated a newly freed buffer for the target block sought by process A. Thus, searching for
the block again insures that only one buffer contains the disk block. Figure 3.10 depicts the
contention between two processes for a free buffer.
The final scenario (Figure 3.11) is complicated, because it involves complex relationships
between several processes. Suppose the kernel, acting for process A, searches for a disk block
and allocates a buffer but goes to sleep before freeing the
writing
blkno 0 mod 4
blkno 1 mod 4
blkno 2 mod 4
blkno 3 mod 4
Figure 3.8. Third Scenario for Buffer Allocation hash queue headers hash queue headers blkno 0
mod 4
Write and explain algorithms: 1. Buffer allocation (getblk), 2. Releasing a buffer (brelse), 3.
Reading disk block (bread), 4. Block read ahead (breada)
1. Buffer Allocation (getblk):
Purpose: The getblk algorithm is used to allocate a buffer for a disk block in the
buffer cache.
Algorithm:
1. Check if the requested block is already in the buffer cache. If so, return the
corresponding buffer.
2. If the requested block is not in the buffer cache, search for a free buffer.
3. If a free buffer is found, allocate it and mark it as busy.
4. If no free buffer is available, use a buffer replacement algorithm (e.g.,
LRU) to select a victim buffer for replacement.
5. Write back the contents of the victim buffer if it is dirty.
6. Re-initialize the victim buffer with the new block number and mark it as
busy.
7. Return the allocated buffer.
2. Releasing a Buffer (brelse):
Purpose: The brelse algorithm is used to release a buffer after its use.
Algorithm:
1. If the buffer is marked as busy, wait until it becomes free.
2. Mark the buffer as not busy.
3. If the buffer is marked for delayed write, initiate the write operation to
disk asynchronously.
4. If the buffer is marked for synchronous write, wait for the write operation
to complete.
5. Clear the buffer's dirty flag.
6. Release the buffer for reuse.
3. Reading Disk Block (bread):
Purpose: The bread algorithm is used to read a disk block into a buffer.
Algorithm:
1. Allocate a buffer using the getblk algorithm for the specified disk block.
2. Check if the buffer contains the required data. If so, return the buffer.
3. If the buffer does not contain the required data, read the block from disk
into the buffer.
4. Update the buffer's metadata to reflect the new block number and mark it
as not dirty.
5. Return the buffer.
4. Block Read Ahead (breada):
Purpose: The breada algorithm is used to read multiple disk blocks ahead of time
to improve I/O performance.
Algorithm:
1. Determine the number of blocks to read ahead (e.g., based on the disk
access pattern or heuristic).
2. For each block to be read ahead: a. Allocate a buffer using the getblk
algorithm for the specified disk block. b. Check if the buffer contains the
required data. If so, skip to the next block. c. If the buffer does not contain
the required data, read the block from disk into the buffer. d. Update the
buffer's metadata to reflect the new block number and mark it as not dirty.
3. Return the buffers containing the read-ahead blocks.
These algorithms are essential components of the buffer cache management in the Linux kernel,
responsible for efficiently handling disk I/O operations and improving overall system
performance.
1. Improved Performance: The buffer cache improves system performance by reducing the
number of disk accesses required for read and write operations. This reduces the overall disk
access time and improves the system’s response time.
2. Reduced I/O Operations: The buffer cache reduces the number of I/O operations required to
access frequently accessed data. This reduces the load on the disk and improves its lifespan.
3. Caching Mechanism: The buffer cache uses a caching mechanism that stores frequently
accessed data in memory. This reduces the need to read data from the disk, which can be slow
and time-consuming.
4. Sharing Data Between Processes: The buffer cache can be shared between multiple
processes, reducing the need for each process to have its own copy of frequently accessed data.
1. Memory Usage: The buffer cache requires a significant amount of memory to store the
cached data. This can reduce the amount of available memory for other processes and
applications.
2. Cache Coherency Issues: The buffer cache can cause cache coherency issues, where multiple
caches containing the same data become inconsistent. This can lead to data corruption and other
issues.
3. Stale Data: The buffer cache can contain stale data that has not been updated on the disk. This
can cause issues if the data is accessed and modified by another process.
4. Cache Thrashing: The buffer cache can suffer from cache thrashing, where the cache is
continually filled and flushed with data. This can reduce the effectiveness of the cache and cause
performance issues.
An Inode is a data structure in UNIX operating system that contains important information
pertaining to files within a file system. When a file system is created in UNIX, a set amount of
indoes is created as well. Usually, about 1 percent of the file system disk space is allocated to
the inode table.
The Inode contains the following information:
14 Bytes 2 Bytes
Number of disk block address possible to store in 1 disk block = (Disk Block Size /
Disk Block Address).
Small files need only direct blocks, so there is little waste in space or extra disk
reads in those cases. Medium-sized files may use indirect blocks. Only large files
use double or triple indirect blocks, which is reasonable since those files are large
anyway. The disk is now broken into two different types of blocks: Inode and Data
Blocks.
There must be some way to determine where the Inodes are, and to keep track of
free Inodes and disk blocks. This is done by a Superblock. Superblock is located
at a fixed position in the file system. The Superblock is usually replicated on the
disk to avoid catastrophic failure in case of corruption of the main Superblock.
Index allocation schemes suffer from some of the same performance problems. As
does linked allocation. For example, the index blocks can be cached in memory, but
the data blocks may be spread all over a partition.
Inode Structure :
The fields of an in-core inode can vary depending on the specific file system implementation and
the features it supports. However, typically, an in-core inode contains metadata about the file or
directory it represents. Here are some common fields found in an in-core inode:
1. File Type: Indicates whether the inode represents a regular file, a directory, a symbolic
link, a device file, or some other type of file.
2. File Permissions: Stores information about who can read, write, or execute the file,
typically represented as a set of permission bits.
3. Owner and Group: Stores the user identifier (UID) and group identifier (GID) of the
file's owner and group.
4. File Size: Records the size of the file in bytes.
5. Pointers to Data Blocks: Contains pointers or references to the data blocks that store the
actual content of the file. For large files, these pointers may point to indirect blocks,
double indirect blocks, or even triple indirect blocks to accommodate the file's size.
6. Timestamps: Keeps track of various timestamps associated with the file, such as the time
of creation, last access, and last modification.
7. Link Count: Tracks the number of hard links pointing to the inode. When this count
drops to zero, the file system knows that the file or directory can be safely removed.
8. File System Identifier and Inode Number: Identifies the file system to which the inode
belongs and its unique inode number within that file system.
9. File Attributes: Stores additional attributes or metadata associated with the file, such as
extended file attributes or Access Control Lists (ACLs).
In-core inode refers to the in-memory representation of an inode, which is a data structure used
by file systems to store metadata about files and directories. The in-core inode contains various
fields that hold information about the associated file or directory. Below are some common fields
typically found in an in-core inode:
1. File Mode: This field specifies the permissions and type of the file. It includes
information about whether the inode represents a regular file, directory, symbolic link,
device file, or other types. It also includes permission bits indicating read, write, and
execute permissions for the owner, group, and others.
2. Owner and Group IDs: These fields store the numeric identifiers (UID and GID) of the
file's owner and group.
3. File Size: This field indicates the size of the file in bytes.
4. Timestamps: In-core inodes typically store several timestamps indicating important
times related to the file, such as the time of creation (ctime), last access (atime), and last
modification (mtime). These timestamps help track changes to the file over time.
5. Link Count: This field keeps track of the number of hard links pointing to the inode.
When the link count drops to zero, it indicates that the file or directory can be safely
removed.
6. Pointers to Data Blocks: Inodes contain pointers or references to the data blocks that
hold the actual contents of the file. Depending on the file system, these pointers may
include direct pointers, indirect pointers, double indirect pointers, and triple indirect
pointers to accommodate files of varying sizes.
7. File System Identifier and Inode Number: These fields uniquely identify the file
system to which the inode belongs and the inode number assigned to it within that file
system.
8. Extended Attributes: Some file systems support extended attributes that allow
additional metadata to be associated with files or directories. These attributes may
include things like file capabilities, security information, or user-defined metadata.
9. File System-specific Information: Depending on the file system implementation, there
may be additional fields in the in-core inode that hold file system-specific information or
optimizations.
These are some of the key fields typically found in an in-core inode. The exact structure and
fields
Explain the structure of the Regular file. (Draw dig and explain)
Internal Representation of Files
Every file a UNIX system has a unique inode. Processes interact with files using well defined
system calls. The users specify a file with a character string which is the file's path and then the
system get the inode which is mapped to the file which corresponds to the path.
The algorithms described below are above the layer of buffer cache. Diagrammatically, it can be
shown like this:
Inodes
Inodes exist in a static form on the disk. The kernel reads them into in-core inodes and modifies
them.
Owner information: ownership is divided into a user and a group of users. Root user has
access to all the files.
File type: it states whether a file is a normal file, a directory, a block or character special
file, or a device file.
File access permissions: there are 3 types of access permissions: owner, group and others.
There are separate permissions for reading, writing and executing. Since execute
permission is not applicable to a directory, execute permission for a directory gives the
right to search inside the directory.
Access times: the times at which the file was last accessed and last modified, and the time
at which the inodes was last modified
Number of links: number of places from which the file is being referred.
Array of disk blocks: even if the users get a logically sequential representation of data in
files, the actual data is scattered across the disk. This array keeps the addresses of the disk
blocks on which the data is scattered.
File size: the addressing of the file begins from location 0 from relative to the starting
location and the size of the file is the maximum offset of the file + 1. For example, if a
user creates a file and writes a byte at offset 999, the size of the file is 1000.
The inode does not specify the pathname/pathnames that access the file.
The in-core inodes contain the following fields in additional to the fields of the disk inode:
The most striking difference between an in-core inode and a buffer header is the reference count.
The reference count increases when a process allocates that inode, for example, by opening a
file. An inode is on the free list if and only if the reference count of the inode is 0. If it is greater,
some process is still accessing the inode. This means, that an inode can be on a hash queue and
not on the free list even if its not locked. That is not the case with a buffer, a buffer will always
be on the free list if it is not locked.
Accessing Inodes
The algorithm iget allocates an in-core copy of an inode. If the inode is not found on a hash
queue, it allocates an inode from the free list and reads the disk copy into the in-core inode. It
already knows the inode number and device number. It calculates the logical block number on
which the disk inode resides according to how many inodes fit into one disk block. The formula
for calculating the logical block number is:
block number = ((inode number - 1) / number of inodes per block) + start block of inode list
where the division operation returns the integer part of the quotient.
To find the byte offset of the inode in that block, this following formula is used:
byte offset = ((inode number - 1) % number of inodes per block) * size of disk inode
The algorithm iget is given below:
/* Algorithm: iget
* Input: file system inode number
* Output: locked inode
*/
{
while (not done)
{
if (inode in inode cache)
{
if (inode locked)
{
sleep (event: inode becomes unlocked);
continue;
}
// special processing for mount points, covered later
if (inode on free list)
remove inode from free list;
increment reference count of the inode;
return inode;
}
// inode not in the cache
if (free list is empty)
return error;
remove inode from free list;
reset inode number and file system;
remove inode from old hash queue and place it on the new hash queue;
read inode from disk (algorithm: bread);
initialize inode;
return inode;
}
}
The kernel manipulates inode lock and reference count independently. It locks an inode
when it is being accessed or modified. An inode is never locked across system calls. But the
reference count remains set across system calls. The kernel can lock and unlock an inode
independent of the value of the reference count.
The algorithm iget is very similar to getblk but there's one big difference. In getblk, a process
sleeps if the free list is empty. But in iget, an error returned. The reason behind this difference is
that the process have control over inodes at user level with open and close system calls. If a
process opens a file and never closes it, the inode will have reference count of at least 1 and it
will never be on the free list unless the process closes the file. This is not the case with buffers.
Processes don't have user level control over buffers. Therefore, buffers are guaranteed to get free
but inodes are not.
Releasing Inodes
/* Algorithm: iput
* Input: in-core inode
* Output: none
*/
{
lock inode if not already locked;
decrement inode reference count;
if (reference count == 0)
{
if (inode link count == 0)
{
free disk blocks for file (algorithm: free); // free is described later
set file type to 0;
free inode (algorithm: ifree); // ifree is described later
}
if (file accessed or inode changed or file changed)
update disk inode;
put inode on free list;
}
release inode lock;
}
To be able to have constant size and yet allow large files, indirect addressing is used. The inodes
have array of size 13 which for storing the block numbers, although, the number of elements in
array is independent of the storage strategy. The first 10 members of the array are "direct
addresses", meaning that they store the block numbers of actual data. The 11th member is "single
indirect", it stores the block number of the block which has "direct addresses". The 12th member
is "double indirect", it stores block number of a "single indirect" block. And the 13th member is
"triple indirect", it stores block number of a "double indirect" block. This strategy can be
extended to "quadruple" or "quintuple" indirect addressing.
If a logical block on the file system holds 1K bytes and that a block number is addressable by a
32 bit integer, then a block can hold up to 256 block numbers. The maximum file size with 13
member data array is:
The algorithm bmap is used to convert logical byte offset of a file to a physical disk block:
/* Algorithm: bmap
* Input: inode
* byte offset
* Output: block number in file system
* byte offset into block
* bytes of I/O in block
* read ahead block number
*/
{
calculate logical block number in file from byte offset;
calculate start byte in block for I/O; // output 2
calculate number of bytes to copy to user; // output 3
check if read-ahead applicable, mark inode; // output 4
determine level of indirection;
while (not at necessary level of indirection)
{
calculate index into inode or indirect block from logical block number in file;
get disk block number from inode or indirect block;
release buffer from previous disk read, if any (algorithm: brelse);
if (no more levels of indirection)
return (block number);
read indirect disk block (algorithm: bread);
adjust logical block number in file according to level of indirection;
}
}
Some examples of conversions (refer to the figure):
1. To access byte offset 9000: The first 10 blocks contain 10K bytes. So 9000 should be in
the first 10 block. 9000 / 1024 = 8 so it is in the 8th block (starting from 0). And 9000 %
1024 = 808 so the byte offset into the 8th block is 808 bytes (starting from 0). (Block 367
in the figure.)
2. To access byte offset 350000: The first 10 blocks contain 10K bytes (350000 - 10240 =
339760). A single indirect block contains 256K bytes. (339760 - (256 * 1024) = 77616).
So a double indirect block (block 9156 in the figure) must be used. Every single indirect
block in the double indirect block contains 256K, so data must be in the 0th single
indirect block (block 331 in the figure). Every direct block in the single indirect block
addresses 10K bytes (77616 / 10240 = 7). So the data must be in 7th (starting from 0)
direct block (block number 3333 in the figure). And the byte offset will be 77616 % 1024
= 816.
Directories
Directory files have entries of sub directories and files that reside inside them. Directory files
have the mapping of a file name and its inode number. One directory entry takes 16 bytes. 14
bytes are given for the name of the file and 2 bytes for inode number. For example:
The entry . has the inode number of the the directory file itself. And .. has the inode number of
the parent directory. . and .. for the root directory are nothing but inode numbers of the root
directory itself. Entries that have the inode number 0 are empty (i.e. deleted files).
The kernel has exclusive writes to write to a directory. The access permission for directories
mean different things. Read permission is for reading the contents of the directory, write
permission given the permission to create files and directories in that directory, and the execute
permission gives the right to search in that directory.
Algorithm namei (name to inode) is used for converting a path to an inode number. The kernel
parses the path by accessing each inode in the path and finally returning the inode of the required
file. Every process has a current directory. The current directory of process 0 is the root
directory. For every other process, it is the current directory of its parent process. Later the
process can change the current directory with the system call chdir. The inode of the current
directory is stored in the u-area. Searches start with respect to the current directory unless the
path begins with the / component, stating that the search should start with the root directory.
Inode number of the root is present as the global variable. Even if the current root is changed by
using chroot (more on this later), the current root inode number is stored in the u-area.
/* Algorithm: namei
* Input: pathname
* Output: locked inode
*/
{
if (path name starts from root)
working inode = root inode (algorithm: iget);
else
working inode = current directory inode (algorithm: iget);
Superblock
The kernel periodically writes the superblock to the disk if it had been modified so that it is
consistent with the data on the disk.
/* Algorithm: ialloc
* Input: file system
* Output: locked inode
*/
{
while (not done)
{
if (super block locked)
{
sleep (event: super block becomes free);
continue;
}
if (inode list in super block is empty)
{
lock super block;
get remembered inode for free inode search;
search disk for free inodes until super block full, or no more free
inodes (algorithm: bread and brelse);
unlock super block;
wake up (event: super block becomes free);
if (no free inodes found on disk)
return (no inode);
set remembered inode for next free inode search;
}
// there are inodes in super block inode list
get inode number form super block inode list;
get inode (algorithm: iget);
if (inode not free after all)
{
write inode to disk;
release inode (algorithm: iput);
continue;
}
// inode is free
initialize inode;
write inode to disk;
decrement file system free inode count;
return inode;
}
}
If the list of inodes numbers in the super block is not empty, the kernel assigns the next inode
number, allocates a free in-core inode for the newly assigned disk inode using algorithm iget,
copies the disk inode to the in-core copy, initializes the fields in the inode and returns the locked
inode. It updates the inode on disk to indicate that the inode is in use.
If the super block list of free inodes is empty, the kernel searches the disk and places as many
free inode numbers as possible into the super block. The kernel reads the inode list on disk, block
by block, and fills the super block list of inode numbers to capacity, remembering the highest-
numbered inode that it finds ("remembered" inode). It is the last one saved in the super block.
The next time the kernel searches the disk for free inodes, it uses the remembered inode as its
starting point, thereby assuring that it wastes no time reading disk blocks where no free inodes
should exist.
An example:
Super block lists are maintained such that the last inode it dispenses from the list is the
remembered inode.
/* Algorithm: ifree
* Input: file system inode number
* Output: none
*/
{
increment file system free inode count;
if (super block locked)
return;
if (inode list full)
{
if (inode number less than remembered inode for search)
set remembered inode for search = input inode number;
}
else
store inode number in inode list;
return;
}
If the super block is locked, the algorithm returns, the inode number is not updated in the free list
of inode numbers. But the inode can be found when searching the disk blocks. If the disk blocks
list is full and the inode number is less than the remembered inode number, the replace the
remembered inode number with the input inode number, so that the search will start from that
inode.
Ideally, there should never be free inodes whose inode number is less than the remembered inode
number, but exceptions are possible. If an inode is being freed and the super block is locked, in
such situation, the it is possible to have an inode number that is free and is less than the
remembered inode number.
An example:
In ialloc, there is a race condition where an inode can be in use even if we get it from the list of
free inode numbers in the super block. Following is an example where such condition is possible.
Assume that there are 3 processes, A, B, and C. If process A assigns inode I but goes to sleep
before it copies the disk inode into the in-core copy. Algorithms iget (invoked by alloc)
and bread (invoked by iget) give ample opportunity to go to sleep. If, while process A sleeps,
process B tries to acquire an inode but finds that the list of free inode numbers is empty. So it
starts searching (assume that it starts searching at a lower inode number than inode I) the disk
blocks for free inodes and finds inode I, as inode I is still marked free. Process B completes the
search, fills up the list and departs taking an inode number. Process A wakes up and completes
the assignment of inode I. Now suppose process C later requests an inode and happens to pick
inode I from the super block free list. When it gets the in-core copy of the inode, it will find its
file type set, implying that the inode was already assigned. That is why that check is required
in ialloc.
/* Algorithm: alloc
* Input: file system number
* Output: buffer for new block
*/
{
while (super block locked)
sleep (event: super block not locked);
remove block from super block free list;
if (removed last block from free list)
{
lock super block;
read block just taken from free list (algorithm: bread);
copy block numbers in block into super block;
release block buffer (algorithm: brelse);
unlock super block;
wakeup processes (event: super block not locked);
}
get buffer for block removed from super block list (algorithm: getblk);
zero buffer contents;
decrement total count of free blocks;
mark super block modified;
return buffer;
}
The program mkfs tries to organize the original linked list of free block numbers so that block
numbers dispensed to a file are near each other. This helps performance, because it reduces disk
seek time and latency when a process reads a file sequentially. The kernel makes no attempt to
sort block numbers on the free list.
The algorithm free for freeing a block is the reverse of the one for allocating a block. If the super
block list is not full, the block number of the newly freed block is placed on the super block list.
If, however, the super block list is full, the newly freed block becomes a link block; the kernel
writes the super block list into the block and writes the block to disk. It then places the block
number of the newly freed block in the super block list: That block number is the only member
of the list.
works:
There is a difference is how free inode numbers are managed and how free disk block numbers
are managed. All the free disk block numbers are stored as a linked list of arrays but in case of
free inode numbers, all the free inode numbers are not stored. After the capacity of the free inode
numbers exceeds, the other free inode numbers are not stored anywhere. There are 3 reasons for
this different treatment:
1. The kernel can determine whether an inode is free by inspecting its file type. However,
there is no way to know whether a disk block is free by looking at the data in it.
2. Disk block lend themselves to the use of linked list: a disk block easily holds large lists of
free block numbers. But inodes have no convenient place for bulk storage of large list of
free inode numbers.
3. Users tend to use free disk blocks more than free inodes, so the performance lag in
searching would be more noticeable in case of disk blocks than inodes.
The UNIX system supports two other file types: pipes and special files. A pipe, sometimes called
a fifo (first-in-first-out), differs from a regular file in that its data is transient; once data is read
from a pipe, it cannot be read again. Also, the data is read in the order that it was written to the
pipe, and the system allows no deviation from that order. The kernel stores data in a pipe the
same way it stores data in an ordinary file, except that it uses only the direct blocks, not the
indirect blocks. Pipes are described later.
Other types in UNIX system are special files, including block device special files and character
device special files. Both types specify devices, and therefore the file inodes do not reference any
data. Instead, the inode contains two numbers known as the major and minor device numbers.
The major number indicates a device type such as terminal or disk, and the minor number
indicates the unit number of the device. Device files are covered later.
Example:
arduinoCopy code
/ (Root Directory) ├── home │ ├── user1 │ │ ├── file1.txt │ │ ├── file2.txt │ │ └──
Documents │ │ ├── doc1.pdf │ │ └── doc2.txt │ └── user2 │ ├── image.jpg │ └──
Music │ ├── song1.mp3 │ └── song2.mp3 └── etc ├── passwd └── hosts
In this example:
Each file and directory in the example above is associated with an inode, which stores its
metadata. The directory entries map filenames to inode numbers, allowing the operating system
to locate and access files efficiently.
This directory layout provides a structured organization of files and directories, making it easier
for users and applications to navigate and manage data in the file system.
These algorithms are essential for managing the allocation and deallocation of resources within a
file system. They ensure efficient utilization of storage space while maintaining the integrity and
consistency of the file system structure. Proper implementation of these algorithms is crucial for
the overall performance and reliability of the file system.
Syntax:
cCopy code
#include <unistd.h> int dup(int oldfd) ;
Parameters:
Return Value:
If successful, dup() returns a new file descriptor that is a duplicate of oldfd. If an error
occurs, it returns -1, and errno is set to indicate the error.
1. Naming:
Named pipes: As the name suggests, named pipes have a name associated with
them and can be created using the mkfifo command or by simply creating a
special type of file in the filesystem. These named pipes exist persistently in the
file system, and multiple processes can access them using their name.
Unnamed pipes: Unnamed pipes, also known as anonymous pipes, do not have a
name. They are created using the pipe() system call and exist only within the
context of the processes that created them. They are typically used for
communication between a parent process and its child processes.
2. Persistence:
Named pipes: Named pipes persist even after the processes using them have
finished execution. They are akin to regular files in the filesystem.
Unnamed pipes: Unnamed pipes exist only as long as the processes using them
are alive. Once the processes close their ends of the pipe, the pipe is destroyed
and its resources are deallocated by the operating system.
3. Access:
Named pipes: Named pipes are accessed by their name, much like regular files.
Any process with appropriate permissions can open a named pipe using its name
and communicate through it.
Unnamed pipes: Unnamed pipes are typically used for communication between
related processes, such as a parent process and its child processes. They are
created by one process and the file descriptors referring to the ends of the pipe are
passed to the related processes through inheritance during forking.
4. Use cases:
Named pipes: Named pipes are useful for communication between unrelated
processes that may not share a parent-child relationship. They are often used in
scenarios where multiple processes need to communicate with each other
indirectly.
Unnamed pipes: Unnamed pipes are typically used for communication between a
parent process and its child processes, where the pipe is set up before forking to
establish communication channels between the parent and child processes.
Syntax:
cCopy code
#include <unistd.h> ssize_t read(int fd, void *buf, size_t count) ;
Parameters:
Return Value:
If successful, read() returns the number of bytes read. If the end of the file (EOF) is
reached, it returns 0. If an error occurs, it returns -1, and errno is set to indicate the error.
Example: Suppose we have a file named example.txt with the following contents:
Copy code
Hello, world!
In Unix-like operating systems, the cd command is used to change the current working directory.
When you open a terminal or command prompt, you typically start in a specific directory known
as the "home directory" of the user. Using the cd command, you can navigate to different
directories in the file system hierarchy.
Syntax: cd [directory_path]
If you provide a directory path as an argument to cd, it changes the current working
directory to the specified directory.
If you don't provide any arguments, cd changes the current working directory to the
user's home directory.
Examples:
Example:
bashCopy code
chroot /mnt/newroot /bin/bash
In this example, /mnt/newroot is set as the new root directory, and /bin/bash is executed within
this new environment.
Short Note:
Change Directory (cd): cd is a command used to navigate the file system hierarchy in
Unix-like operating systems. It allows users to switch between directories conveniently.
Change Root (chroot): chroot is a command used to create a restricted environment by
changing the root directory for a process. It isolates the process and its children from the
rest of the file system, providing a controlled environment for running applications or
performing system maintenance tasks.
1. Input Parameters:
File Path: The path to the file to be created.
File Mode: The permissions to be set on the newly created file. This typically
includes read, write, and execute permissions for the owner, group, and others.
2. Permission Checking:
Before creating the file, the operating system checks the permissions of the
calling process to ensure it has the necessary permissions to create a file in the
specified directory.
3. File Creation:
The operating system performs the following steps to create the file:
It searches the directory specified in the file path to ensure that a file with
the same name does not already exist. If a file with the same name exists
and the O_CREAT flag is not specified, the creat system call returns an
error.
If the file does not exist or the O_CREAT flag is specified, the operating
system creates a new file entry in the directory's inode table.
It allocates an inode (index node) for the file and initializes its metadata,
such as file type, permissions, owner, group, and timestamps.
The file's data blocks are allocated on the disk to store the file's content.
For small files, direct blocks may be used, while larger files may require
indirect or doubly indirect blocks.
If the O_TRUNC flag is specified, the operating system truncates the file,
discarding its previous contents. Otherwise, the file is created with zero
size.
4. File Descriptor:
After successfully creating the file, the creat system call returns a file descriptor,
which is an integer representing the newly created file. This file descriptor can be
used for subsequent read and write operations on the file.
5. Error Handling:
If any errors occur during the file creation process, such as insufficient
permissions, disk full, or file system errors, the creat system call returns an error
code to the calling process, indicating the nature of the error.
6. Permissions Setting:
The operating system sets the permissions of the newly created file according to
the specified file mode or permissions. These permissions determine who can
read, write, or execute the file.
7. Completion:
Finally, the creat system call returns control to the calling process, indicating
whether the file creation was successful or if an error occurred.
In summary, the creat system call creates a new file in the file system by allocating resources,
initializing metadata, setting permissions, and returning a file descriptor to the calling process. It
provides a fundamental mechanism for file creation in Unix-like operating systems.
Draw the file system before and after executing following mount system call-
Mount("/dev/dsk1/","/user",0)
Explain algorithm for mounting the file.
Mounting a file system involves making the contents of a storage device or another file system
available at a specific directory location within an existing file system hierarchy. Here's a
simplified explanation of the algorithm for mounting a file system:
1. Check Permissions and Privileges:
Ensure that the user performing the mount operation has the necessary
permissions and privileges to mount file systems. Typically, this requires
superuser (root) privileges.
2. Specify Mount Point:
Determine the directory where the file system will be mounted, known as the
mount point. This directory should exist within the existing file system hierarchy.
3. Identify File System:
Identify the file system to be mounted, which could be on a physical disk
partition, a logical volume, a network share, or another storage device. The file
system type (e.g., ext4, NTFS, NFS) must be known.
4. Check File System Integrity:
Optionally, perform a file system check (e.g., fsck command) to ensure the
integrity of the file system structure. This step is particularly important if the file
system is coming from an unreliable source or has not been cleanly unmounted in
the past.
5. Mount File System:
Use the mount command to mount the file system onto the specified mount point.
The general syntax is:
phpCopy code
mount -t <file_system_type> <source> <mount_point>
<file_system_type>: The type of file system being mounted (e.g., ext4,
NTFS, NFS).
<source>: The device or file representing the file system to be mounted
(e.g., /dev/sdb1, /mnt/network_share).
<mount_point>: The directory where the file system will be mounted
within the existing file system hierarchy.
6. Update Mount Table:
Update the system's mount table (e.g., /etc/fstab on Linux) to ensure that the file
system is automatically mounted upon system boot, if desired. This step involves
adding an entry specifying the file system type, source, mount point, and mount
options.
7. Verify Mount:
Verify that the file system has been successfully mounted by listing the contents
of the mount point directory (ls <mount_point>) or checking the output of the
mount command to see if the file system appears in the list of mounted file
systems.
8. Unmounting (Optional):
To unmount the file system and detach it from the mount point, use the umount
command:
phpCopy code
umount <mount_point>
This step is necessary when the file system is no longer needed or when
performing system maintenance.
By following this algorithm, administrators can mount file systems effectively, ensuring that the
desired data becomes accessible within the file system hierarchy of the operating system.
Read Operation:
Write Operation:
The kernel has a process table where it stores the state of the process and other information about
the process. The information of the entry and the u-area of the process combined is the context of
the process.
The process enters the created state when the parent process executes the fork system call model
and eventually moves into a state where it is ready to run (3 or 5). The scheduler will eventually
pick the process and the process enters the state kernel running, where it completes its part
of fork system call. After the completion of system call, it may move to user running. When
interrupts occur (such as system call), it again enters the state kernel running. After the servicing
of the interrupt the kernel may decide to schedule another process to execute, so the first process
enters the state preempted. The state preempted is really same as the state ready to run in
memory, but they are depicted separately to stress that a process executing in kernel mode can be
preempted only when it is about to return to user mode. Consequently, the kernel could swap a
process from the state preempted if necessary. Eventually, it will return to user running again.
When a system call is executed, it leaves the state user running and enters the state kernel
running. If in kernel mode, the process needs to sleep for some reason (such as waiting for I/O),
it enters the state asleep in memory. When the event on it which it has slept, happens, the
interrupt handler awakens the process, and it enters the state ready to run in memory.
Suppose the system is executing many processes that do not fit simultaneously into main
memory, then the swapper (process 0) swaps out a process to make room for another process that
is in the state ready to run swapped. When evicted from main memory, the process enters the
state ready to run swapped. Eventually, swapper chooses the process as most eligible to run and
it re-enters the state ready to run in memory. And then when it is scheduled, it will enter the
state kernel running. When a process completes and invokes exit system call, thus entering the
states kernel running and finally, the zombie state.
Some state transitions can be controlled by the users, but not all. User can create a process. But
the user has no control over when a process transitions to sleeping in memory to sleeping in
swap, or ready to run in memory to ready to run in swap, etc. A process can make a system call
to transition itself to kernel running state. But it has no control over when it will return from
kernel mode. Finally, a process can exit whenever it wants, but that is not the only reason
for exit to be called.
The process transitions follow a rigid model encoded in the kernel, reacting to events in a
predictable way according to formulated rules (studied later). For example, no process can
preempt another process executing in the kernel.
Two kernel data structures describe the state of a process: the process table entry and the u-area.
The process table contains information that should be accessible to the kernel and the u-area
contains the information that should be accessible to the process only when its running. Kernel
allocates space for u-area only when creating a process. It does not need u-area for process table
entries that do not have processes. For example, the process table entries that contain the
information about the kernel context, do not have processes.
The u-area contains these fields (some are covered previously as well) :
A pointer in the process table identifies the entry that corresponds to the u-area.
The real and effective user IDs determine various privileges allowed the process, such as
file access rights.
Timer fields record the time the process spent executing in user mode and in kernel
mode.
An array indicates how the process wishes to react to signals.
The control terminal field identifies the "login terminal" associated with the process, if
one exists.
An error field records errors encountered during a system call.
A return value field contains the result of system calls.
I/O parameters describe the amount of data to transfer, the address of the source (or
target) data array in user space, file offsets for I/O, and so on.
The current directory and current root describe the file system environment of the
process.
The user file descriptor table records the files the process has open.
Limit fields restrict the size of a process and the size of a file it can write.
A permission modes field masks mode settings on files the process creates.
The physical memory is addressable. Starting from offset 0, going up to the amount of physical
memory. A process in UNIX contains three sections: text, data, and stack. Text section contains
the instructions. Those instructions could refer to other addresses, for example, addresses of
different subroutines, addresses of global variables in the data section, or the addresses of local
data structures on the stack. If the addresses generated by the compiler were to be treated as
physical addresses, it would be impossible to run more than one process at a time, because the
addresses could overlap. Even if the compiler tries to address this problem by using heuristics, it
would be difficult and impractical.
To solve this problem, the kernel treats the addresses given by the compiler as virtual addresses.
And when the program starts executing, the memory management unit translates the virtual
addresses to physical addresses. The compiler doesn't need to know which physical addresses the
process will get. For example, two instances of a same program could be executing in memory
using the same virtual addresses but different physical addresses. The subsystems of the kernel
and the hardware that cooperate to translate virtual to physical addresses comprise the memory
management subsystem.
Regions
The UNIX system divides its virtual address space in logically separated regions. The regions
are contiguous area of virtual address space. A region is a logically distinct object which can be
shared. The text, data, and stack are usually separate regions. It is common to share the text
region among instances of a same process.
The region table entries contain the physical locations at which the region is spread. Each
process contains a private per process regions table, called a pregion. The pregion entry contains
a pointer to an entry in the region table, and contains starting virtual address of the
region. pregion are stored in process table, or u-area, or a separately allocated memory space,
according to the implementation. The pregion entries contain the access permissions: read-only,
read-write, or read-execute. The pregion and the region structure is analogous to file table and
the in-core inode table. But since, pregions are specific to a process, pregion table is private to a
process, however the file table is global. Regions can be shared amongst processes.
An example of regions:
In a memory model based a pages, the physical memory is divided into equal sized blocks
called pages. Page sizes are usually between 512 bytes to 4K bytes, and are defined by the
hardware. Every memory location can be address by a "page number" and "byte offset" in the
page. For example, a machine with 2^32 bytes of memory has pages of size 1K bytes (2^10),
then it will have 2^22 pages.
When kernel assigns physical pages of memory to a region, it need not assign the pages
contiguously or in any particular order. Just like disk blocks are not assigned contiguously to
avoid fragmentation.
The kernel maintains a mapping of logical to physical page numbers in a table which looks like
this:
These tables are called page tables. Region table entry has pointers to page tables. Since logical
address space is contiguous, it is just the index into an array of physical page numbers. The page
tables also contain hardware dependent information such as permissions for pages. Modern
machines have special hardware for address translation. Because software implementation of
such translation would be too slow. Hence, when a process starts executing, the kernel tells the
hardware where its page tables reside.
For being hardware independent, let us assume that the hardware has register triples (in
abundance) which the kernel uses for memory management. The first register in the triple
contains the address of the page table, the second register contains the first virtual address
mapped by the page table, and the third register contains control information such as number of
pages in page tables and page access permissions. When executing a process, the kernel loads
such register triples with the data in the pregion entries.
If a process accesses an address outside its virtual address space, an exception is generated.
Suppose a process has 0 to 16K bytes of address space and the process accesses the virtual
address 20K, an exception will generated, and it is caught by the operating system. Similarly, if a
process tries to access a page without having enough permission, an exception will be generated.
In such cases, process normally exit.
Even if the kernel executes in the context of a process, its virtual address space is independent of
processes. When the system boots up, the kernel is loaded into memory and necessary page
tables and registers are loaded with appropriate data. Many hardware systems slice a process'
address space into many sections, user and kernel being two of them. When in user mode, access
to kernel page tables is prohibited. When the process switches to kernel mode, only then it can
access kernel page tables. Some system implementations try to allocate the same physical pages
to the kernel, keeping the translation function, an identity function.
Example:
User level context consists of the process text, data, user stack and shared memory that is in the
virtual address space of the process. The part which resides on swap space is also part of the user
level context.
The system level context has a "static part" and a "dynamic part". A process has one static part
throughout its lifetime. But it can have a variable number of dynamic parts.
The kernel stack contains the stack frames the kernel functions. Even if all processes
share the kernel text and data, kernel stack needs to be different for all processes as every
process might be in a different state depending on the system calls it executes. The
pointer to the kernel stack is usually stored in the u-area but it differs according to system
implementations. The kernel stack is empty when the process executes in user mode
The dynamic part of the system level context consists of a set of layers, visualized as a
last-in-first-out stack. Each system-level context layer contains information necessary to
recover the previous layer, including register context of the previous layer.
The kernel pushes a context layer when an interrupt occurs, when a process makes a system call,
or when a process does a context switch. It pops a context layer when it returns from an interrupt
handler, returns from a system call, or when a context switch happens. Thus, a context switch
entails a push-pop operation: The kernel pushes the context of the old process and pops the
context layer of the new process. The process table entry stores the necessary information to
recover the current context layer.
The following figure shows the components that form the context of a process:
The right side of the figure shows the dynamic portion of the context. It consists of several stack
frames where each stack frame contains saved register context of the previous layer and the
kernel stack as it executes in that layer. System context layer 0 is a dummy layer that represents
the user-level context; growth of the stack here is in the user address space and the kernel stack is
null. The process table entry contains the information to recover the current layer in of the
process (shown by the arrow).
The maximum number of context layer depends on the number of levels of interrupts the system
supports. Suppose the system supports 5 levels interrupts (software, terminal, disk, peripheral
devices, clock), then 7 context layers are enough. 1 for user-level context, 1 for system call and 5
for different interrupt levels. As the kernel blocks all the lower level interrupts when a higher
level interrupt occurs, there could be maximum 1 interrupt of each level. Therefore, in the worst
case (interrupts occurring in the sequence of level 1 to 5), 7 context layers will be used. Although
the kernel executes in the context of a process, the logical function it performs do not necessarily
pertain to the process. For instance, interrupts handlers do not generally modify the static parts of
the the process.
The system is responsible for handling interrupts and exceptions. If the system is executing at a
lower processor execution level, when an interrupts occurs, the kernel accepts the interrupt
before decoding the next instruction and then raises the processor execution level to block other
interrupts of that or lower level. It handles the interrupt by performing following sequence of
operations:
1. It saves the current register context of the executing process and creates (pushes) a new
context layer.
2. The kernel determines the source (cause) of the interrupt, and if applicable, unit number
(such as which drive caused the interrupt). When the system receives an interrupt, it gets
a number. It uses that number as an index into the interrupt vector, which stores the
actions to be taken (interrupt handlers) when interrupts occur. Example of interrupt
vector:
3. The kernel invokes the interrupt handler. The kernel stack of the new context layer is
logically distinct from the kernel stack of the previous context layer. Some
implementations use the processes kernel stack to store the stack frame of an interrupt
handler, while some implementations use a global interrupt stack for the interrupt
handlers which are guaranteed to return without a context switch.
4. The kernel returns from the interrupt handler and executes a set of hardware instructions
which restore the previous context. The interrupt handler may affect the behavior of the
process as it might modify the kernel data structures. But usually, the process resumes
execution as if the interrupt never occurred.
/* Algorithm: inthand
* Input: none
* Output: none
*/
{
save (push) current context layer;
determine interrupt source;
find interrupt vector;
call interrupt handler;
restore (pop) previous context layer;
}
Example of state of the context layer as stack as system call is executed and interrupts occur:
The library functions such as open, read, etc. in the standard C library are not actually system
calls. Those are normal functions and normal functions cannot change the mode of execution of
the system. These functions invoke a special instruction which makes the system change its
execution mode to kernel mode and start executing the system call code. The instruction is called
as operating system trap. The system calls are a special case of interrupt handling. The library
routines pass the a number unique for each system call, as the parameter to the operating system
trap through a specific register or on the stack. Through that number, the kernel determines
which system call to execute.
/* Algorithm: syscall
* Input: system call number
* Output: result of system call
*/
{
find entry in the system call table corresponding to the system call number;
determine number of parameters to the system call;
copy parameters from the user address space to u-area;
save current context for abortive return; // studied later
invoke system call code in kernel;
if (error during execution of system call)
{
set register 0 in user saved register context to error number;
turn on carry bit in PS register in user saved register context;
}
else
set register 0, 1 in user saved register context to return values from system
call;
}
Register 0 and 1 are used to exchange information between user mode and kernel mode. For
being machine independent, the terms register 0 and 1 are used. The kernel calculates the address
of the parameters by adding (or subtracting, according to the direction of growth of the stack) an
offset to the user stack pointer, corresponding to the number of parameters to the system call.
The setting of carry bit indicates that there was an error in the system call execution. After
execution of this algorithm, the library function determines the return value from registers 0 and
1 and returns it to the user.
Consider the following code which calls the creat function of the C library. And the assembly
code generated by the compiler (on a Motorola 68000) :
Consider the stack configuration for the above program:
The code for main begins at address 58. It copies the parameters 0666 and the variable "name"
onto the user stack. The order in which the parameters are copied is machine dependent. Then it
calls the C library function creat at address 64. The function is at address 7a. The return address
from the function call is 6a, and the process pushes this number on the stack. The function places
the value 8 (number of the creat system call) in register 0 and then executes the trap instruction
which makes the system switch to kernel mode. The kernel checks the number of the system call
and determines that the routine for creat is to be executed. It determines the number of
parameters to the system call and copies them to the u-area from the user stack. When the system
call returns, the interrupt handler checks it the u-area field for error is set, and if it is, it sets the
carry bit and places the error code in register 0, and returns. The library routine then checks if
carry bit is set, if it is, it jumps to address 13c. It moves the error code from register 0 to a global
variable errno at address 20e and moves -1 in the data register 0, and returns to address 86. If the
carry bit was not set, the creat library function will return and further execution will start after
the call at address 64. Register 0 contains the return value from the system call.
Context Switch
The process followed for a context switch is similar to that of a system calls and interrupts.
Except that the context layer of a different process is restored instead of the previous layer of the
same process. The choice of which process to schedule is a policy decision and it is independent
of the mechanism of the context switch.
The code that implements the context switch on UNIX systems is usually the most difficult to
understand in the operating system, because function calls give the appearance of not returning
on some occasions and materializing from nowhere on the others. This is because the kernel, in
many implementations, saves the process context at one point in the code but proceeds to execute
the context switch and scheduling algorithms in the context of the "old" process. When it later
restores the context of the process, it resumes execution according to the previously saved
context. To differentiate between the case where the kernel resumes the context of a new process
and the case where it continues to execute in the old context after having saved it, the return
values of critical functions may vary, or the program counter where the kernel executes may be
set artificially.
Situations arise when kernel needs to abort the current execution sequence and restore a
previously save context. Such situations are discussed later in the sections
where sleep and signals are described. The mechanism for such context restore is discussed here.
For saving the context, the algorithm setjmp is used and the algorithm longjmp is used to restore
the previous context. setjmp is similar to save_context except that save_context pushes a new
context layer whereas setjmp saves the current context in the u-area and continues to execute in
the old context layer. When the kernel wishes to restore the context saved in setjmp, it does
a longjmp, restoring the context saved in the u-area and returning 1 from setjmp.
The process executes in kernel mode or in user mode. But the system calls need to copy the data
from user to kernel address space and vice-a-versa. Addresses in kernel address space are of
course, not accessible from user address space. The kernel copies data from user address space
kernel address space and vice-a-versa. The kernel when accessing addresses is user address
space, must ascertain that the address is in user space, otherwise it could inadvertently, write on
some kernel data structures and corrupt the system. Therefore, copying data between kernel and
user address space is an expensive operation, requiring more than one instruction.
This is the VAX code for moving one character from user to kernel address space:
The prober instruction checks if one byte at the address argument pointer register + 4 (indicated
by : 4(ap)) could be read in user mode (mode 3). If not, the kernel branches to address eret,
stores -1 in register 0, and returns; the move failed. Otherwise, the kernel copies one byte from
the user given address to register 0 and returns the value to the caller. The procedure requires 5
instructions (with the call to fubyte) to move 1 character; the process is expensive.
The inode of the file from which the region was initially loaded.
The type of the region (text, shared memory, private data, or stack).
The size of the region.
The location of the region in physical memory.
The state of the region:
o locked
o in demand
o being loaded into memory
o valid, loaded into memory
The reference count, giving the number of processes that reference the region
lock a region
unlock a region
allocate a region
attach a region to the memory space of a process
change the size of a region
load a region from a file into the memory space of a process
free a region
detach a region from the memory space of a process, and duplicate the contents of a
region
The kernel can lock and unlock a region independent of the operations to allocate and free a
region, just like the the locking-unlocking mechanism of inodes is independent of the allocating
the releasing (iget and iput) inodes.
Allocating a Region
The kernel allocates a region in fork, exec, and shmget (shared memory get) system calls. Just
like inodes, the kernel contains a free list of regions, when a region is to be allocated, the kernel
picks up the first region from the free list and places it on the active list. The inode is used by the
kernel to enable other process to share the region. The kernel increments the inode reference
count to prevent other processes from removing its contents when unlinking it.
/* Algorithm: allocreg
* Input: indoe pointer
* region type
* Output: locked region
*/
{
remove region from linked list of free regions;
assign region type;
assign region inode pointer;
if (inode pointer not null)
increment inode reference count;
place region on linked list of active regions;
return (locked region);
}
We check if the inode pointer is not null because there are a few exceptions where a region is not
associated with an inode.
The kernel attaches the region to a processes address space with the attachreg system call. It is
used in fork, exec, and shmat. The region being attached might be newly allocated or might be an
already allocated region which is to be shared.
/* Algorithm: attachreg
* Input: pointer to (locked) region being attached
* process to which the region is being attached
* virtual address in process where region will be attached
* region type
* Output: pre process region table entry
*/
{
allocate per process region table entry for process;
initialize per process region table entry;
set pointer to region being attached;
set type field;
set virtual address field;
check legality of virtual address, region size;
increment region reference count;
increment process size according to attached region;
initialize new hardware register triple for process;
return (per process region table entry);
}
The stack region of a process grows and shrinks automatically according to the nesting of calls.
It uses the growreg algorithm. The data region of a process can be extended with the sbrk system
call. It also internally calls growreg. Both of these regions are private to a process. Shared
regions cannot be extended, so there are no side-effects of growreg. Text regions also cannot
grow.
/* Algorithm: growreg
* Input: pointer to per process region table entry
* change in size of region (positive or negative)
* Output: none
*/
{
if (region size increasing)
{
check legality of new region size;
allocate auxiliary tables (page tables);
if (not system supporting demand paging)
{
allocate physical memory;
initialize auxiliary tables, as necessary;
}
}
else // region size decreasing
{
free physical memory, as appropriate;
free auxiliary tables, as appropriate;
}
Loading a Region
In a system supporting demand paging, the kernel can "map" a file into process address space
during the exec system call, arranging to read individual physical pages later on demand (studied
later). If the kernel does not support demand paging, it must copy the executable file into
memory, loading the process regions at virtual addresses specified in the executable file. It may
attach a region at a different virtual address from where it loads the contents of the file, creating
a gap in the page table. For example, this feature is used to cause memory faults when user
programs access address 0 illegally. Programs with pointer variables sometimes use them
erroneously without checking that their value is 0 and, hence, that they are illegal for use as a
pointer reference. By protecting the page containing address 0 appropriately, processes that
errantly access address 0 incur a fault and abort, allowing programmers to discover such bugs
more quickly.
/* Algorithm: loadreg
* Input: pointer to per process region table entry
* virtual address to load region
* inode pointer of file for loading region
* byte offset in file for start of region
* byte count for amount of data to load
* Output: none
*/
{
increase region size according to eventual size of region (algorithm: growreg);
mark region state: being loaded into memory;
unlock region;
set up u-area parameters for reading file:
target virtual address where data is read to,
start offset value for reading file,
count of bytes to read from file;
read file into region (internal variant of read algorithm);
lock region;
mark region state: completely loaded into memory;
awaken all processes waiting for region to be loaded�;
}
For example, if the kernel wants to load text of size 7K into a region that is attached at virtual
address 0 of a process but wants to leave a gap of 1K bytes at the beginning of the region. By this
time, the kernel will have allocated a region table entry and will have attached the region at
address 0 using algorithms allocreg and attachreg. Now it invokes loadreg, which
invokes growreg twice -- first, to account for the 1K byte gap at the beginning of the region, and
second, to allocate storage for the contents of the region -- and growreg allocates a page table for
the region. The kernel then sets up fields in the u-area to read the file: It reads 7K bytes from a
specified byte offset in the file (supplies as a parameter by the kernel) into virtual address 1K of
the process. It is shown in the following diagram:
Freeing a Region
When a kernel no longer needs a region, it frees the region and places it on the free list again.
The algorithm freereg is given below:
/* Algorithm: freereg
* Input: pointer to a (locked) region
* Output: none
*/
{
if (region reference count non zero)
{
// some process still using region
release region lock;
if (region has an associated inode)
release inode lock;
return;
}
if (region has associated inode)
release inode (algorithm: iput);
free physical memory still associated with region;
free auxiliary tables associated with region;
clear region fields;
place region on region free list;
unlock region;
}
The kernel detaches regions in the exec, exit, and shmdt system calls. It updates the pregion entry
and cuts the connection to physical memory by invalidating the associated memory management
register triple. The address translation mechanisms thus invalidated apply specifically to
the process, not to the region (as in the algorithm freereg). The kernel decrements the region
reference count. If the region referenced count drops to 0 and there is no reason to keep the
region in memory (studied later), the kernel frees the region with algorithm freereg. Otherwise, it
only releases the region and inode locks.
/* Algorithm: detachreg
* Input: pointer to per process region table entry
* Output: none
*/
{
get auxiliary memory management tables for process, release as appropriate;
decrement process size;
decrement region reference count;
if (region reference count is 0 and region not stick bit) // studied later
free region (algorithm: freereg;)
else // either reference count non-0 or region sticky bit on
{
free inode lock, if applicable (inode associated with region);
free region lock;
}
}
Duplicating a Region
In the fork system call, the kernel needs to duplicate the regions of a process. If the region is
shared, the kernel just increments the reference count of the region. If it is not shared, the kernel
has to physically copy it, so it allocates a new region table entry, page table, and physical
memory for the region. The algorithm dupreg is given below:
/* Algorithm: dupreg
* Input: pointer to region table entry
* Output: pointer to a region that looks identical to input region
*/
{
if (region type shared)
// caller will increment region reference count with subsequent attachreg call
return (input region pointer);
allocate new region (algorithm: allocreg);
set up auxiliary memory management structures, as currently exist in input region;
allocate physical memory region contents;
"copy" region contents from input region to newly allocated region;
return (pointer to allocated region);
}
The state of regions when process A forks process B:
Sleep
Processes sleep inside of system calls awaiting for a particular resource or even if a page fault
occurs. In such cases, they push a context layer and do a context switch. The context layers of a
sleep process are shown below:
Sleep Event and Addresses
As seen previously, processes sleep on a particular event (for example, unlocking of a buffer).
When it is said that processes sleep on an event, they actually sleep on an address. This
mechanism has couple of disadvantages. If there are many processes sleeping on a buffer to
become unlocked, the kernel wakes up all the processes when the buffer gets unlocked, even
though many of them will go back to sleep after entering the kernel mode.
Another disadvantage is that, multiple events can map to a single address. As shown in the figure
below:
Many process are waiting on addr A but some are waiting for the buffer while a process is
awaiting I/O completion. On any of these two events, all the processes will be woken up. Even
though any one of these two events occurs, all the process will be woken up since they are
sleeping on the same address. It would have been better if there was a one-to-one mapping, but
practically, such clashes are rare and system performance is not affected.
/* Algorithm: sleep
* Input: sleep address
* priority
* Output: 1 if a process awakened as a result of a signal that process catches,
* longjmp if the process is awakened as a result of a signal it does not catch,
* 0 otherwise
*/
{
raise processor execution level to block all the interrupts;
set process state to sleep;
put process on sleep hash queue, based on sleep address;
save sleep address in process table slot;
set process priority level to input priority;
if (process sleep is not interruptible)
{
do context switch;
// process resumes execution here when it wakes up
reset processor priority level to allow interrupts as when process went to
sleep;
return (0);
}
/* Algorithm: wakeup
* Input: sleep address
* Output: none
*/
{
raise process execution level to block all the interrupts;
find sleep hash queue for sleep address;
for (every process asleep on sleep address)
{
remove process from hash queue;
mark process as "ready to run";
put process on scheduler list of processes ready to run;
clear field in process table entry for sleep address;
if (process not loaded in memory)
wake up swapper process (0);
else if (awakened process is more eligible to run than currently running
process)
set scheduler flag;
}
restore processor execution level to original level;
}
If a process that is woken up is not loaded in the memory, the kernel wakes up the swapper
process to swap the process in memory. Otherwise, if the awakened process is more eligible to
run than currently executing process, the kernel sets a scheduler flag so that it will go through the
scheduling algorithm after the process enters the user mode.
wakeup does not cause the process to be scheduled immediately; it only makes the process
eligible to get scheduled.
Processes usually sleep on events that are guaranteed to occur. Such as, I/O of a buffer or an
inode/buffer to get unlocked. But processes also might sleep on events that are not guaranteed to
happen. In such cases, there must be a mechanism through which processes can wakeup and
regain control. The kernel can send a signal to such processes (signals are studied later) and
wake them up. A signal can be sent selectively to a process, and as a result, it wakes up and it can
recognize that a signal has been received.
If the sleep priority is above a threshold value, a process will not wake up on receiving a signal,
but will sleep until the event occurs. But if the priority is below a threshold value, it will wakeup
immediately on receipt of a signal.
In the case where a signal has arrived when a process enters the sleep algorithm, the process will
sleep if the sleep priority is above a threshold value, but if the priority value is below the
threshold, it will never sleep and respond to the signal as if it had arrived while it was sleeping. If
it had slept, the signal might not arrive later and the process might never wakeup.
If a process wakes up (or never sleeps as described above) on reception of a signal, the process
may do a longjmp depending on the event that it had slept on. The kernel does a longjmp if, after
reception of a signal, there is no way to complete the system call. For instance, if a process
is reading from a terminal and that terminal is switches off, there is no way for read to complete
and it should return with an error. This holds for all system calls that can be interrupted while
they are in sleep. In the algorithm syscall, the kernel saves the previous context using setjmp in
anticipation of the need for a later longjmp.
There are situations where kernel wants to wakeup on receipts of a signal but do not want to do
a longjmp. The kernel invokes the sleep algorithm with a special priority parameter that
suppresses execution of the longjmp and causes the sleep algorithm to return the value 1. The
purpose is to allow the kernel to cleanup the local data structures. For example, a device driver
may allocate private data structures and then sleep at an interruptible priority; if i
1. Input:
The algorithm takes as input the size of the region to be allocated (size) and any
additional parameters or constraints relevant to the memory management system.
2. Search for Free Space:
The first step is to search for a suitable region of memory that can accommodate
the requested size.
This can involve traversing through memory data structures such as free lists,
memory maps, or other data structures maintained by the memory manager.
The algorithm should consider fragmentation, memory alignment requirements,
and any other constraints specified by the memory management policy.
3. Select a Region:
Once a suitable free space is identified, the algorithm selects the region for
allocation.
The selection criteria may vary depending on the memory management policy.
For example, it may prioritize minimizing fragmentation, maximizing contiguous
allocation, or considering other factors such as memory locality.
4. Allocate Region:
The selected region is then marked as allocated in the memory management data
structures.
This typically involves updating metadata associated with the memory region to
indicate that it is now in use.
If necessary, adjustments may be made to account for alignment requirements or
other constraints.
5. Return Pointer to Allocated Region:
Finally, the algorithm returns a pointer or reference to the allocated region, which
can be used by the requesting process to access the allocated memory.
6. Error Handling:
If a suitable region cannot be found (e.g., due to insufficient free space), the
algorithm may return an error code or NULL pointer to indicate failure.
Error handling mechanisms should be in place to handle such scenarios gracefully
and prevent system instability or crashes.
Explanation:
Efficiency: The algorithm aims to efficiently allocate memory regions while adhering to
the constraints and policies defined by the memory management system.
Optimization: Depending on the system requirements, the algorithm may employ
various optimizations such as best-fit, first-fit, or next-fit strategies to minimize
fragmentation and improve memory utilization.
Concurrency: In multi-threaded or multi-process environments, the algorithm should
ensure thread safety and prevent race conditions by employing appropriate
synchronization mechanisms.
Scalability: The algorithm should be scalable to handle a large number of allocation
requests efficiently without significant performance degradation.
Flexibility: The algorithm should be flexible enough to accommodate different memory
management policies and adapt to varying workload characteristics.
The main duty or the work of the Operating System is to complete the given process in
less than the given stipulated time. So, the term process is very important for the subject
named Operating Systems. Now, let us learn everything about the term process in a very
deep manner.
Definition of Process
Basically, a process is a simple program.
An active program which running now on the Operating System is known as the
process. The Process is the base of all computing things. Although process is relatively
similar to the computer code but, the method is not the same as computer code. A
process is a "active" entity, in contrast to the program, which is sometimes thought of as
some sort of "passive" entity. The properties that the process holds include the state of
the hardware, the RAM, the CPU, and other attributes.
In other words, we create computer programs as text files that, when executed, create
processes that carry out all of the tasks listed in the program.
When a program is loaded into memory, it may be divided into the four components
stack, heap, text, and data to form a process. The simplified depiction of a process in the
main memory is shown in the diagram below.
Stack
The process stack stores temporary information such as method or function arguments,
the return address, and local variables.
Heap
Text
This consists of the information stored in the processor's registers as well as the most
recent activity indicated by the program counter's value.
Data
Program
Program is a set of instructions which are executed when the certain task is allowed to
complete that certain task. The programs are usually written in a Programming
Language like C, C ++, Python, Java, R, C # (C sharp), etc.
S. Process Program
No
4 Process is created during the execution Program is already existed in the memory
and it is loaded directly into the main and it is present in the secondary memory.
memory
5 Process has its own control system Program does not have any control system.
known as Process Control Block It is just called when specified and it
executes the whole program when called
6 Process changes from time to time by Program cannot be changed on its own. It
itself must be changed by the programmer.
7 A process needs extra data in addition Program is basically divided into two parts.
to the program data needed for One is Code part and the other part is data
management and execution. part.
Now, let us understand the Process Control Block with the help of the components
present in the Process Control Block.
1. Process ID
2. Process State
3. Program Counter
4. CPU Registers
5. CPU Scheduling Information
6. Accounting and Business Information
7. Memory Management Information
8. Input Output Status Information
Now, let us understand about each and every component in detail now.
1) Process ID
It is a Identification mark which is present for the Process. This is very useful for finding
the process. It is also very useful for identifying the process also.
2) Process State
Now, let us know about each and every process states in detail. Let me explain about
each and every state
i) New State
A Program which is going to be taken up by the Operating System directly into the Main
Memory is known as a New Process State
The ready state, when a process waits for the CPU to be assigned, is the first state it
enters after being formed. The operating system pulls new processes from secondary
memory and places them all in main memory.
The term "ready state processes" refers to processes that are in the main memory and
are prepared for execution. Numerous processes could be active at the moment.
The OS switches a process to the block or wait state and allots the CPU to the other
processes while it waits for a specific resource to be allocated or for user input.
v) Terminated State
A process enters the termination state once it has completed its execution. The
operating system will end the process and erase the whole context of the process
(Process Control Block).
3) Program Counter
The address of the following instruction to be executed from memory is stored in a CPU
register called a program counter (PC) in the computer processor. It is a digital counter
required for both task execution speed and for monitoring the present stage of
execution.
4) CPU Registers
When the process is in a running state, here is where the contents of the processor
registers are kept. Accumulators, index and general-purpose registers, instruction
registers, and condition code registers are the many categories of CPU registers.
The State of Business Addressing and Information includes information such as CPU use,
the amount of time a process uses in real time, the number of jobs or processes, etc.
This Input Output Status Information section consists of Input and Output related
information which includes about the process statuses, etc.
Explain with example mapping of process virtual address to physical memory address.
Memory Translations
The physical address space is your system RAM, the memory modules
inside your ESXi hosts, also referred to as the global system memory.
When talking about virtual memory, we are talking about the memory
that is controlled by an operating system, or a hypervisor like vSphere
ESXi. Whenever workloads access data in memory, the system needs
to look up the physical memory address that matches the virtual
address. This is what we refer to as memory translations or mappings.
TLB in Detail
The TLB acts as a cache for the MMU that is used to reduce the time
taken to access physical memory. The TLB is a part of the MMU.
Depending on the make and model of a CPU, there’s more than one
TLB, or even multiple levels of TLB like with memory caches to avoid
TLB misses and ensuring as low as possible memory latency.
TLB hit
So, the offset is directly mapped to the physical memory layer, and the
virtual page number matches a tag already in the TLB. The MMU now
immediately knows what physical memory page to access without the
need to look into the global memory.
In the example provided in the above diagram, the virtual page number
is found in the TLB, and immediately translated to the physical page
number.
TLB miss
What happens when a virtual page number is not found in the TLB, also
referred to as a TLB miss? The TLB needs to consult the system’s
global memory to understand what physical page number is used.
Reaching out to physical memory means higher latency compared to a
TLB hit. If the TLB is full and a TLB miss occurs, the least recent TLB
entry is flushed, and the new entry is placed instead of it. In the
following example, the virtual page number is not found in the TLB,
and the TLB needs to look into memory to get the page number.
A TLB miss is not ideal, but the worst-case scenario is data that is not
residing in memory but on storage media (flash or disk). Where we are
talking nanoseconds to retrieve data in caches or global memory,
getting data from storage media will quickly run into milliseconds or
seconds depending on the media used.
1. The virtual address is dissected in the virtual page number and
the page offset.
2. The page offset is passed through as it is not translated.
3. The virtual page number is looked up in the TLB, looking for a tag
with a corresponding number. In this example, the TLB does not
yet have a valid entry.
4. TLB reaches out to memory to find page number 0 (because of
the tag, derived from the virtual page number). Page number 0 is
retrieved in memory but finds that the data does not resides in
memory, but on storage. A page fault is triggered, because we
cannot translate memory pages for data that is not in memory.
We need to wait for the data from storage.
The U Area
Even if every process has a u-area, the kernel accesses them through its u variable. It needs to
access only one u-area at a time, of the currently executing process. The kernel knows where the
page table entry of the u-area is located, therefore, when a process is scheduled, the physical
address of its u-area is loaded into kernel page tables.
Example:
The first two register triples point to text and data and the third triple refers to the u-area of
currently executing process (in this case, process D). When a context switch happens, the entry
in this fields changes and points to the u-area of the newly scheduled process. Entries 1 and 2 do
not change as all the process share the kernel text and data.
In Unix-like operating systems, the U area, also known as the "User Area" or "User Structure," is
a data structure maintained by the kernel for each running process. The U area contains various
fields and information related to the process, its execution context, and its interaction with the
operating system. While the exact structure of the U area can vary between different Unix
variants, here are some common fields found in the U area:
1. Process ID (PID):
The unique identifier assigned to the process by the kernel. It is used to
distinguish between different processes in the system.
2. Process State:
Indicates the current state of the process, such as "Running," "Sleeping,"
"Waiting," or "Zombie." This field helps the kernel manage process scheduling
and resource allocation.
3. User ID (UID) and Group ID (GID):
The user and group identifiers associated with the process. These fields determine
the permissions and access rights granted to the process when interacting with
system resources.
4. File Descriptor Table:
An array or table containing entries for open files, sockets, and other I/O
resources associated with the process. Each entry in the table contains metadata
about the opened resource, such as its file descriptor number, file mode, and file
position.
5. Signal Handling Information:
Information about the process's signal handlers, including registered signal
handlers, pending signals, and signal mask (which signals are currently blocked
from delivery).
6. Resource Usage Information:
Statistics and accounting information related to the process's resource usage, such
as CPU time, memory usage, number of system calls executed, and number of
page faults.
7. Process Priority and Scheduling Information:
Parameters and attributes used by the kernel for scheduling the process, such as
scheduling priority, scheduling policy (e.g., real-time or time-sharing), and
scheduling class.
8. Kernel Stack:
A region of memory allocated by the kernel to store the process's execution stack.
This stack is used for storing local variables, function call frames, and other
execution context during the execution of system calls and kernel routines.
9. Process Environment:
Environment variables and settings inherited by the process from its parent
process. These variables influence the behavior of the process and the execution
of user-level programs.
10. Process Credentials:
Information about the process's credentials, including its effective user ID, real
user ID, effective group ID, and supplementary group IDs. These credentials
determine the process's permissions and access rights when interacting with
system resources.
These are some of the common fields found in the U area of a Unix-like operating system. The U
area provides the kernel with essential information about each process, allowing it to manage
process execution, resource allocation, and system interactions effectively.
Discuss mapping between per process region table and page table.
In operating systems that use virtual memory, both per-process region tables and page
tables play crucial roles in managing memory. Let's discuss each of them separately and
then explore how they are mapped to each other.
Now, let's discuss the mapping between the per-process region table and the page
table:
The per-process region table describes the entire virtual address space of a
process and the attributes of each region within that address space.
The page table, on the other hand, provides the mapping between virtual pages
and physical frames.
The mapping between these two tables is established during the process of
address translation, which occurs when the CPU generates a virtual address and
needs to map it to a physical address.
When a process attempts to access memory, the MMU uses the virtual address to
look up the corresponding entry in the page table to find the physical frame
where the desired memory resides.
However, before the MMU can perform this translation, it needs to know which
page table to use for the current process. This information is typically obtained
from the per-process region table, which stores a reference to the page table
associated with the process.
In essence, the per-process region table points to the page table(s) that are
relevant for translating virtual addresses of that particular process.
Therefore, the per-process region table and the page table are linked indirectly,
with the former guiding the MMU to the appropriate page table for address
translation.
Overall, the per-process region table and the page table are essential components of
memory management in modern operating systems, and their coordination ensures
efficient and secure memory access for processes.
Overall, the clock interrupt handler is a fundamental component of the operating system's kernel,
responsible for managing timekeeping, process scheduling, and system responsiveness. Its
efficient operation is crucial for the overall performance and stability of the system.
Explain system calls for time What is the use of fork() system call? Explain the sequence of
operations kernel executes for fork.
In the Linux operating system, signals are a fundamental mechanism for inter-process
communication and for handling various asynchronous events. They allow the kernel, other
processes, or even the terminal to interrupt a running process and notify it of specific events or
conditions. Signals play a vital role in process management, error handling, and user interaction.
Here's an explanation of their uses and types in Linux:
1. Process Termination:
Signals can be used to request the termination of a process. For example, the
SIGTERM signal is often sent to gracefully terminate a process, allowing it to
perform cleanup operations before exiting.
Additionally, the SIGKILL signal can be used to forcefully terminate a process
without giving it a chance to clean up.
2. Handling Errors:
Signals are used to indicate errors or exceptional conditions that occur during
program execution. For instance, the SIGSEGV signal indicates a segmentation
fault, usually caused by invalid memory access.
Other signals like SIGFPE (floating-point exception) and SIGILL (illegal
instruction) signify arithmetic errors and invalid machine instructions,
respectively.
3. User Interaction:
Signals facilitate communication between the user and running processes. For
example, the SIGINT signal (typically generated by pressing Ctrl+C) is sent to
interrupt a process and request its termination.
Similarly, the SIGTSTP and SIGCONT signals are used to suspend and resume
processes, respectively, allowing users to pause and resume background
processes.
4. Asynchronous Events:
Signals are used to handle various asynchronous events, such as hardware
interrupts, timer expirations, or I/O events. For instance, the SIGCHLD signal is
sent to a parent process when a child process terminates, allowing the parent to
perform cleanup tasks.
Signals like SIGALRM (alarm) can be used to set timers and receive
notifications when the timer expires.
1. Standard Signals:
Linux defines a set of standard signals (denoted by numbers) with specific default
actions associated with them.
Examples include SIGTERM (terminate), SIGINT (interrupt), SIGSEGV
(segmentation fault), SIGILL (illegal instruction), and SIGALRM (alarm).
2. Real-time Signals:
Linux also supports real-time signals, which are used for high-priority
asynchronous event notification.
Real-time signals have numbers in the range of SIGRTMIN to SIGRTMAX,
allowing for greater flexibility and customization compared to standard signals.
3. User-defined Signals:
Applications can define and use their own signals for specific purposes.
User-defined signals typically use numbers in the range SIGUSR1 to SIGUSR2
and provide a means for applications to define custom signal handlers.
In summary, signals in Linux are a versatile mechanism for process management, error handling,
and user interaction. Understanding how signals work and how to handle them is essential for
developing robust and responsive software applications in Linux.
It seems there might be a typo in your question. Assuming you're referring to two different
processes: "systemd" and "init process," let me explain each one:
1. systemd: Systemd is a system and service manager for Unix-like operating systems. It is a
replacement for the traditional SysV init system and Upstart. Systemd is responsible for
managing the system's services, daemons, devices, and other aspects of the system's operation.
Key features of systemd include:
Systemd is widely adopted in modern Linux distributions, such as Fedora, Red Hat Enterprise
Linux, CentOS, Ubuntu, and Debian.
2. init process: The init process (short for initialization) is the first process started by the kernel
during the boot process of Unix-like operating systems. Its primary role is to initialize the system
and start other essential processes and services. The init process typically has process ID (PID) 1
and acts as the ancestor of all other processes in the system.
Traditionally, Unix systems used System V init (SysV init) as the init system. SysV init follows a
sequential boot process, where it executes a series of startup scripts (init scripts) located in
specific directories (e.g., /etc/init.d/) to bring the system to a functional state.
While SysV init served as the standard init system for many years, it has limitations, such as
slow boot times and lack of parallelism. As a result, newer init systems like systemd have been
developed to address these shortcomings.
In summary, systemd is a modern system and service manager widely used in Unix-like
operating systems, while the init process (usually managed by an init system like SysV init or
systemd) is responsible for initializing the system and starting essential processes during the boot
process.
There exist a strong a relationship between user level threads and kernel level threads.
Dependencies between ULT and KLT :
When subtasks are concurrently performed by sharing the code section, it may result in data
inconsistency. Ultimately, requires suitable synchronization techniques to maintain the control
flow to access the shared data (critical section).
In a multithreaded process, synchronization adopted using four different models :
1. Mutex Locks – This allows only one thread at a time to access the shared resource.
2. Read/Write Locks – This allows exclusive writes and concurrent read of a shared
resource.
3. Counting Semaphore – This count refers to the number of shared resource that can
be accessed simultaneously at a time. Once the count limit is reached, the remaining
threads are blocked.
4. Condition Variables – This blocks the thread until the condition satisfies(Busy
Waiting).
All these synchronization models are carried out within each process using thread
library. The memory space for the lock variables is allocated in the user address
space. Thus, requires no kernel intervention.
1. Scheduling :
The application developer during the thread creation sets the priority and scheduling policy of
each ULT thread using the thread library. On the execution of program, based on the defined
attributes the scheduling takes place by the thread library. In this case, the system scheduler
has no control over thread scheduling as the kernel is unaware of the ULT threads.
2. Context Switching :
Switching from one ULT thread to other ULT thread is faster within the same process, as each
thread has its own unique thread control block, registers, stack. Thus, registers are saved and
restored. Does not require any change of address space. Entire switching takes place within the
user address space under the control of thread library.
3. Asynchronous I/O :
After an I/O request ULT threads remains in blocked state, until it receives the
acknowledgment(ack) from the receiver. Although it follows asynchronous I/O, it creates a
synchronous environment to the application user. This is because the thread library itself
schedules an other ULT to execute until the blocked thread sends sigpoll as an ack to the
process thread library. Only then the thread library, reschedules the blocked thread.
For example, consider a program to copy the content(read) from one file and to paste(write) in
the other file. Additionally, a pop-up that displays the percentage of progress completion.
This process contains three subtasks each allocated to a ULT,
Thread A – Read the content from source file. Store in a global variable X within
the process address space.
Here, the application developer will schedule the multiple flow of control within a program
using the thread library.
Order of execution: Begins with Thread A, Then thread B and Then thread C.
Thread A and Thread B shares the global variable X. Only when after thread A writes on X,
thread B can read X. In that case, synchronization is to be adopted on the shared variable to
avoid thread B from reading old data.Context switching from thread A to Thread B and then
Thread C takes place within the process address space. Each thread saves and restores the
registers in its own thread control block (TCB). Thread C remains in blocked state, until thread
B starts its first write operation on the destination file. This is the reason behind, the graphical
indication of 100% pops-up a few seconds later although process completion.
Dependency between ULT and KLT :
The one and only major dependency between KLT and ULT arise when an ULT is in need of
the Kernel resources. Every ULT thread is associated to a virtual processor called Light-
weight process. This is created and bound to ULT by the thread library according to the
application need. Whenever a system call invoked, a kernel level thread is created and
scheduled to the LWPs by the system scheduler. These KLT are scheduled to access the kernel
resources by the system scheduler which is unaware of the ULT. Whereas the KLT is aware of
each ULT associated to it via LWPs.
1. The system scheduler may schedule a process with threads that are of less priority
or idle threads. This leads to starvation of high-prioritized thread, which in turn
reduces the efficiency of the system.
2. When a single thread gets blocked, the entire process gets blocked. Then the CPU
utilization even in a multicore system will become much less. Though there may
exist executable threads, kernel considers every process as a single threaded process
and allocates only one core at a time.
3. System scheduler may provide a single time slice irrespective of the number of
threads within a process. A single threaded process and a process with 1000 threads
provided with same time slice will make system more inefficient.
Here are some advantages and disadvantages of user-level threads and kernel-level
threads:
A Process Scheduler schedules different processes to be assigned to the CPU based on particular
scheduling algorithms. There are six popular process scheduling algorithms which we are going
to discuss in this chapter −
P0 0-0=0
P1 5-1=4
P2 8-2=6
P3 16 - 3 = 13
P0 0 5 0
P1 1 3 5
P2 2 8 14
P3 3 6 8
P0 0-0=0
P1 5-1=4
P2 14 - 2 = 12
P3 8-3=5
Given: Table of processes, and their Arrival time, Execution time, and priority. Here we are
considering 1 is the lowest priority.
P0 0 5 1 0
P1 1 3 2 11
P2 2 8 1 14
P3 3 6 3 5
P0 0-0=0
P1 11 - 1 = 10
P2 14 - 2 = 12
P3 5-3=2
Shortest remaining time (SRT) is the preemptive version of the SJN algorithm.
The processor is allocated to the job closest to completion but it can be preempted by a
newer ready job with shorter time to completion.
Impossible to implement in interactive systems where required CPU time is not known.
It is often used in batch environments where short jobs need to give preference.
P0 (0 - 0) + (12 - 3) = 9
P1 (3 - 1) = 2
P3 (9 - 3) + (17 - 12) = 11
Multiple-level queues are not an independent scheduling algorithm. They make use of other
existing algorithms to group and schedule jobs with common characteristics.
For example, CPU-bound jobs can be scheduled in one queue and all I/O-bound jobs in another
queue. The Process Scheduler then alternately selects jobs from each queue and assigns them to
the CPU based on the algorithm assigned to the queue.
Explain how kernel prevents a process from monopolizing the use of CPU in Unix System V
In Unix System V and similar operating systems, preventing a process from monopolizing the
CPU involves implementing various scheduling policies and mechanisms to ensure fair
allocation of CPU resources among competing processes. Here's how the kernel achieves this:
1. Time Sharing:
Unix System V implements time-sharing scheduling, where CPU time is divided
among active processes in small time slices or quantum.
Each process is allocated a time slice during which it can execute on the CPU.
Once the time slice expires, the kernel preemptively switches to another process,
ensuring fairness among competing processes.
2. Priority-based Scheduling:
System V Unix employs priority-based scheduling to assign priorities to processes
based on various factors such as CPU usage, I/O activity, and process
characteristics.
Higher-priority processes are given preferential treatment in CPU allocation, but
the system ensures that lower-priority processes are not starved of CPU time.
3. Round-robin Scheduling:
Unix System V may use round-robin scheduling as a component of its scheduling
policy. In round-robin scheduling, processes are scheduled in a circular queue,
and each process is allowed to execute for a fixed time slice before being
preempted.
This mechanism ensures that each process gets a fair share of CPU time and
prevents any single process from monopolizing the CPU indefinitely.
4. Quantum and Preemption:
The kernel in Unix System V sets a predefined time quantum for each process,
after which the process is preempted and placed back in the ready queue.
This ensures that no process can monopolize the CPU for an extended period.
Even if a process is CPU-bound, it will be periodically interrupted to allow other
processes to execute.
5. Dynamic Prioritization:
System V Unix may adjust the priority of processes dynamically based on their
behavior and resource usage. For example, processes that consume excessive
CPU time may have their priorities lowered to prevent them from monopolizing
the CPU.
6. Fairness and Scheduling Policies:
The kernel in Unix System V implements fairness in CPU allocation by adhering
to various scheduling policies and algorithms, such as the Completely Fair
Scheduler (CFS) in modern Linux kernels.
These policies ensure that all processes, regardless of their characteristics or
resource demands, are treated fairly in CPU allocation.
In summary, Unix System V prevents processes from monopolizing the CPU by implementing
time-sharing scheduling, priority-based scheduling, round-robin scheduling, preemption, and
dynamic prioritization. These mechanisms ensure fair allocation of CPU resources among
competing processes and maintain system responsiveness and stability.
1. Types of Profiling:
CPU Profiling: Measures the amount of CPU time consumed by different functions or
sections of code during program execution.
Memory Profiling: Analyzes the memory usage patterns of a program, including
memory allocation, deallocation, and memory leaks.
Function-Level Profiling: Tracks the frequency of function calls, execution time of each
function, and call stack information.
I/O Profiling: Examines the input/output operations performed by the program,
including disk reads/writes, network operations, and file accesses.
Concurrency Profiling: Investigates how a program utilizes multiple threads or
processes, identifying potential concurrency issues such as race conditions or deadlocks.
2. Profiling Techniques:
Profiling Libraries: Libraries such as gperftools (Google Performance Tools) and Intel
VTune provide APIs for instrumenting code and collecting runtime data.
Profiler Commands: Tools like gprof, perf, valgrind, and oprofile offer command-line
interfaces for CPU, memory, and I/O profiling.
Integrated Development Environments (IDEs): Many IDEs, including Visual Studio,
Eclipse, and JetBrains IntelliJ IDEA, offer built-in profiling tools for analyzing and
optimizing code.
Specialized Profiling Tools: Tools like strace, ltrace, and dtrace provide system-level
tracing capabilities for analyzing system calls, library calls, and kernel-level events.
Web Profilers: For web applications, tools like Chrome DevTools, Firefox Developer
Tools, and New Relic offer web-based profilers for analyzing client-side and server-side
performance.
4. Profiling Workflow:
Identify Performance Goals: Define specific performance metrics and goals for the
profiling process, such as reducing CPU usage, optimizing memory usage, or improving
response time.
Select Profiling Tools: Choose the appropriate profiling tools based on the type of
analysis needed (e.g., CPU profiling, memory profiling) and the programming language
and platform being used.
Instrument or Sample: Depending on the chosen profiling technique, instrument the
code with profiling hooks or run the program under a profiler to collect runtime data.
Analyze Results: Analyze the profiling data to identify performance bottlenecks,
resource usage patterns, and areas for optimization. This may involve examining call
graphs, memory allocation traces, or CPU utilization reports.
Optimize and Iterate: Based on the profiling results, make optimizations to the code,
data structures, algorithms, or system configuration to improve performance. Iterate the
profiling process as needed to validate optimizations and further improve performance.
5. Best Practices:
Therefore, the operating system‘s demand paging mechanism follows a few steps in its
operation.
Program Execution: Upon launching a program, the operating system allocates a
certain amount of memory to the program and establishes a process for it.
Creating page tables: To keep track of which program pages are currently in
memory and which are on disk, the operating system makes page tables for each
process.
Handling Page Fault: When a program tries to access a page that isn’t in memory
at the moment, a page fault happens. In order to determine whether the necessary
page is on disk, the operating system pauses the application and consults the page
tables.
Page Fetch: The operating system loads the necessary page into memory by
retrieving it from the disk if it is there.
The page’s new location in memory is then reflected in the page table.
Resuming the program: The operating system picks up where it left off when the
necessary pages are loaded into memory.
Page replacement: If there is not enough free memory to hold all the pages a
program needs, the operating system may need to replace one or more pages
currently in memory with pages currently in memory. on the disk. The page
replacement algorithm used by the operating system determines which pages are
selected for replacement.
Page cleanup: When a process terminates, the operating system frees the memory
allocated to the process and cleans up the corresponding entries in the page tables.
Advantages of Demand Paging
So in the Demand Paging technique, there are some benefits that provide efficiency of the
operating system.
Efficient use of physical memory: Query paging allows for more efficient use
because only the necessary pages are loaded into memory at any given time.
Support for larger programs: Programs can be larger than the physical memory
available on the system because only the necessary pages will be loaded into
memory.
Faster program start: Because only part of a program is initially loaded into
memory, programs can start faster than if the entire program were loaded at once.
Reduce memory usage: Query paging can help reduce the amount of memory a
program needs, which can improve system performance by reducing the amount of
disk I/O required.
Disadvantages of Demand Paging
Page Fault Overload: The process of swapping pages between memory and disk
can cause a performance overhead, especially if the program frequently accesses
pages that are not currently in memory.
Degraded performance: If a program frequently accesses pages that are not
currently in memory, the system spends a lot of time swapping out pages, which
degrades performance.
Fragmentation: Query paging can cause physical memory fragmentation,
degrading system performance over time.
Complexity: Implementing query paging in an operating system can be complex,
requiring complex algorithms and data structures to manage page tables and swap
space.
Overall, the page stealer process is a vital component of virtual memory management, ensuring
efficient utilization of physical memory resources while transparently managing the movement
of data between RAM and disk to accommodate the demands of running processes.
Explain page fault. Explain handling of validity page fault.
Page faults dominate more like an error. A page fault will happen if a program tries to access a
piece of memory that does not exist in physical memory (main memory). The fault specifies the
operating system to trace all data into virtual memory management and then relocate it from
secondary memory to its primary memory, such as a hard disk.
A page fault trap occurs if the requested page is not loaded into memory. The page fault
primarily causes an exception, which is used to notify the operating system to retrieve
the "pages" from virtual memory to continue operation. Once all of the data has been placed
into physical memory, the program resumes normal operation. The Page fault process occurs in
the background, and thus the user is unaware of it.
1. The computer's hardware track to the kernel and the program counter is often saved on
the stack. The CPU registers hold information about the current state of instruction.
2. An assembly program is started, which saves the general registers and other volatile data
to prevent the Operating system from destroying it.
A Page Fault happens when you access a page that has been marked as invalid. The paging
hardware would notice that the invalid bit is set while translating the address across the page
table, which will cause an operating system trap. The trap is caused primarily by the OS's failure
to load the needed page into memory.
Now, let's understand the procedure of page fault handling in the OS:
1. Firstly, an internal table for this process to assess whether the reference was valid or
invalid memory access.
2. If the reference becomes invalid, the system process would be terminated. Otherwise, the
page will be paged in.
3. After that, the free-frame list finds the free frame in the system.
4. Now, the disk operation would be scheduled to get the required page from the disk.
5. When the I/O operation is completed, the process's page table will be updated with a new
frame number, and the invalid bit will be changed. Now, it is a valid page reference.
6. If any page fault is found, restart these steps from starting.
There are various page fault terminologies in the operating system. Some terminologies of page
fault are as follows:
1. Page Hit
When the CPU attempts to obtain a needed page from main memory and the page exists in main
memory (RAM), it is referred to as a "PAGE HIT".
2. Page Miss
If the needed page has not existed in the main memory (RAM), it is known as "PAGE MISS".
The time it takes to get a page from secondary memory and recover it from the main memory
after loading the required page is known as "PAGE FAULT TIME".
The rate at which threads locate page faults in memory is referred to as the "PAGE FAULT
RATE". The page fault rate is measured per second.
If a required page exists in the hard disk's page file, it is referred to as a "HARD PAGE
FAULT".
If a required page is not located on the hard disk but is found somewhere else in memory, it is
referred to as a "SOFT PAGE FAULT".
If a process needs data and that data exists in memory but is being allotted to another process at
the same moment, it is referred to as a "MINOR PAGE FAULT".
Explain in detail allocation of space on swap device.
Swapping is a memory management technique used in multi-programming to increase the
number of processes sharing the CPU. It is a technique of removing a process from the main
memory and storing it into secondary memory, and then bringing it back into the main memory
for continued execution. This action of moving a process out from main memory to secondary
memory is called Swap Out and the action of moving a process out from secondary memory to
main memory is called Swap In.
Swap-space management is a technique used by operating systems to optimize memory usage
and improve system performance. Here are some advantages and disadvantages of swap-space
management:
Advantages:
Disadvantages:
Slower access times: Accessing data from disk is slower than accessing data from physical
memory, which can result in slower system performance if too much swapping is required.
Increased disk usage: Swap-space management requires disk space to be reserved for use as
virtual memory, which can reduce the amount of available space for other data storage
purposes.
Risk of data loss: In some cases, if there is a problem with the swap file, such as a disk error or
corruption, data may be lost or corrupted.
Overall, swap-space management is a useful technique for optimizing memory usage and
improving system performance. However, it is important to carefully manage swap space
allocation and monitor system performance to ensure that excessive swapping does not
negatively impact system performance.
Swap-Space :
The area on the disk where the swapped-out processes are stored is called swap space.
Swap-Space Management :
Swap-Space management is another low-level task of the operating system. Disk space is used
as an extension of main memory by the virtual memory. As we know the fact that disk access
is much slower than memory access, In the swap-space management we are using disk space,
so it will significantly decreases system performance. Basically, in all our systems we require
the best throughput, so the goal of this swap-space implementation is to provide the virtual
memory the best throughput. In these article, we are going to discuss how swap space is used,
where swap space is located on disk, and how swap space is managed.
Swap-Space Use :
Swap-space is used by the different operating-system in various ways. The systems which are
implementing swapping may use swap space to hold the entire process which may include
image, code and data segments. Paging systems may simply store pages that have been pushed
out of the main memory. The need of swap space on a system can vary from a megabytes to
gigabytes but it also depends on the amount of physical memory, the virtual memory it is
backing and the way in which it is using the virtual memory.
It is safer to overestimate than to underestimate the amount of swap space required, because if
a system runs out of swap space it may be forced to abort the processes or may crash entirely.
Overestimation wastes disk space that could otherwise be used for files, but it does not harm
other.
Following table shows different system using amount of swap space:
Explain the swapping of processes between swap space and main memory.
To increase CPU utilization in multiprogramming, a memory management scheme known as
swapping can be used. Swapping is the process of bringing a process into memory and then
temporarily copying it to the disc after it has run for a while. The purpose of swapping in an
operating system is to access data on a hard disc and move it to RAM so that application
programs can use it. It’s important to remember that swapping is only used when data isn’t
available in RAM. Although the swapping process degrades system performance, it allows
larger and multiple processes to run concurrently. Because of this, swapping is also known
as memory compaction. The CPU scheduler determines which processes are swapped in and
which are swapped out. Consider a multiprogramming environment that employs a priority-
based scheduling algorithm. When a high-priority process enters the input queue, a low-
priority process is swapped out so the high-priority process can be loaded and executed. When
this process terminates, the low priority process is swapped back into memory to continue its
execution. Below figure shows the swapping process in operating system:
Swapping has been subdivided into two concepts: swap-in and swap-out.
Swap-out is a technique for moving a process from RAM to the hard disc.
Swap-in is a method of transferring a program from a hard disc to main memory, or
RAM.
Advantages
If there is low main memory so some processes may has to wait for much long but
by using swapping process do not have to wait long for execution on CPU.
It utilize the main memory.
Using only single main memory, multiple process can be run by CPU using swap
partition.
The concept of virtual memory start from here and it utilize it in better way.
This concept can be useful in priority based scheduling to optimize the swapping
process.
Disadvantages
If there is low main memory resource and user is executing too many processes and
suddenly the power of system goes off there might be a scenario where data get
erase of the processes which are took parts in swapping.
Chances of number of page faults occur
Low processing performance
In computing, a stream refers to a sequence of data elements that are made available over time.
Streams are fundamental abstractions used for input and output operations in programming
languages and operating systems. They provide a flexible and efficient way to handle data
communication between processes, files, devices, and networks.
1. Reading: Reading from a stream involves retrieving data elements from the stream
sequentially. This can include reading bytes from a file, characters from a network
socket, or structured data from a data source.
2. Writing: Writing to a stream involves sending data elements to the stream sequentially.
This can include writing bytes to a file, sending messages over a network socket, or
storing structured data in a data sink.
3. Flushing: Flushing a stream involves forcing any buffered data to be written to the
underlying data destination immediately. This is typically done to ensure that data is not
held in memory for an extended period and is instead written out promptly.
4. Closing: Closing a stream involves releasing any associated system resources and
terminating communication with the underlying data source or destination. This is
important for resource management and ensuring that system resources are not leaked.
In summary, streams are versatile and powerful abstractions for handling input and output
operations in software development. They provide a unified interface for interacting with various
types of data sources and destinations, offering flexibility, efficiency, and ease of use in
managing data communication tasks.