Set1 Processes and Threads
Set1 Processes and Threads
2
Dr. Suleiman Abu Kharmeh
Processes
• Definition
• Creation
• Management
• PCB
• States
• Interaction with the OS
• Threads
• Process and thread scheduling
3
Dr. Suleiman Abu Kharmeh
Process Management
• This lecture begins a series of topics on processes, and
threads.
– this is perhaps the most important part of the class
• Today: processes and process management
– what are the OS units of execution?
– how are they represented inside the OS?
– how is the CPU scheduled across processes?
– what are the possible execution states of a process?
4
Dr. Suleiman Abu Kharmeh
Running Programs
• One basic function of an OS is to execute and
manage code (programs) dynamically, e.g.:
– A command issued at a command line terminal
– An icon double clicked from the desktop
– Jobs/tasks run as part of a batch system
(sequential execution of many commands)
5
Dr. Suleiman Abu Kharmeh
Programs and Processes
Process
The running
instantiation of a
program, stored in
RAM
Program One-to-many
An executable relationship
file in long-term between program
storage and processes
6
Dr. Suleiman Abu Kharmeh
What’s in a Process?
• A process consists of (at least):
– an address space
– the code for the running program
– the data for the running program
– an execution stack and stack pointer (SP)
• traces state of procedure calls made
– the program counter (PC), indicating the next instruction
– a set of general-purpose processor registers and their values
– a set of OS resources
• open files, network connections, sound channels, …
• The process is a container for all of this state
– a process is named by a process ID (PID)
• just an integer (actually, typically a short)
7
Dr. Suleiman Abu Kharmeh
A process’s address space
0xFFFFFFFF
stack
(dynamic allocated mem)
SP
8
Dr. Suleiman Abu Kharmeh
The Program Loader
• OS functionality that loads programs
into memory, creates processes
Memory
– Places segments into memory
ESP
– Loads necessary dynamic libraries Stack
– Performs relocation
– Allocates the initial stack frame
– Sets EIP to the program’s entry point
Heap
.data
EIP .text
9
Dr. Suleiman Abu Kharmeh
Single-Process Address Apace
• The stack is used for local variables and
Memory
function calls
– Grows downwards Stack
• Heap is allocated dynamically
(malloc/new)
– Grows upwards
• When the stack and heap meet, there is
no more memory left in the process
Heap
– Process will probably crash
.data
• Static data and global variables are fixed .text
at compile time 10
Dr. Suleiman Abu Kharmeh
Process states
• Each process has an execution state, which
indicates what it is currently doing
– ready: waiting to be assigned to CPU
• could run, but another process has the CPU
– running: executing on the CPU
• is the process that currently controls the CPU
• how many processes can be running simultaneously?
– Waiting/blocked: waiting for an event, e.g. I/O completion
• cannot make progress until event happens
• As a process executes, it moves from state to state
– UNIX: run ps, STAT column shows current state
• In which state a process spends most of the time?
• Other process states: New, Terminated. 11
Dr. Suleiman Abu Kharmeh
Process states and state transitions
exit
terminated
running
dispatch / interrupt
schedule (unschedule)
13
Dr. Suleiman Abu Kharmeh
PCB
• The PCB is a data structure with many, many fields:
– process ID (PID)
– execution state
– program counter, stack pointer, registers
– memory management info
– UNIX username of owner
– scheduling priority
– accounting info
– pointers into state queues
• In linux:
– defined in task_struct (include/linux/sched.h)
– over 95 fields and 350 lines of code.
14
Dr. Suleiman Abu Kharmeh
PCBs and Hardware State
• When a process is running, its hardware state is inside the CPU
– PC, SP, registers
– CPU contains current values
• When the OS stops running a process (puts it in the waiting/ready
state), it saves the registers’ values in the PCB
– when the OS puts the process in the running state, it loads the hardware
registers from the values in that process’ PCB
• The act of switching the CPU from one process to another is called a
context switch
– timesharing systems may do 1000s of switches per second
• On a 2.6GHz CPU, results in 2.6MHz per process, 2.6 Million
instruction per process assuming 1 instruction per cycle
– The context switch itself takes about 1 microseconds on today’s hardware
• On a 2.6GHz CPU, assuming 1 instruction per cycle, this
translates to 2600 instructions per 1 microsecond.
16
Dr. Suleiman Abu Kharmeh
Context Switching
• Context switching
– Saves state of a process before a switching to another process
– Restores original process state when switching back
17
Dr. Suleiman Abu Kharmeh
Threads
• So far, process has a single thread of execution
• Consider having multiple program counters per
process
– Multiple locations can execute at once
• Multiple threads of control -> threads
• Must then have storage for thread details,
multiple program counters in PCB
• Explore in detail later
19
Dr. Suleiman Abu Kharmeh
Process creation (fork system call)
• One process can create another process
– creator is called the parent
– created process is called the child
– UNIX: do ps, look for PPID field
• ps axo stat,euid,ruid,tty,tpgid,sess,pgrp,ppid,pid,pcpu,comm
20
Dr. Suleiman Abu Kharmeh
UNIX process creation
• UNIX process creation through fork()
system call
– creates and initializes a new PCB
– creates a new address space
– initializes new address space with a copy of the entire
contents of the address space of the parent
– initializes kernel resources of new process with resources
of parent (e.g. open files)
– places new PCB on the ready queue
• the fork() system call returns twice
– once into the parent, and once into the child
– In the parent, it returns the child’s PID
– In the child, it returns 0 21
Dr. Suleiman Abu Kharmeh
Parent similar, but different Child
PCB in key ways PCB
22
Dr. Suleiman Abu Kharmeh
testfork – use of fork( )
#include <sys/types.h>
#include <unistd.h>
#include <stdio.h>
23
Dr. Suleiman Abu Kharmeh
testfork output
CMD% gcc -o testfork testfork.c
CMD% ./testfork
My child is 486
Child of testfork is 0
CMD% ./testfork
Child of testfork is 0
My child is 571
24
Dr. Suleiman Abu Kharmeh
Fork and exec
• So how do we start a new program, instead of
just forking the old program?
– the exec() system call!
• exec(‘program’, …)
– stops the current process
– loads ‘program’ into the address space
– initializes hardware context, args for new program
– places PCB onto ready queue
– note: does not create a new process!
25
Dr. Suleiman Abu Kharmeh
Starting New Programs
• Exec System Call:
– Many versions of exec() are used from the program in C (Linux),
for example:
• int execv(const char *prog, char *const argv[]);
– Check privileges and file type of prog
– Loads program at path prog into address space
• Replacing previous contents!
• Execution starts at main()
– Initializes context – e.g. passes arguments
• *argv
– Place PCB on ready queue
– Preserves, pipes, open files, privileges, etc.
• Running a new program in Unix/Linux from C:
– fork() followed by exec()
– Creates a new process as clone of previous one
– First thing that clone does is to replace itself with new program
• What does it mean for exec to return?
Child Process
pid = fork();
if (pid == 0) main() {
exec(…); …
else }
wait(pid);
pid = 0
28
Dr. Suleiman Abu Kharmeh
Process Termination
• Typically, a process will call waitpid(pid, …) until
its child process(es) complete
– wait(NULL) waits for one of the children to finish.
– If you want to wait for all children to finish:
• while(wait(NULL) != -1);
• abort() can be used to immediately end the
current process (similar to exit system call)
– noreturn void abort(void);
• How can we immediately terminate a child
process?
– kill(pid, SIGKILL);
29
Dr. Suleiman Abu Kharmeh
Fork + Exec
#include <stdlib.h>
#include <sys/types.h>
#include <unistd.h>
#include <stdio.h>
#include <sys/wait.h>
31
Dr. Suleiman Abu Kharmeh
Cooperating Processes
• Concurrent Processes can be
– Independent processes
» cannot affect or be affected by the execution of another
process.
– Cooperating processes
» can affect or be affected by the execution of another process.
• Advantages of process cooperation:
» Information sharing
» Computation speedup
» Modularity
» Convenience(e.g. editing, printing, compiling)
• Concurrent execution requires
» process communication and process synchronization
int main()
{
const int SIZE = 4096; /* the size (in bytes) of shared memory object */
const char *name = "OS"; /* name of the shared memory object */
const char *message_0 = "Hello"; /* strings written to shared memory */
const char *message_1 = "World!";
return 0;
}
Dr. Suleiman Abu Kharmeh
#include <sys/mman.h>
IPC Consumer
#include <sys/stat.h>
#include <fcntl.h>
#include <stdio.h>
int main() {
const int SIZE = 4096; /* the size (in bytes) of shared memory object */
const char *name = "OS"; /* name of the shared memory object */
int fd; /* shared memory file descriptor */
char *ptr; /* pointer to shared memory object */
return 0;
}
44
Dr. Suleiman Abu Kharmeh
Processes and Parallel Programming
• A process includes many things:
– an address space (all the code and data pages)
• protection boundary
– OS resources (e.g., open files) and accounting info
– hardware execution state (PC, SP, regs)
• Creating a new process is costly, because of all the data
structures that must be allocated/initialized
creating address space and copying old process
45
Dr. Suleiman Abu Kharmeh
Processes and Parallel Programing (2)
• Imagine a web server, which forks off copies of itself to
handle multiple simultaneous tasks
– or, imagine we have any parallel program on a
multiprocessor
• To execute these, we need to:
– create several processes that execute in parallel
– cause each to map to the same address space to share data
• see the shm_open
• have the OS schedule them in parallel
• multiprogramming or true parallel processing on an SMP:
shared-memory symmetric multiprocessing
• This is inefficient
– space: PCB, page tables –discussed later-, etc.
– time: creating OS structures, fork and copy addr space, etc.
46
Dr. Suleiman Abu Kharmeh
Can we do better?
• What’s similar in these processes?
– they all share the same code and data (address space)
– they all share the same privileges
– they all share the same resources (files, sockets, etc.)
• What’s different?
– each has its own hardware execution state
• PC, registers, stack pointer, and stack
• Key idea:
– a process (address space, etc.)
– a “thread of control” is a minimal execution state (PC, SP, …etc)
• this execution state is usually called a thread, or sometimes, a
lightweight process
47
Dr. Suleiman Abu Kharmeh
Thread Design Space
older
MS/DOS UNIXes
thread
Java Mach, NT,
Chorus,
many threads/process many threads/process Linux, …
one process many processes
48
Dr. Suleiman Abu Kharmeh
(old) Process address space
0xFFFFFFFF stack
(dynamic allocated mem)
SP
49
Dr. Suleiman Abu Kharmeh
(new) Address space with threads
thread 1 stack
SP (T1)
0xFFFFFFFF
thread 2 stack
SP SP (T2)
thread 3 stack
SP (T3)
address space
heap
(dynamic allocated mem)
static data
(data segment)
0x00000000 PC PC (T2)
code
PC (T1)
(text segment) PC (T3)
50
Dr. Suleiman Abu Kharmeh
Threads and processes
• Most modern OS’s (Mach, Chorus, NT, modern Unix)
therefore support two entities:
– the process, which defines the address space and general
process attributes (such as open files, etc.)
– the thread, which defines a sequential execution stream within
a process
• A thread is bound to a single process
– processes, however, can have multiple threads executing within
them
– sharing data between threads is cheap: all see same address
space
• Threads become the unit of scheduling
– processes are just containers in which threads execute
51
Dr. Suleiman Abu Kharmeh
Process/Thread Separation
• Separating threads and processes makes it easier to
support multi-threaded applications
– creating concurrency does not require creating new
processes
• Reasons to create programs with multiple threads:
– improving program structure (the Java argument)
• and potentially the performance
– handling concurrent events (e.g., web servers)
– building parallel programs (e.g., raytracer)
• So, multithreading is useful even on a uniprocessor
– even though only one thread can run at a time
52
Dr. Suleiman Abu Kharmeh
Types of Threads
• User-level threads
• Kernel-supported threads
• Hybrid approach implements both user-level
and kernel-supported threads.
• Examples
– Linux and Windows operating systems
56
Dr. Suleiman Abu Kharmeh
Kernel thread vs user-level threads
• Who is responsible for creating/managing threads?
– Two answers, in general:
• the OS (kernel threads)
– thread creation and management requires system calls
• the user-level process (user-level threads)
– a library linked into the program manages the threads
– thread creation and management requires procedure calls
• Why is user-level thread management possible?
– threads share the same address space
• therefore the thread manager doesn’t need to manipulate
address spaces
– threads only differ in hardware contexts (roughly)
• PC, SP, registers
• these can be manipulated by the user-level process itself!
57
Dr. Suleiman Abu Kharmeh
Processes vs. Threads
• Threads are better if:
– You need to create new ones quickly, on-the-fly
– You need to share lots of state/data
59
Dr. Suleiman Abu Kharmeh
Pthread
• POSIX.1 specifies a set of interfaces (functions, header
files) for threaded programming commonly known as
Pthreads. A single process can contain multiple threads,
all of which are executing the same program.
• It is a widely used Linux library.
• You must compile with the –pthread option to GCC.
• For more information check the man page:
– https://man7.org/linux/man-pages/man7/pthreads.7.html
60
Dr. Suleiman Abu Kharmeh
Pthread API
• pthread_create() – create a new thread (similar to fork for processes)
– Create independent threads each of which will execute a function
• pthread_exit() – exit the current thread (similar to exit or abort for
processes)
• pthread_join() – wait for another thread to exit (similar to wait/waitpid for
processes)
– Wait till threads are complete before main continues.
– If we do not “join”, then the main process might execute an “exit” which
will terminate the process and all threads before the threads have
completed.
61
Dr. Suleiman Abu Kharmeh
#include <stdio.h>
Pthread Basic Example:
#include <stdlib.h>
#include <pthread.h>
// check that iret1 == 0 and iret2 == 0, i.e. pthread_create function finished creating the threads successfully.
printf("Thread 1 returns: %d\n", iret1);
printf("Thread 2 returns: %d\n", iret2);
exit(0);
}
62
Dr. Suleiman Abu Kharmeh
Thread Oddities
• What happens if you fork() a process that has
multiple threads?
– You get a child process with exactly one thread
– Whichever thread called fork() survives
• What happens if you run exec() in a multi-
threaded process?
– All but one threads are killed
– exec() gets run normally (*unless exec fails)
63
Dr. Suleiman Abu Kharmeh
Summary
• You really want multiple threads per address space
• Kernel threads are much more efficient than processes,
but they’re still not cheap
– all operations require a kernel call and parameter validation
• User-level threads are:
– really fast/cheap
– great for common-case operations
• creation, synchronization, destruction
– can suffer in uncommon cases due to kernel obliviousness
• I/O
• preemption of a lock-holder
• Scheduler activations are an answer
– pretty subtle though (can map user threads over kernel
threads)
64
Dr. Suleiman Abu Kharmeh