[go: up one dir, main page]

0% found this document useful (0 votes)
46 views16 pages

Golang Scheduler Deep Dive

Uploaded by

Sarath Babu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
46 views16 pages

Golang Scheduler Deep Dive

Uploaded by

Sarath Babu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

Golang Scheduler Deep Dive

Introduction
Golang introduces an innovative approach to concurrency: goroutines. A goroutine is a
lightweight thread of execution, managed not by the operating system but by the Go
runtime itself. This management style o8ers a significant departure from traditional
thread handling, providing both e8iciency and reduced overhead.

Goroutines operate in the user space, o8ering a more e8icient and manageable
approach to concurrency. Goroutines are much more lightweight compared to traditional
threads. They have a small initial stack (typically around 2 KB) and grow dynamically as
needed. This makes it easy to spawn thousands or even millions of goroutines
concurrently, which would be impractical with system threads.

OS-level threads reside in the kernel space, governed by the operating system's
scheduling and management mechanisms. Multiple OS threads can be executed
concurrently, even on a single CPU (core), by quickly switching between them (time-
sharing).

CPU cores, the underlying hardware layer, execute the threads or goroutines scheduled
by the respective managing entities.
go scheduler (N:M scheduling)

The Go scheduler is an integral part of go runtime. It's responsible for orchestrating the
execution of goroutines and is where the magic of concurrency in Go really happens.
Here's a closer look at what the scheduler does:

• Managing Goroutines: The scheduler decides which goroutines should run and
when.
• Multiplexing on OS Threads: Go uses a model often referred to as M:N
multiplexing. It means the scheduler takes multiple goroutines and e8iciently
runs them on a smaller number of OS threads. This is a key aspect of why
goroutines are more lightweight compared to traditional threads.
• Pre-emptive Scheduling: Earlier versions of Go's scheduler uses a cooperative
model latest scheduler includes a form of pre-emptive scheduling to handle
long-running goroutines, ensuring no single goroutine can monopolize the CPU
for extended periods

Summary:
- Increased flexibility
- Number of G’s typically much greater than M’s
- User space scheduler multiplex G’s over available M’s
Early Go Scheduler Model

Global run queue

Scheduler maintained a single global run queue for all goroutines that were ready to
execute. When a goroutine was created or scheduled to run, it would be placed in this
global queue. When the scheduler needed to pick a goroutine to run, it would dequeue
one from this global runqueue.

• Issue: The global queue had to be accessed by all OS threads, which meant that
locking was necessary to ensure that only one thread could access the queue at
a time. This led to a significant amount of contention when many OS threads
were trying to access the global queue concurrently.

Distributed (Local) Run Queues

Improvements in the Go runtime scheduler solved many of these issues by introducing


multiple local run queues, removing the reliance on a single global runqueue, and
reducing the need for locks.
- Distributed runqueue does not hold the lock to take next goroutine to run
- Each dedicated runqueue have 256 size
Later Go Scheduler Model
The Processor
Go runtime introduced the concept of P (processor). Represents an abstract logical
CPU, which is used to execute goroutines. Each P corresponds to a logical processor
within the Go runtime, and each P has its own local run queue.

GOMAXPROCS variable limits the numbers of OS thread running user level code
simultaneously. Numbers of processor are equal to GOMAXPROCS.
DiDerent scenarios how processor get goroutine to execute

1. Check for local runqueue:

If goroutine completed and exited, processor will look into local runqueue, if goroutines
are available in queue will assign one goroutine to run on M.
2. What happened if local runqueue is empty ?

P0 will check for global runqueue and steal fair share of goroutines it supposed to get
and that is gqlen/GOMAXPROCS +1 and push in local runqueue. Now P get goroutines
in local runqueue and start execution.
3. What happens when local & global both runqueues are empty ?

P will check for netpoller: The netpoller allows Go to e8iciently multiplex I/O operations
across multiple sockets or file descriptors, avoiding the need to block on each
individual I/O operation.

If any goroutine waiting on netpoller are in ready to run, P will pick and start executing.
4. What happens when netpoller too is empty ? (work stealing)

The P0 will select random P and steal half of the its G’s

Non-cooperative Pre-emption
Goroutines are Preempted by the Scheduler: Go uses a non-cooperative preemption
mechanism for goroutines. The runtime schedules goroutines to run on a single or
multiple threads, and it can interrupt a running goroutine at various points (e.g., during
function calls or system calls) to switch to another goroutine.

Goroutines Don't Explicitly Yield: Unlike cooperative systems, Go’s goroutines do not
need to explicitly yield control. The Go runtime does this automatically when needed.
This means that a long-running goroutine, such as one performing a computation
without interacting with I/O, can still be preempted by the scheduler to allow other
goroutines to execute.

Preemption Points: Go’s runtime chooses appropriate preemption points, such as:

• After every function call.


• When a goroutine makes a blocking system call (e.g., reading from a network
socket).
• During goroutine synchronization events, like acquiring and releasing mutexes.

Time-Slice Preemption: Go has a form of preemption that is somewhat time-slice-


based. For example, the runtime will preempt goroutines when they exceed certain
thresholds (e.g., running for too long without performing I/O or yielding), and switch to
another goroutine.
Benefits of Non-Cooperative Preemption

1. Fairness: Ensures that no single task can monopolize the CPU, promoting
fairness in a multi-tasking system.
2. Responsiveness: Helps improve system responsiveness, particularly in
environments with high concurrency or real-time systems.
3. Simplicity for Developers: Developers don't have to worry about manually
yielding control to the scheduler or managing concurrency explicitly. Go’s
runtime handles this for them.
4. Improved EYiciency: The OS or runtime can better manage resource allocation,
preempting tasks that are blocking or taking too long, allowing others to run.
5. Avoid Starvation: Since processes or threads can be preempted without
needing to voluntarily yield, it prevents starvation (where a process never gets
the CPU time it needs) in systems with many competing tasks.

- Each goroutine has given 10ms of time slice after which pre-emption is
attempted
- Pre-emption occurs by sending a user space signal to thread running the
goroutine that needs to be pre-empted.
- SIGURG signal is send to the thread whose goroutine needs to be pre-empted
- SYSMON is demon without P, which handles many other runtime things
- SYSMON issue pre-emption request for long running goroutines

Where does the pre-empted goroutine go?

The pre-empted goroutine gets put into global runqueue.

When goroutine spans a new goroutine where this new goroutine will go?

- The Go scheduler decides where to place the new goroutine. It will attempt to
place it in the local run queue of an available processor (P).
- If there are no free processors (all P's are busy), the runtime will either:
o Perform work stealing: If there are idle processors (P's) with no
goroutines, they may steal goroutines from other busy processors' local
run queues.
o Schedule on an existing M: If there's an available OS-level thread (M), it
may schedule the goroutine immediately, depending on how the Go
runtime's scheduling policies are set (e.g., the GOMAXPROCS setting).
Fairness & Locality
- FIFO is good for fairness and bad for locality
- LIFO is good for locality and bad for fairness

Let's see specific practice and can we optimize commonly used patterns?

Channels are commonly used for synchronization or communication between


goroutines.

- package main
-
- import "fmt"
-
- func sender(ch chan int) {
- for i := 0; i < 10; i++ {
- ch <- i
- }
- close(ch)
- }
- func receiver(ch chan int) {
- for val := range ch {
- process(val)
- }
- }
-
- func process(val int) {
- fmt.Println("Received: ", val)
- }
-
- func main() {
- ch := make(chan int)
- go sender(ch)
- receiver(ch)
- }

From above example as shown in figure


- Suppose sender is waiting in queue and receiver is running, it will block on channel.

- Sender is scheduled and unblocks the receiver.

Let's analyze the performance impact:

- This sending and receiving is prolonged process – if this happens every time then
each of then must wait for other goroutine to complete or pre-empted.

- If they are long running, pre-empted every ~10ms

- The local runqueue can have fixed length 256

- If all are running- worst case each of them must wait ~255*10ms

- This is an issue of poor locality

- Can we combine LIFO and FIFO try to achieve better locality?


Improving Locality
Whenever a goroutine is spawned, it is put at head of the local runqueue rather than tail.

The issue now 2 each other re-spawn each other and starve remaining goroutines in the
queue.

The way Go schedule solve this problem doing something known as time slice
inheritance.

Time Slice Inheritance


- The spawned goroutine that is put at the head of the queue inherited the
remaining time slice of the goroutine that spawned it.
- This e8iciently gives a time slice of 10ms to the spawner – spawnee pair post
which one of them will be preemted and put into the global runqueue.
This improved BenchmarkPingPongHog ~ 99.88%

Things Look good!!!


But could lead to starvation of goroutines in global runqueue
- Right now, our only chance of polling global runqueue is when we try to look for
goroutine to run, after verifying that local runqueue is empty.
- If our local runqueue are always a source of work, we would never poll the global
runqueue.
- To try and address this corner case – the go scheduler poll the global queue
occasionally
-
- if shedTick % 61 == 0 {
- getFromGlobal()
- }else{
- doThingsAsBefore()
- }
What happens if thread blocks in system call?
We have seen how goroutine e8ectively end up running on thread, but what happens if
the thread itself blocks in something like a syscall ?

- When goroutine blocks on syscall, P release the thread


- P is not attached to any thread will create new OS thread and starting service the
goroutine.

HandoD can be expensive


- Especially when you have to create new thread
- And some syscall are not blocking for a prolonged period of time and doing a
hando8 for every syscall might be significantly expensive
- To optimize this, the scheduler does hando8 slightly more intelligent manner.
- Do hando8 immediately only for syscalls and not all
- In other cases, let P block as well

If sysmon sees that a P has been in the executing syscall state for too long, it initiates a
hando8.
What Happens when the syscall returns?
- Scheduler tries to schedule this goroutine on it’s old P, the one it was on before
going into a syscall
- If that is not possible, it tries to get available idle P and schedule it on that.
- If no idle is available, the scheduler puts this goroutine on global queue
- It also parked the thread that was in the syscall

Conclusion
The Go Scheduler provides a highly e8icient and abstracted way of managing
concurrency within Go programs. It enables lightweight goroutines to be scheduled and
executed across available OS threads, making Go programs scalable and performant
even with high levels of concurrency. Key features include:

- M:N scheduling model, where M is the number of OS threads, and N is the


number of goroutines.
- P-local queues for e8icient load balancing and work-stealing.
- Preemptive scheduling to ensure fair CPU time allocation among goroutines.
- EYicient handling of blocked goroutines, allowing other goroutines to continue
execution.

By abstracting away the underlying thread management, Go’s scheduler makes


concurrent programming simpler and more e8icient, especially for programs with many
short-lived tasks or a large number of concurrent operations.

You might also like