0% found this document useful (0 votes)

46 views16 pages

Golang Scheduler Deep Dive

Uploaded by

Sarath Babu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

46 views16 pages

Golang Scheduler Deep Dive

Uploaded by

Sarath Babu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 16

Golang Scheduler Deep Dive

Introduction
Golang introduces an innovative approach to concurrency: goroutines. A goroutine is a
lightweight thread of execution, managed not by the operating system but by the Go
runtime itself. This management style o8ers a signiﬁcant departure from traditional
thread handling, providing both e8iciency and reduced overhead.

Goroutines operate in the user space, o8ering a more e8icient and manageable
approach to concurrency. Goroutines are much more lightweight compared to traditional
threads. They have a small initial stack (typically around 2 KB) and grow dynamically as
needed. This makes it easy to spawn thousands or even millions of goroutines
concurrently, which would be impractical with system threads.

OS-level threads reside in the kernel space, governed by the operating system's
scheduling and management mechanisms. Multiple OS threads can be executed
concurrently, even on a single CPU (core), by quickly switching between them (time-
sharing).

CPU cores, the underlying hardware layer, execute the threads or goroutines scheduled
by the respective managing entities.
go scheduler (N:M scheduling)

The Go scheduler is an integral part of go runtime. It's responsible for orchestrating the
execution of goroutines and is where the magic of concurrency in Go really happens.
Here's a closer look at what the scheduler does:

• Managing Goroutines: The scheduler decides which goroutines should run and
when.
• Multiplexing on OS Threads: Go uses a model often referred to as M:N
multiplexing. It means the scheduler takes multiple goroutines and e8iciently
runs them on a smaller number of OS threads. This is a key aspect of why
goroutines are more lightweight compared to traditional threads.
• Pre-emptive Scheduling: Earlier versions of Go's scheduler uses a cooperative
model latest scheduler includes a form of pre-emptive scheduling to handle
long-running goroutines, ensuring no single goroutine can monopolize the CPU
for extended periods

Summary:
- Increased ﬂexibility
- Number of G’s typically much greater than M’s
- User space scheduler multiplex G’s over available M’s
Early Go Scheduler Model

Global run queue

Scheduler maintained a single global run queue for all goroutines that were ready to
execute. When a goroutine was created or scheduled to run, it would be placed in this
global queue. When the scheduler needed to pick a goroutine to run, it would dequeue
one from this global runqueue.

• Issue: The global queue had to be accessed by all OS threads, which meant that
locking was necessary to ensure that only one thread could access the queue at
a time. This led to a signiﬁcant amount of contention when many OS threads
were trying to access the global queue concurrently.

Distributed (Local) Run Queues

Improvements in the Go runtime scheduler solved many of these issues by introducing

multiple local run queues, removing the reliance on a single global runqueue, and
reducing the need for locks.
- Distributed runqueue does not hold the lock to take next goroutine to run
- Each dedicated runqueue have 256 size
Later Go Scheduler Model
The Processor
Go runtime introduced the concept of P (processor). Represents an abstract logical
CPU, which is used to execute goroutines. Each P corresponds to a logical processor
within the Go runtime, and each P has its own local run queue.

GOMAXPROCS variable limits the numbers of OS thread running user level code
simultaneously. Numbers of processor are equal to GOMAXPROCS.
DiDerent scenarios how processor get goroutine to execute

1. Check for local runqueue:

If goroutine completed and exited, processor will look into local runqueue, if goroutines
are available in queue will assign one goroutine to run on M.
2. What happened if local runqueue is empty ?

P0 will check for global runqueue and steal fair share of goroutines it supposed to get
and that is gqlen/GOMAXPROCS +1 and push in local runqueue. Now P get goroutines
in local runqueue and start execution.
3. What happens when local & global both runqueues are empty ?

P will check for netpoller: The netpoller allows Go to e8iciently multiplex I/O operations
across multiple sockets or ﬁle descriptors, avoiding the need to block on each
individual I/O operation.

If any goroutine waiting on netpoller are in ready to run, P will pick and start executing.
4. What happens when netpoller too is empty ? (work stealing)

The P0 will select random P and steal half of the its G’s

Non-cooperative Pre-emption
Goroutines are Preempted by the Scheduler: Go uses a non-cooperative preemption
mechanism for goroutines. The runtime schedules goroutines to run on a single or
multiple threads, and it can interrupt a running goroutine at various points (e.g., during
function calls or system calls) to switch to another goroutine.

Goroutines Don't Explicitly Yield: Unlike cooperative systems, Go’s goroutines do not
need to explicitly yield control. The Go runtime does this automatically when needed.
This means that a long-running goroutine, such as one performing a computation
without interacting with I/O, can still be preempted by the scheduler to allow other
goroutines to execute.

Preemption Points: Go’s runtime chooses appropriate preemption points, such as:

• After every function call.

• When a goroutine makes a blocking system call (e.g., reading from a network
socket).
• During goroutine synchronization events, like acquiring and releasing mutexes.

Time-Slice Preemption: Go has a form of preemption that is somewhat time-slice-

based. For example, the runtime will preempt goroutines when they exceed certain
thresholds (e.g., running for too long without performing I/O or yielding), and switch to
another goroutine.
Beneﬁts of Non-Cooperative Preemption

1. Fairness: Ensures that no single task can monopolize the CPU, promoting
fairness in a multi-tasking system.
2. Responsiveness: Helps improve system responsiveness, particularly in
environments with high concurrency or real-time systems.
3. Simplicity for Developers: Developers don't have to worry about manually
yielding control to the scheduler or managing concurrency explicitly. Go’s
runtime handles this for them.
4. Improved EYiciency: The OS or runtime can better manage resource allocation,
preempting tasks that are blocking or taking too long, allowing others to run.
5. Avoid Starvation: Since processes or threads can be preempted without
needing to voluntarily yield, it prevents starvation (where a process never gets
the CPU time it needs) in systems with many competing tasks.

- Each goroutine has given 10ms of time slice after which pre-emption is
attempted
- Pre-emption occurs by sending a user space signal to thread running the
goroutine that needs to be pre-empted.
- SIGURG signal is send to the thread whose goroutine needs to be pre-empted
- SYSMON is demon without P, which handles many other runtime things
- SYSMON issue pre-emption request for long running goroutines

Where does the pre-empted goroutine go?

The pre-empted goroutine gets put into global runqueue.

When goroutine spans a new goroutine where this new goroutine will go?

- The Go scheduler decides where to place the new goroutine. It will attempt to
place it in the local run queue of an available processor (P).
- If there are no free processors (all P's are busy), the runtime will either:
o Perform work stealing: If there are idle processors (P's) with no
goroutines, they may steal goroutines from other busy processors' local
run queues.
o Schedule on an existing M: If there's an available OS-level thread (M), it
may schedule the goroutine immediately, depending on how the Go
runtime's scheduling policies are set (e.g., the GOMAXPROCS setting).
Fairness & Locality
- FIFO is good for fairness and bad for locality
- LIFO is good for locality and bad for fairness

Let's see speciﬁc practice and can we optimize commonly used patterns?

Channels are commonly used for synchronization or communication between

goroutines.

- package main
-
- import "fmt"
-
- func sender(ch chan int) {
- for i := 0; i < 10; i++ {
- ch <- i
- }
- close(ch)
- }
- func receiver(ch chan int) {
- for val := range ch {
- process(val)
- }
- }
-
- func process(val int) {
- fmt.Println("Received: ", val)
- }
-
- func main() {
- ch := make(chan int)
- go sender(ch)
- receiver(ch)
- }

From above example as shown in ﬁgure

- Suppose sender is waiting in queue and receiver is running, it will block on channel.

- Sender is scheduled and unblocks the receiver.

Let's analyze the performance impact:

- This sending and receiving is prolonged process – if this happens every time then
each of then must wait for other goroutine to complete or pre-empted.

- If they are long running, pre-empted every ~10ms

- The local runqueue can have ﬁxed length 256

- If all are running- worst case each of them must wait ~255*10ms

- This is an issue of poor locality

- Can we combine LIFO and FIFO try to achieve better locality?

Improving Locality
Whenever a goroutine is spawned, it is put at head of the local runqueue rather than tail.

The issue now 2 each other re-spawn each other and starve remaining goroutines in the
queue.

The way Go schedule solve this problem doing something known as time slice
inheritance.

Time Slice Inheritance

- The spawned goroutine that is put at the head of the queue inherited the
remaining time slice of the goroutine that spawned it.
- This e8iciently gives a time slice of 10ms to the spawner – spawnee pair post
which one of them will be preemted and put into the global runqueue.
This improved BenchmarkPingPongHog ~ 99.88%

Things Look good!!!

But could lead to starvation of goroutines in global runqueue
- Right now, our only chance of polling global runqueue is when we try to look for
goroutine to run, after verifying that local runqueue is empty.
- If our local runqueue are always a source of work, we would never poll the global
runqueue.
- To try and address this corner case – the go scheduler poll the global queue
occasionally
-
- if shedTick % 61 == 0 {
- getFromGlobal()
- }else{
- doThingsAsBefore()
- }
What happens if thread blocks in system call?
We have seen how goroutine e8ectively end up running on thread, but what happens if
the thread itself blocks in something like a syscall ?

- When goroutine blocks on syscall, P release the thread

- P is not attached to any thread will create new OS thread and starting service the
goroutine.

HandoD can be expensive

- Especially when you have to create new thread
- And some syscall are not blocking for a prolonged period of time and doing a
hando8 for every syscall might be signiﬁcantly expensive
- To optimize this, the scheduler does hando8 slightly more intelligent manner.
- Do hando8 immediately only for syscalls and not all
- In other cases, let P block as well

If sysmon sees that a P has been in the executing syscall state for too long, it initiates a
hando8.
What Happens when the syscall returns?
- Scheduler tries to schedule this goroutine on it’s old P, the one it was on before
going into a syscall
- If that is not possible, it tries to get available idle P and schedule it on that.
- If no idle is available, the scheduler puts this goroutine on global queue
- It also parked the thread that was in the syscall

Conclusion
The Go Scheduler provides a highly e8icient and abstracted way of managing
concurrency within Go programs. It enables lightweight goroutines to be scheduled and
executed across available OS threads, making Go programs scalable and performant
even with high levels of concurrency. Key features include:

- M:N scheduling model, where M is the number of OS threads, and N is the

number of goroutines.
- P-local queues for e8icient load balancing and work-stealing.
- Preemptive scheduling to ensure fair CPU time allocation among goroutines.
- EYicient handling of blocked goroutines, allowing other goroutines to continue
execution.

By abstracting away the underlying thread management, Go’s scheduler makes

concurrent programming simpler and more e8icient, especially for programs with many
short-lived tasks or a large number of concurrent operations.

Queues Fairness and The Go Scheduler V3
No ratings yet
Queues Fairness and The Go Scheduler V3
145 pages
Go Shceduler
No ratings yet
Go Shceduler
134 pages
Rethinking Classical Concurrency Patterns
No ratings yet
Rethinking Classical Concurrency Patterns
121 pages
Go Language: Concurrency and Distributed Processing
No ratings yet
Go Language: Concurrency and Distributed Processing
25 pages
Go Conccurency Pattern Insightful Ideas
No ratings yet
Go Conccurency Pattern Insightful Ideas
5 pages
Go Low Level Programming Guide
No ratings yet
Go Low Level Programming Guide
2 pages
Go Concurrency: Goroutines & Channels
No ratings yet
Go Concurrency: Goroutines & Channels
27 pages
Job Queue in Golang
No ratings yet
Job Queue in Golang
82 pages
Concurrency in Go
No ratings yet
Concurrency in Go
27 pages
Concurrecy in Golang
No ratings yet
Concurrecy in Golang
14 pages
L7 Concurrency in Go
No ratings yet
L7 Concurrency in Go
54 pages
Go Concurrency Basics
No ratings yet
Go Concurrency Basics
6 pages
Go Programming Language Tutorial (Part 7)
No ratings yet
Go Programming Language Tutorial (Part 7)
8 pages
Go Course Day 3
No ratings yet
Go Course Day 3
47 pages
RPC in Go Algorithms
No ratings yet
RPC in Go Algorithms
13 pages
Intro To Golang
No ratings yet
Intro To Golang
3 pages
Go Routines
No ratings yet
Go Routines
10 pages
Static Deadlock Detection For Concurrent Go
No ratings yet
Static Deadlock Detection For Concurrent Go
12 pages
Lecture 09-Network Programming Using Golang
No ratings yet
Lecture 09-Network Programming Using Golang
19 pages
Go Programming for Developers
No ratings yet
Go Programming for Developers
8 pages
Concurrency in The Go Programming Language
No ratings yet
Concurrency in The Go Programming Language
18 pages
Golang Tutorials For in Depth
No ratings yet
Golang Tutorials For in Depth
32 pages
Details Design Pattern
No ratings yet
Details Design Pattern
2 pages
Golang 140118232950
No ratings yet
Golang 140118232950
21 pages
Go Goroutines for Developers
No ratings yet
Go Goroutines for Developers
1 page
? 10-Week Golang Learning Plan For Distributed Systems
No ratings yet
? 10-Week Golang Learning Plan For Distributed Systems
4 pages
Phase 5: Golang Advanced Applications
No ratings yet
Phase 5: Golang Advanced Applications
5 pages
Select & For Range Channel I Bet You Didn't Know These Facts by
No ratings yet
Select & For Range Channel I Bet You Didn't Know These Facts by
18 pages
Go Network Programming Guide
No ratings yet
Go Network Programming Guide
15 pages
Concurrency Mastery
No ratings yet
Concurrency Mastery
6 pages
GoLang Basics: Beginner's Guide
No ratings yet
GoLang Basics: Beginner's Guide
9 pages
43 Channels
No ratings yet
43 Channels
5 pages
Go Programming Language Tutorial (Part 9)
No ratings yet
Go Programming Language Tutorial (Part 9)
7 pages
Go Concurrency for Developers
No ratings yet
Go Concurrency for Developers
9 pages
All Document Reader 1744355615150
No ratings yet
All Document Reader 1744355615150
46 pages
Automatically Detecting and Fixing Concurrency Bugs in Go
No ratings yet
Automatically Detecting and Fixing Concurrency Bugs in Go
14 pages
Understanding Real-World Concurrency Bugs in Go: Tengfei Tu Xiaoyu Liu
No ratings yet
Understanding Real-World Concurrency Bugs in Go: Tengfei Tu Xiaoyu Liu
14 pages
Go Programming Basics Guide
No ratings yet
Go Programming Basics Guide
7 pages
Concurrency - Part 3: Pitfalls and Summary - Medium
No ratings yet
Concurrency - Part 3: Pitfalls and Summary - Medium
2 pages
Go Programming Language Tutorial (Part 6)
No ratings yet
Go Programming Language Tutorial (Part 6)
7 pages
Go Intermediate & Advanced Guide
No ratings yet
Go Intermediate & Advanced Guide
8 pages
Full Golang Interview QA 3plus
No ratings yet
Full Golang Interview QA 3plus
5 pages
Test Doubles in Drivinglicence PKG: Dummies Stubs (Fake Randomnumbergenerator) Stub
No ratings yet
Test Doubles in Drivinglicence PKG: Dummies Stubs (Fake Randomnumbergenerator) Stub
42 pages
L11-Asynchronous Programming in Rust
No ratings yet
L11-Asynchronous Programming in Rust
63 pages
Srecon16 Slides Hamilton
No ratings yet
Srecon16 Slides Hamilton
26 pages
? 10-Week Golang Learning Plan For Distributed Systems
No ratings yet
? 10-Week Golang Learning Plan For Distributed Systems
6 pages
Download
No ratings yet
Download
23 pages
Going Go Programming
No ratings yet
Going Go Programming
324 pages
Cloud Native Development in Go
No ratings yet
Cloud Native Development in Go
16 pages
Understanding Real-World Concurrency Bugs in Go: Tengfei Tu Xiaoyu Liu
No ratings yet
Understanding Real-World Concurrency Bugs in Go: Tengfei Tu Xiaoyu Liu
14 pages
Concurrency - Part 1: Sync Vs Async - Medium
No ratings yet
Concurrency - Part 1: Sync Vs Async - Medium
2 pages
Assignment: Understanding Channels in Golang
No ratings yet
Assignment: Understanding Channels in Golang
3 pages
Asio 的异步模型
No ratings yet
Asio 的异步模型
21 pages
GoLang Best Practices
No ratings yet
GoLang Best Practices
13 pages
Go Introduction
No ratings yet
Go Introduction
70 pages
Concurrency Go
No ratings yet
Concurrency Go
6 pages
Lamport Mutual Exclusion Go
No ratings yet
Lamport Mutual Exclusion Go
3 pages
The Google Programming GO
No ratings yet
The Google Programming GO
6 pages
WWW - Irdai.gov - in WWW - Gicouncil.in
No ratings yet
WWW - Irdai.gov - in WWW - Gicouncil.in
2 pages
Systems Programming and Low-Level Software Engineering
No ratings yet
Systems Programming and Low-Level Software Engineering
126 pages
LED Character Driver (Complete Code With Procedure)
No ratings yet
LED Character Driver (Complete Code With Procedure)
15 pages
Embedded Linux Systems With The Yocto Project - Crash Course Material
No ratings yet
Embedded Linux Systems With The Yocto Project - Crash Course Material
158 pages
217 Energy Management
No ratings yet
217 Energy Management
1 page
Hipam Features
No ratings yet
Hipam Features
4 pages
SOX Matrixes in Tenaris: Start
No ratings yet
SOX Matrixes in Tenaris: Start
388 pages
Hitachi India - Brochure Sep 2021
No ratings yet
Hitachi India - Brochure Sep 2021
1 page
Division of Rational Expressions
No ratings yet
Division of Rational Expressions
16 pages
Learner'S Licence: Form 3 (See Rule 3 (A) and 13)
No ratings yet
Learner'S Licence: Form 3 (See Rule 3 (A) and 13)
1 page
Ang Mutya NG Section e (Book 3) (Part 3) (Completed) - 3
No ratings yet
Ang Mutya NG Section e (Book 3) (Part 3) (Completed) - 3
200 pages
Kapil CPF
No ratings yet
Kapil CPF
3 pages
施耐德SD328变频器说明书
No ratings yet
施耐德SD328变频器说明书
11 pages
JAR - OPS Quality System
No ratings yet
JAR - OPS Quality System
9 pages
HW Fluido II
No ratings yet
HW Fluido II
33 pages
Thermodynamics - II: Clausius-Clapeyron Equation
No ratings yet
Thermodynamics - II: Clausius-Clapeyron Equation
12 pages
Licensure Examination For BSC Nursing Graduates Prepared by Fmoh, June 2019
100% (5)
Licensure Examination For BSC Nursing Graduates Prepared by Fmoh, June 2019
17 pages
MACDxxxxxx
100% (1)
MACDxxxxxx
8 pages
2 Precepts Transcript
No ratings yet
2 Precepts Transcript
2 pages
Andhra Pradesh Court Fees and Suits Valuation Act, 1956
No ratings yet
Andhra Pradesh Court Fees and Suits Valuation Act, 1956
144 pages
Certificate for Higher Studies
No ratings yet
Certificate for Higher Studies
1 page
Class VI Science Exam Paper
No ratings yet
Class VI Science Exam Paper
2 pages
Arabic Calligraphy
No ratings yet
Arabic Calligraphy
14 pages
Smoothed Particle Hydrodynamics (SPH) For Complex Fluid Flows Recent Developments in Methodology and Applications
100% (1)
Smoothed Particle Hydrodynamics (SPH) For Complex Fluid Flows Recent Developments in Methodology and Applications
42 pages
Unit, Quantities and Vector
No ratings yet
Unit, Quantities and Vector
7 pages
Lesson Plan #2: Collaboration: Grade: Third Grade Social Studies Strand: Economics
No ratings yet
Lesson Plan #2: Collaboration: Grade: Third Grade Social Studies Strand: Economics
4 pages
NDIA GVSETS 2024 MOSA Session - (Papers) Harnessing Advanced Technologies For Swarm Operations Within CJADC2
No ratings yet
NDIA GVSETS 2024 MOSA Session - (Papers) Harnessing Advanced Technologies For Swarm Operations Within CJADC2
13 pages
TME 7 Pandu Gelombang
No ratings yet
TME 7 Pandu Gelombang
27 pages
CS II Minor Elective 3rd Year Paper
No ratings yet
CS II Minor Elective 3rd Year Paper
4 pages
Harvey and Penzo - Parenting A Child Who Has Intense Emotions
100% (2)
Harvey and Penzo - Parenting A Child Who Has Intense Emotions
225 pages
Fashion Trends: Impact and Debate
No ratings yet
Fashion Trends: Impact and Debate
2 pages
11th Accountancy Half Yearly Exam Important Questions English Medium PDF Download
No ratings yet
11th Accountancy Half Yearly Exam Important Questions English Medium PDF Download
6 pages
Basic Hydraulic
50% (2)
Basic Hydraulic
5 pages
US20210301331A1 - Biochip and Manufacturing Method Therefor
No ratings yet
US20210301331A1 - Biochip and Manufacturing Method Therefor
20 pages

Golang Scheduler Deep Dive

Uploaded by

Golang Scheduler Deep Dive

Uploaded by

Golang Scheduler Deep Dive

Global run queue

Distributed (Local) Run Queues

Improvements in the Go runtime scheduler solved many of these issues by introducing

1. Check for local runqueue:

• After every function call.

Time-Slice Preemption: Go has a form of preemption that is somewhat time-slice-

Where does the pre-empted goroutine go?

The pre-empted goroutine gets put into global runqueue.

Channels are commonly used for synchronization or communication between

From above example as shown in ﬁgure

- Sender is scheduled and unblocks the receiver.

Let's analyze the performance impact:

- If they are long running, pre-empted every ~10ms

- The local runqueue can have ﬁxed length 256

- This is an issue of poor locality

- Can we combine LIFO and FIFO try to achieve better locality?

Time Slice Inheritance

Things Look good!!!

- When goroutine blocks on syscall, P release the thread

HandoD can be expensive

- M:N scheduling model, where M is the number of OS threads, and N is the

By abstracting away the underlying thread management, Go’s scheduler makes

You might also like