0% found this document useful (0 votes)

43 views7 pages

Memory Model and Ordering in C++

C++

Uploaded by

2193maddy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

43 views7 pages

Memory Model and Ordering in C++

C++

Uploaded by

2193maddy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

Memory Model and Enforced Ordering

Both the Compiler and the CPU can reorder memory accesses, meaning that they can happen in a
different order from what's specified in the code. These reordering happen for optimization
purposes and we don't really care about them in single threaded applications However, this can
become problematic in multithreaded applications where multiple threads may need to read from
and write to shared memory simultaneously.

There are three different memory ordering models each offering different levels of restrictions and
rules. They determine the degree of freedom that the CPU and the compiler have to reorder the
operations in one thread, and to propagate changes between multiple threads. Here is a brief
description of each model:

Sequentially Consistent: This is the most strict and straightforward ordering model. It
guarantees that all the threads will agree on the same order of events, meaning that they will
establish a single total modification order of all atomic operations that are tagged with
memory_order_seq_cst .
Relaxed: This ordering poses no constraints on the CPU and the compiler. The only
guarantee with this ordering is that the operations will be carried out atomically. This
ordering applies to operations tagged with memory_order_relaxed .
Acquire/Release: If the result of a release atomic store ( atomicVar.store(value,
std::memory_order_release) ) is read by an acquire atomic load
( atomicVar.load(std::memory_order_acquire) ) in another thread, the second thread is
guaranteed to see everything that happened before the atomic store. This ordering model is a
middle ground between the other two in terms of constraints and it has better performance
than Sequentially Consistent ordering.

Before diving deep into each of these models, let's go over some formal definitions and examples.

Sequenced-before
When evaluation A happens before evaluation B in the same thread, A is sequenced-before B.

// A is sequenced-before B
int num = 10; // A
num = 30; // B
Synchronizes-with

// Thread 1
atomicFlag.store(true, std::memory_order_release);

// Thread 2
while (!atomicFlag.load(std::memory_order_acquire));

In this example, the atomic store in Thread 1 is synchronized with the atomic load in Thread 2.
The use case of this will become clear when we go over the Acquire/Release ordering in more
detail.

Inter-thread happens-before
Evaluation A inter-thread happens-before B in the following situations:

A synchronizes-with B

// Thread 1
atomicNum.store(20, std::memory_order_release); // A

// Thread 2
while (atomicNum.load(std::memory_order_acquire) != 20); // B

A synchronizes-with X and X is sequenced-before B

// Thread 1
atomicFlag.store(true, std::memory_order_release); // A

// Thread 2
while (!atomicFlag.load(std::memory_order_acquire)); // X
std::cout << "Hello"; // B

A is sequenced-before X, and X inter-thread happens-before B

// Thread 1
atomicNum.store(20, std::memory_order_relaxed); // A
atomicFlag.store(true, std::memory_order_release); // X

// Thread 2
while (!atomicFlag.load(std::memory_order_acquire));
int value = atomicNum.load(std::memory_order_relaxed); // B
A inter-thread happens-before X, and X inter-thread happens-before B (transitive property).

// Thread 1
atomicNum.store(20, std::memory_order_relaxed); // A
flag1.store(true, std::memory_order_release);

// Thread 2
while (!flag1.load(std::memory_order_acquire)); // X
flag2.store(true, std::memory_order_release);

// Thread 3
while (!flag2.load(std::memory_order_acquire));
int value = atomicNum.load(std::memory_order_relaxed); // B

Happens-before
We say A happens-before B if either A is sequenced-before B or A inter-thread happens-before B.

Sequencing Models
Sequentially Consistent Ordering
In the absence of any explicit ordering, memory_order_seq_cst is the default. The sequentially
consistent ordering is the most restrictive ordering out of all the options, and it's the most
intuitive one. It guarantees that all the threads will agree on the same sequence of events between
the atomic operations, and the atomic stores are synchronized with subsequent atomic loads .

There is only one downside with this approach. This restriction and the guarantee of ordering
comes at the cost of performance. In multicore systems, different cores may need to perform
additional expensive synchronization instructions in order to guarantee sequential consistency
between each other. One such instruction is XCHG which is discussed later on in this article.

Let's go over an example of what sequentially consistent ordering means:

// flag initially holds false

// Thread 1
atomicNumber.store(20, std::memory_order_seq_cst); // 1
flag.store(true, std::memory_order_seq_cst); // 2

// Thread 2
while (!flag.load(std::memory_order_seq_cst)); // 3
assert(atomicNumber.load() == 20); // 4

In this example, the assertion on line 4 will never fail. This is because the atomic store on line 2
in Thread 1 is synchronized with the atomic load on line 3 in Thread 2 and in Thread 1, the
atomic store in the atomicNumber variable, happens before the atomic store on the flag. Due to
the transitive property of the happens before relationship, line 1 happens before line 4 and
therefore, the assertion will always succeed.

This example is very intuitive and is what we would expect to happen. This is exactly why
memory_order_seq_cst is the default mode for all the atomic operations. Unless we have a good
reason not to, we can usually stick with this memory ordering model. As we will see in a bit, the
assertion above may fail if we used memory_order_relaxed instead.

Relaxed Ordering
memory_order_relaxed is the exact opposite side of the coin. Minimal to no guarantee is made
on the ordering of the atomic operations used with this model. Another interesting thing with
relaxed ordering is that different threads may not agree on the same order of events. Meaning that
2 threads may see 2 different values for the same atomic variable at the exact same time.

Let's see an example of what's totally valid but unexpected when memory_order_relaxed is used:

// Shared
std::atomic<int> num(0);
std::atomic<bool> flag(false);

// Thread 1
num.store(10, std::memory_order_relaxed);
num.store(20, std::memory_order_relaxed);
num.store(30, std::memory_order_relaxed);
num.store(40, std::memory_order_relaxed);
num.store(50, std::memory_order_relaxed);
flag.store(true, std::memory_order_relaxed);

// Thread 2
while (!flag.load(std::memory_order_relaxed));
int value = num.load(std::memory_order_relaxed);

In Thread 2, value can be 10, 20, 30, 40, or 50. All of these are valid outcomes because we used
memory_order_relaxed and there is no guarantee that both threads will see the same value for
the atomic variable num . Some rules still apply, even with relaxed ordering. For instance, let's
assume that thread 2 sees the value 30 the first time it calls
num.load(std::memory_order_relaxed) . In this situation, the subsequent calls to
num.load(std::memory_order_relaxed) in thread 2 cannot see any of the previous atomic stores
on num, so they will either still see 30, or they will see the subsequent atomic stores in thread 1
which are 40 and 50.

The benefit of using a relaxed ordering is that it has the best performance but it's also the trickiest
to get right. Due to the looseness of the rules enforced by memory_order_relaxed , it shouldn't be
used often unless performance if of utmost importance. A common application of
memory_order_relaxed is incrementing a counter because we don't really care about the order in
which the counter gets incremented, as long as it does eventually and atomically.

Acquire/Release Ordering
The Acquire/Release ordering model is a middle ground between the sequentially consistent and
relaxed models. It has better performance than sequentially consistent, and it provides useful
synchronization guarantees in multithreaded programs.

If a load atomic operation tagged with memory_order_acquire in Thread B, reads a value stored
in the same atomic variable with an atomic store tagged with memory_order_release in Thread
A, then Thread A synchronizes with Thread B, and they will participate in a release sequence.
This means that all atomic and non-atomic memory accesses before the release store are visible
after the acquire load. In terms of the definitions we had earlier, every memory access that
appears before the store tagged with memory_order_release , happens before the instructions that
appear after the load tagged with memory_order_acquire .

This synchronization is only established between the threads releasing and acquiring the same
atomic variable. Memory accesses may still propagate in a different order to other threads that
don't perform an acquire operation on the same atomic variable.

// Thread 1
nonAtomicNum = 20
atomicNum.store(40, std::memory_order_relaxed);
flag.store(true, std::memory_order_release); // A

// Thread 2
while (!flag.load(std::memory_order_acquire)); // B
assert(nonAtomicNum == 20); // Never fires
assert(atomicNum.load(std::memory_order_relaxed) == 40); // Never fires

In the example above, Thread 1 synchronizes with Thread 2 on flag, which is an atomic boolean.
Therefore, all the writes before the store tagged with memory_order_release are going to be
visible in Thread 2 after the synchronization. As a result, both the asserts are always guaranteed
to succeed.

Behind the Scenes

x86 vs ARM
In strongly ordered architectures like x86, release-acquire is the default and it is the minimum
amount of constraint imposed by the CPU on the ordering of the instructions. No additional
instruction is generated for memory_order_release and memory_order_acquire as opposed to
memory_order_relaxed in these architectures.
atomicNum.store(10, std::memory_order_release); // A
atomicNum.store(10, std::memory_order_relaxed); // B

Lines A and B in the example above produce the exact same assembly code. The mov instruction
guarantees release-acquire ordering in the x86 architecture (compiled with x86 GCC 13.2):

mov DWORD PTR atomicNum[rip], 10

in weakly ordered systems like ARM however, the CPU may reorder the operations that are
tagged with memory_order_release . In these systems, additional instructions are required to
enforce release-acquire ordering. Line A in the example above generates the following arm
assembly (compiled with ARM GCC 13.2.0):

; counter.store(10, std::memory_order_relaxed)
movs r0, #10

whereas Line B generates the following:

; counter.store(10, std::memory_order_release)
dmb ish
movs r2, #10

dmb stands for Data Memory Barrier which according to the arm documentation, "ensures that all
explicit memory accesses that appear in program order before the DMB instruction are observed
before any explicit memory accesses that appear in program after the DMB instruction". This is
exactly the behavior we would expect from memory_order_release . For operations that are
tagged with memory_order_acquire , the data memory barrier ( dmb ) will be placed after the main
instruction. and with memory_order_seq_cst , the main instruction will be wrapped with two data
memory barriers:

; counter.store(10, std::memory_order_seq_cst);
dbm ish
movs r2, #10
dmb ish

To ensure sequential consistency in x86 architectures, the compiler either uses mov + mfence or
mov + xchg . The mov by itself is not enough since it only has release semantics. xchg has an
implicit lock prefix which makes it a full memory barrier. I got the following assembly when
compiling with x86 GCC 13.2:
; counter.store(10, std::memory_order_seq_cst)
mov eax, 10
xchg eax, DWORD PTR counter[rip]

These additional instructions explain the performance difference between different memory models
and why they work the way they do.

The CPP Memory Model
No ratings yet
The CPP Memory Model
46 pages
Foundations of The C++ Concurrency Memory Model: John Mellor-Crummey and Karthik Murthy
100% (1)
Foundations of The C++ Concurrency Memory Model: John Mellor-Crummey and Karthik Murthy
31 pages
Memory Model For Multithreaded C++: Andrei Alexandrescu Hans Boehm Kevlin Henney Doug Lea Bill Pugh
No ratings yet
Memory Model For Multithreaded C++: Andrei Alexandrescu Hans Boehm Kevlin Henney Doug Lea Bill Pugh
6 pages
C++11 Memory Model & Multithreading
No ratings yet
C++11 Memory Model & Multithreading
35 pages
Memory Consistency Models: Sarita Adve
No ratings yet
Memory Consistency Models: Sarita Adve
60 pages
L4a MM Examples
No ratings yet
L4a MM Examples
10 pages
CH 4 Synchronization Models of Memory Consistency
100% (1)
CH 4 Synchronization Models of Memory Consistency
26 pages
Memory Consistency Models Explained
No ratings yet
Memory Consistency Models Explained
5 pages
L04 Parallel Systems Synchronization Communication Scheduling
No ratings yet
L04 Parallel Systems Synchronization Communication Scheduling
117 pages
11 Lock Freedom
No ratings yet
11 Lock Freedom
24 pages
C++11, 14, 17 Atomics - The Deep Dive - Michael Wong - CppCon 2015
No ratings yet
C++11, 14, 17 Atomics - The Deep Dive - Michael Wong - CppCon 2015
69 pages
Mute Xes
No ratings yet
Mute Xes
7 pages
Chenyu Zheng CSCI 5828 - Spring 2010 Prof. Kenneth M. Anderson University of Colorado at Boulder
No ratings yet
Chenyu Zheng CSCI 5828 - Spring 2010 Prof. Kenneth M. Anderson University of Colorado at Boulder
44 pages
3 Concurrency
No ratings yet
3 Concurrency
52 pages
F06 - Threads and The Memory Model in ISO C C++ and Java
No ratings yet
F06 - Threads and The Memory Model in ISO C C++ and Java
69 pages
Memory Consistency Model: Ack: Prof. Sarita Adve, UIUC
No ratings yet
Memory Consistency Model: Ack: Prof. Sarita Adve, UIUC
27 pages
Atomic Operations
No ratings yet
Atomic Operations
6 pages
Synchronization
No ratings yet
Synchronization
81 pages
Concurrent Data Structures Method
No ratings yet
Concurrent Data Structures Method
17 pages
Mmnet13 Sarkar
No ratings yet
Mmnet13 Sarkar
50 pages
Omp Sync Data Runtime Environment
No ratings yet
Omp Sync Data Runtime Environment
59 pages
Back To Basics Concurrency Arthur Odwyer
No ratings yet
Back To Basics Concurrency Arthur Odwyer
58 pages
Atomic Ops
No ratings yet
Atomic Ops
11 pages
Atomic Smart Pointers: Half Thread-Safe
No ratings yet
Atomic Smart Pointers: Half Thread-Safe
4 pages
Concurrency Insights for Systems Programmers
No ratings yet
Concurrency Insights for Systems Programmers
12 pages
Memory Consistyency
No ratings yet
Memory Consistyency
1 page
Locks 1
No ratings yet
Locks 1
61 pages
Uh Oh Its IO Ordering Will Deacon Arm
No ratings yet
Uh Oh Its IO Ordering Will Deacon Arm
38 pages
Shared Memory OpenMP Tasks Summary
No ratings yet
Shared Memory OpenMP Tasks Summary
3 pages
CS 162 Memory Consistency Models
No ratings yet
CS 162 Memory Consistency Models
22 pages
Lecture #10: Threads & Synchronization
No ratings yet
Lecture #10: Threads & Synchronization
7 pages
Shared Memory Multiprocessors: Logical Design and Software Interactions
No ratings yet
Shared Memory Multiprocessors: Logical Design and Software Interactions
107 pages
Explaining Atomics in Rust - Explaining Atomics in Rust
No ratings yet
Explaining Atomics in Rust - Explaining Atomics in Rust
17 pages
Advanced Performance Optimization in CUDA (S62192)
100% (1)
Advanced Performance Optimization in CUDA (S62192)
127 pages
3 Thread Creation&Amnipulation&Synchronization
No ratings yet
3 Thread Creation&Amnipulation&Synchronization
12 pages
Lect06 Consistency Models
No ratings yet
Lect06 Consistency Models
64 pages
5 Scheduling
No ratings yet
5 Scheduling
168 pages
05 C++ Threads
No ratings yet
05 C++ Threads
28 pages
Multiprocessors I
No ratings yet
Multiprocessors I
13 pages
Notes 02
No ratings yet
Notes 02
9 pages
3 Multithreading in C
No ratings yet
3 Multithreading in C
4 pages
CSE211 Computer Architecture
No ratings yet
CSE211 Computer Architecture
18 pages
Concurrency Primer
No ratings yet
Concurrency Primer
12 pages
Atomics
No ratings yet
Atomics
33 pages
L4 Atomics
No ratings yet
L4 Atomics
56 pages
Introduction To DSM: Unit - III Essay Questions
No ratings yet
Introduction To DSM: Unit - III Essay Questions
21 pages
15 Synchronization
No ratings yet
15 Synchronization
57 pages
Pthreads Mod
No ratings yet
Pthreads Mod
110 pages
KTMTSS Shared Memory Multiprocessor
No ratings yet
KTMTSS Shared Memory Multiprocessor
29 pages
Untitled
No ratings yet
Untitled
27 pages
Foundations of The C++ Concurrency Memory Model
No ratings yet
Foundations of The C++ Concurrency Memory Model
12 pages
MC&CC
No ratings yet
MC&CC
21 pages
Lecture 05
No ratings yet
Lecture 05
73 pages
3unit3 Mca Pecnotes
No ratings yet
3unit3 Mca Pecnotes
23 pages
06 Consistency
No ratings yet
06 Consistency
46 pages
OpenMP Shared Memory Programming Guide
No ratings yet
OpenMP Shared Memory Programming Guide
24 pages
PDF 20230226 190207 0000
No ratings yet
PDF 20230226 190207 0000
20 pages
Multithreading in C++ by Mandeep Punia
No ratings yet
Multithreading in C++ by Mandeep Punia
22 pages
Luxury Penthouses in Bangalore
No ratings yet
Luxury Penthouses in Bangalore
22 pages
Prestige Glenbrook
No ratings yet
Prestige Glenbrook
32 pages
MISRA C v2 Update
No ratings yet
MISRA C v2 Update
17 pages
Freebitco .In Script
No ratings yet
Freebitco .In Script
17 pages
COL106 Minor 1 Subjective
No ratings yet
COL106 Minor 1 Subjective
2 pages
VALTECH INTERVIEW QUESTIONS 20th Sept 17
No ratings yet
VALTECH INTERVIEW QUESTIONS 20th Sept 17
4 pages
User Exits in SAP ABAP
No ratings yet
User Exits in SAP ABAP
2 pages
Google Analytics
No ratings yet
Google Analytics
7 pages
Programming Language Evolution
No ratings yet
Programming Language Evolution
3 pages
Ibm'S Websphere Service Registry and Repository - Technical Overview
No ratings yet
Ibm'S Websphere Service Registry and Repository - Technical Overview
22 pages
IBM Tivoli Monitoring Version 6.1 Universal Agent Script Data Provider Tips
No ratings yet
IBM Tivoli Monitoring Version 6.1 Universal Agent Script Data Provider Tips
23 pages
Ravi Led Hitec
No ratings yet
Ravi Led Hitec
5 pages
Control M Training
100% (2)
Control M Training
59 pages
JAVA For Selenium
No ratings yet
JAVA For Selenium
2 pages
Introduction To Socket Programming
No ratings yet
Introduction To Socket Programming
20 pages
BCA 402 Python Unit 1
No ratings yet
BCA 402 Python Unit 1
57 pages
Module 1 Programs
No ratings yet
Module 1 Programs
3 pages
Exam Example Sol
100% (2)
Exam Example Sol
11 pages
1-Need For Database Systems Characteristics of Database Approach Actors in DBMS Dat
No ratings yet
1-Need For Database Systems Characteristics of Database Approach Actors in DBMS Dat
23 pages
Upwork Test
50% (2)
Upwork Test
22 pages
Introduction To PhpMyAdmin MySQL
100% (1)
Introduction To PhpMyAdmin MySQL
16 pages
Aim: To Software Engineering Fundamentals.: Practical No: 1
No ratings yet
Aim: To Software Engineering Fundamentals.: Practical No: 1
5 pages
Nios Computer Science 330 SR Secondary Question Paper April 2025
No ratings yet
Nios Computer Science 330 SR Secondary Question Paper April 2025
33 pages
Primavera
No ratings yet
Primavera
2 pages
Delphi - Stored Procedures - Part 1 PDF
No ratings yet
Delphi - Stored Procedures - Part 1 PDF
4 pages
Hardware Protection
No ratings yet
Hardware Protection
21 pages
EJB Interview Guide for Developers
No ratings yet
EJB Interview Guide for Developers
84 pages
PPFD 4
No ratings yet
PPFD 4
45 pages
Strings (ALL PROGRAMS)
No ratings yet
Strings (ALL PROGRAMS)
4 pages
Chapter 5: The Project Life Cycle: Just Enough Structured Analysis
No ratings yet
Chapter 5: The Project Life Cycle: Just Enough Structured Analysis
28 pages
18 Day Regular Inner Class 1
No ratings yet
18 Day Regular Inner Class 1
16 pages
Java JNLP for Developers
No ratings yet
Java JNLP for Developers
14 pages

Memory Model and Ordering in C++

Uploaded by

Memory Model and Ordering in C++

Uploaded by

Memory Model and Enforced Ordering

A synchronizes-with X and X is sequenced-before B

A is sequenced-before X, and X inter-thread happens-before B

Let's go over an example of what sequentially consistent ordering means:

// flag initially holds false

Behind the Scenes

mov DWORD PTR atomicNum[rip], 10

whereas Line B generates the following:

You might also like