8000 Describe GC for free-threaded build in the GC design doc (#1263) · python/devguide@fe3722d · GitHub
[go: up one dir, main page]

Skip to content

Commit fe3722d

Browse files
Describe GC for free-threaded build in the GC design doc (#1263)
Co-authored-by: Ezio Melotti <ezio.melotti@gmail.com>
1 parent bdec818 commit fe3722d

File tree

1 file changed

+134
-33
lines changed

1 file changed

+134
-33
lines changed

internals/garbage-collector.rst

Lines changed: 134 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -53,9 +53,29 @@ is needed to clean these reference cycles between objects once they become
5353
unreachable. This is the cyclic garbage collector, usually called just Garbage
5454
Collector (GC), even though reference counting is also a form of garbage collection.
5555

56+
Starting in version 3.13, CPython contains two GC implementations:
57+
58+
* The default build implementation relies on the :term:`global interpreter
59+
lock` for thread safety.
60+
* The free-threaded build implementation pauses other executing threads when
61+
performing a collection for thread safety.
62+
63+
Both implementations use the same basic algorithms, but operate on different
64+
data structures. The :ref:`gc-differences` section summarizes the
65+
differences between the two GC implementations.
66+
67+
5668
Memory layout and object structure
5769
==================================
5870

71+
The garbage collector requires additional fields in Python objects to support
72+
garbage collection. These extra fields are different in the default and the
73+
free-threaded builds.
74+
75+
76+
GC for the default build
77+
------------------------
78+
5979
Normally the C structure supporting a regular Python object looks as follows:
6080

6181
.. code-block:: none
@@ -107,6 +127,44 @@ isn't running at all!), and merging partitions, all with a small constant number
107127
With care, they also support iterating over a partition while objects are being added to - and
108128
removed from - it, which is frequently required while GC is running.
109129

130+
GC for the free-threaded build
131+
------------------------------
132+
133+
In the free-threaded build, Python objects contain a 1-byte field
134+
``ob_gc_bits`` that is used to track garbage collection related state. The
135+
field exists in all objects, including ones that do not support cyclic
136+
garbage collection. The field is used to identify objects that are tracked
137+
by the collector, ensure that finalizers are called only once per object,
138+
and, during garbage collection, differentiate reachable vs. unreachable objects.
139+
140+
.. code-block:: none
141+
142+
object -----> +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ \
143+
| ob_tid | |
144+
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ |
145+
| pad | ob_mutex | ob_gc_bits | ob_ref_local | |
146+
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ | PyObject_HEAD
147+
| ob_ref_shared | |
148+
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ |
149+
| *ob_type | |
150+
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ /
151+
| ... |
152+
153+
154+
Note that not all fields are to scale. ``pad`` is two bytes, ``ob_mutex`` and
155+
``ob_gc_bits`` are each one byte, and ``ob_ref_local`` is four bytes. The
156+
other fields, ``ob_tid``, ``ob_ref_shared``, and ``ob_type``, are all
157+
pointer-sized (i.e., eight bytes on a 64-bit platform).
158+
159+
160+
The garbage collector also temporarily repurposes the ``ob_tid`` (thread ID)
161+
and ``ob_ref_local`` (local reference count) fields for other purposes during
162+
collections.
163+
164+
165+
C APIs
166+
------
167+
110168
Specific APIs are offered to allocate, deallocate, initialize, track, and untrack
111169
objects with GC support. These APIs can be found in the `Garbage Collector C API
112170
documentation <https://docs.python.org/3.8/c-api/gcsupport.html>`_.
@@ -139,14 +197,11 @@ the interpreter create cycles everywhere. Some notable examples:
139197
* When representing data structures like graphs, it is very typical for them to
140198
have internal links to themselves.
141199

142-
To correctly dispose of these objects once they become unreachable, they need to be
143-
identified first. Inside the function that identifies cycles, two doubly linked
144-
lists are maintained: one list contains all objects to be scanned, and the other will
145-
contain all objects "tentatively" unreachable.
146-
147-
To understand how the algorithm works, let’s take the case of a circular linked list
148-
which has one link referenced by a variable ``A``, and one self-referencing object which
149-
is completely unreachable:
200+
To correctly dispose of these objects once they become unreachable, they need
201+
to be identified first. To understand how the algorithm works, let’s take
202+
the case of a circular linked list which has one link referenced by a
203+
variable ``A``, and one self-referencing object which is completely
204+
unreachable:
150205

151206
.. code-block:: python
152207
@@ -171,10 +226,17 @@ is completely unreachable:
171226
>>> gc.collect()
172227
2
173228
174-
When the GC starts, it has all the container objects it wants to scan
175-
on the first linked list. The objective is to move all the unreachable
176-
objects. Since most objects turn out to be reachable, it is much more
177-
efficient to move the unreachable as this involves fewer pointer updates.
229+
The GC starts with a set of candidate objects it wants to scan. In the
230+
default build, these "objects to scan" might be all container objects or a
231+
smaller subset (or "generation"). In the free-threaded build, the collector
232+
always operates scans all container objects.
233+
234+
The objective is to identify all the unreachable objects. The collector does
235+
this by identifying reachable objects; the remaining objects must be
236+
unreachable. The first step is to identify all of the "to scan" objects that
237+
are **directly** reachable from outside the set of candidate objects. These
238+
objects have a refcount larger than the number of incoming references from
239+
within the candidate set.
178240

179241
Every object that supports garbage collection will have an extra reference
180242
count field initialized to the reference count (``gc_ref`` in the figures)
@@ -273,23 +335,20 @@ Once the GC knows the list of unreachable objects, a very delicate process start
273335
with the objective of completely destroying these objects. Roughly, the process
274336
follows these steps in order:
275337

276-
1. Handle and clean weak references (if any). If an object that is in the unreachable
277-
set is going to be destroyed and has weak references with callbacks, these
278-
callbacks need to be honored. This process is **very** delicate as any error can
279-
cause objects that will be in an inconsistent state to be resurrected or reached
280-
by some Python functions invoked from the callbacks. In addition, weak references
281-
that also are part of the unreachable set (the object and its weak reference
282-
are in cycles that are unreachable) need to be cleaned
283-
immediately, without executing the callback. Otherwise it will be triggered later,
284-
when the ``tp_clear`` slot is called, causing havoc. Ignoring the weak reference's
285-
callback is fine because both the object and the weakref are going away, so it's
286-
legitimate to say the weak reference is going away first.
287-
288-
2. If an object has legacy finalizers (``tp_del`` slot) move them to the
338+
1. Handle and clear weak references (if any). Weak references to unreachable objects
339+
are set to ``None``. If the weak reference has an associated callback, the callback
340+
is enqueued to be called once the clearing of weak references is finished. We only
341+
invoke callbacks for weak references that are themselves reachable. If both the weak
342+
reference and the pointed-to object are unreachable we do not execute the callback.
343+
This is partly for historical reasons: the callback could resurrect an unreachable
344+
object and support for weak references predates support for object resurrection.
345+
Ignoring the weak reference's callback is fine because both the object and the weakref
346+
are going away, so it's legitimate to say the weak reference is going away first.
347+
2. If an object has legacy finalizers (``tp_del`` slot) move it to the
289348
``gc.garbage`` list.
290349
3. Call the finalizers (``tp_finalize`` slot) and mark the objects as already
291-
finalized to avoid calling them twice if they resurrect or if other finalizers
292-
have removed the object first.
350+
finalized to avoid calling finalizers twice if the objects are resurrected or
351+
if other finalizers have removed the object first.
293352
4. Deal with resurrected objects. If some objects have been resurrected, the GC
294353
finds the new subset of objects that are still unreachable by running the cycle
295354
detection algorithm again and continues with them.
@@ -300,12 +359,12 @@ follows these steps in order:
300359
Optimization: generations
301360
=========================
302361

303-
In order to limit the time each garbage collection takes, the GC uses a popular
304-
optimization: generations. The main idea behind this concept is the assumption that
305-
most objects have a very short lifespan and can thus be collected shortly after their
306-
creation. This has proven to be very close to the reality of many Python programs as
307-
many temporary objects are created and destroyed very fast. The older an object is
308-
the less likely it is that it will become unreachable.
362+
In order to limit the time each garbage collection takes, the GC
363+
implementation for the default build uses a popular optimization:
364+
generations. The main idea behind this concept is the assumption that most
365+
objects have a very short lifespan and can thus be collected soon after their
366+
creation. This has proven to be very close to the reality of many Python
367+
programs as many temporary objects are created and destroyed very quickly.
309368

310369
To take advantage of this fact, all container objects are segregated into
311370
three spaces/generations. Every new
@@ -317,6 +376,9 @@ the same object survives another GC round in this new generation (generation 1)
317376
it will be moved to the last generation (generation 2) where it will be
318377
surveyed the least often.
319378

379+
The GC implementation for the free-threaded build does not use multiple
380+
generations. Every collection operates on the entire heap.
381+
320382
In order to decide when to run, the collector keeps track of the number of object
321383
allocations and deallocations since the last collection. When the number of
322384
allocations minus the number of deallocations exceeds ``threshold_0``,
@@ -497,6 +559,45 @@ tracking status of the object.
497559
True
498560
499561
562+
.. _gc-differences:
563+
564+
Differences between GC implementations
565+
======================================
566+
567+
This section summarizes the differences between the GC implementation in the
568+
default build and the implementation in the free-threaded build.
569+
570+
The default build implementation makes extensive use of the ``PyGC_Head`` data
571+
structure, while the free-threaded build implementation does not use that
572+
data structure.
573+
574+
* The default build implementation stores all tracked objects in a doubly
575+
linked list using ``PyGC_Head``. The free-threaded build implementation
576+
instead relies on the embedded mimalloc memory allocator to scan the heap
577+
for tracked objects.
578+
* The default build implementation uses ``PyGC_Head`` for the unreachable
579+
object list. The free-threaded build implementation repurposes the
580+
``ob_tid`` field to store a unreachable objects linked list.
581+
* The default build implementation stores flags in the ``_gc_prev`` field of
582+
``PyGC_Head``. The free-threaded build implementation stores these flags
583+
in ``ob_gc_bits``.
584+
585+
586+
The default build implementation relies on the :term:`global interpreter lock`
587+
for thread safety. The free-threaded build implementation has two "stop the
588+
world" pauses, in which all other executing threads are temporarily paused so
589+
that the GC can safely access reference counts and object attributes.
590+
591+
The default build implementation is a generational collector. The
592+
free-threaded build is non-generational; each collection scans the entire
593+
heap.
594+
595+
* Keeping track of object generations is simple and inexpensive in the default
596+
build. The free-threaded build relies on mimalloc for finding tracked
597+
objects; identifying "young" objects without scanning the entire heap would
598+
be more difficult.
599+
600+
500601
.. admonition:: Document History
501602
:class: note
502603

0 commit comments

Comments
 (0)
0