@@ -108,7 +108,7 @@ As is explained later in the
108
108
[ Optimization: reusing fields to save memory] ( #optimization-reusing-fields-to-save-memory )
109
109
section, these two extra fields are normally used to keep doubly linked lists of all the
110
110
objects tracked by the garbage collector (these lists are the GC generations, more on
111
- that in the [ Optimization: generations ] ( #Optimization-generations ) section), but
111
+ that in the [ Optimization: incremental collection ] ( #Optimization-incremental-collection ) section), but
112
112
they are also reused to fulfill other purposes when the full doubly linked list
113
113
structure is not needed as a memory optimization.
114
114
@@ -351,38 +351,90 @@ follows these steps in order:
351
351
the reference counts fall to 0, triggering the destruction of all unreachable
352
352
objects.
353
353
354
- Optimization: generations
355
- =========================
354
+ Optimization: incremental collection
355
+ ====================================
356
356
357
- In order to limit the time each garbage collection takes, the GC
358
- implementation for the default build uses a popular optimization:
359
- generations. The main idea behind this concept is the assumption that most
360
- objects have a very short lifespan and can thus be collected soon after their
361
- creation. This has proven to be very close to the reality of many Python
357
+ In order to bound the length of each garbage collection pause, the GC implementation
358
+ for the default build uses incremental collection with two generations.
359
+
360
+ Generational garbage collection takes advantage of what is known as the weak
361
+ generational hypothesis: Most objects die young.
362
+ This has proven to be very close to the reality of many Python
362
363
programs as many temporary objects are created and destroyed very quickly.
363
364
364
365
To take advantage of this fact, all container objects are segregated into
365
- three spaces/generations. Every new
366
- object starts in the first generation (generation 0). The previous algorithm is
367
- executed only over the objects of a particular generation and if an object
368
- survives a collection of its generation it will be moved to the next one
369
- (generation 1), where it will be surveyed for collection less often. If
370
- the same object survives another GC round in this new generation (generation 1)
371
- it will be moved to the last generation (generation 2) where it will be
372
- surveyed the least often.
373
-
374
- The GC implementation for the free-threaded build does not use multiple
375
- generations. Every collection operates on the entire heap.
366
+ two generations: young and old. Every new object starts in the young generation.
367
+ Each garbage collection scans the entire young generation and part of the old generation.
368
+
369
+ The time taken to scan the young generation can be controlled by controlling its
370
+ size, but the size of the old generation cannot be controlled.
371
+ In order to keep pause times down, scanning of the old generation of the heap
372
+ occurs in increments.
373
+
374
+ To keep track of what has been scanned, the old generation contains two lists:
375
+
376
+ * Those objects that have not yet been scanned, referred to as the ` pending ` list.
377
+ * Those objects that have been scanned, referred to as the ` visited ` list.
378
+
379
+ To detect and collect all unreachable objects in the heap, the garbage collector
380
+ must scan the whole heap. This whole heap scan is called a full scavenge.
381
+
382
+ Increments
383
+ ----------
384
+
385
+ Each full scavenge is performed in a series of increments.
386
+ For each full scavenge, the combined increments will cover the whole heap.
387
+
388
+ Each increment is made up of:
389
+
390
+ * The young generation
391
+ * The old generation's least recently scanned objects
392
+ * All objects reachable from those objects that have not yet been scanned this full scavenge
393
+
394
+ The surviving objects (those that are not collected) are moved to the back of the
395
+ ` visited ` list in the old generation.
396
+
397
+ When a full scavenge starts, no objects in the heap are considered to have been scanned,
398
+ so all objects in the old generation must be in the ` pending ` space.
399
+ When all objects in the heap have been scanned a cycle ends, and all objects are moved
400
+ to the ` pending ` list again. To avoid having to traverse the entire list, which list is
401
+ ` pending ` and which is ` visited ` is determined by a field in the ` GCState ` struct.
402
+ The ` visited ` and ` pending ` lists can be swapped by toggling this bit.
403
+
404
+ Correctness
405
+ -----------
406
+
407
+ The [ algorithm for identifying cycles] ( #Identifying-reference-cycles ) will find all
408
+ unreachable cycles in a list of objects, but will not find any cycles that are
409
+ even partly outside of that list.
410
+ Therefore, to be guaranteed that a full scavenge will find all unreachable cycles,
411
+ each cycle must be fully contained within a single increment.
412
+
413
+ To make sure that no partial cycles are included in the increment we perform a
414
+ [ transitive closure] ( https://en.wikipedia.org/wiki/Transitive_closure )
415
+ over reachable, unscanned objects from the initial increment.
416
+ Since the transitive closure of objects reachable from an object must be a (non-strict)
417
+ superset of any unreachable cycle including that object, we are guaranteed that a
418
+ transitive closure cannot contain any partial cycles.
419
+ We can exclude scanned objects, as they must have been reachable when scanned.
420
+ If a scanned object becomes part of an unreachable cycle after being scanned, it will
421
+ not be collected this at this time, but it will be collected in the next full scavenge.
422
+
423
+ > [ !NOTE]
424
+ > The GC implementation for the free-threaded build does not use incremental collection.
425
+ > Every collection operates on the entire heap.
376
426
377
427
In order to decide when to run, the collector keeps track of the number of object
378
428
allocations and deallocations since the last collection. When the number of
379
- allocations minus the number of deallocations exceeds ` threshold_0 ` ,
380
- collection starts. Initially only generation 0 is examined. If generation 0 has
381
- been examined more than ` threshold_1 ` times since generation 1 has been
382
- examined, then generation 1 is examined as well. With generation 2,
383
- things are a bit more complicated; see
384
- [ Collecting the oldest generation] ( #Collecting-the-oldest-generation ) for
385
- more information. These thresholds can be examined using the
429
+ allocations minus the number of deallocations exceeds ` threshold0 ` ,
430
+ collection starts. ` threshold1 ` determines the fraction of the old
431
+ collection that is included in the increment.
432
+ The fraction is inversely proportional to ` threshold1 ` ,
433
+ as historically a larger ` threshold1 ` meant that old generation
434
+ collections were performed less frequently.
435
+ ` threshold2 ` is ignored.
436
+
437
+ These thresholds can be examined using the
386
438
[ ` gc.get_threshold() ` ] ( https://docs.python.org/3/library/gc.html#gc.get_threshold )
387
439
function:
388
440
@@ -402,8 +454,8 @@ specifically in a generation by calling `gc.collect(generation=NUM)`.
402
454
... pass
403
455
...
404
456
405
- # Move everything to the last generation so it's easier to inspect
406
- # the younger generations .
457
+ # Move everything to the old generation so it's easier to inspect
458
+ # the young generation .
407
459
408
460
>>> gc.collect()
409
461
0
@@ -413,40 +465,24 @@ specifically in a generation by calling `gc.collect(generation=NUM)`.
413
465
>>> x = MyObj()
414
466
>>> x.self = x
415
467
416
- # Initially the object is in the youngest generation.
468
+ # Initially the object is in the young generation.
417
469
418
470
>>> gc.get_objects(generation=0)
419
471
[..., <__main__.MyObj object at 0x7fbcc12a3400>, ...]
420
472
421
473
# After a collection of the youngest generation the object
422
- # moves to the next generation.
474
+ # moves to the old generation.
423
475
424
476
>>> gc.collect(generation=0)
425
477
0
426
478
>>> gc.get_objects(generation=0)
427
479
[]
428
480
>>> gc.get_objects(generation=1)
481
+ []
482
+ >>> gc.get_objects(generation=2)
429
483
[..., <__main__.MyObj object at 0x7fbcc12a3400>, ...]
430
484
```
431
485
432
- Collecting the oldest generation
433
- --------------------------------
434
-
435
- In addition to the various configurable thresholds, the GC only triggers a full
436
- collection of the oldest generation if the ratio ` long_lived_pending / long_lived_total `
437
- is above a given value (hardwired to 25%). The reason is that, while "non-full"
438
- collections (that is, collections of the young and middle generations) will always
439
- examine roughly the same number of objects (determined by the aforementioned
440
- thresholds) the cost of a full collection is proportional to the total
441
- number of long-lived objects, which is virtually unbounded. Indeed, it has
442
- been remarked that doing a full collection every <constant number > of object
443
- creations entails a dramatic performance degradation in workloads which consist
444
- of creating and storing lots of long-lived objects (for example, building a large list
445
- of GC-tracked objects would show quadratic performance, instead of linear as
446
- expected). Using the above ratio, instead, yields amortized linear performance
447
- in the total number of objects (the effect of which can be summarized thusly:
448
- "each full garbage collection is more and more costly as the number of objects
449
- grows, but we do fewer and fewer of them").
450
486
451
487
Optimization: reusing fields to save memory
452
488
===========================================
@@ -588,9 +624,9 @@ heap.
588
624
be more difficult.
589
625
590
626
591
- > [ !NOTE]
627
+ > [ !NOTE]
592
628
> ** Document history**
593
- >
629
+ >
594
630
> Pablo Galindo Salgado - Original author
595
- >
631
+ >
596
632
> Irit Katriel - Convert to Markdown
0 commit comments