From 10ee4fd8bdbe620cc21c59e58b29ce1960714c76 Mon Sep 17 00:00:00 2001 From: Pablo Galindo Date: Mon, 20 Jan 2020 18:08:23 +0000 Subject: [PATCH 01/28] Add section about the design of CPython's garbage collector --- appendix.rst | 3 +- garbage_collector.rst | 386 +++++++++++++++++++++++++ images/python-cyclic-gc-1-new-page.png | Bin 0 -> 4415 bytes images/python-cyclic-gc-2-new-page.png | Bin 0 -> 4337 bytes images/python-cyclic-gc-3-new-page.png | Bin 0 -> 4876 bytes images/python-cyclic-gc-4-new-page.png | Bin 0 -> 4863 bytes images/python-cyclic-gc-5-new-page.png | Bin 0 -> 5712 bytes index.rst | 2 + 8 files changed, 390 insertions(+), 1 deletion(-) create mode 100644 garbage_collector.rst create mode 100644 images/python-cyclic-gc-1-new-page.png create mode 100644 images/python-cyclic-gc-2-new-page.png create mode 100644 images/python-cyclic-gc-3-new-page.png create mode 100644 images/python-cyclic-gc-4-new-page.png create mode 100644 images/python-cyclic-gc-5-new-page.png diff --git a/appendix.rst b/appendix.rst index f2db2ac2e1..793f615c83 100644 --- a/appendix.rst +++ b/appendix.rst @@ -51,6 +51,7 @@ Language development in depth * :doc:`exploring` * :doc:`grammar` * :doc:`compiler` +* :doc:`garbage_collector` * :doc:`stdlibchanges` * :doc:`langchanges` * :doc:`porting` @@ -64,4 +65,4 @@ Testing and continuous integration * :doc:`buildbots` * :doc:`buildworker` * :doc:`coverity` - \ No newline at end of file + diff --git a/garbage_collector.rst b/garbage_collector.rst new file mode 100644 index 0000000000..ed5bdc3465 --- /dev/null +++ b/garbage_collector.rst @@ -0,0 +1,386 @@ +.. _gc: + +Design of CPython's Garbage Collector +===================================== + +.. highlight:: none + +Abstract +-------- + +The main garbage collector system of CPython is reference count. The basic idea is +that CPython counts how many different places there are that have a reference to an +object. Such a place could be another object, or a global (or static) C variable, or +a local variable in some C function. When an object’s reference count becomes zero, +the object is deallocated. If it contains references to other objects, their +reference count is decremented. Those other objects may be deallocated in turn, if +this decrement makes their reference count become zero, and so on. The reference +count field can be examined using the ``sys.getrefcount`` function (notice that the +value returned by this function is always 1 more as the function also has a reference +to the object when called): + +.. code-block:: python + + >>> x = object() + >>> sys.getrefcount(x) + 2 + >>> y = x + >>> sys.getrefcount(x) + 3 + del y + >>> sys.getrefcount(x) + 2 + +The main problem present with the reference count schema is that reference count does not +handle reference cycles. For instance, consider this code + +.. code-block:: python + + >>> container = [] + >>> container.append(container) + + >>> del container + +In this example, ``container`` holds a reference to itself, so even when we remove +our reference to it (the variable "container") the reference count never falls to 0 +because it still has its own internal reference and therefore it will never be +cleaned just by simple reference counting. For this reason some additional machinery +is needed to clean these reference cycles between objects once they become +unreachable. We normally refer to this additional machinery as the Garbage Collector, +but technically reference counting is also a form of garbage collection. + +Memory layout and object structure +---------------------------------- + +Normally the C structure supporting a regular Python object looks as follows: + +.. code-block:: none + + object -----> +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ \ + | ob_refcnt | | + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ | PyObject_HEAD + | *ob_type | | + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ / + | ... | + + +In order to support the garbage collector, the memory layout of objects is altered +to acomodate extra information **before** the normal layout: + +.. code-block:: none + + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ \ + | *_gc_next | | + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ | PyGC_Head + | *_gc_prev | | + object -----> +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ / + | ob_refcnt | \ + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ | PyObject_HEAD + | *ob_type | | + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ / + | ... | + + +In this way the object can be treated as a normal python object and when the extra +information associated to the GC is needed the previous fields can be accessed by a +simple type cast from the original object: :code:`((PyGC_Head *)(the_object)-1)`. + +As is explained later in the `Optimization: reusing fields to save memory`_ section, +these two extra fields are normally used to keep a double linked list of all the +objects tracked by the garbage collector (these lists are the GC generations, more on +that in the `Optimization: reusing fields to save memory`_ section) but they are also +reused to fullfill other pourposes when the full double linked list structure is not +needed as a memory optimization. + +Specific APIs are offered to allocate, deallocate, initialize, track and untrack +objects with GC support. These APIs can be found in the `Garbage Collector C API +documentation `_. + +Appart from this object structure, the type object of objects supporting garbage +collection must include the ``Py_TPFLAGS_HAVE_GC`` in its ``tp_flags`` slot and +provide an implementation of the ``tp_traverse`` handler. Unless it can be proven +that the objects cannot form reference cycles with only objects of its type or if the +type are mutable, a ``tp_clear`` implementation must also be provided. + + +Identifiying reference cycles reference cycles +---------------------------------------------- + +The algorithm that why CPython uses to detect those reference cycles is +implemented in the ``gc`` module. The garbage collector **only focuses** +on cleaning container objects (i.e. objects that can contain a reference +to one or more objects). These can be arrays, dictionaries, lists, custom +class instances, classes in extension modules, etc. One could think that +cycles are uncommon but the thuth is that many internal references needed by +the interpreter create cycles everywhere. Some notable examples: + + * Exceptions contain traceback objects that contain a list of frames that + contain the exception itself. + * Instances have references to their class and the class to the module, which + contains references to everything that is inside (and maybe other modules) + and this can lead back to the original instance. + * When representing data structures like graphs is very typical for them to + have internal links to themselves. + +To correctly dispose of these objects once they become unreachable, they need to +be identified first. This is done in the `deduce_unreachable() `__ +function. Inside this component, two double-linked lists are maintained: one list contains +all objects to be scanned, and the other will contain all objects "tentatively" unreachable. + +To understand how the algorith works, Let’s take the case of a circular linked list which has +one link referenced by a variable A, and one self-referencing object which is completely +unreachable + +.. code-block:: python + + + class Link: + def __init__(self, next_link=None): + self.next_link = next_link + + link_3 = Link() + link_2 = Link(link3) + link_1 = Link(link2) + link_3.next_link = link_1 + + link_4 = Link() + link_4.next_link = link_4 + + import gc + gc.collect() + +When the GC starts, it has all the container objects it wants to scan +on a the first linked list. The objective is to move all the unreachable +objects. As generally most objects turn out to be reachable, is much more +efficient to move the unreachable as this involves fewer pointer updates. + +Every object that supports garbage collection will have a extra reference +count field initialized to the reference count (``gc_ref`` in the figures) +of that object when the algorithm starts. This is because the algorith needs +to modify the reference count to do the computations and in this way the +interpreter will not modify the real reference count field. + +.. figure:: images/python-cyclic-gc-1-new-page.png + +The GC then iterates over all containers in the first list and decrements by one the +``gc_ref`` field of any other object that container it is referencing. For doing +this it makes use of the ``tp_traverse`` slot in the container class (implemented +using the C API or inherited by a superclass) to know what objects are refered by +each container. After all the objects have been scanned, only the objects that have +references from outside the “objects to scan” list will have ``gc_ref > 0``. + +.. figure:: images/python-cyclic-gc-2-new-page.png + +Notice that having ``gc_refs == 0`` does not imply that the object is unreachable. +This is because another object that is reachable from the outside (``gc_refs > 0``) +can still have references to it. For instance, the ``link_2`` object in our example +ended having ``gc_refs == 0`` but is referenced still by the ``link_1`` object that +is reachable from the outside. To obtain the set of objects that are really +unreachanle, the garbage collector scans again the container objects using the +``tp_traverse`` slot with a diferent traverse function that marks objects with +``gc_refs == 0`` as "tentatively unreachable" and then moves them to the +tentatively unreachable list. The following image depicts the state of the lists in a +moment when the GC processed the ``link 3`` and ``link 4`` objects but hasn’t +processed ``link 1`` and ``link 2`` yet. + +.. figure:: images/python-cyclic-gc-3-new-page.png + +Then the GC scans next the ``link 1`` object. Because its has ``gc_refs == 1`` +the gc does not do anything special because it knows it has to be reachable (and is +already in what will become the reachable list): + +.. figure:: images/python-cyclic-gc-4-new-page.png + +When the GC encounters an object which is reachable (``gc_refs > 0``), it traverses +its references using the ``tp_traverse`` slot to find all the objects that are +reachable from it, marking moving them to the end oflist of reachable objects (where +they started originally) and setting its ``gc_refs`` field to 1. This is what happens +to ``link 2`` and ``link 3`` below as they are reachable from ``link 1``. From the +state in the previous image and after examining the objects referred to by ``link1`` +the GC knows that ``link 3`` is reachable after all, so it is moved back to the +original list and its ``gc_refs`` field is set to one so if the GC vitis it again, it +does not that is reachable. To avoid visiting a object twice, the GC marks all +objects that are not visited yet with and once an object is processed is unmarked so +the GC does not process it twice. + +.. figure:: images/python-cyclic-gc-5-new-page.png + +Notice that once a object that was marked as "tentatively unreachable" and later is +moved back to the reachable list, it will be visited again by the garbage collector +as now all the references that that objects has need to be processed as well. This +process in really a breadth first search over the object graph. Once all the objects +are scanned, the GC knows that all container objects in the tentatively unreachable +list are really unreachable and can thus be garbage collected. + +Destroying unreachable objects +------------------------------ + +Once the GC knows the list of unreachable objects, a very delicate process starts +with the objective of completely destroying these objects. Roughtly, the process +follows these steps in order: + +1. Handle and clean weak references (if any). If an object that is in the unreachable + set is going to be destroyed and has weak references with callbacks, these + callbacks need to be honored. This proces is **very** delicate as any error can + cause objects that will be in an inconsistent state to be resurrected or reached + by some python functions invoked from the callbacks. To avoid this weak references + that also are part of the unreachable set (the object and its weak reference + are in a cycles that are unreachable) then the weak reference needs to be clean + inmediately and the callback must not be executed so it does not trigger later + when the ``tp_clear`` slot is called, causing havok. This is fine because both + the object and the weakref are going away, so it's legitimate to pretend the + weakref is going away first so the callback is never executed. + +2. If an object has legacy finalizers (``tp_del`` slot) move them to the + ``gc.garbage`` list. +3. Call the finalizers (``tp_finalize`` slot) and mark the objects as already + finalized to avoid calling them twice if they resurrect of if other finalizers + have removed the object first. +4. Deal with resurrected objects. If some objects have been resurrected the GC + finds the new subset of objects that are still unreachable by running the cycle + detection algorithm again and continues with them. +5. Call the ``tp_clear`` slot of every object so all internal links are broken and + the reference counts fall to 0, triggering the destruction of all unreachabke + objects. + +Optimization: generations +------------------------- + +In order to limit the time each garbage collection takes, the GC is uses a popular +optimization: generations. The main idea behind this concept is the assumption that +most objects have a very short lifespan and can thus be collected shortly after their +creation. This has proven to be very close to the reality of many Python programs as +many temporary objects are created and destroyed very fast. The older an object is +the less likely is to become unreachable. + +To take advatange of this fact, all container objects are segregated across +three spaces or "generations" (CPython currently uses 3 generations). Every new +object starts in the firstgeneration (generation 0). The previous algorithm is +executed only over the objects of a particular generation and if an object +survives a collection of its generation it will be moved to the next one +(generation 1), where it will it will be surveyed for collection less often. If +the same object survives another GC round in this new generation (generation 1) +it will be moved to the last generation (generation 2) where it will be +surveyed the least often. + +Generations are collected when the number of objects that they contain reach some +predefined threshold which is unique of each generation and is lower the older the +generation is. These thresholds can be examined using the ``gc.get_threshold`` +function: + +.. code-block:: python + + >>> import gc + >>> gc.get_threshold() + (700, 10, 10) + + +The content of these generations can be examined using the +``gc.get_objects(generation=NUM)`` function and collections can be triggered +specifically in a generation by calling ``gc.collect(generation=NUM)``. + +Collecting the oldest generation +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +In addition to the various configurable thresholds, the GC only trigger a full +collection of the oldest if the ratio ``long_lived_pending / long_lived_total`` +is above a given value (hardwired to 25%). The reason is that, while "non-full" +collections (i.e., collections of the young and middle generations) will always +examine roughly the same number of objects (determined by the aforementioned +thresholds) the cost of a full collection is proportional to the total +number of long-lived objects, which is virtually unbounded. Indeed, it has +been remarked that doing a full collection every of object +creations entails a dramatic performance degradation in workloads which consist +in creating and storing lots of long-lived objects (e.g. building a large list +of GC-tracked objects would show quadratic performance, instead of linear as +expected). Using the above ratio, instead, yields amortized linear performance +in the total number of objects (the effect of which can be summarized thusly: +"each full garbage collection is more and more costly as the number of objects +grows, but we do fewer and fewer of them"). + +Optimization: reusing fields to save memory +------------------------------------------- + +In order to save memory, the two linked list pointers in every object with gc +support are reused for several pourposes. This is a common optimization known +as "fat pointers" or "tagged pointers": pointers that carry additional data, +"folded" into the pointer, meaning stored inline in the data representing the +address, taking advantage of certain properties of memory addressing. This is +possible as most architectures are certain types of data will often be aligned +to the size of the data, often a word or multiple thereof. This discrepancy +leaves a few of the least significant bits of the pointer unused, which can be +used for tags or to keep other information – most often as a bit field (each +bit a separate tag) – as long as code that uses the pointer masks out these +bits before accessing memory. E.g., on a 32-bit architecture (for both +addresses and word size), a word is 32 bits = 4 bytes, so word-aligned +addresses are always a multiple of 4, hence end in 00, leaving the last 2 bits +available; while on a 64-bit architecture, a word is 64 bits word = 8 bytes, so +word-aligned addresses end in 000, leaving the last 3 bits available. + +The CPython GC makes use of two fat pointers: + +* Between collections, the ``_gc_prev``` field is used as the "previous" + pointer to maintain the doubly linked list but the lowest two bits of are used + to keep some flags like `PREV_MASK_COLLECTING`. During collections ``_gc_prev`` + is temporary used for storing the temporary copy of the reference count + (``gc_refs``) , and the GC linked list becomes a singly linked list until + ``_gc_prev`` is restored. + +* The ``_gc_next`` field is used as the "next" pointer to maintain the doubly + linked list but during its lowest bit is used to keep the + ``NEXT_MASK_UNREACHABLE`` flag that indicates if an object is tentatively + unreachable during the cycle detection algorithm. + +Optimization: delay tracking containers +--------------------------------------- + +Certain types of container cannot participate in a reference cycle, and so do +not need to be tracked by the garbage collector. Untracking these objects +reduces the cost of garbage collections. However, determining which objects may +be untracked is not free, and the costs must be weighed against the benefits +for garbage collection. There are two possible strategies for when to untrack +a container: + +1. When the container is created. +2. When the container is examined by the garbage collector. + +As a general rule, instances of atomic types aren't tracked and instances of +non-atomic types (containers, user-defined objects...) are. However, some +type-specific optimizations can be present in order to suppress the garbage +collector footprint of simple instances. Some examples of native types that +benefit from delayed tracking: + +* Tuples containing only immutable objects (integers, strings etc, + and recursively, tuples of immutable objects) do not need to be tracked. The + interpreter creates a large number of tuples, many of which will not survive + until garbage collection. It is therefore not worthwhile to untrack eligible + tuples at creation time. Instead, all tuples except the empty tuple are tracked + when created. During garbage collection it is determined whether any surviving + tuples can be untracked. A tuple can be untracked if all of its contents are + already not tracked. Tuples are examined for untracking in all garbage collection + cycles. It may take more than one cycle to untrack a tuple. + +* Dictionaries containing only immutable objects also do not need to be tracked. + Dictionaries are untracked when created. If a tracked item is inserted into a + dictionary (either as a key or value), the dictionary becomes tracked. During a + full garbage collection (all generations), the collector will untrack any dictionaries + whose contents are not tracked. + +The garbage collector module provides the python function is_tracked(obj), which returns +the current tracking status of the object. Subsequent garbage collections may change the +tracking status of the object. + +.. code-block:: python + + >>> gc.is_tracked(0) + False + >>> gc.is_tracked("a") + False + >>> gc.is_tracked([]) + True + >>> gc.is_tracked({}) + False + >>> gc.is_tracked({"a": 1}) + False + >>> gc.is_tracked({"a": []}) + True diff --git a/images/python-cyclic-gc-1-new-page.png b/images/python-cyclic-gc-1-new-page.png new file mode 100644 index 0000000000000000000000000000000000000000..2ddac50f4b5575888d8d19cd9b6f2e3863132776 GIT binary patch literal 4415 zcmZ`-cUV)~vfn9?Pz*(Sks?SF4DHY{v_R-ZsUjdq5iyj2R83TRQ>ubMBya=-k*@S2 z3Mdj0L~1}mA_z$D@Zve|-FwgdzW2whS$p=LHM3{dH*0Xq9R2J5_h(<=`5s@q#(=Bn-Jf z-S2tM!qOTyM=GQ_F{MtvGeD=0kzPv=j*Bz!5MIOG|JTF)DzhYW(esjJ z8f05Ovk=8t?*$Q51LL3qw;pW~)XD##-`3zt|49Z~^5l;x(40MO)Z%!tAv7mko!A0n zmG#6YS#c$AIH392w1k_$~Q!)rSP0RDWQ!sH=4h`Nb>*26BS5k7PBYnJF~ zbT4h4YeC+M?a%Iny|4wBj~)hRJUqkjTZ&4xuGN!2BhK(*%O4{t+91coptKo@;$BTd z8tby(2ExM#R#22;W_ASN3no4_?B;Et>+u|1uR`eH#BUM!{+2a9O!2SDGaL z+x{Le<&h{En)7esAe?j22{GF+P z8UCG4tAF_PcUmjyag17G>?K%Ygo#Mdu;kiXRH~!owO1KUka-VjL$^T~iu-|^u@FXV z*t)x+@Ra1!*_1hMv}rjUt#m~jP)}G?nl@feRtmFl^L;= z$w$i9gtOE2uM7y16@5beh=!|xal#WO^1>CK3c3wRQPDQmnCt>Q7sNH$b#l;) zb)+(&o){N*nz=H6uIKq_vQ?(XgJ1~pri#cb=ka%tsO0yaD6c9ES|SKSB*m4xoHb>m z$_9BqP2~1b(B(d}1jpSZTH$@p%HrR&7YP!UPvAn@1#CgiS#>w|7F~|cuC?9XU99#x z$Ghr;dli0#a>^~k9QBm<6${xGj~lH05#tcZbhGmK?PS*qcRgf-H#pL+UL?|%V(#{G z2t*ZTug-}8@;so^DG_%kkv4?~h09NmYa(`53@k#7>4#RGi|03&*kkMp#~^v#%ugi8 zVN|f|WyD79+FkpDyJhG7UTz1r=5@BeD6gcRA-165an3iQxUae=c-m{u@q88<9TY1d zVULl6)!4o>M8jSY;fHcTREPMUfU3z8isR+49X@S`b$x;2*nBH&sw~D3yFabBtx!&T z+E|`(IYttvd@BxUOacm%qfsZRSw2l^cFZ~a`1)#@SN?VQEc|w^)J*8LlYu5Lv^VlP zD=zu+Q#4av36X_j0`^}w(C@vtH|w!vEkl%u@A=ABGz3vElJN@3$n|>t$FF`S#v{ia z(KvSA2ym1#Gm*+KapVLNWJVh6T%OAQw)M!2x74+-Nzuug{(c4)v7ziD;r_=Ur9o2p zZ93zl?Pq4T;F-FG znp{`A*bDVz|?V`}Lei!y3+2TSH}ph&!A zqox-2Aik@nulVg@6stEGr=~R$?}brw=3a8WNqbPFO#abgyQu-3IQMDXl5LMrx2?>w z)?K8J3+ExnBJ%-ipr2LO2uOYI@cit($04`Xa2K(v*0-i}Pt7t3V{@BE7$I|S@&<2& zd3&{qlKEddrbr;%za?J$9saHoq;8Wr-4UulLT9nhTbY z9SN)Jx0p)gW9q~zR1dP;uG321FcX!EgNsl#2wYVGTmi;w*G;8S@_jPOULQ%ByN90T zt9Rem1pENF0O;*+?-P1v^bW21RVGh8NssFpsG8X7E>Aj{EEzI%ZF}l-r}W7r-L!yw z%9fH7%f{C(EZ-YX;q zyHp$UHq+e_sIbgZM)L_y^y8AW%r(cWs&mP= zVDI9!p>wm5%dMV>s5WdZ`n@cLWuBbf>QJh4_KdtaFUl7?q=9jqn8{tN$VwNCSn8Ln zltHvz0N@b_Ua7XLt zzuuy7V$k}qZN^HneP)UTI&xm_ht zoVnB^x#18saQk(B+R-HN+#ek^h&34t(t)<>88N$x}nsZE#MGvQXeL9`VO$Vv~|fgxZUxG(+O(b+i^Jc6irFph_OD& zd5tV7qf*%xa&Q@f(S5)cdK_qf5sA_H*fm-&^kEp}TZ~mE+U?bY8x46b;`4`*QnFEw zgo?@Lqp?+|#0Ke!Pi@Ot(`BCypCHD?uhaYHVYUA6^WbU$H4*;p$GIae8b=*K&ArG* zzuLN{vNw|*EfNDmmePc}z6MdUOq3I7@^q51Pg9KnWuP|fuI`fmx^2fv>X|_XHTKLlue~_{KBZ$4PXp$#_#s+x*UPa zwMM%4Ar|Wf!v}}16|3E($eVS;^&qlVXGwajZC^~edaw{D+i3G!azPsw71~7cTNguU-QSiIGhDAhI#41P8wN=Z;8Jr4N3QG@%5e zw&Ao_myO|hIm%$EP}6Zdnq#vbKKmlFw;r|_9Vx^~+XpqF^JuLQ&{(2<$>%zO6NMT{P!G%_z@~aqz z3x2?YHLXaGd`21UD+LSO~EC;)9T#smRQaXX?+8EXbl_n_`chHawo* zh`N#(Wmj6bLNPy8zn9OV?$`Otm}3mGgnKGwXcTtr>t8!<{yDh&!1A~;V(01A0b6ZS z%8L5%dPPL@JPl8E6>p&odBe@s0;PdMm+Q7y>9pQ#l!xNmCyxQ$zj_0}aj;22_vQ!)t#z_Gs7bqTm{2b&`T*6Y zNj@A55xJ+s;$|r~0#kRv9SD;v;Tw^yLI!$7$|sdlZE_QTej6*58y4Ocn~ZSoGh93r zxI~>_MtOq!+0=_w1qQU8o=a`$(Hi2Z%9yCq*MfwjS@FiVqN7E$2FY>(vb2pl79&H$ zRp^$04WfI8rg1q-ZEN*_5aE^^G0%quH=3@ggc+|Z|0+&)VWzSdcZrU^T+!6V+<@Y4 ziRaf)y;OshyGpX>?w`s_cZ6uQ+R7Tf9P4S>FMY;~rXy}EUzBnDu7vDU$vR2C;%eK literal 0 HcmV?d00001 diff --git a/images/python-cyclic-gc-2-new-page.png b/images/python-cyclic-gc-2-new-page.png new file mode 100644 index 0000000000000000000000000000000000000000..159aeeb05024a3247381e1c3aec2b986035b5e44 GIT binary patch literal 4337 zcmai2c|26z|GzV1UlU^N*_SX`LzplkQPwOGiO9~_w~?}rl$3;wY$aRPjEpP|m2?rJ zmR-!@|}8q%k%v4d;R`+pL1U4-uLI6d*1h+^LgL&^H!%h*@V~t0N^w?GqC{x zC`fM)v(O8`)bGhCG@A?yWf?=9IGv3e#VmmljFh{G#RiPKn)u5jej&5;;o z8)=E~t&T2Aeq=+&9f}n5;8Aw~cqzZYcC67;rB*^zaJuDiizk4>< zkypd{HI5{l{iu!+E4ww&`yu@rH~L6v^a0C;#QkF4Q8Ax;&#ka-P(%4D9neZqraj+Q zW0Mc($GG5_SwfG4WBmty56JNFpLpyhNPcn$X?uy;M@}@U!ax)}3Z!MI`Sv4OtRaBY zOKGTOycVtl^4GCKM=8d^Gzh@KG{)<**a#`QVs!s_I|OBVLg&h$_bYccI@^USK;QI@ zd{wgimy)Z7W=4co$(u;qI_4%JwEkb%D*3{KgJxcywn!H<+|zNG?mKzNDcLe!&^$%3 zbh=Gh8vM8f$r==tNaG%SfCeTlAm_1i;K!>l-8AMisw(HkL@@@pmpSR<2!nOo1qx70 z+&{qM^LrWpi3}vK&;R20O0#ItLBWci>v~K$1fHzCy<>yikak+MfMlIAIU;)cV0?G3 zGt+ILk?>Rwl2sis&_T?*$^N%$e`5WP_upD2$%1@;@LN!8 zhKD=e$kgd(u-nvjv3b;Ra-v<;yJVMx>~rI>Lw_%G@U(d2njA8A!S?RSB!#}b^UJRd zf3|QVVi~X-D9gN{&PI`YNZWBW@4cPOY(y*vc0X#E$`9_Wh1x+aUHzim88=Qp&}Sl2 zsHhN5G=1r+%8V%x+}yN#n$&;ILl*qE)_-*8CC@|kpH^L?Gx7j@@#hD*bnzkNP^t5D zrKjPOR`iJ*Ay4PAXxQ_Ut*4u9$6?M@-@vJmd`QUlH1gbR(e^NlHgt+iK;Qd}fq z9}=u|!6ri)!vvINzVwvCZEWfy7Gt4Pk&OLTA3l^+Go!@{!%P(iz`jB$p43%N^n>FA z0p31Wb)V-AtuRsw!*(S&Ng`x_Ql6gTSc&IhlM67i&OF(?C$cV7;%f>r`!>`?>f3`b zE-`KkT+nSNPeVD1J{?tNo-}`|-Vmup+Y2PrbjxRUe|lgn>{p|& zdgvus6?CDdb7E-AtLGC7dU%jaxk}`<9G_<3Bq#X+4=JZXq#$YYTihij><63F6`3xJ z$Tq`YvN%)-K0-hC2ieB5KHLA+_WO=OL*#H1yA+gUvcE|XQvEl7bHk2Wz#AGf$lz~X(ebpht!FA-G%PP zXCf`aP1-Z;K-S5tVb&c{S?^!z?r^DH3@9BTY72*W&Of>rD$?Vqb~mnE7Y)PlTw%bi zYKc7K80#80cq7o(N}R8R#B*s{t`>R>~q{WFM91BD}6(_0CSe9 zQY!U@qJi_r==;?s#;T%t05$IUI=AQ6aY}5Nh}UrV=LFXuK^yXp+}rmnx81a-S7GP^ zgqDL&sL4v)-3HgiExZE9?8+_wlW>7ZSBVJAxI4AK6qI({mfPW2I{^UCn0ytq6HI#AuHG>9V2f&L`;r@dQn?sE zpXpv^A)kEDU{Gp7q)<4^IxS7F0U_gRF`V(P4o>vO$t`@Yc03h4TiEXWv7X6ih_9Tz zNcx2_Z*qZde$IB;;dAb7=3durk>_Yxo#eXRr95HM+4cDuvH3cO}C9oW9>YXC# zlHO`24BX0H2FbT_+41q4NYjWFUs~9}tGafYN`A@wh(-!}Tp$?|dMwaQZp}0&KVGF; z3=%W*?E9dEd9opOC^T^YgH!u}UOn+yO>zQTQ`)Z1ewa4t zH8Qh5D5h^~+n5hSUT9A+#N}N~7)8{##WyVpm5XqHNg}`|#ShO!oArG8xT1_1{Gj@R z^S<`TX3)N)9&H+kNnTw`tYOZ$Avry@O!p@?O64_ie#YxpdJV)(Thh8M9!N{R zV_33YhC8ZART#JVcFW?#DN`Jja#h*%^1CNLVI^cNkU8o(DWTW)oln~w2zkV{G)zU{ zGI^CMUdn`Be*O5wLN7nZuu~+|;@oJ;zH8+=vqE3z9;L1*ExpI9}&va*iwfp#1wAGRCS@cn6sruI%E+egyACipOqeFPO&jPVT)Wk{Our zU-~AT_fmgT~-chP>VgCCMMsy~E3Yp9#qXJ&cnyM4) z{yYkENC_`0h-jr^dW6WBX!boNI_7+{L%2Y_1I<|{(OQpmAjne|-gqqm^6U zN8qP3HeHJOOy?Wr-}?Lr*5R4&a=e5Kb*4si>|Zvf`DdwKUr>Mamy2DX!O(n_59y}a1%DtW@5GEg!`_iv3d{*&|RO_WoiAM3A(Rjn~R#&T3 zed;2t2z3-A%Bqo6%hdERe9PCxn>>9);S?MF+!)1J5zifewCZ~?M9-Cn2TnhJ&>e96 z!l3!x6s#t4UKdu-JG=$MDc!td$mLlqqB~f{g+6n_v8dR7`}xH7t8(Q+1!#OC8@d@` zN8GW$1sfhsidbpt@@fUE)3({}qCz9RGgiOHGNQ{74FU?INh&SYZ$q?>Tvu|<;(?4EBJIyTKq;GWM4jJ}#1Lo-yg*&Xha^Jsk} zfsRiqQ4;lF4=ibuB)zj+%?0wke>0yp1?z@-=j{YB|EVMiN3Y*Bg^;*{#~M zkQBDPRjUD;q{3O09OCPSf&|_LvQQ!!V;Y8a=AAXYx&-IKfH|-mX}qZyd0^_T7)r7z8UaBAyYKJ-@ss0Wt`kSe~+jlY17oB%%{>_Vx!- z2O~jt-yO#?+W6k?vUw~6=#1mU6U`?)Wki`RiF-+pI=u+?qk&FJ-kT+#)vlvO?V^e}-zAd-rILnC+_S|gFq8e+tJ&n|HZU>X)^NSG6c*b}kTvC~@ zxOxV%6AHHH(_PfR%<8}W?B?wXx`o+gAnO&SGd#rK@4Wc2$xPgXD(E>AAhq$V=vbBl z!$W6vO@sO?e@P99c~-}|P=Yt9r)(WWo1%pV$T7c literal 0 HcmV?d00001 diff --git a/images/python-cyclic-gc-3-new-page.png b/images/python-cyclic-gc-3-new-page.png new file mode 100644 index 0000000000000000000000000000000000000000..29fab0498e5b106f813e08aa2799602269cd74d1 GIT binary patch literal 4876 zcma)AcT`i$w?1hgARUw{9f{J5KxmO50qIqw$(14qh(JsbDTxXJq)U?;L_n&b6afQr z0YPa&q)JgiiPEkRq)6li@9)05)_d!%_5L{Lo3-cLduH~WS!ebaZ*7TYXBA=v0D#@p z#K;x^z!XON?GeTcF!q0ZnE@=I1^lB3#Xj=55SyDZAlKN{sm-} zJ!`}HQU+LYVz}2x);DtOQ-JVC`l|Pn?_Z53u?fOgU7U=cA}lNEgk!9c(^U93WCfNv zx5BQ@1(V>Y2c^0vdAZ%QVT=KDCOE<=jBo@;lEdFo`Ft5eAy5C)cWR6mQXmbdR6c>I z-?t+no<$G6!9ftS7>WN=Nm21(6Aj*m6v zG2hDuqamMe-%5tDQD8b8dg`&H6ie9asgw+x+5!_BlEW}_gTQl;hm;}s%kGopIO?=# zF@vj5VuK~bsY{3vhSbY47T^5G_%RVAx!d<@5$0>fkJNDgJ$jn1HS;Ici|%-LlOM*v zEyFvn|JUOFULzy9Bz|8;a*bH>u~!>IB84v;#f9V zNQ%3rHTl<*-On9WbJe)WsuNK4gh_|ykptC^{lKxjz{d#lN+Lk+sa14+%!tlN>;K5^ z;nw{-kAJpx{wVd-3!z6;5{PvYSHck}5l`TR43srigzkDc_A5_*TL^<{I$?Oy`Zcr&|kzYj#A( zNYIw8U}&K&?%gcy@XecV)vS0>1ox_js6DRptTcsb*YtCYN%1pw{{I8*!c-w9pD#8H zZU>Z-R?SFdv?z&q7R3SEV*}~cgJ(h1l|Fvh9te3{#}ZyTGq=8WUF4$}kqR zh=Ll1VI$~A(QqJyoiUcrzmk6tg{p#$F+T=e;)eff_YqI;$IA7vaTK85e-adGoA);q^dHQ|Y~0^jk8QZk|`$x&Eo zg;gSU-CnNQsgP?f5(*4XyO^(+IBw&Xcy7A_MJ7DIdKyk?0|t-Q%|5NSb#zpu(|q_f?BK_{umQ}r1r-xiD$by9ao*7+s?1p9L5{XI8wJk zOP(zk%r;AGf==55*XTzT$mwePpJkTOQ{)@5P5DQ5y|{`->u+f+*v5}!q3h&TMwvR) z#`HxMv#+ExIQ+g6XmjBv53%vig52PWZr^_5pYpnQpZ2V$+m6n zVCrIA-#H9c9(XxjfJ(N?m5|j3Q5;kKz};tlv_Y8IiUW3%?&;xw`Qm+gzfr9qDh>CaRVzuCj?JGXjg)+ zQE?gCedj;DHk)Rr-kD7^PHS7ypFq%v9LG>1A_k`xwbVTaGqyDF#*gqQg6tG0wWpT+ zel1cbrUHwVf8&v)#0lA>Br37dME9IqL>|Ls{4`P9%O7~vQG}={QV$pR)I;PtJBA~j zpZW!vC-MAxE3_854r-8*yxQFQ2&ps*&GfRa29wn&ycWo(X9?GIQKP{5&oGgnJ@^y4 zfKe4vlPQ-aqLp6E@S^4Sd=w6doG@+5q*qq3P&6lNZdHdEAzfcWL~a#^%`;I82YoME zX3xmYT)qM4xwLv6F}vjzGAK86+o;QH(wGcze#ZR*&VYTW=WT*Wm z*`k=Y-13EYW$S>D{f(Xbmts4jb{EmPzg#2eKRaBe^qBW}+bptH$M1^7Q;9|67%twV z5o>ni4J10fubgm7uK300>Hwy2a?wxq5<$w>RFv8r0tG37sK%6b@p`|r?*^0?xT}w* z;;kl4%3QyW%{(_ifuaI^a7BXQai3Hu7&+#ha}&*Jomd6uAW0LzX0b@ag32gf+Aam`T{eZkZZ zK_R7gh{t&V>E8K0Eb__s&?27(*UwH{{b4M=9ne2RtrqhA*A#%ICCvzqEX~o~lb0N& z+KK?Wbn9eiS9A^YsU9PixDNma6xU-ec^`^z&VVz1^x1-{li%RKU4eP8-H z?~rbODeZoA-Qx3U<%mA}cCL{9eyf83uM6Xofn2BymRjV*ueN>PyL$aatIe}?pK4?o zzf4=e*-43IAK!@}r#^AioBgn)I2a~GT4J7-w1a)(;hs<)jFtydQ5LtQH}m|@6USm( zILul(Xelj*Gf$FM$~3Spx_q|AD)tuudikP;sL_K_$x+ueY(gBj%8;8PTrBJOr<*at z?ao`6dqua@OZ@6zZCoq^gth$5}!nij!AJ@`(Hq0S#r z$Ic#l9XD9Vm}D7hpWdi_ZEAr}*5XqOgZH*agrrwoPWOhjs!-+81RK(eso{s4_GS;#ngx?sTjvB1d}_M zpnJf?>gBqgi&55BzMTCQg&ba9X?C;}LVGB(_DAhT`}myjEa>M`**rqpuob*2G}YT` zFRr7I62VF7?@j6x$vQy$_&fj-LvgZY(m2M*JG@{9mVjMx4PznqWshGgr&~jnDNv6``*I|Iul@n~?MxN9aV^=i(&%0_wtzmuYk@qT`;WTj;JF}j&zcd? zy{d`QNi~<9VW)Y;N>SqY=0F90Je5FRPopTok}nzAC8qWW1kJ5FrqtXsP8_5l&wFd@ zg)&nN`3ca)w3PTjbHNxxi_n_w&5^4B?O2=PPO9Q4udl?Bz&2nblw?PO4x8}VvKoEn z3h{dB!p9mU4ifJl2tWWyo%T)y>+`SgN3v-HPAMjWU)>6HdI$Ca(o+BOZaeW$0>Lod zikCVGA{ceIS|Wq$PctDWIeq!WlbUN$9n3-jRez=TF7kYlb9_X~cOa9FJM=`Q#+KZ3J$p-~2#B_2$*`y`1PX&zkmIi}m3f*%3A`Fr@6Ot?Hy51)D z#)zl_X72l|$P}g=6A$pW?|mQgZF2~Thm*~N7tH$29jAK5WvS|GYMrs})ctM?z8fnO zYuRs;e{sAtBmH|Qy$X>Y#7^@7c_o-Y@iq5RD_O}4PJQJWlIIf3fSBy=gI<8VwXruG zfc``^QO@~5g~|&gN5y+DH9EvOoqh|YT3{8((RK$nd>69SlA5)e(gl00a|yLs9Go$Q z8y}a@S+4q4{8_GbUU^FIExhiWw7Wu5BkS=wyKqXNGj>fo5niY}0zxOWP#64j*k@Z% zd6UO^-g&HsA-J!AmoLG z>sl(Gvt8hF!0Y?MW}B&y`<92S&obG&NeIW8G(aJXOjezuXW`(E(Olo_b(d zF$;pZjt-W#q&!IADK;bCeHX~!nMWD64y0`uVwCzK=P|OmuvqxgiwfkE2sshViRpSJxM!2BN33vB1Ghrxmfy7kFsq{_ZVu*~ z*}I?DtL6eOtPrak+Ds=_=W_c0Xc~vltxR?_e*J*X`4MEi?8;$^RIH-6O9VWOyD(8B zK@4DSoq99iE;d#vXn|v8e>TPSEpQ~r=vNz9FZJS(DV%Z(yc*bZ@lB*!#}&!F?`{P5 z%V=2m6_R(m&`s*c?X)<#XgiXDFbvjeKpC?vm30ZQ_h~QbUubCoLb^=80+QJ#Mt5DJsz~qyZn7m+C#ZWkHH&RY`Hg2ikKXapFy2~KYn6zhDBSgSmsi22X& zG1HeqSiPTROyGM{Y`T2Z6N^Ke1@``O^)hY5y;)ZDd{-x!bU|BW%lY7dHT;chvQpWX zWy(2vEi#&gG!j&9&8R)azzutD|JJKxd548NL%2AEs0;sAvkv~Gg1yMbPsd0lh#$wM zbO>MWjj6s{zcO^*e?E7seEpSJ=SO|gLJy~PN5=$4@U30Ty}znFqaPB=e&S;#)XIwD z=^lbK1_hOXduZ1m6yPup(-*PFjO9{;S{?(mC5IWarm}j=>y)|Xr zBRmxB_gC4T?QcQhw_5X$ymF`tRQxRt90EgZAV2>l*7sDPT7U=so}n|zV9q&Vy#By= zBRoUA833p%tEej|YbvNJ+p8)gG?Wo4XHF|CBb1ei4-yVltA8B=gS`BFZvNj6Ke|;L Q7!H8xIZLBjLyz141yIB0(EtDd literal 0 HcmV?d00001 diff --git a/images/python-cyclic-gc-4-new-page.png b/images/python-cyclic-gc-4-new-page.png new file mode 100644 index 0000000000000000000000000000000000000000..51a2b1065ea64eaf009a96ce6fecd3132424ea0f GIT binary patch literal 4863 zcma)AXIN8Nw?0WE)W8S|NY{Xl^r8ZRK$LC)DN+R-YA{j_C}1ENg@{5B5k^!X6k+HJ z0Y+jdA{`}y6oCY000}}2NNACA<9Fw4KkoC~ALqRL+2^di)_T|8d+oJTF526`MC3#O z0D#%qS~>v$n9gqx3h`fnHRg9eJ_twHdR*tD?LQvt9*h~k0rop*V+rgL)B9ThK=QPm z82#6q+FnvrGKR}ilyOU0y zY+V+9bjatnv44;l36k}t$h^E*-KqRb^RdToT1q``4)%zjQR5@b*weiPHH>@Y2F`{^8;(ZxEGL1ZeRPC@)!%C-*AK#zFskKZ|H-ou3>c?44}EH&+LWMIAxb5(i}$3dChM z?u`r63*AJGT^zoUyxq_U-b}OlaGjeKUYaILw>LQa;PyuE-~eVFB5Cj|pcSV_hC9{R zrZX|;%nd_<=}yW;yfjjo{*e~b&$5dKBMQ>>c#A`4_{#5~Hbhbm1z|;gfsPkE(Cp=_ zk%1sAGMBPQpRPc8s9I+*h1G#i=+wBs56x%t-%kFgum5!NKYc~<1%B}QzjWli$aH>T z=_`abF1^Toj?c~vu<{jUk%78dBc8c~a^hud4DDfEv(XjQ_CW@946;K;FMHEb+L=q}txm9*+-at9*R-lvs|Z zDHpH4%Qk@THrX_p3Q|K`Wiyuc^EP;cUcKBv zYl`&7?`qmp-_L}Sch=%nQNFutI?RL5YVxx*P>D)qbAkjCm>VU(6tzwRaOtku0sHy! z{xKK+KY#%gOV>#JS+b*s*S`1zqw(V9w%vzOi3R&TMd>iSwxY`c@&?e^fknEfqf{;LTXW)tP| zTBuxr1;d+}9+pp~U+>quWkWXE_t46k#U`xF0WJ`-gq{tLgi05AI5#H1N_{_WKPOGv z695vnvpvMKRZtu-_M(EO#nhV>Bwi=Nv~sq(#DvU|YoY^v+Kr~?mI{`z%PvF3x} zV^$V9tY3NK>V`e}l7l6Js(>bHv;C%}6PkVeDC+c#;?esytg)++)_>&HGAkJIgqU&_ zr=$+l{K@F}3b{Yh@@ukWs}5P1$X2{COfY{af$)mG3r?l;$jSzYV zn2x(ZHa|)CcJTiwVQj{-e!D4#k{>kEABM^$L89KGNhuW<@VjL1!Bi7~ou%BB&>>nu zg;88#@?CTe)e+lu3_!GbrQTY-SPf9r;C^@7j)C0ZWOH<*d!0?TMf}61%YkHbUApHV zRfL4mG@svsfhUU7BJtcbyjgr;-En$pOy;RT2{H|Crzj7?t!Q8SV^_cUF#WorB!Ux( zeE0H#5+eBFF9C4!fKt;_Fz%XR#n(2ko8j4z+>OpzYm_pK(xB@*y>+_ioihqSeETWDiZgv{QiSDCMe+=L>a_6u_$7ps2}+1dF<>5_VTb)u6hXuVi{iu^ z@$wKH+Rc3YtADtNa~PD{);cJU&7gvE?bBOzgt-u&AI4bLUtuwg4ueg$$b-475Tw1p zEo)F5hJ1aw2=G>9y-yYLrpai4xo60wB3ORcA@sq!mbn6~P=a=EaLIcidKRIf$=$cf z9?z6!k?r)|yq>9|@~gk-_$CU`y@1>YTkFMHCInK#bB(i=V6GxN8&<2Jcl`ve48UR_ z8yxF*vg7D^lQ|^`WBq@yo(k#WQXX`KYokM;nWqpQzol1LxDrL5Ae*1}@ zqY5$OEo>=k_I%yWnM}GLr?Nj{QfE%{vcp|0~DGD^%qpU zm(TGTya+dzeXL7Eek0p_WT8R*_&WrJchMD-A>YI5V}@iKf$`&w0bTcu!}7=Z3u5PT zoiPH-kU#zA2=Bn%n}>t2 zb{fGm9K1NuZ$a>!#2)rZapaw2j4r6j$>vwzqKB_;(2DXc*iLaBx9{v16g6KPz~i+8CD(>WRG04|)W1HoEBY9_yaJ=C<~f~q4z|GK6d zk48(2LcGS>(ee6Y`9W=riL73?GbCl|a(?~zn&@~{1~CS*0G3HsV9*C!2f}Y8%k{b@ zBqWe6RC@I9obQ}zW49Y2d-d%~3M|cc9F#r$o&~pYL|)#4QWPbp&bumw7;2*%JBxpU zgy?Dmy!nJ{*RH&dG9j|JUklUKYQ_GvCs%x_*<1eFyFSI+3U1k5-Mcm#CP1!I^Y~ch zJ;nNL$*tMjY_x{@Py?j{HO#~JQ5sI*abR-G*i~xC=tdpK{pHQ%Fc;)VXsRr4ZvKD> zy$>%Q!Sp%+it{2D6<+EeRzD_8uZx3W`wZ=c=+Sul7ZFR2`&hlG%p`?EfEFo+RL{K` zTb&GeLs&~R&S5?72+AOHoAZYx8dDW*rhNAESQ)gT(OESQIswZO%T~?2pgO}9FKZn6 zo)KUks!ZEllSXX>Kd*6~?+ZdXloox$9;0U_{+b*78_s;ITq4O-L3J!sb6r%!QuVq1KM1U=m5FS>Wb9}noiWkq~c-baijNv(*p@K z7=5{Gq1vJpJE__C;o$5qvFof0hD$5WS9%Y4SsQU+2Noh#k1H^jjDKgo=|3o?V}t`( zX-$q2Xw0Mfk@8_w{6i39YwsH&3d}+uA7mO~0vNT&v6Qh^Ky8fizv(71LMD{S@OO@jO$IB3FA`v@1IO znTTKbi8B?~sOY9Hg3_fMM^gTzoX81y!*aB(_#A>Z_%ZYsQdSmmR2d(mM-pjQFt`&^ zQ2DaBeW7B+6Z4ZDTlRr%MdhXJa0^m|GJQW>O0(*CumHpq>O{Rf+_<8JWB*JO9}bzOTiQ}Y7t=6%Ev4DBm(@i&h!DtSmhr3Owd;4 z%_iao)5Ov`wK$3&g&Qw|QTeCTs|4c!yU;RTU$}HhJ0|V^UG#2RXsyC?-HjXb1p{`4 z!yR7M06wM89~b1fGrKvY5#!FGjl%F@V`)#|6#W2K%#F&{Oyry9>n%PpM+6zKD(%L|6(d26|LCG|-`)5adrs-6C+dNE&?gAwXDFrvLfw!J?xy+E7}mlV~c zk`%TDKFfCBiga>3^Gy8FCn&3sVDvcN7bnsE>~e~29F{nJQmNstLW$PVPT8!PyBV5D ze(p5<(wRpb&A%XGvl;s{*W<>}960{>W=N+LJ;77G5LUI`ET~^)mP#i2wurLC6f0{L z=+q%&xRR;~a??UgQ5~qm7#cQr0{_V{hbH(7{_#~qYmlI2RdpVzjY3Z59;z@Er zbb9?zjC^#_`k!mz_K&?pPMFNzbeG$m03Bpxe8R9h>okcC#Q0~LS1Qk~4(`Yf#Mn({ z%yqv%>fduEt4{{?+vy5zy}YXP!56)mOC9&tEoXP?*-kZC^07vdI!bm=z7-H@_0DY- zcL=HCB?I*qB>u{XMp8J}d^xFEf)pjzU+ojQ4Z{<(7I*mmvV2+tL*~}ng=WUF?+A>s z0yo6WB;R^l0VQZ;1n;vk6pW+1II(h)@Oj!l&Yrvgd!(t%P(nleLq6UIGn#*#_i_T? zWjf%iv*v7>uBNYj@OgF~uEuuwYBBEJ4-*!rJn_{+6gOO|UJxj5=7EKLFcdDjU6D@D zl3*Sn%YYS60GY0U;`o@?iEmO3@jCS5o%L|;i`xWcfYCkhdN8digsRYY&s|5Wr3JHx zSImFGAc`|DL$IA|5YmL~rZ_7ofy5e9Kr634Taxip*QFOxlxd59fNbzuO z+f#zHRHdas%p)1B2%{38=M)X||NouRD08~fs7FgmDk5z2UoYP`HR`qa3;?wz!G=EmDUuNZx%pi$CZIprNv1WZM zCJrK@$M^$aB^X5e!=r+6k7v5MIKuk}6h;GTad*3@eCGDXuFmH0_iY#aFe4>U%`A{$0@ko)B3wL z`e#ayX0qka>ACrcM#OG}z3vxcpx@0mym2_!vc=gPt&llWuU50>WOKW@(d!`95t-~- zyC8I+rUK`*P2O#)ju=CxP=0qxM|x%Z$lPOR5FjL6g*{N)SNp0ANMvs zUV9RccxLc;?>f@1$~0rl&0%X|UA&dT1h`plJI{|Ce<0u%rM literal 0 HcmV?d00001 diff --git a/images/python-cyclic-gc-5-new-page.png b/images/python-cyclic-gc-5-new-page.png new file mode 100644 index 0000000000000000000000000000000000000000..fe67a6896fe4b07c03291ca8b21781fde8785757 GIT binary patch literal 5712 zcmZ{oc|25W{Ksd(U>H&MeTi&kD-6RRYgw`{4T`ZO6oaB+CgB#@ibBH?ne3G1rjSZx zJ@y(GNf>)HBujpC@9*CG+g`sv&iOp&`99z0oaghL=e(ZtzHMh?&dn*p34uVkEiF#l zLm)7aSsr9(o*-sHPcJfsUr3MiX71YkS z46$;&&OfOq;KkV&VAeD?n6jEu#b}*Z-deqM^y;-f+eey7Npi>dh~_sRV&$;Or(ty# ze+x-XsESiz@uTdi-t6t>$w z?tqJflHqqz=fsXi>wp3X)|aM<0RvPBaoE)TL;G*cej%#ZWR?J^yBrn?$}oiwFoo4I zy9ZeRiDl4llWn9qz{hKd0HiE7*%L(=Sj$wrCK|Z07epI#mLCA%?Z=F3hX(xC6KHE1K*+6MMs3EA3G_fVf7p{ zopk@Yk-5&Z$QnEt$r3I1N%5~5!x`=WJ}!~~(=Vv|pTcyOB=FBD|Ic8VvHxrm^6`VB z>*F={09~pb0|!-A_;9C18CZZ;pi_TQx8`}uCsq83$p#z&*?isjtt#SQ%+;dOS2DFwE<5o-+B=AxsXVWaCp= z>uK#VnD~V+CSO=Mbe)xQgOOnL!)eyumioLPa)XO4*Ac+bNI0^4{AAaeT%wP*3ye}- zu%LvEOXTVjINl_9aZWghyCAVQ9Ky+Tg86L7f8;NFnA85JOaBU; z3P&d$GPYng-N!*D8fBhJF*BxE!KQM)Fm`Z?2Rv@YrHYL+$Bs2d)WaA^C{EQ;0PqJ^ zE~qmCol;&$OM%Xa&AH|HYAynoIbQdDM=wRGiIpCOaaFHzy;dTF?JvXqF(rm)X%ciz zgEvW~FY~8$IW&86O?U)tO3w-Tq^i_N2!#o8$t9#;n*$E5wE5IP{+%ZV(AxBoaIc&mHr&Em0-=x)oWFQeB*m-=|q-OvgM*1<0eb6?4 zo|Rro&R?#a!o&L4s&*?Zc0X^bD&|M-^Ohj}i)YTpY4%7q!E^6MNCKLquTHwJ z}eWu5d}US28D?~qLlHJ)ZF6RE|R!%`wp#f%FnG3piiS#QU56_MC1|Hs1uSEH`7QG~=s;SL4?8@2ZHAE9Y(MErPitJDJ_t(^F24pmdi^G*yy>7k1HP;Lg|#G zVmRiSK8T4nqks2UP3j@n9P8yy{;_pR5?NPY7rFXnZ}Mw!r#Y08E)d)+_Ye=^UyG;8;o%muPF(SRyRID8m(z8 z$%=%a6D$|Jc`{WN-=Ab=TF6BYj*Iv5@|h?aoC^P%ZYh6Q5<*BxDNStAkgiQ6%jsn; z43n`B?5w;uN}-rrR$(N6#|amW_a{5G)2(_Kz?|{Ot-H~veS;#3@d$0tIAkJ{~2Vz3@gi9(ZmU^{1ER9l+K3i-@yi%yla|zcZc~=i&FwoAB+n4 zw=6qr%{$K})dX_{S;s&rZ_Jc&!v|9c!fg-B)~5$kLG4x-Lbs0D(3ab}cMjH;3f10B zYMJN8IIQ3!w?hDA&&o(*a!JmKMX5D7t@LE~mf*qaiLPe_!R=ilJQ~L*`S|lu^z%92 z&4)G;UL7-{c%J|D>%;7EDio*hWlQ7%pXlDq%k8U?WNAtl+{DTS5yvlio_D8i@?z(0 zh&M{U;dnf0wV7WpioWyAnTONlGo2;^QLcDl?qzIpgZ$B;r{v}FR|kP_6axu$5P9wQ z`C>iR;W`CV^$90M&WaN%E`-g~O1`Iuz%Y$3>zs{~?BWx<^(jeT$yFM9PtFIP(Lcn# zR+z*hb#`jfC58ob-sM+z?s=86m5mVQKkK(Cg8QP+K~v^ue{ECAqpL3p4k$fex407? zF7=h=F!#pqdryz)32_M@M-=BM>9KItk-5T&$$H)SzZlP>^4{{udc^H@9C);>vKb?i zV%-($3Kx*T!Cv+Zf2PZ%kUv))BI>LEGtjJQ&Ut4(V@72b4u#h871(rK!uSlidkFxLm` zXMOZoW>>!>o^Sd_oMg%K+>Rf;1F4kVHvbL7KC=KPUEtYbh{jT4$y5p$cfA!Ss(cW3 z$`~6CcR;emUYMGkr@mp6K8NE(||!$-fpFm?WiRI|*PvIber#KT}&&dSB8 zNAulb_tW(4rOsBZnT4m|>KgWa&dqEjmi@X|UaX9TZ?_#R34@Vk*X$kgL~}inTBOx} ze=PO(-B?)DzNLk)F&Zi^0V8m>DL2Y|)p45L|9G(2grFL-%kgGMxKu+J9r+o&7XI0j zcY-}6sFosQliv)SRyST*#Dr_UMgkoT^3SV6cT_=_2q@$JVG*y;1GhsTphp2p{gxEq zlAV_QUBYwoGoyf(!CxEkQyD6L?qQQwZjlTqjj~V?dMaM-X@~E2MDkC)i_Ie+|4jw= zsHdrEfd{s77Qfb|m^gY=*?Pzc{r0yvu^a?n-J{Jb3Qty!H;Ohd8lfykfNMCRb4VFY z0qm~3?YP*D2Ur&*x05Zekd{_?0=bf*M(cA-&>aDrguolooRGx)7u3e(HxR60B!Zb6$wJIiCd_@55hyfgfL zdQ4m2RaHyuxwA;VmRh8`@wjgHg{c&ceh7i6iEo6`1zu#SIclVys@w-XRpfGZb72kv zp;e03rHPH>MSH-#X%VxIB`W4i>#Nw9)AqQ$C2qPdiZelwCt>Y+`jeG+8MLaGCcY^twGpkw`gmPh zEp7G%$wmp!+B_be#6zcgy&C%?E%w=K$%BPB)TN*O2kDaydcoBjW331M@wpRINIq1#+HfGkO z<_?VTl5I%ZMRm651X8Ox5j{m`D8nh-J?%%5Ap7pvdkM?;AQ(S38QtZ^=@*dwD?m8= z^kUl$FHXAi_VWm+ZNsSh)L27dYW?UvU+qzZLMX1H_=lXZs^wOslx5J%LmHOjVl{i) zXrQuwG(p2NUR>di4&%u`#;Z;^(ss2+C{*#^skD##>VjSof4q8~;(PpGfzqD0YB`F=+QB)vGaF0-SA9(v2m zD7)v%!zd0)X~7LYu?QzickxFz^{mHK4O-X&%1yp;=HD{l4fM9qP!_7@rE*x@id4Pr zi~S}No$u&z;|BF)LW@*PTqxH>%LjLPsqjbfoYKb##ZC%a7=~koIYM zmop%$v>&!H6j>ts>Fro#j5h2piQNFh+hXu41F>}BO1;FV0y1*BV+8w|&LD6jir zy*q@`a7%17vnRQ1U@I=&Qi)>W?v#xcrA~ZoyOiM%yzP@lu-zQ~IfD(-gHFUg!K=F% z=G4B$4$2xTfDWBTlW@W#^wjFh-*^+2-S65>^U?bpXLYksU(;ZeEF51c&PZ-IGU?cP z*hJAvNswxe(}6Tl-j-=WEfvohUnv}&Wv8gK4D$vIt?K}H%dTk4JC8ysU_W-Kmw)h4 zuTDO{+s_JYsKKjPmL}U$#KsX#-$7+(mM~BM;Sg7m%9PI1^+dhVqsL8BUqJ&N3<997 zy2<4P@O>ugy9e`?kjkLdd!;*#Ds^|K%T}?{itlqZDf)f$HMOl)nkTMwteuvk_!&)A zoi2|S4%^-*tvcO-L<$vtDzy?9O1BRKsp;_wnkDT9QI*yGY=kLPwPXW~8zNuZ?je0Y z5|S}gY@jM$kG^$CrIXZ^@=?Y+z6KIqg>dE2f;`oEzb-BJY^d_$<;ms-Vx`H&H{kK! z#Y5Rv%1V)4VfVY1hfwNT z<%(24Jfird_~p8}*k?aLcISR( zfzGM#J5P2=RWy~~xPcaud|-@NBe2KYrVfhtTwezL=fu_epP{7nBmfqLFxC36^5HbD9{^+0UkD5UyMIH=sPI4l(<6B?wq-}UIHOP zz%^XE3oF_^2M<+yRIVcz-16yE!62_qYSrLwN zA{pp^6;dgKOq`t_{xbXrhFmoz0-hro7$hD~PcqIdzPZ?A3cNfF&X|dNDPR2&$diC( z2jvi4a^)@v2u3{hhc%N8(CaRU#;*y}*Dq&1D|{Wd|0L&FU4NowN}Xq4nq06 zh0RUVXHjD^m8I-Nd4Ss49k~m)mT)cIE`IidI$@YSt2QR^A@(&gNXfW>gTCd41;HmdEn~#_NihACS>I&lmmz4Qi;myqczM4#*1lG3RG~EIHVNI_Y z_;0ig=AO_+DVApk+=!PYw~QeXDwlQ8!DFJ3uGtwTOM`>epL+F3@@EWRrRp*P)Q$jn z0+0Ha_56+h{u6%vONICVFL=NStYO~X7=$JXW!#ln`Zt>VTmFWl{Y3|ef%~pM z3N_a7Ojdi8VCqWny+FXDeZufefoQ0yYpJSft7Ulni@p(`R@Nn d2oCiP^t=B5CtNi{nllq1mZmnRtBk#G{s-d0T~Yu5 literal 0 HcmV?d00001 diff --git a/index.rst b/index.rst index f5feda0083..03dcbeb8b2 100644 --- a/index.rst +++ b/index.rst @@ -260,6 +260,7 @@ Additional Resources * :doc:`exploring` * :doc:`grammar` * :doc:`compiler` + * :doc:`garbage_collector` * Tool support * :doc:`gdb` * :doc:`clang` @@ -317,6 +318,7 @@ Full Table of Contents exploring grammar compiler + garbage_collector extensions coverity clang From 8aac4855004f1ceda3a9db1dd7cca67c6ad440e8 Mon Sep 17 00:00:00 2001 From: Pablo Galindo Date: Mon, 20 Jan 2020 21:06:03 +0000 Subject: [PATCH 02/28] Fix several typos --- garbage_collector.rst | 133 +++++++++++++++++++++++++++++++----------- 1 file changed, 98 insertions(+), 35 deletions(-) diff --git a/garbage_collector.rst b/garbage_collector.rst index ed5bdc3465..3bc3690fe3 100644 --- a/garbage_collector.rst +++ b/garbage_collector.rst @@ -36,10 +36,11 @@ handle reference cycles. For instance, consider this code .. code-block:: python - >>> container = [] - >>> container.append(container) - - >>> del container + >>> container = [] + >>> container.append(container) + >>> sys.getrefcount(container) + 3 + >>> del container In this example, ``container`` holds a reference to itself, so even when we remove our reference to it (the variable "container") the reference count never falls to 0 @@ -65,7 +66,7 @@ Normally the C structure supporting a regular Python object looks as follows: In order to support the garbage collector, the memory layout of objects is altered -to acomodate extra information **before** the normal layout: +to accommodate extra information **before** the normal layout: .. code-block:: none @@ -111,7 +112,7 @@ implemented in the ``gc`` module. The garbage collector **only focuses** on cleaning container objects (i.e. objects that can contain a reference to one or more objects). These can be arrays, dictionaries, lists, custom class instances, classes in extension modules, etc. One could think that -cycles are uncommon but the thuth is that many internal references needed by +cycles are uncommon but the truth is that many internal references needed by the interpreter create cycles everywhere. Some notable examples: * Exceptions contain traceback objects that contain a list of frames that @@ -127,36 +128,38 @@ be identified first. This is done in the `deduce_unreachable() >> import gc - class Link: - def __init__(self, next_link=None): - self.next_link = next_link + >>> class Link: + ... def __init__(self, next_link=None): + ... self.next_link = next_link - link_3 = Link() - link_2 = Link(link3) - link_1 = Link(link2) - link_3.next_link = link_1 + >>> link_3 = Link() + >>> link_2 = Link(link3) + >>> link_1 = Link(link2) + >>> link_3.next_link = link_1 - link_4 = Link() - link_4.next_link = link_4 + >>> link_4 = Link() + >>> link_4.next_link = link_4 - import gc - gc.collect() + >>> del link_4 + >>> gc.collect() + 2 When the GC starts, it has all the container objects it wants to scan -on a the first linked list. The objective is to move all the unreachable +on the first linked list. The objective is to move all the unreachable objects. As generally most objects turn out to be reachable, is much more efficient to move the unreachable as this involves fewer pointer updates. Every object that supports garbage collection will have a extra reference count field initialized to the reference count (``gc_ref`` in the figures) -of that object when the algorithm starts. This is because the algorith needs +of that object when the algorithm starts. This is because the algorithm needs to modify the reference count to do the computations and in this way the interpreter will not modify the real reference count field. @@ -165,7 +168,7 @@ interpreter will not modify the real reference count field. The GC then iterates over all containers in the first list and decrements by one the ``gc_ref`` field of any other object that container it is referencing. For doing this it makes use of the ``tp_traverse`` slot in the container class (implemented -using the C API or inherited by a superclass) to know what objects are refered by +using the C API or inherited by a superclass) to know what objects are referenced by each container. After all the objects have been scanned, only the objects that have references from outside the “objects to scan” list will have ``gc_ref > 0``. @@ -176,11 +179,11 @@ This is because another object that is reachable from the outside (``gc_refs > 0 can still have references to it. For instance, the ``link_2`` object in our example ended having ``gc_refs == 0`` but is referenced still by the ``link_1`` object that is reachable from the outside. To obtain the set of objects that are really -unreachanle, the garbage collector scans again the container objects using the -``tp_traverse`` slot with a diferent traverse function that marks objects with +unreachable, the garbage collector scans again the container objects using the +``tp_traverse`` slot with a different traverse function that marks objects with ``gc_refs == 0`` as "tentatively unreachable" and then moves them to the tentatively unreachable list. The following image depicts the state of the lists in a -moment when the GC processed the ``link 3`` and ``link 4`` objects but hasn’t +moment when the GC processed the ``link 3`` and ``link 4`` objects but has not processed ``link 1`` and ``link 2`` yet. .. figure:: images/python-cyclic-gc-3-new-page.png @@ -193,12 +196,12 @@ already in what will become the reachable list): When the GC encounters an object which is reachable (``gc_refs > 0``), it traverses its references using the ``tp_traverse`` slot to find all the objects that are -reachable from it, marking moving them to the end oflist of reachable objects (where +reachable from it, marking moving them to the end of the list of reachable objects (where they started originally) and setting its ``gc_refs`` field to 1. This is what happens to ``link 2`` and ``link 3`` below as they are reachable from ``link 1``. From the state in the previous image and after examining the objects referred to by ``link1`` the GC knows that ``link 3`` is reachable after all, so it is moved back to the -original list and its ``gc_refs`` field is set to one so if the GC vitis it again, it +original list and its ``gc_refs`` field is set to one so if the GC visits it again, it does not that is reachable. To avoid visiting a object twice, the GC marks all objects that are not visited yet with and once an object is processed is unmarked so the GC does not process it twice. @@ -212,24 +215,49 @@ process in really a breadth first search over the object graph. Once all the obj are scanned, the GC knows that all container objects in the tentatively unreachable list are really unreachable and can thus be garbage collected. +Why moving unreachable objects is better +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +It sounds logical to move the unreachable objects under the premise that most object +are usually reachable, until you think about it: the reason it pays isn't actually +obvious. + +Suppose we create objects A, B, C in that order. They appear in the young generation +in the same order. If B points to A, and C to B, and C is reachable from outside, +then the adjusted refcounts after the first step of the algorith runs will be 0, 0, +and 1 respectively because the only reachable object from the outside is C. + +When the next step of the algorithm finds A, A is moved to the unreachable list. The +same for B when it's first encountered. Then C is traversed, B is moved *back* to +the reachable list. B is eventually traversed, and then A is moved back to the reachable +list. + +So instead of not moving at all, the reachable objects B and A are moved twice each. +Why is this a win? A straightforward algorithm to move the reachable objects instead +would move A, B, and C once each. The key is that this dance leaves the objects in +order C, B, A - it's reversed from the original order. On all *subsequent* scans, +none of them will move. Since most objects aren't in cycles, this can save an +unbounded number of moves across an unbounded number of later collections. The only +time the cost can be higher is the first time the chain is scanned. + Destroying unreachable objects ------------------------------ Once the GC knows the list of unreachable objects, a very delicate process starts -with the objective of completely destroying these objects. Roughtly, the process +with the objective of completely destroying these objects. Roughly, the process follows these steps in order: 1. Handle and clean weak references (if any). If an object that is in the unreachable set is going to be destroyed and has weak references with callbacks, these - callbacks need to be honored. This proces is **very** delicate as any error can + callbacks need to be honored. This process is **very** delicate as any error can cause objects that will be in an inconsistent state to be resurrected or reached by some python functions invoked from the callbacks. To avoid this weak references that also are part of the unreachable set (the object and its weak reference are in a cycles that are unreachable) then the weak reference needs to be clean - inmediately and the callback must not be executed so it does not trigger later - when the ``tp_clear`` slot is called, causing havok. This is fine because both + immediately and the callback must not be executed so it does not trigger later + when the ``tp_clear`` slot is called, causing havoc. This is fine because both the object and the weakref are going away, so it's legitimate to pretend the - weakref is going away first so the callback is never executed. + weak reference is going away first so the callback is never executed. 2. If an object has legacy finalizers (``tp_del`` slot) move them to the ``gc.garbage`` list. @@ -240,7 +268,7 @@ follows these steps in order: finds the new subset of objects that are still unreachable by running the cycle detection algorithm again and continues with them. 5. Call the ``tp_clear`` slot of every object so all internal links are broken and - the reference counts fall to 0, triggering the destruction of all unreachabke + the reference counts fall to 0, triggering the destruction of all unreachable objects. Optimization: generations @@ -253,9 +281,9 @@ creation. This has proven to be very close to the reality of many Python program many temporary objects are created and destroyed very fast. The older an object is the less likely is to become unreachable. -To take advatange of this fact, all container objects are segregated across +To take advantage of this fact, all container objects are segregated across three spaces or "generations" (CPython currently uses 3 generations). Every new -object starts in the firstgeneration (generation 0). The previous algorithm is +object starts in the first generation (generation 0). The previous algorithm is executed only over the objects of a particular generation and if an object survives a collection of its generation it will be moved to the next one (generation 1), where it will it will be surveyed for collection less often. If @@ -279,6 +307,41 @@ The content of these generations can be examined using the ``gc.get_objects(generation=NUM)`` function and collections can be triggered specifically in a generation by calling ``gc.collect(generation=NUM)``. +.. code-block:: python + + >>> import gc + >>> class MyObj: + ... pass + ... + + # Move everything to the last generation so its easier to inspect + # the younger generations. + + >>> gc.collect() + 0 + + # Create a reference cycle + + >>> x = MyObj() + >>> x.self = x + + # Initially the object is in the younguest generation + + >>> gc.get_objects(generation=0) + [..., <__main__.MyObj object at 0x7fbcc12a3400>, ...] + + # After a collection of the younguest generation the object + # moves to the next generation + + >>> gc.collect(generation=0) + 0 + >>> gc.get_objects(generation=0) + [] + >>> gc.get_objects(generation=1) + [..., <__main__.MyObj object at 0x7fbcc12a3400>, ...] + + + Collecting the oldest generation ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ @@ -302,7 +365,7 @@ Optimization: reusing fields to save memory ------------------------------------------- In order to save memory, the two linked list pointers in every object with gc -support are reused for several pourposes. This is a common optimization known +support are reused for several purposes. This is a common optimization known as "fat pointers" or "tagged pointers": pointers that carry additional data, "folded" into the pointer, meaning stored inline in the data representing the address, taking advantage of certain properties of memory addressing. This is From 7776e140d6ad7574ea905f0c05d9d13ffab4fd19 Mon Sep 17 00:00:00 2001 From: Pablo Galindo Date: Mon, 20 Jan 2020 22:24:40 +0000 Subject: [PATCH 03/28] Update garbage_collector.rst Co-Authored-By: Tim Peters --- garbage_collector.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/garbage_collector.rst b/garbage_collector.rst index 3bc3690fe3..43a1ce7ff8 100644 --- a/garbage_collector.rst +++ b/garbage_collector.rst @@ -87,7 +87,7 @@ information associated to the GC is needed the previous fields can be accessed b simple type cast from the original object: :code:`((PyGC_Head *)(the_object)-1)`. As is explained later in the `Optimization: reusing fields to save memory`_ section, -these two extra fields are normally used to keep a double linked list of all the +these two extra fields are normally used to keep doubly linked lists of all the objects tracked by the garbage collector (these lists are the GC generations, more on that in the `Optimization: reusing fields to save memory`_ section) but they are also reused to fullfill other pourposes when the full double linked list structure is not From c72ce11f9acf333581c584aa4e97ca135470a22b Mon Sep 17 00:00:00 2001 From: Pablo Galindo Date: Mon, 20 Jan 2020 22:29:52 +0000 Subject: [PATCH 04/28] Fix more typos --- garbage_collector.rst | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/garbage_collector.rst b/garbage_collector.rst index 43a1ce7ff8..e24212efa8 100644 --- a/garbage_collector.rst +++ b/garbage_collector.rst @@ -123,14 +123,14 @@ the interpreter create cycles everywhere. Some notable examples: * When representing data structures like graphs is very typical for them to have internal links to themselves. -To correctly dispose of these objects once they become unreachable, they need to -be identified first. This is done in the `deduce_unreachable() `__ -function. Inside this component, two double-linked lists are maintained: one list contains -all objects to be scanned, and the other will contain all objects "tentatively" unreachable. +To correctly dispose of these objects once they become unreachable, they need to be +identified first. Inside the function that identifies cycles, two double-linked +lists are maintained: one list contains all objects to be scanned, and the other will +contain all objects "tentatively" unreachable. -To understand how the algorithm works, Let’s take the case of a circular linked list which has -one link referenced by a variable A, and one self-referencing object which is completely -unreachable +To understand how the algorithm works, Let’s take the case of a circular linked list +which has one link referenced by a variable A, and one self-referencing object which +is completely unreachable .. code-block:: python @@ -202,7 +202,7 @@ to ``link 2`` and ``link 3`` below as they are reachable from ``link 1``. From state in the previous image and after examining the objects referred to by ``link1`` the GC knows that ``link 3`` is reachable after all, so it is moved back to the original list and its ``gc_refs`` field is set to one so if the GC visits it again, it -does not that is reachable. To avoid visiting a object twice, the GC marks all +does know that is reachable. To avoid visiting a object twice, the GC marks all objects that are not visited yet with and once an object is processed is unmarked so the GC does not process it twice. From 410487bb5cf22221474f62068adf27e1e8548143 Mon Sep 17 00:00:00 2001 From: Pablo Galindo Date: Mon, 20 Jan 2020 22:30:11 +0000 Subject: [PATCH 05/28] Update garbage_collector.rst Co-Authored-By: Tim Peters --- garbage_collector.rst | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/garbage_collector.rst b/garbage_collector.rst index e24212efa8..3c766f2a77 100644 --- a/garbage_collector.rst +++ b/garbage_collector.rst @@ -93,6 +93,14 @@ that in the `Optimization: reusing fields to save memory`_ section) but they are reused to fullfill other pourposes when the full double linked list structure is not needed as a memory optimization. +Doubly linked lists are used because they efficiently support most frequently required operations. In +general, the collection of all objects tracked by GC are partitioned into disjoint sets, each in its own +doubly linked list. Between collections, objects are partitioned into "generations", reflecting how +often they're survived collection attempts. During collections, the generations(s) being collected +are further partitioned into, e.g., sets of reachable and unreachable objects. Doubly linked lists +support moving an object from one partition to another, adding a new object, removing an object +entirely (objects tracked by GC are most often reclaimed by the refcounting system when GC +isn't running at all!), and merging partitions, all with a small constant number of pointer updates. Specific APIs are offered to allocate, deallocate, initialize, track and untrack objects with GC support. These APIs can be found in the `Garbage Collector C API documentation `_. From b18da79a4ca5769669e14780308f738f2b05dd07 Mon Sep 17 00:00:00 2001 From: Pablo Galindo Date: Mon, 20 Jan 2020 22:36:13 +0000 Subject: [PATCH 06/28] Update garbage_collector.rst Co-Authored-By: Tim Peters --- garbage_collector.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/garbage_collector.rst b/garbage_collector.rst index 3c766f2a77..686c212cb8 100644 --- a/garbage_collector.rst +++ b/garbage_collector.rst @@ -101,6 +101,7 @@ are further partitioned into, e.g., sets of reachable and unreachable objects. support moving an object from one partition to another, adding a new object, removing an object entirely (objects tracked by GC are most often reclaimed by the refcounting system when GC isn't running at all!), and merging partitions, all with a small constant number of pointer updates. + Specific APIs are offered to allocate, deallocate, initialize, track and untrack objects with GC support. These APIs can be found in the `Garbage Collector C API documentation `_. From 25d75551e612a73a622110897f21f4be910e35bb Mon Sep 17 00:00:00 2001 From: Pablo Galindo Date: Mon, 20 Jan 2020 22:53:15 +0000 Subject: [PATCH 07/28] Apply suggestions from code review Co-Authored-By: Brett Cannon <54418+brettcannon@users.noreply.github.com> --- garbage_collector.rst | 58 +++++++++++++++++++++---------------------- 1 file changed, 29 insertions(+), 29 deletions(-) diff --git a/garbage_collector.rst b/garbage_collector.rst index 686c212cb8..8ad08c84a0 100644 --- a/garbage_collector.rst +++ b/garbage_collector.rst @@ -32,7 +32,7 @@ to the object when called): 2 The main problem present with the reference count schema is that reference count does not -handle reference cycles. For instance, consider this code +handle reference cycles. For instance, consider this code: .. code-block:: python @@ -89,7 +89,7 @@ simple type cast from the original object: :code:`((PyGC_Head *)(the_object)-1)` As is explained later in the `Optimization: reusing fields to save memory`_ section, these two extra fields are normally used to keep doubly linked lists of all the objects tracked by the garbage collector (these lists are the GC generations, more on -that in the `Optimization: reusing fields to save memory`_ section) but they are also +that in the `Optimization: reusing fields to save memory`_ section), but they are also reused to fullfill other pourposes when the full double linked list structure is not needed as a memory optimization. @@ -102,21 +102,21 @@ support moving an object from one partition to another, adding a new object, re entirely (objects tracked by GC are most often reclaimed by the refcounting system when GC isn't running at all!), and merging partitions, all with a small constant number of pointer updates. -Specific APIs are offered to allocate, deallocate, initialize, track and untrack +Specific APIs are offered to allocate, deallocate, initialize, track, and untrack objects with GC support. These APIs can be found in the `Garbage Collector C API documentation `_. -Appart from this object structure, the type object of objects supporting garbage +Apart from this object structure, the type object for objects supporting garbage collection must include the ``Py_TPFLAGS_HAVE_GC`` in its ``tp_flags`` slot and provide an implementation of the ``tp_traverse`` handler. Unless it can be proven that the objects cannot form reference cycles with only objects of its type or if the -type are mutable, a ``tp_clear`` implementation must also be provided. +type is immutable, a ``tp_clear`` implementation must also be provided. Identifiying reference cycles reference cycles ---------------------------------------------- -The algorithm that why CPython uses to detect those reference cycles is +The algorithm that CPython uses to detect those reference cycles is implemented in the ``gc`` module. The garbage collector **only focuses** on cleaning container objects (i.e. objects that can contain a reference to one or more objects). These can be arrays, dictionaries, lists, custom @@ -126,10 +126,10 @@ the interpreter create cycles everywhere. Some notable examples: * Exceptions contain traceback objects that contain a list of frames that contain the exception itself. - * Instances have references to their class and the class to the module, which + * Instances have references to their class which itself references its module, and the module contains references to everything that is inside (and maybe other modules) and this can lead back to the original instance. - * When representing data structures like graphs is very typical for them to + * When representing data structures like graphs, it is very typical for them to have internal links to themselves. To correctly dispose of these objects once they become unreachable, they need to be @@ -163,10 +163,10 @@ is completely unreachable When the GC starts, it has all the container objects it wants to scan on the first linked list. The objective is to move all the unreachable -objects. As generally most objects turn out to be reachable, is much more +objects. Since most objects turn out to be reachable, it is much more efficient to move the unreachable as this involves fewer pointer updates. -Every object that supports garbage collection will have a extra reference +Every object that supports garbage collection will have an extra reference count field initialized to the reference count (``gc_ref`` in the figures) of that object when the algorithm starts. This is because the algorithm needs to modify the reference count to do the computations and in this way the @@ -175,8 +175,8 @@ interpreter will not modify the real reference count field. .. figure:: images/python-cyclic-gc-1-new-page.png The GC then iterates over all containers in the first list and decrements by one the -``gc_ref`` field of any other object that container it is referencing. For doing -this it makes use of the ``tp_traverse`` slot in the container class (implemented +``gc_ref`` field of any other object that container is referencing. Doing +this makes use of the ``tp_traverse`` slot in the container class (implemented using the C API or inherited by a superclass) to know what objects are referenced by each container. After all the objects have been scanned, only the objects that have references from outside the “objects to scan” list will have ``gc_ref > 0``. @@ -197,7 +197,7 @@ processed ``link 1`` and ``link 2`` yet. .. figure:: images/python-cyclic-gc-3-new-page.png -Then the GC scans next the ``link 1`` object. Because its has ``gc_refs == 1`` +Then the GC scans the next ``link 1`` object. Because its has ``gc_refs == 1`` the gc does not do anything special because it knows it has to be reachable (and is already in what will become the reachable list): @@ -205,7 +205,7 @@ already in what will become the reachable list): When the GC encounters an object which is reachable (``gc_refs > 0``), it traverses its references using the ``tp_traverse`` slot to find all the objects that are -reachable from it, marking moving them to the end of the list of reachable objects (where +reachable from it, moving them to the end of the list of reachable objects (where they started originally) and setting its ``gc_refs`` field to 1. This is what happens to ``link 2`` and ``link 3`` below as they are reachable from ``link 1``. From the state in the previous image and after examining the objects referred to by ``link1`` @@ -227,7 +227,7 @@ list are really unreachable and can thus be garbage collected. Why moving unreachable objects is better ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -It sounds logical to move the unreachable objects under the premise that most object +It sounds logical to move the unreachable objects under the premise that most objects are usually reachable, until you think about it: the reason it pays isn't actually obvious. @@ -241,7 +241,7 @@ same for B when it's first encountered. Then C is traversed, B is moved *back* t the reachable list. B is eventually traversed, and then A is moved back to the reachable list. -So instead of not moving at all, the reachable objects B and A are moved twice each. +So instead of not moving at all, the reachable objects B and A are each moved twice. Why is this a win? A straightforward algorithm to move the reachable objects instead would move A, B, and C once each. The key is that this dance leaves the objects in order C, B, A - it's reversed from the original order. On all *subsequent* scans, @@ -291,18 +291,18 @@ many temporary objects are created and destroyed very fast. The older an object the less likely is to become unreachable. To take advantage of this fact, all container objects are segregated across -three spaces or "generations" (CPython currently uses 3 generations). Every new +three spaces/generations. Every new object starts in the first generation (generation 0). The previous algorithm is executed only over the objects of a particular generation and if an object survives a collection of its generation it will be moved to the next one -(generation 1), where it will it will be surveyed for collection less often. If +(generation 1), where it will be surveyed for collection less often. If the same object survives another GC round in this new generation (generation 1) it will be moved to the last generation (generation 2) where it will be surveyed the least often. Generations are collected when the number of objects that they contain reach some -predefined threshold which is unique of each generation and is lower the older the -generation is. These thresholds can be examined using the ``gc.get_threshold`` +predefined threshold which is unique of each generation and is lower than the older +generations are. These thresholds can be examined using the ``gc.get_threshold`` function: .. code-block:: python @@ -334,13 +334,13 @@ specifically in a generation by calling ``gc.collect(generation=NUM)``. >>> x = MyObj() >>> x.self = x - # Initially the object is in the younguest generation + # Initially the object is in the younguest generation. >>> gc.get_objects(generation=0) [..., <__main__.MyObj object at 0x7fbcc12a3400>, ...] # After a collection of the younguest generation the object - # moves to the next generation + # moves to the next generation. >>> gc.collect(generation=0) 0 @@ -354,8 +354,8 @@ specifically in a generation by calling ``gc.collect(generation=NUM)``. Collecting the oldest generation ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -In addition to the various configurable thresholds, the GC only trigger a full -collection of the oldest if the ratio ``long_lived_pending / long_lived_total`` +In addition to the various configurable thresholds, the GC only triggers a full +collection of the oldest generation if the ratio ``long_lived_pending / long_lived_total`` is above a given value (hardwired to 25%). The reason is that, while "non-full" collections (i.e., collections of the young and middle generations) will always examine roughly the same number of objects (determined by the aforementioned @@ -363,7 +363,7 @@ thresholds) the cost of a full collection is proportional to the total number of long-lived objects, which is virtually unbounded. Indeed, it has been remarked that doing a full collection every of object creations entails a dramatic performance degradation in workloads which consist -in creating and storing lots of long-lived objects (e.g. building a large list +of creating and storing lots of long-lived objects (e.g. building a large list of GC-tracked objects would show quadratic performance, instead of linear as expected). Using the above ratio, instead, yields amortized linear performance in the total number of objects (the effect of which can be summarized thusly: @@ -373,7 +373,7 @@ grows, but we do fewer and fewer of them"). Optimization: reusing fields to save memory ------------------------------------------- -In order to save memory, the two linked list pointers in every object with gc +In order to save memory, the two linked list pointers in every object with GC support are reused for several purposes. This is a common optimization known as "fat pointers" or "tagged pointers": pointers that carry additional data, "folded" into the pointer, meaning stored inline in the data representing the @@ -385,9 +385,9 @@ used for tags or to keep other information – most often as a bit field (each bit a separate tag) – as long as code that uses the pointer masks out these bits before accessing memory. E.g., on a 32-bit architecture (for both addresses and word size), a word is 32 bits = 4 bytes, so word-aligned -addresses are always a multiple of 4, hence end in 00, leaving the last 2 bits +addresses are always a multiple of 4, hence end in ``00``, leaving the last 2 bits available; while on a 64-bit architecture, a word is 64 bits word = 8 bytes, so -word-aligned addresses end in 000, leaving the last 3 bits available. +word-aligned addresses end in ``000``, leaving the last 3 bits available. The CPython GC makes use of two fat pointers: @@ -406,7 +406,7 @@ The CPython GC makes use of two fat pointers: Optimization: delay tracking containers --------------------------------------- -Certain types of container cannot participate in a reference cycle, and so do +Certain types of containers cannot participate in a reference cycle, and so do not need to be tracked by the garbage collector. Untracking these objects reduces the cost of garbage collections. However, determining which objects may be untracked is not free, and the costs must be weighed against the benefits From 811235f29657d2ae553e62d3447d65868076dc01 Mon Sep 17 00:00:00 2001 From: Pablo Galindo Date: Mon, 20 Jan 2020 22:54:22 +0000 Subject: [PATCH 08/28] Fix more typos --- garbage_collector.rst | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/garbage_collector.rst b/garbage_collector.rst index 8ad08c84a0..caeb57db67 100644 --- a/garbage_collector.rst +++ b/garbage_collector.rst @@ -31,8 +31,8 @@ to the object when called): >>> sys.getrefcount(x) 2 -The main problem present with the reference count schema is that reference count does not -handle reference cycles. For instance, consider this code: +The main problem present with the reference count schema is that reference counting +does not handle reference cycles. For instance, consider this code: .. code-block:: python @@ -90,7 +90,7 @@ As is explained later in the `Optimization: reusing fields to save memory`_ sect these two extra fields are normally used to keep doubly linked lists of all the objects tracked by the garbage collector (these lists are the GC generations, more on that in the `Optimization: reusing fields to save memory`_ section), but they are also -reused to fullfill other pourposes when the full double linked list structure is not +reused to fullfill other pourposes when the full doubly linked list structure is not needed as a memory optimization. Doubly linked lists are used because they efficiently support most frequently required operations. In From 8b72c715ef5af2a06192f2bee74e80a21ee70fa9 Mon Sep 17 00:00:00 2001 From: Pablo Galindo Date: Mon, 20 Jan 2020 22:57:02 +0000 Subject: [PATCH 09/28] Update garbage_collector.rst Co-Authored-By: Tim Peters --- garbage_collector.rst | 2 ++ 1 file changed, 2 insertions(+) diff --git a/garbage_collector.rst b/garbage_collector.rst index caeb57db67..a268f50ee3 100644 --- a/garbage_collector.rst +++ b/garbage_collector.rst @@ -101,6 +101,8 @@ are further partitioned into, e.g., sets of reachable and unreachable objects. support moving an object from one partition to another, adding a new object, removing an object entirely (objects tracked by GC are most often reclaimed by the refcounting system when GC isn't running at all!), and merging partitions, all with a small constant number of pointer updates. +With care, they also support iterating over a partition while objects are being added to - and +removed from - it, which is frequently required while GC is running. Specific APIs are offered to allocate, deallocate, initialize, track, and untrack objects with GC support. These APIs can be found in the `Garbage Collector C API From 3b84282d2ef976aa6c493abf1e5fb04390368110 Mon Sep 17 00:00:00 2001 From: Pablo Galindo Date: Mon, 20 Jan 2020 23:19:32 +0000 Subject: [PATCH 10/28] Update garbage_collector.rst Co-Authored-By: Tim Peters --- garbage_collector.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/garbage_collector.rst b/garbage_collector.rst index a268f50ee3..74e6c281f7 100644 --- a/garbage_collector.rst +++ b/garbage_collector.rst @@ -235,7 +235,7 @@ obvious. Suppose we create objects A, B, C in that order. They appear in the young generation in the same order. If B points to A, and C to B, and C is reachable from outside, -then the adjusted refcounts after the first step of the algorith runs will be 0, 0, +then the adjusted refcounts after the first step of the algorithm runs will be 0, 0, and 1 respectively because the only reachable object from the outside is C. When the next step of the algorithm finds A, A is moved to the unreachable list. The From 0d3f3226c14d05a1a46782b74d096682a9b4ea26 Mon Sep 17 00:00:00 2001 From: Pablo Galindo Date: Tue, 21 Jan 2020 00:08:53 +0000 Subject: [PATCH 11/28] Apply suggestions from code review Co-Authored-By: Terry Jan Reedy --- garbage_collector.rst | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/garbage_collector.rst b/garbage_collector.rst index 74e6c281f7..fe5424a3c9 100644 --- a/garbage_collector.rst +++ b/garbage_collector.rst @@ -264,7 +264,7 @@ follows these steps in order: cause objects that will be in an inconsistent state to be resurrected or reached by some python functions invoked from the callbacks. To avoid this weak references that also are part of the unreachable set (the object and its weak reference - are in a cycles that are unreachable) then the weak reference needs to be clean + are in a cycles that are unreachable) then the weak reference needs to be cleaned immediately and the callback must not be executed so it does not trigger later when the ``tp_clear`` slot is called, causing havoc. This is fine because both the object and the weakref are going away, so it's legitimate to pretend the @@ -290,7 +290,7 @@ optimization: generations. The main idea behind this concept is the assumption t most objects have a very short lifespan and can thus be collected shortly after their creation. This has proven to be very close to the reality of many Python programs as many temporary objects are created and destroyed very fast. The older an object is -the less likely is to become unreachable. +the less likely it is to become unreachable. To take advantage of this fact, all container objects are segregated across three spaces/generations. Every new @@ -380,7 +380,7 @@ support are reused for several purposes. This is a common optimization known as "fat pointers" or "tagged pointers": pointers that carry additional data, "folded" into the pointer, meaning stored inline in the data representing the address, taking advantage of certain properties of memory addressing. This is -possible as most architectures are certain types of data will often be aligned +possible as most architectures align certain types of data to the size of the data, often a word or multiple thereof. This discrepancy leaves a few of the least significant bits of the pointer unused, which can be used for tags or to keep other information – most often as a bit field (each @@ -401,7 +401,7 @@ The CPython GC makes use of two fat pointers: ``_gc_prev`` is restored. * The ``_gc_next`` field is used as the "next" pointer to maintain the doubly - linked list but during its lowest bit is used to keep the + linked list but during collection its lowest bit is used to keep the ``NEXT_MASK_UNREACHABLE`` flag that indicates if an object is tentatively unreachable during the cycle detection algorithm. From d84a267128d448d7b47da8736f9af745b0eb2cc3 Mon Sep 17 00:00:00 2001 From: Pablo Galindo Date: Tue, 21 Jan 2020 01:35:00 +0000 Subject: [PATCH 12/28] Apply suggestions from code review Co-Authored-By: Tim Peters --- garbage_collector.rst | 12 ++++++++---- 1 file changed, 8 insertions(+), 4 deletions(-) diff --git a/garbage_collector.rst b/garbage_collector.rst index fe5424a3c9..c2a1132296 100644 --- a/garbage_collector.rst +++ b/garbage_collector.rst @@ -31,7 +31,7 @@ to the object when called): >>> sys.getrefcount(x) 2 -The main problem present with the reference count schema is that reference counting +The main problem with the reference count schema is that reference counting does not handle reference cycles. For instance, consider this code: .. code-block:: python @@ -115,7 +115,7 @@ that the objects cannot form reference cycles with only objects of its type or i type is immutable, a ``tp_clear`` implementation must also be provided. -Identifiying reference cycles reference cycles +Identifiying reference cycles ---------------------------------------------- The algorithm that CPython uses to detect those reference cycles is @@ -128,6 +128,8 @@ the interpreter create cycles everywhere. Some notable examples: * Exceptions contain traceback objects that contain a list of frames that contain the exception itself. + * Module-level functions reference the module's dict (which is needed to resolve globals), + which in turn contains an entry for the module-level function. * Instances have references to their class which itself references its module, and the module contains references to everything that is inside (and maybe other modules) and this can lead back to the original instance. @@ -181,7 +183,7 @@ The GC then iterates over all containers in the first list and decrements by one this makes use of the ``tp_traverse`` slot in the container class (implemented using the C API or inherited by a superclass) to know what objects are referenced by each container. After all the objects have been scanned, only the objects that have -references from outside the “objects to scan” list will have ``gc_ref > 0``. +references from outside the “objects to scan” list will have ``gc_refs > 0``. .. figure:: images/python-cyclic-gc-2-new-page.png @@ -226,6 +228,8 @@ process in really a breadth first search over the object graph. Once all the obj are scanned, the GC knows that all container objects in the tentatively unreachable list are really unreachable and can thus be garbage collected. +Pragmatically, it's important to note that no recursion is required by any of this, and neither does it in any other way require additional memory proportional to the number of objects, number of pointers, or the lengths of pointer chains. Apart from ``O(1)`` storage for internal C needs, the objects themselves contain all the storage the GC algorithms require. + Why moving unreachable objects is better ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ @@ -303,7 +307,7 @@ it will be moved to the last generation (generation 2) where it will be surveyed the least often. Generations are collected when the number of objects that they contain reach some -predefined threshold which is unique of each generation and is lower than the older +predefined threshold which is unique for each generation and is lower than the older generations are. These thresholds can be examined using the ``gc.get_threshold`` function: From 5ca46f78247c3125e489203f47616b0875eda0e1 Mon Sep 17 00:00:00 2001 From: Pablo Galindo Date: Tue, 21 Jan 2020 01:38:41 +0000 Subject: [PATCH 13/28] Fix indentation and rework incomplete sentence --- garbage_collector.rst | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/garbage_collector.rst b/garbage_collector.rst index c2a1132296..a34298214d 100644 --- a/garbage_collector.rst +++ b/garbage_collector.rst @@ -129,7 +129,7 @@ the interpreter create cycles everywhere. Some notable examples: * Exceptions contain traceback objects that contain a list of frames that contain the exception itself. * Module-level functions reference the module's dict (which is needed to resolve globals), - which in turn contains an entry for the module-level function. + which in turn contains an entry for the module-level function. * Instances have references to their class which itself references its module, and the module contains references to everything that is inside (and maybe other modules) and this can lead back to the original instance. @@ -216,8 +216,9 @@ state in the previous image and after examining the objects referred to by ``lin the GC knows that ``link 3`` is reachable after all, so it is moved back to the original list and its ``gc_refs`` field is set to one so if the GC visits it again, it does know that is reachable. To avoid visiting a object twice, the GC marks all -objects that are not visited yet with and once an object is processed is unmarked so -the GC does not process it twice. +objects that are already visited once (by unsetting the ``PREV_MASK_COLLECTING`` flag) +so if an object that has already been processed is referred by some other object, the +GC does not process it twice. .. figure:: images/python-cyclic-gc-5-new-page.png From 404fcc5c3937618489a5983fa758b13139f1339a Mon Sep 17 00:00:00 2001 From: Pablo Galindo Date: Tue, 21 Jan 2020 01:41:17 +0000 Subject: [PATCH 14/28] Fix more indentation --- garbage_collector.rst | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/garbage_collector.rst b/garbage_collector.rst index a34298214d..9c3b883400 100644 --- a/garbage_collector.rst +++ b/garbage_collector.rst @@ -128,8 +128,8 @@ the interpreter create cycles everywhere. Some notable examples: * Exceptions contain traceback objects that contain a list of frames that contain the exception itself. - * Module-level functions reference the module's dict (which is needed to resolve globals), - which in turn contains an entry for the module-level function. + * Module-level functions reference the module's dict (which is needed to resolve globals), + which in turn contains an entry for the module-level function. * Instances have references to their class which itself references its module, and the module contains references to everything that is inside (and maybe other modules) and this can lead back to the original instance. From 8fc0547adbee399bf588494510ea11af6d43d0f8 Mon Sep 17 00:00:00 2001 From: Pablo Galindo Date: Tue, 21 Jan 2020 01:48:57 +0000 Subject: [PATCH 15/28] Rework sentence about _gc_prev --- garbage_collector.rst | 13 +++++++------ 1 file changed, 7 insertions(+), 6 deletions(-) diff --git a/garbage_collector.rst b/garbage_collector.rst index 9c3b883400..ff21376e86 100644 --- a/garbage_collector.rst +++ b/garbage_collector.rst @@ -398,12 +398,13 @@ word-aligned addresses end in ``000``, leaving the last 3 bits available. The CPython GC makes use of two fat pointers: -* Between collections, the ``_gc_prev``` field is used as the "previous" - pointer to maintain the doubly linked list but the lowest two bits of are used - to keep some flags like `PREV_MASK_COLLECTING`. During collections ``_gc_prev`` - is temporary used for storing the temporary copy of the reference count - (``gc_refs``) , and the GC linked list becomes a singly linked list until - ``_gc_prev`` is restored. +* The ``_gc_prev``` field is normally used as the "previous" pointer to maintain the + doubly linked list but the lowest two bits of are used to keep some flags like + `PREV_MASK_COLLECTING` and `_PyGC_PREV_MASK_FINALIZED`. Between collections, the + only flag that can be present is `_PyGC_PREV_MASK_FINALIZED` that indicates if an + object has been already finalized. During collections ``_gc_prev`` is temporary + used for storing the temporary copy of the reference count (``gc_refs``), and the + GC linked list becomes a singly linked list until ``_gc_prev`` is restored. * The ``_gc_next`` field is used as the "next" pointer to maintain the doubly linked list but during collection its lowest bit is used to keep the From e956478573438f4ab826547dfb35b6a36cf5b836 Mon Sep 17 00:00:00 2001 From: Pablo Galindo Date: Tue, 21 Jan 2020 01:53:49 +0000 Subject: [PATCH 16/28] Update garbage_collector.rst Co-Authored-By: Tim Peters --- garbage_collector.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/garbage_collector.rst b/garbage_collector.rst index ff21376e86..f11e88e441 100644 --- a/garbage_collector.rst +++ b/garbage_collector.rst @@ -409,7 +409,7 @@ The CPython GC makes use of two fat pointers: * The ``_gc_next`` field is used as the "next" pointer to maintain the doubly linked list but during collection its lowest bit is used to keep the ``NEXT_MASK_UNREACHABLE`` flag that indicates if an object is tentatively - unreachable during the cycle detection algorithm. + unreachable during the cycle detection algorithm. This is a drawback to using only doubly linked lists to implement partitions: while most needed operations are constant-time, there is no efficient way to determine which partition an object is currently in. Instead, when that's needed, ad hoc tricks (like the ``NEXT_MASK_UNREACHABLE`` flag) are employed. Optimization: delay tracking containers --------------------------------------- From 47190cd8c09e4a2749f95c67624da9288dfe1505 Mon Sep 17 00:00:00 2001 From: Pablo Galindo Date: Tue, 21 Jan 2020 01:56:26 +0000 Subject: [PATCH 17/28] Fix indentation --- garbage_collector.rst | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/garbage_collector.rst b/garbage_collector.rst index f11e88e441..18d22dff1e 100644 --- a/garbage_collector.rst +++ b/garbage_collector.rst @@ -406,10 +406,14 @@ The CPython GC makes use of two fat pointers: used for storing the temporary copy of the reference count (``gc_refs``), and the GC linked list becomes a singly linked list until ``_gc_prev`` is restored. -* The ``_gc_next`` field is used as the "next" pointer to maintain the doubly - linked list but during collection its lowest bit is used to keep the +* The ``_gc_next`` field is used as the "next" pointer to maintain the doubly linked + list but during collection its lowest bit is used to keep the ``NEXT_MASK_UNREACHABLE`` flag that indicates if an object is tentatively - unreachable during the cycle detection algorithm. This is a drawback to using only doubly linked lists to implement partitions: while most needed operations are constant-time, there is no efficient way to determine which partition an object is currently in. Instead, when that's needed, ad hoc tricks (like the ``NEXT_MASK_UNREACHABLE`` flag) are employed. + unreachable during the cycle detection algorithm. This is a drawback to using only + doubly linked lists to implement partitions: while most needed operations are + constant-time, there is no efficient way to determine which partition an object is + currently in. Instead, when that's needed, ad hoc tricks (like the + ``NEXT_MASK_UNREACHABLE`` flag) are employed. Optimization: delay tracking containers --------------------------------------- From 722a99c7c9ea18482b64b76443b11f23137847bb Mon Sep 17 00:00:00 2001 From: Pablo Galindo Date: Tue, 21 Jan 2020 01:58:26 +0000 Subject: [PATCH 18/28] Fix quotes --- garbage_collector.rst | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/garbage_collector.rst b/garbage_collector.rst index 18d22dff1e..a9382f78e6 100644 --- a/garbage_collector.rst +++ b/garbage_collector.rst @@ -400,8 +400,8 @@ The CPython GC makes use of two fat pointers: * The ``_gc_prev``` field is normally used as the "previous" pointer to maintain the doubly linked list but the lowest two bits of are used to keep some flags like - `PREV_MASK_COLLECTING` and `_PyGC_PREV_MASK_FINALIZED`. Between collections, the - only flag that can be present is `_PyGC_PREV_MASK_FINALIZED` that indicates if an + ``PREV_MASK_COLLECTING`` and ``_PyGC_PREV_MASK_FINALIZED``. Between collections, the + only flag that can be present is ``_PyGC_PREV_MASK_FINALIZED`` that indicates if an object has been already finalized. During collections ``_gc_prev`` is temporary used for storing the temporary copy of the reference count (``gc_refs``), and the GC linked list becomes a singly linked list until ``_gc_prev`` is restored. From b40b030ba7e4fe744a98cfc8cbdb88267a1189cc Mon Sep 17 00:00:00 2001 From: Pablo Galindo Date: Tue, 21 Jan 2020 01:59:17 +0000 Subject: [PATCH 19/28] Fix typo and indentation --- garbage_collector.rst | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/garbage_collector.rst b/garbage_collector.rst index a9382f78e6..9c825e5070 100644 --- a/garbage_collector.rst +++ b/garbage_collector.rst @@ -399,12 +399,12 @@ word-aligned addresses end in ``000``, leaving the last 3 bits available. The CPython GC makes use of two fat pointers: * The ``_gc_prev``` field is normally used as the "previous" pointer to maintain the - doubly linked list but the lowest two bits of are used to keep some flags like - ``PREV_MASK_COLLECTING`` and ``_PyGC_PREV_MASK_FINALIZED``. Between collections, the - only flag that can be present is ``_PyGC_PREV_MASK_FINALIZED`` that indicates if an - object has been already finalized. During collections ``_gc_prev`` is temporary - used for storing the temporary copy of the reference count (``gc_refs``), and the - GC linked list becomes a singly linked list until ``_gc_prev`` is restored. + doubly linked list but its lowest two bits are used to keep some flags like + ``PREV_MASK_COLLECTING`` and ``_PyGC_PREV_MASK_FINALIZED``. Between collections, + the only flag that can be present is ``_PyGC_PREV_MASK_FINALIZED`` that indicates + if an object has been already finalized. During collections ``_gc_prev`` is + temporary used for storing the temporary copy of the reference count (``gc_refs``), + and the GC linked list becomes a singly linked list until ``_gc_prev`` is restored. * The ``_gc_next`` field is used as the "next" pointer to maintain the doubly linked list but during collection its lowest bit is used to keep the From 111b2bbd2e7c0d25069c58e909dde6fd0c9b9864 Mon Sep 17 00:00:00 2001 From: Pablo Galindo Date: Tue, 21 Jan 2020 02:01:22 +0000 Subject: [PATCH 20/28] Update garbage_collector.rst --- garbage_collector.rst | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/garbage_collector.rst b/garbage_collector.rst index 9c825e5070..91ed132ba7 100644 --- a/garbage_collector.rst +++ b/garbage_collector.rst @@ -294,7 +294,7 @@ In order to limit the time each garbage collection takes, the GC is uses a popul optimization: generations. The main idea behind this concept is the assumption that most objects have a very short lifespan and can thus be collected shortly after their creation. This has proven to be very close to the reality of many Python programs as -many temporary objects are created and destroyed very fast. The older an object is +many temporarily objects are created and destroyed very fast. The older an object is the less likely it is to become unreachable. To take advantage of this fact, all container objects are segregated across @@ -399,11 +399,11 @@ word-aligned addresses end in ``000``, leaving the last 3 bits available. The CPython GC makes use of two fat pointers: * The ``_gc_prev``` field is normally used as the "previous" pointer to maintain the - doubly linked list but its lowest two bits are used to keep some flags like + doubly linked list but its lowest two bits are used to keep the flags ``PREV_MASK_COLLECTING`` and ``_PyGC_PREV_MASK_FINALIZED``. Between collections, the only flag that can be present is ``_PyGC_PREV_MASK_FINALIZED`` that indicates if an object has been already finalized. During collections ``_gc_prev`` is - temporary used for storing the temporary copy of the reference count (``gc_refs``), + temporarily used for storing the temporarily copy of the reference count (``gc_refs``), and the GC linked list becomes a singly linked list until ``_gc_prev`` is restored. * The ``_gc_next`` field is used as the "next" pointer to maintain the doubly linked From 2ec50024e9e632dab20b5f9209fa07e02bc72254 Mon Sep 17 00:00:00 2001 From: Pablo Galindo Date: Tue, 21 Jan 2020 02:03:40 +0000 Subject: [PATCH 21/28] Update garbage_collector.rst --- garbage_collector.rst | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/garbage_collector.rst b/garbage_collector.rst index 91ed132ba7..bc4b1ca5f4 100644 --- a/garbage_collector.rst +++ b/garbage_collector.rst @@ -403,8 +403,9 @@ The CPython GC makes use of two fat pointers: ``PREV_MASK_COLLECTING`` and ``_PyGC_PREV_MASK_FINALIZED``. Between collections, the only flag that can be present is ``_PyGC_PREV_MASK_FINALIZED`` that indicates if an object has been already finalized. During collections ``_gc_prev`` is - temporarily used for storing the temporarily copy of the reference count (``gc_refs``), - and the GC linked list becomes a singly linked list until ``_gc_prev`` is restored. + temporarily used for storing a copy of the reference count (``gc_refs``), in + addition to two flags, and the GC linked list becomes a singly linked list until + ``_gc_prev`` is restored. * The ``_gc_next`` field is used as the "next" pointer to maintain the doubly linked list but during collection its lowest bit is used to keep the From 7c8e02ae3c1f30ff7c57b305cc55b0767b0161bf Mon Sep 17 00:00:00 2001 From: Pablo Galindo Date: Tue, 21 Jan 2020 02:12:12 +0000 Subject: [PATCH 22/28] Add author section --- garbage_collector.rst | 2 ++ 1 file changed, 2 insertions(+) diff --git a/garbage_collector.rst b/garbage_collector.rst index bc4b1ca5f4..15c295fc71 100644 --- a/garbage_collector.rst +++ b/garbage_collector.rst @@ -3,6 +3,8 @@ Design of CPython's Garbage Collector ===================================== +:Author: Pablo Galindo Salgado + .. highlight:: none Abstract From e51d354ad0563d82073df47cb81b0848a954e584 Mon Sep 17 00:00:00 2001 From: Pablo Galindo Date: Tue, 21 Jan 2020 02:32:45 +0000 Subject: [PATCH 23/28] Add a warning section regarding tagged pointers --- garbage_collector.rst | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/garbage_collector.rst b/garbage_collector.rst index 15c295fc71..a4a302aba1 100644 --- a/garbage_collector.rst +++ b/garbage_collector.rst @@ -400,6 +400,15 @@ word-aligned addresses end in ``000``, leaving the last 3 bits available. The CPython GC makes use of two fat pointers: + .. warning:: + + Because the presence of extra information, "tagged" or "fat" pointers cannot be + dereferenced directly and the extra information must be stripped off before to + obtain the real memory address. Special care needs to be taken with functions that + directly manipulate the linked lists, as these functions normally asume the + pointers in them are in a consistent state. + + * The ``_gc_prev``` field is normally used as the "previous" pointer to maintain the doubly linked list but its lowest two bits are used to keep the flags ``PREV_MASK_COLLECTING`` and ``_PyGC_PREV_MASK_FINALIZED``. Between collections, From 411037f23fc6fa73d13d87569a1e02851f88459f Mon Sep 17 00:00:00 2001 From: Pablo Galindo Date: Tue, 21 Jan 2020 02:41:00 +0000 Subject: [PATCH 24/28] Add reference to the memory layout --- garbage_collector.rst | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/garbage_collector.rst b/garbage_collector.rst index a4a302aba1..8b45138e99 100644 --- a/garbage_collector.rst +++ b/garbage_collector.rst @@ -398,7 +398,8 @@ addresses are always a multiple of 4, hence end in ``00``, leaving the last 2 bi available; while on a 64-bit architecture, a word is 64 bits word = 8 bytes, so word-aligned addresses end in ``000``, leaving the last 3 bits available. -The CPython GC makes use of two fat pointers: +The CPython GC makes use of two fat pointers that corresponds to the extra fields +of ``PyGC_Head`` discussed in the `Memory layout and object structure`_ section: .. warning:: From 40816075013570f74c8d14321fb7bfb6a538c70c Mon Sep 17 00:00:00 2001 From: Pablo Galindo Date: Tue, 21 Jan 2020 02:43:53 +0000 Subject: [PATCH 25/28] Fix link to the generation section --- garbage_collector.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/garbage_collector.rst b/garbage_collector.rst index 8b45138e99..b0f031768d 100644 --- a/garbage_collector.rst +++ b/garbage_collector.rst @@ -91,7 +91,7 @@ simple type cast from the original object: :code:`((PyGC_Head *)(the_object)-1)` As is explained later in the `Optimization: reusing fields to save memory`_ section, these two extra fields are normally used to keep doubly linked lists of all the objects tracked by the garbage collector (these lists are the GC generations, more on -that in the `Optimization: reusing fields to save memory`_ section), but they are also +that in the `Optimization: generations`_ section), but they are also reused to fullfill other pourposes when the full doubly linked list structure is not needed as a memory optimization. From 03178a03c647b4618e7f27f4f0c3bc1830ffafea Mon Sep 17 00:00:00 2001 From: Pablo Galindo Date: Tue, 21 Jan 2020 09:50:54 +0000 Subject: [PATCH 26/28] Apply suggestions from code review Co-Authored-By: Tim Peters --- garbage_collector.rst | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/garbage_collector.rst b/garbage_collector.rst index b0f031768d..ddc6ef204f 100644 --- a/garbage_collector.rst +++ b/garbage_collector.rst @@ -98,7 +98,7 @@ needed as a memory optimization. Doubly linked lists are used because they efficiently support most frequently required operations. In general, the collection of all objects tracked by GC are partitioned into disjoint sets, each in its own doubly linked list. Between collections, objects are partitioned into "generations", reflecting how -often they're survived collection attempts. During collections, the generations(s) being collected +often they've survived collection attempts. During collections, the generations(s) being collected are further partitioned into, e.g., sets of reachable and unreachable objects. Doubly linked lists support moving an object from one partition to another, adding a new object, removing an object entirely (objects tracked by GC are most often reclaimed by the refcounting system when GC @@ -131,7 +131,7 @@ the interpreter create cycles everywhere. Some notable examples: * Exceptions contain traceback objects that contain a list of frames that contain the exception itself. * Module-level functions reference the module's dict (which is needed to resolve globals), - which in turn contains an entry for the module-level function. + which in turn contains entries for the module-level functions. * Instances have references to their class which itself references its module, and the module contains references to everything that is inside (and maybe other modules) and this can lead back to the original instance. From 6d01e46ee5d45dd6adcc46df30a607b5044c087d Mon Sep 17 00:00:00 2001 From: Pablo Galindo Date: Tue, 21 Jan 2020 12:14:31 +0000 Subject: [PATCH 27/28] Fix more typos --- garbage_collector.rst | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/garbage_collector.rst b/garbage_collector.rst index ddc6ef204f..bad5fd8285 100644 --- a/garbage_collector.rst +++ b/garbage_collector.rst @@ -113,8 +113,8 @@ documentation `_. Apart from this object structure, the type object for objects supporting garbage collection must include the ``Py_TPFLAGS_HAVE_GC`` in its ``tp_flags`` slot and provide an implementation of the ``tp_traverse`` handler. Unless it can be proven -that the objects cannot form reference cycles with only objects of its type or if the -type is immutable, a ``tp_clear`` implementation must also be provided. +that the objects cannot form reference cycles with only objects of its type or unless +the type is immutable, a ``tp_clear`` implementation must also be provided. Identifiying reference cycles @@ -227,7 +227,7 @@ GC does not process it twice. Notice that once a object that was marked as "tentatively unreachable" and later is moved back to the reachable list, it will be visited again by the garbage collector as now all the references that that objects has need to be processed as well. This -process in really a breadth first search over the object graph. Once all the objects +process is really a breadth first search over the object graph. Once all the objects are scanned, the GC knows that all container objects in the tentatively unreachable list are really unreachable and can thus be garbage collected. @@ -404,10 +404,10 @@ of ``PyGC_Head`` discussed in the `Memory layout and object structure`_ section: .. warning:: Because the presence of extra information, "tagged" or "fat" pointers cannot be - dereferenced directly and the extra information must be stripped off before to - obtain the real memory address. Special care needs to be taken with functions that - directly manipulate the linked lists, as these functions normally asume the - pointers in them are in a consistent state. + dereferenced directly and the extra information must be stripped off before + obtaining the real memory address. Special care needs to be taken with + functions that directly manipulate the linked lists, as these functions + normally asume the pointers inside the lists are in a consistent state. * The ``_gc_prev``` field is normally used as the "previous" pointer to maintain the From 4321593baa7da0eadc06be5b13cccf206cc524ca Mon Sep 17 00:00:00 2001 From: Pablo Galindo Date: Tue, 21 Jan 2020 12:16:59 +0000 Subject: [PATCH 28/28] Address Petr feedback and fix more typos --- garbage_collector.rst | 12 ++++++++---- 1 file changed, 8 insertions(+), 4 deletions(-) diff --git a/garbage_collector.rst b/garbage_collector.rst index bad5fd8285..98ceff1784 100644 --- a/garbage_collector.rst +++ b/garbage_collector.rst @@ -49,8 +49,8 @@ our reference to it (the variable "container") the reference count never falls t because it still has its own internal reference and therefore it will never be cleaned just by simple reference counting. For this reason some additional machinery is needed to clean these reference cycles between objects once they become -unreachable. We normally refer to this additional machinery as the Garbage Collector, -but technically reference counting is also a form of garbage collection. +unreachable. This is the cyclic garbage collector, usually called just Garbage +Collector (GC), even though reference counting is also a form of garbage collection. Memory layout and object structure ---------------------------------- @@ -231,7 +231,11 @@ process is really a breadth first search over the object graph. Once all the obj are scanned, the GC knows that all container objects in the tentatively unreachable list are really unreachable and can thus be garbage collected. -Pragmatically, it's important to note that no recursion is required by any of this, and neither does it in any other way require additional memory proportional to the number of objects, number of pointers, or the lengths of pointer chains. Apart from ``O(1)`` storage for internal C needs, the objects themselves contain all the storage the GC algorithms require. +Pragmatically, it's important to note that no recursion is required by any of this, +and neither does it in any other way require additional memory proportional to the +number of objects, number of pointers, or the lengths of pointer chains. Apart from +``O(1)`` storage for internal C needs, the objects themselves contain all the storage +the GC algorithms require. Why moving unreachable objects is better ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ @@ -269,7 +273,7 @@ follows these steps in order: set is going to be destroyed and has weak references with callbacks, these callbacks need to be honored. This process is **very** delicate as any error can cause objects that will be in an inconsistent state to be resurrected or reached - by some python functions invoked from the callbacks. To avoid this weak references + by some python functions invoked from the callbacks. To avoid these weak references that also are part of the unreachable set (the object and its weak reference are in a cycles that are unreachable) then the weak reference needs to be cleaned immediately and the callback must not be executed so it does not trigger later