8000 Minor editorial pass over garbage_collector.rst (#563) · python/devguide@9ac18cf · GitHub
[go: up one dir, main page]

Skip to content

Commit 9ac18cf

Browse files
authored
Minor editorial pass over garbage_collector.rst (#563)
1 parent 5d00f7e commit 9ac18cf

File tree

1 file changed

+26
-25
lines changed

1 file changed

+26
-25
lines changed

garbage_collector.rst

Lines changed: 26 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ that CPython counts how many different places there are that have a reference to
1515
object. Such a place could be another object, or a global (or static) C variable, or
1616
a local variable in some C function. When an object’s reference count becomes zero,
1717
the object is deallocated. If it contains references to other objects, their
18-
reference count is decremented. Those other objects may be deallocated in turn, if
18+
reference counts are decremented. Those other objects may be deallocated in turn, if
1919
this decrement makes their reference count become zero, and so on. The reference
2020
count field can be examined using the ``sys.getrefcount`` function (notice that the
2121
value returned by this function is always 1 more as the function also has a reference
@@ -46,7 +46,7 @@ does not handle reference cycles. For instance, consider this code:
4646
4747
In this example, ``container`` holds a reference to itself, so even when we remove
4848
our reference to it (the variable "container") the reference count never falls to 0
49-
because it still has its own internal reference and therefore it will never be
49+
because it still has its own internal reference. Therefore it would never be
5050
cleaned just by simple reference counting. For this reason some additional machinery
5151
is needed to clean these reference cycles between objects once they become
5252
unreachable. This is the cyclic garbage collector, usually called just Garbage
@@ -92,7 +92,7 @@ As is explained later in the `Optimization: reusing fields to save memory`_ sect
9292
these two extra fields are normally used to keep doubly linked lists of all the
9393
objects tracked by the garbage collector (these lists are the GC generations, more on
9494
that in the `Optimization: generations`_ section), but they are also
95-
reused to fullfill other pourposes when the full doubly linked list structure is not
95+
reused to fullfill other purposes when the full doubly linked list structure is not
9696
needed as a memory optimization.
9797

9898
Doubly linked lists are used because they efficiently support most frequently required operations. In
@@ -144,8 +144,8 @@ lists are maintained: one list contains all objects to be scanned, and the other
144144
contain all objects "tentatively" unreachable.
145145

146146
To understand how the algorithm works, Let’s take the case of a circular linked list
147-
which has one link referenced by a variable A, and one self-referencing object which
148-
is completely unreachable
147+
which has one link referenced by a variable ``A``, and one self-referencing object which
148+
is completely unreachable:
149149

150150
.. code-block:: python
151151
@@ -156,14 +156,15 @@ is completely unreachable
156156
... self.next_link = next_link
157157
158158
>>> link_3 = Link()
159-
>>> link_2 = Link(link3)
160-
>>> link_1 = Link(link2)
159+
>>> link_2 = Link(link_3)
160+
>>> link_1 = Link(link_2)
161161
>>> link_3.next_link = link_1
162+
>>> A = link_1
163+
>>> del link_1, link_2, link_3
162164
163165
>>> link_4 = Link()
164166
>>> link_4.next_link = link_4
165167
166-
>>> del link_4
167168
>>> gc.collect()
168169
2
169170
@@ -194,16 +195,16 @@ This is because another object that is reachable from the outside (``gc_refs > 0
194195
can still have references to it. For instance, the ``link_2`` object in our example
195196
ended having ``gc_refs == 0`` but is referenced still by the ``link_1`` object that
196197
is reachable from the outside. To obtain the set of objects that are really
197-
unreachable, the garbage collector scans again the container objects using the
198-
``tp_traverse`` slot with a different traverse function that marks objects with
198+
unreachable, the garbage collector re-scans the container objects using the
199+
``tp_traverse`` slot; this time with a different traverse function that marks objects with
199200
``gc_refs == 0`` as "tentatively unreachable" and then moves them to the
200201
tentatively unreachable list. The following image depicts the state of the lists in a
201-
moment when the GC processed the ``link 3`` and ``link 4`` objects but has not
202-
processed ``link 1`` and ``link 2`` yet.
202+
moment when the GC processed the ``link_3`` and ``link_4`` objects but has not
203+
processed ``link_1`` and ``link_2`` yet.
203204

204205
.. figure:: images/python-cyclic-gc-3-new-page.png
205206

206-
Then the GC scans the next ``link 1`` object. Because its has ``gc_refs == 1``
207+
Then the GC scans the next ``link_1`` object. Because its has ``gc_refs == 1``
207208
the gc does not do anything special because it knows it has to be reachable (and is
208209
already in what will become the reachable list):
209210

@@ -213,9 +214,9 @@ When the GC encounters an object which is reachable (``gc_refs > 0``), it traver
213214
its references using the ``tp_traverse`` slot to find all the objects that are
214215
reachable from it, moving them to the end of the list of reachable objects (where
215216
they started originally) and setting its ``gc_refs`` field to 1. This is what happens
216-
to ``link 2`` and ``link 3`` below as they are reachable from ``link 1``. From the
217-
state in the previous image and after examining the objects referred to by ``link1``
218-
the GC knows that ``link 3`` is reachable after all, so it is moved back to the
217+
to ``link_2`` and ``link_3`` below as they are reachable from ``link_1``. From the
218+
state in the previous image and after examining the objects referred to by ``link_1``
219+
the GC knows that ``link_3`` is reachable after all, so it is moved back to the
219 ED4F 220
original list and its ``gc_refs`` field is set to one so if the GC visits it again, it
220221
does know that is reachable. To avoid visiting a object twice, the GC marks all
221222
objects that are already visited once (by unsetting the ``PREV_MASK_COLLECTING`` flag)
@@ -273,13 +274,13 @@ follows these steps in order:
273274
set is going to be destroyed and has weak references with callbacks, these
274275
callbacks need to be honored. This process is **very** delicate as any error can
275276
cause objects that will be in an inconsistent state to be resurrected or reached
276-
by some python functions invoked from the callbacks. To avoid these weak references
277+
by some python functions invoked from the callbacks. In addition, weak references
277278
that also are part of the unreachable set (the object and its weak reference
278-
are in a cycles that are unreachable) then the weak reference needs to be cleaned
279-
immediately and the callback must not be executed so it does not trigger later
280-
when the ``tp_clear`` slot is called, causing havoc. This is fine because both
281-
the object and the weakref are going away, so it's legitimate to pretend the
282-
weak reference is going away first so the callback is never executed.
279+
are in a cycles that are unreachable) need to be cleaned
280+
immediately, without executing the callback. Otherwise it will be triggered later,
281+
when the ``tp_clear`` slot is called, causing havoc. Ignoring the weak reference's
282+
callback is fine because both the object and the weakref are going away, so it's
283+
legitimate to say the weak reference is going away first.
283284

284285
2. If an object has legacy finalizers (``tp_del`` slot) move them to the
285286
``gc.garbage`` list.
@@ -296,7 +297,7 @@ follows these steps in order:
296297
Optimization: generations
297298
-------------------------
298299

299-
In order to limit the time each garbage collection takes, the GC is uses a popular
300+
In order to limit the time each garbage collection takes, the GC uses a popular
300301
optimization: generations. The main idea behind this concept is the assumption that
301302
most objects have a very short lifespan and can thus be collected shortly after their
302303
creation. This has proven to be very close to the reality of many Python programs as
@@ -314,7 +315,7 @@ it will be moved to the last generation (generation 2) where it will be
314315
surveyed the least often.
315316

316317
Generations are collected when the number of objects that they contain reach some
317-
predefined threshold which is unique for each generation and is lower than the older
318+
predefined threshold, which is unique for each generation and is lower the older
318319
generations are. These thresholds can be examined using the ``gc.get_threshold``
319320
function:
320321

@@ -411,7 +412,7 @@ of ``PyGC_Head`` discussed in the `Memory layout and object structure`_ section:
411412
dereferenced directly and the extra information must be stripped off before
412413
obtaining the real memory address. Special care needs to be taken with
413414
functions that directly manipulate the linked lists, as these functions
414-
normally asume the pointers inside the lists are in a consistent state.
415+
normally assume the pointers inside the lists are in a consistent state.
415416

416417

417418
* The ``_gc_prev``` field is normally used as the "previous" pointer to maintain the

0 commit comments

Comments
 (0)
0