10000 Fix several typos · python/devguide@8aac485 · GitHub
[go: up one dir, main page]

Skip to content

Commit 8aac485

Browse files
committed
Fix several typos
1 parent 10ee4fd commit 8aac485

File tree

1 file changed

+98
-35
lines changed

1 file changed

+98
-35
lines changed

garbage_collector.rst

Lines changed: 98 additions & 35 deletions
Original file line numberDiff line numberDiff line change
@@ -36,10 +36,11 @@ handle reference cycles. For instance, consider this code
3636

3737
.. code-block:: python
3838
39-
>>> container = []
40-
>>> container.append(container)
41-
42-
>>> del container
39+
>>> container = []
40+
>>> container.append(container)
41+
>>> sys.getrefcount(container)
42+
3
43+
>>> del container
4344
4445
In this example, ``container`` holds a reference to itself, so even when we remove
4546
our reference to it (the variable "container") the reference count never falls to 0
10000
@@ -65,7 +66,7 @@ Normally the C structure supporting a regular Python object looks as follows:
6566
6667
6768
In order to support the garbage collector, the memory layout of objects is altered
68-
to acomodate extra information **before** the normal layout:
69+
to accommodate extra information **before** the normal layout:
6970

7071
.. code-block:: none
7172
@@ -111,7 +112,7 @@ implemented in the ``gc`` module. The garbage collector **only focuses**
111112
on cleaning container objects (i.e. objects that can contain a reference
112113
to one or more objects). These can be arrays, dictionaries, lists, custom
113114
class instances, classes in extension modules, etc. One could think that
114-
cycles are uncommon but the thuth is that many internal references needed by
115+
cycles are uncommon but the truth is that many internal references needed by
115116
the interpreter create cycles everywhere. Some notable examples:
116117

117118
* Exceptions contain traceback objects that contain a list of frames that
@@ -127,36 +128,38 @@ be identified first. This is done in the `deduce_unreachable() <https://github.c
127128
function. Inside this component, two double-linked lists are maintained: one list contains
128129
all objects to be scanned, and the other will contain all objects "tentatively" unreachable.
129130

130-
To understand how the algorith works, Let’s take the case of a circular linked list which has
131+
To understand how the algorithm works, Let’s take the case of a circular linked list which has
131132
one link referenced by a variable A, and one self-referencing object which is completely
132133
unreachable
133134

134135
.. code-block:: python
135136
137+
>>> import gc
136138
137-
class Link:
138-
def __init__(self, next_link=None):
139-
self.next_link = next_link
139+
>>> class Link:
140+
... def __init__(self, next_link=None):
141+
... self.next_link = next_link
140142
141-
link_3 = Link()
142-
link_2 = Link(link3)
143-
link_1 = Link(link2)
144-
link_3.next_link = link_1
143+
>>> link_3 = Link()
144+
>>> link_2 = Link(link3)
145+
>>> link_1 = Link(link2)
146+
>>> link_3.next_link = link_1
145147
146-
link_4 = Link()
147-
link_4.next_link = link_4
148+
>>> link_4 = Link()
149+
>>> link_4.next_link = link_4
148150
149-
import gc
150-
gc.collect()
151+
>>> del link_4
152+
>>> gc.collect()
153+
2
151154
152155
When the GC starts, it has all the container objects it wants to scan
153-
on a the first linked list. The objective is to move all the unreachable
156+
on the first linked list. The objective is to move all the unreachable
154157
objects. As generally most objects turn out to be reachable, is much more
155158
efficient to move the unreachable as this involves fewer pointer updates.
156159

157160
Every object that supports garbage collection will have a extra reference
158161
count field initialized to the reference count (``gc_ref`` in the figures)
159-
of that object when the algorithm starts. This is because the algorith needs
162+
of that object when the algorithm starts. This is because the algorithm needs
160163
to modify the reference count to do the computations and in this way the
161164
interpreter will not modify the real reference count field.
162165

@@ -165,7 +168,7 @@ interpreter will not modify the real reference count field.
165168
The GC then iterates over all containers in the first list and decrements by one the
166169
``gc_ref`` field of any other object that container it is referencing. For doing
167170
this it makes use of the ``tp_traverse`` slot in the container class (implemented
168-
using the C API or inherited by a superclass) to know what objects are refered by
171+
using the C API or inherited by a superclass) to know what objects are referenced by
169172
each container. After all the objects have been scanned, only the objects that have
170173
references from outside the “objects to scan” list will have ``gc_ref > 0``.
171174

@@ -176,11 +179,11 @@ This is because another object that is reachable from the outside (``gc_refs > 0
176179
can still have references to it. For instance, the ``link_2`` object in our example
177180
ended having ``gc_refs == 0`` but is referenced still by the ``link_1`` object that
178181
is reachable from the outside. To obtain the set of objects that are really
179-
unreachanle, the garbage collector scans again the container objects using the
180-
``tp_traverse`` slot with a diferent traverse function that marks objects with
182+
unreachable, the garbage collector scans again the container objects using the
183+
``tp_traverse`` slot with a different traverse function that marks objects with
181184
``gc_refs == 0`` as "tentatively unreachable" and then moves them to the
182185
tentatively unreachable list. The following image depicts the state of the lists in a
183-
moment when the GC processed the ``link 3`` and ``link 4`` objects but hasn’t
186+
moment when the GC processed the ``link 3`` and ``link 4`` objects but has not
184187
processed ``link 1`` and ``link 2`` yet.
185188

186189
.. figure:: images/python-cyclic-gc-3-new-page.png
@@ -193,12 +196,12 @@ already in what will become the reachable list):
193196

194197
When the GC encounters an object which is reachable (``gc_refs > 0``), it traverses
195198
its references using the ``tp_traverse`` slot to find all the objects that are
196-
reachable from it, marking moving them to the end oflist of reachable objects (where
199+
reachable from it, marking moving them to the end of the list of reachable objects (where
197200
they started originally) and setting its ``gc_refs`` field to 1. This is what happens
198201
to ``link 2`` and ``link 3`` below as they are reachable from ``link 1``. From the
199202
state in the previous image and after examining the objects referred to by ``link1``
200203
the GC knows that ``link 3`` is reachable after all, so it is moved back to the
201-
original list and its ``gc_refs`` field is set to one so if the GC vitis it again, it
204+
original list and its ``gc_refs`` field is set to one so if the GC visits it again, it
202205
does not that is reachable. To avoid visiting a object twice, the GC marks all
203206
objects that are not visited yet with and once an object is processed is unmarked so
204207
the GC does not process it twice.
@@ -212,24 +215,49 @@ process in really a breadth first search over the object graph. Once all the obj
212215
are scanned, the GC knows that all container objects in the tentatively unreachable
213216
list are really unreachable and can thus be garbage collected.
214217

218+
Why moving unreachable objects is better
219+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
220+
221+
It sounds logical to move the unreachable objects under the premise that most object
222+
are usually reachable, until you think about it: the reason it pays isn't actually
223+
obvious.
224+
225+
Suppose we create objects A, B, C in that order. They appear in the young generation
226+
in the same order. If B points to A, and C to B, and C is reachable from outside,
227+
then the adjusted refcounts after the first step of the algorith runs will be 0, 0,
228+
and 1 respectively because the only reachable object from the outside is C.
229+
230+
When the next step of the algorithm finds A, A is moved to the unreachable list. The
231+
same for B when it's first encountered. Then C is traversed, B is moved *back* to
232+
the reachable list. B is eventually traversed, and then A is moved back to the reachable
233+
list.
234+
235+
So instead of not moving at all, the reachable objects B and A are moved twice each.
236+
Why is this a win? A straightforward algorithm to move the reachable objects instead
237+
would move A, B, and C once each. The key is that this dance leaves the objects in
238+
order C, B, A - it's reversed from the original order. On all *subsequent* scans,
239+
none of them will move. Since most objects aren't in cycles, this can save an
240+
unbounded number of moves across an unbounded number of later collections. The only
241+
time the cost can be higher is the first time the chain is scanned.
242+
215243
Destroying unreachable objects
216244
------------------------------
217245

218246
Once the GC knows the list of unreachable objects, a very delicate process starts
219-
with the objective of completely destroying these objects. Roughtly, the process
247+
with the objective of completely destroying these objects. Roughly, the process
220248
follows these steps in order:
221249

222250
1. Handle and clean weak references (if any). If an object that is in the unreachable
223251
set is going to be destroyed and has weak references with callbacks, these
224-
callbacks need to be honored. This proces is **very** delicate as any error can
252+
callbacks need to be honored. This process is **very** delicate as any error can
225253
cause objects that will be in an inconsistent state to be resurrected or reached
226254
by some python functions invoked from the callbacks. To avoid this weak references
227255
that also are part of the unreachable set (the object and its weak reference
228256
are in a cycles that are unreachable) then the weak reference needs to be clean
229-
inmediately and the callback must not be executed so it does not trigger later
230-
when the ``tp_clear`` slot is called, causing havok. This is fine because both
257+
immediately and the callback must not be executed so it does not trigger later
258+
when the ``tp_clear`` slot is called, causing havoc. This is fine because both
231259
the object and the weakref are going away, so it's legitimate to pretend the
232-
weakref is going away first so the callback is never executed.
260+
weak reference is going away first so the callback is never executed.
233261

234262
2. If an object has legacy finalizers (``tp_del`` slot) move them to the
235263
``gc.garbage`` list.
@@ -240,7 +268,7 @@ follows these steps in order:
240268
finds the new subset of objects that are still unreachable by running the cycle
241269
detection algorithm again and continues with them.
242270
5. Call the ``tp_clear`` slot of every object so all internal links are broken and
243-
the reference counts fall to 0, triggering the destruction of all unreachabke
271+
the reference counts fall to 0, triggering the destruction of all unreachable
244272
objects.
245273

246274
Optimization: generations
@@ -253,9 +281,9 @@ creation. This has proven to be very close to the reality of many Python program
253281
many temporary objects are created and destroyed very fast. The older an object is
254282
the less likely is to become unreachable.
255283

256-
To take advatange of this fact, all container objects are segregated across
284+
To take advantage of this fact, all container objects are segregated across
257285
three spaces or "generations" (CPython currently uses 3 generations). Every new
258-
object starts in the firstgeneration (generation 0). The previous algorithm is
286+
object starts in the first generation (generation 0). The previous algorithm is
259287
executed only over the objects of a particular generation and if an object
260288
survives a collection of its generation it will be moved to the next one
261289
(generation 1), where it will it will be surveyed for collection less often. If
@@ -279,6 +307,41 @@ The content of these generations can be examined using the
279307
``gc.get_objects(generation=NUM)`` function and collections can be triggered
280308
specifically in a generation by calling ``gc.collect(generation=NUM)``.
281309

310+
.. code-block:: python
311+
312+
>>> import gc
313+
>>> class MyObj:
314+
... pass
315+
...
316+
317+
# Move everything to the last generation so its easier to inspect
318+
# the younger generations.
319+
320+
>>> gc.collect()
321+
0
322+
323+
# Create a reference cycle
324+
325+
>>> x = MyObj()
326+
>>> x.self = x
327+
328+
# Initially the object is in the younguest generation
329+
330+
>>> gc.get_objects(generation=0)
331+
[..., <__main__.MyObj object at 0x7fbcc12a3400>, ...]
332+
333+
# After a collection of the younguest generation the object
334+
# moves to the next generation
335+
336+
>>> gc.collect(generation=0)
337+
0
338+
>>> gc.get_objects(generation=0)
339+
[]
340+
>>> gc.get_objects(generation=1)
341+
[..., <__main__.MyObj object at 0x7fbcc12a3400>, ...]
342+
343+
344+
282345
Collecting the oldest generation
283346
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
284347

@@ -302,7 +365,7 @@ Optimization: reusing fields to save memory
302365
-------------------------------------------
303366

304367
In order to save memory, the two linked list pointers in every object with gc
305-
support are reused for several pourposes. This is a common optimization known
368+
support are reused for several purposes. This is a common optimization known
306369
as "fat pointers" or "tagged pointers": pointers that carry additional data,
307370
"folded" into the pointer, meaning stored inline in the data representing the
308371
address, taking advantage of certain properties of memory addressing. This is

0 commit comments

Comments
 (0)
0