-
Notifications
You must be signed in to change notification settings - Fork 1.3k
[RFC] Primary Design of the GC #2908
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I have setup a crate of linked list: https://crates.io/crates/cyclic_list. It does not use the tagged pointer, however. For the GC algorithm, I now prefer one in this paper: https://researcher.watson.ibm.com/researcher/files/us-bacon/Bacon01Concurrent.pdf. It fit the RC system well and have concurrent implementations which might be useful in RustPython. I'm going to start another crate to try some experiments. Once it is ready, I'll update this RFC and then open a PR. |
Just an idea based on the cpython gc, never work on gc before, maybe I am wrong. If we could assume:
Than we can safely gc via the snapped gc list without stw. But this may have more overhead:
|
@qingshi163 Yes, you are right. The algorithm described in this paper (https://researcher.watson.ibm.com/researcher/files/us-bacon/Bacon01Concurrent.pdf) does the things you mentioned. But, there might be no need to worry about references of out of the snapped gc list. It does not snapshot all objects, but only possible objects in cycles, detected by its coloring strategy. FYI, there have been some repos implemented this algorithm: |
There was a recent discussion about gc in CPython: https://discuss.python.org/t/switching-from-refcounting-to-libgc/1641/37 |
any progress update of this RFC ? |
@killme2008 I am sorry to miss your comment. No, we have nothing. |
a gc library written in rust https://github.com/mmtk/mmtk-core |
I'm sorry to say that I have no time to work on this (since my job has been changed). I'd rather close this issue. |
Found a newer paper(also written by Bacon e.t.c) https://link.springer.com/chapter/10.1007/978-3-540-31985-6_11, which based on taking a snapshot of ref count graph and based on their previous work. |
Uh oh!
There was an error while loading. Please reload this page.
Summary
This RFC is going to setup a minimal facilities for GC supports.
Here is the goal of this RFC:
Detailed Explanation
Background
As the design doc of CPython's GC (https://devguide.python.org/garbage_collector/), it is implemented by two pointers
https://github.com/python/cpython/blob/main/Include/internal/pycore_gc.h#L11-L20
The two pointers are not only dedicated pointers, but also coupled with flags in its lowest one or two bytes (in a 32bit machine, pointer addresses are typically aligned to 4 bytes, where the lowest two bits is
00
).Data Structures
The GC headers contains two pointers, so the first scaffolding might be
Let's talk about
next
andprev
one by one.The
next
pointerFor the
_gc_next
pointer, it is mainly used as a single linked-list (for generations lists and unreachable sets when doing collecting).However, the
next
pointer contains aNEXT_MASK_UNREACHABLE
mark on its lowest bit. So we might need a type likeTaggedPtr
:The
prev
pointerThe
_gc_prev
pointer is more complicated. Like thenext
pointer, it is a pointer with 2 kinds of tags. The tag class can be defined as:However, the
prev
pointer is also used as a GC reference counter during some stages of the GC algorithm. We can define a helper trait andimpl
for theprev
pointer:The only problem is that when the pointer is used as a counter, we must forbid it being used as a real pointer. Otherwise it might crash if we dereference the ill-formed pointer (which is exactly a pointer). It is one of the unresolved questions and will be talked later.
I have setup a simple playground for the pointers' data structures.
Algorithms
Having the data structures of the pointers, we can decribe the algorithms.
Refer to CPython's implementation, the main GC procedure can be described as
Drawbacks, Rationale, and Alternatives
The CPython's GC algorithm is straight-forward and not difficult to implement, thought there are a few works to do because the global allocator and deallocator might be revised to fit the GC.
However, there are some drawbacks:
Alternative GC algorithms are:
Maybe combining 1 and 2 in different generations is more reality because 3 requires too much in engineering and almost reproducing a yet-another JVM.
Anyway, the current GC algorithm CPython is a good way to start.
Unresolved Questions
There are still some unresolved questions.
1. Where should the GC header be put?
Currently, CPython puts the two pointers in front of the object header, and conversions between GC headers and object pointers are done by addition/subtraction of pointers. But in Rust, directly adding/subtracting pointers is not rather a good way.
One choice is to put the header in front of the object header, just like what CPython do. They are wrapped as:
Next, make
PyGcObject<T>
implementDeref
andDerefMut
withtaget = PyObject<T>
. Then the wrappedPyGcObject<T>
can be used like thePyObject<T>
before. But we need to redefine thetype PyObjectRef = PyRef<PyObject<Erased>>
totype PyGcObjectRef = PyRef<PyGcObject<Erased>>
. Also use theDeref
andDerefMut
trait to make it easier to use.In alternative, the GC header can be also put inside the
PyObject<T>
, like this:This needs less revision of the existing codes.
Another way is to put GC header at the beginning of the object payload. This needs additional pointer adding/subtracting when traversing on the GC list.
2. How to obtain the reference count of the object?
RustPython use Rust's
Rc
orArc
as its reference counting, which is not visible outside. Unless we pack the header together with a reference likewe can not easily get the reference count. However, packing the header with a
PyRef
can bring another problem: since theobj
is referred by thePyGcObject<T>
, the reference count will be implicitly added one.3. How to define the boundary of the safe and unsafe?
As the double-linked list is introduced, and the
prev
pointer is reused as GC reference count, it can be kind of unsafe if a counter is misused as a pointer and dereferencing might crash.I designed a counter based on mutable borrowing like this. If a pointer is borrowed (mutably) as a counter, it cannot be read or write as pointer, until the counter is dropped.
Mutably borrowing the pointer as a counter works fine when the pointers are used in a local scope:
But it seemed hard to work if there are many pointers that will be used as a counter in the same time. It is impossible to borrow each of them as a separate counter.
So much for this RFC. I will append something if more come into my mind. Any comments or suggestion is welcome!
The text was updated successfully, but these errors were encountered: