8000 [RFC] Primary Design of the GC · Issue #2908 · RustPython/RustPython · GitHub
[go: up one dir, main page]

Skip to content

[RFC] Primary Design of the GC #2908

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
frank-king opened this issue Aug 18, 2021 · 9 comments
Closed

[RFC] Primary Design of the GC #2908

frank-king opened this issue Aug 18, 2021 · 9 comments
Labels
RFC Request for comments

Comments

@frank-king
Copy link
Contributor
frank-king commented Aug 18, 2021

Summary

This RFC is going to setup a minimal facilities for GC supports.

Here is the goal of this RFC:

  • define the data structures and main algorithms of GC,
  • give an implementation of GC for "header-only" objects (with dummy payload), which is easy to verify and able to be assembled into the VM.

Detailed Explanation

Background

As the design doc of CPython's GC (https://devguide.python.org/garbage_collector/), it is implemented by two pointers
https://github.com/python/cpython/blob/main/Include/internal/pycore_gc.h#L11-L20

/* GC information is stored BEFORE the object structure. */
typedef struct {
    // Pointer to next object in the list.
    // 0 means the object is not tracked
    uintptr_t _gc_next;

    // Pointer to previous object in the list.
    // Lowest two bits are used for flags documented later.
    uintptr_t _gc_prev;
} PyGC_Head;

The two pointers are not only dedicated pointers, but also coupled with flags in its lowest one or two bytes (in a 32bit machine, pointer addresses are typically aligned to 4 bytes, where the lowest two bits is 00).

  • The _gc_prev field is normally used as the “previous” pointer to maintain the doubly linked list but its lowest two bits are used to keep the flags PREV_MASK_COLLECTING and _PyGC_PREV_MASK_FINALIZED. Between collections, the only flag that can be present is _PyGC_PREV_MASK_FINALIZED that indicates if an object has been already finalized. During collections _gc_prev is temporarily used for storing a copy of the reference count (gc_refs), in addition to two flags, and the GC linked list becomes a singly linked list until _gc_prev is restored.
  • The _gc_next field is used as the “next” pointer to maintain the doubly linked list but during collection its lowest bit is used to keep the NEXT_MASK_UNREACHABLE flag that indicates if an object is tentatively unreachable during the cycle detection algorithm. This is a drawback to using only doubly linked lists to implement partitions: while most needed operations are constant-time, there is no efficient way to determine which partition an object is currently in. Instead, when that’s needed, ad hoc tricks (like the NEXT_MASK_UNREACHABLE flag) are employed.

Data Structures

The GC headers contains two pointers, so the first scaffolding might be

struct PyGcObject<T> {
    header: PyGcHeader,
    obj: PyObject<T>,
}
struct PyGcHeader<T> {
    next: Option<NonNull<PyGcObject<T>>>,
    prev: Option<NonNull<PyGcObject<T>>>,
}

Let's talk about next and prev one by one.

The next pointer

For the _gc_next pointer, it is mainly used as a single linked-list (for generations lists and unreachable sets when doing collecting).

However, the next pointer contains a NEXT_MASK_UNREACHABLE mark on its lowest bit. So we might need a type like TaggedPtr:

struct TaggedPtr<T, TAG: Tag> {
    ptr: usize,
    _marker: PhantomData<*const T>,
    _tag: PhantomData<TAG>,
}

trait Tagged<T, TAG: Tag>: Sized {
    fn new(ptr: Option<NonNull<T>>, tags: TAG) -> Self;
    fn get_ptr(&self) -> Option<NonNull<T>>; 
    fn set_ptr(&mut self, ptr: NonNull<T>); 
    fn unset_ptr(&mut self);                
    fn set_tag(&mut self, tag: TAG);        // self.ptr |= tag;
    fn unset_tag(&mut self, tag: TAG);      // self.ptr &= !tag;
    fn clear_tags(&mut self);               // self.ptr &= PTR_MASK;
    fn test_tag(&self, tag: TAG) -> bool;   // (self.ptr & tag) as bool
    fn get_tags(&self) -> TAG;              // self.ptr & TAG_MASK
    fn get(&self) -> (Option<NonNull<T>>, TAG) {
        (self.get_ptr(), self.get_tags())
    }
}

trait Tag: From<usize> + Into<usize> + <
8000
span class="pl-smi">BitOr + PartialEq + Eq {
    fn none() -> Self {
        0.into()
    }
}

impl<T, TAG: Tag> Tagged<T, TAG> for TaggedPtr<T, TAG> {/* ... */}

struct TagNext {
    const NEXT_MASK_UNREACHABLE = 0b01;
}

The prev pointer

The _gc_prev pointer is more complicated. Like the next pointer, it is a pointer with 2 kinds of tags. The tag class can be defined as:

struct TagPrev {
   const PREV_MASK_COLLECTING = 0b01;
   const PREV_MASK_FINALIZED = 0b10;
}

However, the prev pointer is also used as a GC reference counter during some stages of the GC algorithm. We can define a helper trait and impl for the prev pointer:

trait Counter {
    fn get(&self) -> usize;
    fn set(&mut self, cnt: usize);
    fn inc(&mut self);
    fn dec(&mut self); 
}
impl Counter for ...

The only problem is that when the pointer is used as a counter, we must forbid it being used as a real pointer. Otherwise it might crash if we dereference the ill-formed pointer (which is exactly a pointer). It is one of the unresolved questions and will be talked later.

I have setup a simple playground for the pointers' data structures.

Algorithms

Having the data structures of the pointers, we can decribe the algorithms.

Refer to CPython's implementation, the main GC procedure can be described as

/// find all objects that is unreachable from outside the `head`, and move than into the `unreachable` list.
fn deduce_unreachable(PyGcObject<T> head, PyGcObject<T> unreachable) {
    // initialize the GC reference counts of all the object in list `head` as their object reference counts.
    init_gc_refs(head);
    // for each object in list `head`, visit each of its referent (not recursively) and decrement its GC reference count. 
    substract_gc_refs(head);
    // move all of the objects in list `head` whose GC reference count is 0, to list `unreachable`.
    // but if encountered an object that has non-zero GC reference count, move all of its referent back to list `head`.
    move_unreachable(head, unreachable);
    //
}

/// main logics of GC collecting. finalizations and wake reference processing is omitted here.
fn collect() {
    // find all unreachable objects and move to the list `unreachable`.
    deduce_unreachable(head, unreachable);
    // drop all of the objects in list `unreachable`
    drop_all_in(unreachable);
}

Drawbacks, Rationale, and Alternatives

The CPython's GC algorithm is straight-forward and not difficult to implement, thought there are a few works to do because the global allocator and deallocator might be revised to fit the GC.

However, there are some drawbacks:

  • it is single-threaded, and stop-the-world,
  • mixing RC and GC might be inefficient. Especially for multithreading, there are lots of atomic inc/dec in RC.

Alternative GC algorithms are:

  1. The tricolor mark-and-sweep algorithm,
  2. The copy-and-compaction algorithm, (2 times memory consumption)
  3. GC algorithms in JVM, like CMS, PS, G1, ZGC, etc., (too enormous in engineering)

Maybe combining 1 and 2 in different generations is more reality because 3 requires too much in engineering and almost reproducing a yet-another JVM.

Anyway, the current GC algorithm CPython is a good way to start.

Unresolved Questions

There are still some unresolved questions.

1. Where should the GC header be put?

Currently, CPython puts the two pointers in front of the object header, and conversions between GC headers and object pointers are done by addition/subtraction of pointers. But in Rust, directly adding/subtracting pointers is not rather a good way.

One choice is to put the header in front of the object header, just like what CPython do. They are wrapped as:

struct PyGcObject<T> {
    header: PyGcHeader<T>,
    obj: PyObject<T>,
}

Next, make PyGcObject<T> implement Deref and DerefMut with taget = PyObject<T>. Then the wrapped PyGcObject<T> can be used like the PyObject<T> before. But we need to redefine the type PyObjectRef = PyRef<PyObject<Erased>> to type PyGcObjectRef = PyRef<PyGcObject<Erased>>. Also use the Deref and DerefMut trait to make it easier to use.

In alternative, the GC header can be also put inside the PyObject<T>, like this:

struct PyObject<T> {
    gc: PyGcHeader<T>,
    inner: ManuallyDrop<PyInner<T>>,
}

This needs less revision of the existing codes.

Another way is to put GC header at the beginning of the object payload. This needs additional pointer adding/subtracting when traversing on the GC list.

2. How to obtain the reference count of the object?

RustPython use Rust's Rc or Arc as its reference counting, which is not visible outside. Unless we pack the header together with a reference like

struct PyGcObject<T> {
    header: PyGcHeader<T>,
    obj: PyRef<T>,
}

we can not easily get the reference count. However, packing the header with a PyRef can bring another problem: since the obj is referred by the PyGcObject<T>, the reference count will be implicitly added one.

3. How to define the boundary of the safe and unsafe?

As the double-linked list is introduced, and the prev pointer is reused as GC reference count, it can be kind of unsafe if a counter is misused as a pointer and dereferencing might crash.

I designed a counter based on mutable borrowing like this. If a pointer is borrowed (mutably) as a counter, it cannot be read or write as pointer, until the counter is dropped.

type PyGcPrev<T> = TaggedPtr<T, TagPrev>;

impl<T> PyGcPrev<T> {
    fn get_counter<'a>(&'a mut self) -> impl Counter + 'a {
        GcCounter(&mut self.ptr)
    }
}

impl Counter for GcCounter<'_> {/* ... */}

Mutably borrowing the pointer as a counter works fine when the pointers are used in a local scope:

{
    let mut counter = tagged_ptr.get_counter();
    counter.set(5);
    assert_eq!(counter.get(), 5);
    counter.inc();
    // assert_ne!(tagged_ptr.get_ptr(), ptr1); // should not compile
    assert_eq!(counter.get(), 6);
    counter.dec();
    assert_eq!(counter.get(), 5);
}

But it seemed hard to work if there are many pointers that will be used as a counter in the same time. It is impossible to borrow each of them as a separate counter.


So much for this RFC. I will append something if more come into my mind. Any comments or suggestion is welcome!

@frank-king frank-king added the RFC Request for comments label Aug 18, 2021
@frank-king frank-king changed the title [Draft] [RFC] Primary Design of the GC Header [Draft] [RFC] Primary Design of the GC Aug 21, 2021
@frank-king frank-king changed the title [Draft] [RFC] Primary Design of the GC [RFC] Primary Design of the GC Aug 21, 2021
@frank-king
Copy link
Contributor Author

I have setup a crate of linked list: https://crates.io/crates/cyclic_list. It does not use the tagged pointer, however.

For the GC algorithm, I now prefer one in this paper: https://researcher.watson.ibm.com/researcher/files/us-bacon/Bacon01Concurrent.pdf. It fit the RC system well and have concurrent implementations which might be useful in RustPython. I'm going to start another crate to try some experiments. Once it is ready, I'll update this RFC and then open a PR.

@qingshi163
Copy link
Contributor

Just an idea based on the cpython gc, never work on gc before, maybe I am wrong.

If we could assume:

  • on a snap of the gc list, if we find some objects should drop, these objects can never be used in the future;
  • on a snap of the gc list, if we find some objects should drop but not been dropped via reference count, means these objects can never drop from outside of the gc because cycling reference.

Than we can safely gc via the snapped gc list without stw.

But this may have more overhead:

  • when we treversing the container object, we may get the new child that is out of the snapped gc list, maybe a mark is needed;
  • we need to keep the reference in the snapped gc list.
  • when we treversing the container object, we need to get the snapped gc list node from the object.

@frank-king
Copy link
Contributor Author

@qingshi163 Yes, you are right. The algorithm described in this paper (https://researcher.watson.ibm.com/researcher/files/us-bacon/Bacon01Concurrent.pdf) does the things you mentioned.

But, there might be no need to worry about references of out of the snapped gc list. It does not snapshot all objects, but only possible objects in cycles, detected by its coloring strategy.

FYI, there have been some repos implemented this algorithm:
https://github.com/fitzgen/bacon-rajan-cc
https://github.com/jrmuizel/cc-mt

@youknowone
Copy link
Member

There was a recent discussion about gc in CPython: https://discuss.python.org/t/switching-from-refcounting-to-libgc/1641/37

@killme2008
Copy link
Contributor

any progress update of this RFC ?

@youknowone
Copy link
Member

@killme2008 I am sorry to miss your comment. No, we have nothing.

@youknowone
Copy link
Member

a gc library written in rust https://github.com/mmtk/mmtk-core

@frank-king
Copy link
Contributor Author

I'm sorry to say that I have no time to work on this (since my job has been changed). I'd rather close this issue.

@discord9
Copy link
Contributor

I have setup a crate of linked list: https://crates.io/crates/cyclic_list. It does not use the tagged pointer, however.

For the GC algorithm, I now prefer one in this paper: https://researcher.watson.ibm.com/researcher/files/us-bacon/Bacon01Concurrent.pdf. It fit the RC system well and have concurrent implementations which might be useful in RustPython. I'm going to start another crate to try some experiments. Once it is ready, I'll update this RFC and then open a PR.

Found a newer paper(also written by Bacon e.t.c) https://link.springer.com/chapter/10.1007/978-3-540-31985-6_11, which based on taking a snapshot of ref count graph and based on their previous work.
Also after read "Concurrent Cycle Collection in Reference Counted Systems", I also start a new crate https://github.com/discord9/cc_bacontrying to impl those algorithm.
But I found it seems only support single thread?(I mean one thread for main program and one for concurrent garbage collector) For example, if two ref cnt is dec at same time in different thread with strong count equals two, then it could lead to dangling pointer or even double free, depending on exec order, the only solution seems to be add a lock(or a multiple producer single consumer queue?) to serialize inc/dec ref cnt op. Which could impact on throughput through.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
RFC Request for comments
Projects
None yet
Development

No branches or pull requests

5 participants
0