8000 [RFC] GC: Both Stop-the-World and On-the-fly · Issue #4158 · RustPython/RustPython · GitHub
[go: up one dir, main page]

Skip to content

[RFC] GC: Both Stop-the-World and On-the-fly #4158

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
discord9 opened this issue Sep 13, 2022 · 4 comments
Open

[RFC] GC: Both Stop-the-World and On-the-fly #4158

discord9 opened this issue Sep 13, 2022 · 4 comments
Labels
RFC Request for comments

Comments

@discord9
Copy link
Contributor
discord9 commented Sep 13, 2022

Draft PR: #4180

Summary

I propose we implement a cycle collector for our rustPython RefCount garbage collection system based on Bacon etc.'s work in this two paper:

  • Concurrent Cycle Collection in Reference Counted Systems, this paper come up with a sync(stop the world) cycle collector and a concurrent version of it, although I suggest only impl the sync version of it, for concurrent version is harder to implement and still limited in preformance which is not as good as the second paper we are introducing
  • An efficient on-the-fly cycle collection, this paper based on the sync cycle collector in previous paper, adding a "sliding view" algorithm to it, which is like a "copy on write" for Object so during tracing the GC can see a fixed version of object dependence graph while mutator can still freely mutate object whatever they want.

Detailed Explanation

The synchronous cycle collector is easier to impl, I suggest we implement a synchronous verison of cycle collector for proof of Concept, then modfiy it to a On-the-Fly version.

Synchronous Cycle Collection

The algo in "Concurrent Cycle Collection in Reference Counted Systems" is O(N + E) wrost-case time for collection at most.

For an object T in its' header there need fields denoted buffered(T), color(T),and RC(T). Where:

  • buffer is used to place potential roots of cyclic garbage, buffered(T) denote whether a object is in the buffer.
  • color is explain in this table(Red and Orange is only used in concurrent version):
Color Meaning
Black In use or free
Gray Possible member of cycle
White Member of garbage cycle
Purple Possible root of cycle
Green Acyclic
Red Candidate cycle undergoing $\Sigma$-computation
Orange Candidate cycle awaiting epoch boundary
  • RC is just a reference count(which doesn't even need to be atomic for a sync version).

Pseudocode and Explanation

In this section I just ref to paper because it explain much better than me, the basic idea is to first scan for possible cycles, then check for actual cycles:

Our synchronous algorithm is similar to Lins’ algorithm: when reference counts are decremented, we place potential roots of cyclic garbage into abuffer called Roots. Periodically,we process this buffer and look for cycles by subtracting internal reference counts.

There are two major changes that makethe algorithm linear time: first of all, we add a buffered flag to every object, which isused to prevent the same object being added to the root buffer more than once per cycle collection. This in turn places alinear bound on the size of the buffer.

Secondly,we analyze the entire transitiveclosure of Roots as asingle graph, rather than as aset of graphs. This means that the complexity of the algorithm islimited by the size of that transitiveclosure, which in turn islimited by $N + E$ (since Roots is bounded by $N$ by the use of the buffered flag). Of course, in practice we hope that the transitiveclosure will be significantly smaller.

In practice we found that the first change (the use of the buffered flag) made almost no difference in the running time of the algorithm; however,the second change (analyzing the entire graph at once) made an enormous difference in run-time. When we applied Lins’ algorithm unmodified to large programs, garbage collection delays extended into minutes.

There is also concurrent cycle collection, but that is not the main point today. Because the on-the-fly version is better both in performance and ease to implement.

Efficient On-the-Fly Cycle Collection

The basic idea is to use a "sliding view" method(which is a bit like "Copy on write"), and create a stable snapshot by adding a "LogPointer" to record the original status of object to create a virtual snapshot of the heap for cycle collection.

A crucial problem with repeated scanning arises when concurrent program threads modify the objects graph during the scan. This means that the collector cannot trust a scan to repeat the very same structure that a previous scan has traversed. Furthermore, as modifications occur concurrently with the scan, each specific scan cannot be guaranteed to view a consistent snapshot of the objects graph at any specific point in time. Such problems are the source of the two drawbacks of Bacon and Rajan’s on-the-fly cycle collector: a practical drawback and a theoretical one.

Data Struct

We can use work in #2908 for object layout, and add two new feature:"gc" and "on_the_fly_cc" which depend on "gc" feature:

#![cfg(feature = "gc")]

struct PyGcObject {
    header: PyGcHeader,
    obj: PyObject, //or directly use PyInner<Erased>
}

struct PyGcHeader {
    ref_cnt: RefCount,
    color: Color,
    #[cfg(feature = "on_the_fly_cc")]
    log_ptr: Option<LogPointer> // for on-the-fly dirty bit and pointer
}

And add a GcTrace trait should be implement on every type require GC.
edited: in#4180 add a field to PyInner instead which is clearer and easier to impl.

pub trait GcTrace {
    /// call tracer_fn for every GcOjbect owned by a dyn GcTrace Object
    /// # API Contract
    /// must make sure that every owned object is called with tracer_fn, or garbage collect won't act correctly.
    fn trace(&self, tracer_fn: &mut TracerFn);
}

/// A `TracerFn` is a callback function that is invoked for each `PyGcObjectRef` owned
/// by an instance of something.
pub type TracerFn<'a> = dyn FnMut(&mut PyGcObjectRef) + 'a;

Drawbacks, Rationale, and Alternatives

Alternative: impl #2908's CPython GC.

@discord9 discord9 added the RFC Request for comments label Sep 13, 2022
@killme2008
Copy link
Contributor

Cool, i think a synchronous version is a good start.

@youknowone
Copy link
Member

I want to comment:

  • Any trial will be great.
  • If we can use any GC library to reduce our work, it will be even better.

@jopemachine
Copy link
Contributor

If we can use any GC library to reduce our work, it will be even better.

@discord9 What do you think about porting mmtk?
This lib seems to be able to reduce RustPython's work.

@discord9
Copy link
Contributor Author
discord9 commented Sep 22, 2022

If we can use any GC library to reduce our work, it will be even better.

@discord9 What do you think about porting mmtk? This lib seems to be able to reduce RustPython's work.

I think it's totally doable!

I have build a simple gc(stop-the-world version, a cycle collector that rely on ref count) without mmtk(https://github.com/discord9/RustPython based on https://github.com/fitzgen/bacon-rajan-cc, still a WIP), but migrate to mmtk sound like a great idea!

edit:current version basically replace RefCount with GcHeader in every occurrence of ref_count(which is like only about just 18 of them and all in object/core.rs), so the work here for now isn't much
edit_1: maybe combine a ref count cycle collector and a mark-sweep tracer gc in the future?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
RFC Request for comments
Projects
None yet
Development

No branches or pull requests

4 participants
0