-
Notifications
You must be signed in to change notification settings - Fork 1.3k
(READY FOR REVIEW)Garbage collect: A stop-the-world cycle collector #4180
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Rust 1.64 came out a few hours ago, so there are some new compilation warnings and Clippy lints to deal with in a separate PR (#4182). Once that PR is merged, it might be a good idea to rebase this branch on top of a fresh copy of |
Just realize I still need make some change to make gc adopt to threading, need to think about it. |
bug: might need to store the number of PyWeak ref count to prevent premature dealloc in heap and access violation. Also PyWeak itself is storing a PyObjectRef which keep the object from drop, some change need to be made to change that |
CPython has a generational garbage collector with a total of three generations. The first generation has a threshold of 700 objects, and the next two have a threshold of 10 objects each. 1 I imagine that bumping the current threshold from 100 up to 700 objects may mitigate some of the performance decrease you're currently observing. Footnotes |
Sorry for the messy commit logs, will reword them later. |
When happens when you compile this with the |
Yes, just realize that, gonna fix it, also still debugging on the dead lock&double dealloc issue, my code is still very faulty |
This comment was marked as resolved.
This comment was marked as resolved.
@coolreader18 do you have advices? |
Might found a bug related to the old ref count system: #4238 ,still trying to find a fix |
Using PyRwLock is indeed very slow, might need to found out a way to trace PyMutex properly |
ED48
I've slightly taken a peak at the paper this is based on, I'm guessing the two improvements it mentions have been implemented, right? I.e, using the buffered flag to not re-add it to the collector list and checking the current items in the collector as a single graph and not as distinct components. |
Yes, this is the main improvement here, I write a buffered field for every object and there is a roots vec, actually the name of methods is corresponding between paper and the actual code |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you so much for working on this! I started this a week or 2 ago and then it slipped my mind after I took a break about halfway through looking at stuff. I haven't looked at the actual gc code, and though I'd really like to, to be honest I'm not sure I'll have the bandwidth to do so. If no other maintainers feel able to either, worst case we mark it as a sort of experimental feature, but the test suite passing fine gives me confidence that it's working more or less as it should. It's not like there haven't already been multithreaded gc issues even without the tracing gc, lol.
stdlib/src/gc.rs
Outdated
Err(vm.new_not_implemented_error("".to_owned())) | ||
#[cfg(feature = "gc")] | ||
{ | ||
rustpython_vm::object::disable(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good for these to have more descriptive names, either a gc_
prefix or a separate module. Also the _gc module could probably get moved to rustpython-vm proper.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also the
gc
module could probably get moved to rustpython-vm proper.
Indeed, in CPython, the gc
module is compiled into the interpreter (rather than being dynamically linked):
Python 3.11.0 (main, Oct 25 2022, 13:57:33) [Clang 14.0.0 (clang-1400.0.29.202)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import gc
>>> gc.__file__
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: module 'gc' has no attribute '__file__'. Did you mean: '__name__'?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good for these to have more descriptive names, either a
gc_
prefix or a separate module. Also the _gc module could probably get moved to rustpython-vm proper.
how about "gc_bacon" as feature name(as bacon is the papers' author)?
/// use on struct with named fields like `struct A{x:i32, y:i32}` to impl `Trace` for datatype | ||
/// | ||
/// use `#[notrace]` on fields you wish not to trace | F438||
#[proc_macro_attribute] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could probably be a normal derive() macro, and then you wouldn't have to deal with removing the notrace attr
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could probably be a normal derive() macro, and then you wouldn't have to deal with removing the notrace attr
Yes, #[notrace]
is only used four time in th following, still considering whether or not to remove it and hand write Trace:
#[pyclass(module = false, name = "enumerate")]
#[derive(Debug)]
#[pytrace]
pub struct PyEnumerate {
#[notrace]
counter: PyRwLock<BigInt>,
iterator: PyIter,
}
#[pyclass(module = false, name = "iterator")]
#[derive(Debug)]
#[pytrace]
pub struct PySequenceIterator {
// cached sequence methods
#[notrace]
seq_methods: &'static PySequenceMethods,
internal: PyMutex<PositionIterInternal<PyObjectRef>>,
}
#[pyclass(module = false, name = "zip")]
#[derive(Debug)]
#[pytrace]
pub struct PyZip {
iterators: Vec<PyIter>,
#[notrace]
strict: PyAtomic<bool>,
}
#[derive(Debug, Clone)]
#[pytrace]
pub struct ArgMapping {
obj: PyObjectRef,
#[notrace]
methods: &'static PyMappingMethods,
}
vm/src/object/core.rs
Outdated
} | ||
std::alloc::dealloc( | ||
x.cast(), | ||
std::alloc::Layout::for_value(x.cast::<PyInner<T>>().as_ref().unwrap()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
std::alloc::Layout::for_value(x.cast::<PyInner<T>>().as_ref().unwrap()), | |
std::alloc::Layout::for_value(&*x.cast::<PyInner<T>>()), |
Maybe also could move this out of the function call and into a variable, just to be safe wrt stacked borrows (not sure if the reference is still "live" while dealloc() runs)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe also could move this out of the function call and into a variable, just to be safe wrt stacked borrows (not sure if the reference is still "live" while dealloc() runs)
Will do, yes I also have doubt on some of my code's soundness, especially with dealloc&drop
vm/src/object/core.rs
Outdated
} | ||
)else* | ||
}; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm... what if instead of this, PyObjectPayload
required Trace (and stuck it in the PyObjVtable), but there's a DummyTrace trait that's safe and marks a object that isn't actually Trace. like:
unsafe trait Trace {
const IS_TRACE: bool = true;
fn trace(&self, ...);
}
trait DummyTrace {} // or NoTrace or something
unsafe impl<T: DummyTrace> Trace for T {
const IS_TRACE: bool = false;
fn trace(&self, ...) {}
}
And #[pyclass]
could by default add a DummyTrace impl, unless specifically asked to pytrace it.
Oh and instead of TraceHelper::is_traceable
the vtable has a trace: Option<...>
that's None if IS_TRACE is false. Might be faster too, rather than checking 70 different TypeIds it's just one virtual function call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm... what if instead of this,
PyObjectPayload
required Trace (and stuck it in the PyObjVtable), but there's a DummyTrace trait that's safe and marks a object that isn't actually Trace. like:unsafe trait Trace { const IS_TRACE: bool = true; fn trace(&self, ...); } trait DummyTrace {} // or NoTrace or something unsafe impl<T: DummyTrace> Trace for T { const IS_TRACE: bool = false; fn trace(&self, ...) {} }And
#[pyclass]
could by default add a DummyTrace impl, unless specifically asked to pytrace it.Oh and instead of
TraceHelper::is_traceable
the vtable has atrace: Option<...>
that's None if IS_TRACE is false. Might be faster too, rather than checking 70 different TypeIds it's just one virtual function call.
Interesting idea, will try refactor into that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might rename those to MaybeTrace
try_trace
and make is_trace
a associated const
Maybe we can create a separate CI check that runs the GC with a select few python test files? The idea is to at least have it in and running relevant tests on it while not slowing down the total runtime. |
Is this PR still in review? @youknowone @discord9 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am really sorry for late review.
I am not understanding how this gc is working - and I seem to not have enough time to get it before finish review - but I'd like to merge it as an optional feature and a good starting point to enhance gc better in future.
But before merging this change, I hope to minimize changes of CPython-originated code because changed CPython code will be easily forgotten from future developers and it even might be broken when the future enhancement is correct but not compatible with this change.
let is_dead = self.is_dead.lock(); | ||
if *is_dead { | ||
return true; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
7802 The reason will be displayed to describe this comment to others. Learn more.
How does this handled in CPython? Do we really need an additional is_dead
field instead of reusing parent
field well?
vm/src/object/core.rs
Outdated
} | ||
#[cfg(not(feature = "gc_bacon"))] | ||
{ | ||
self.0.ref_count.inc(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we'd better to have unified interface for ref_count handling for #[cfg(not(feature = "gc_bacon"))]
in future.
I am thinking about update document on how this GC works, perhaps even add it to documents of RustPython? |
That will be great in future. I think separating gc feature and reduce effect to non-gc build as it was will be good enough for now. |
i rebased it on main branch |
I will fix the CI |
vm/src/frame.rs
Outdated
gc_count += 1; | ||
if gc_count > 1000 { | ||
#[cfg(feature = "gc_bacon")] | ||
{ | ||
crate::object::try_gc(); | ||
} | ||
gc_count = 0; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Idk well about gc, so I'd like to learn about the reasoning.
is this the preferred place to put try_gc?
and how the number 1000 is decided? isn't it a potential cause the reason gc is too slow because it is too small?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Idk well about gc, so I'd like to learn about the reasoning. is this the preferred place to put try_gc? and how the number 1000 is decided? isn't it a potential cause the reason gc is too slow because it is too small?
My bad, I should mimic CPython(https://github.com/python/cpython/blob/7703def37e4fa7d25c3d23756de8f527daa4e165/Python/ceval.c#L833), here in handlePending it does necessary check whether to gc or not, I should rewrite this part
According to CPython:
/* Do periodic things, like check for signals and async I/0.
* We need to do reasonably frequently, but not too frequently.
* All loops should include a check of the eval breaker.
* We also check on return from any builtin function.
*/
Maybe only check for it after certain set of Bytecode is executed or return from a function?
Lib/test/test_ordered_dict.py
Outdated
# TODO: RUSTPYTHON, Need to fix this: somehow after del A, it takes two call to `gc.collect()` | ||
# for gc to realize a loop is there and to be collected |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this still a valid comment for this test?
de08f82
to
0c11636
Compare
feat: add double drop check in debug mode fix: use Mutex instead of PyMutex in `ID2TYPE` refactor: cfg cond for `Drop` trait instead feat: add `Trace` trait feat: trace RwLock right&trace tuple feat: `Trace` for `PyDict` feat: `Trace` on `PyIter`&`PyIterReturn`&`PyIterIter` feat: `Trace` on PyEnumerate feat: `Trace` on `ArgCallable` `ArgIterable` `ArgMapping` `ArgSequence` feat: `Trace` on `IterStatus` `PySequenceIterator` `PyCallableIterator` `PositionIterInternal` feat: `Trace` on `PyReverseSequenceIterator` feat: `Trace` on `PyTuple` `PyTupleIterator` `PyTupleTyped` feat: `Trace` on `PyFilter` `PyFunction` `PyBoundMethod` feat: `Trace` on `PyCell` feat: `Trace` on `PyList` `PyListIterator` `PyListReverseIterator` feat: `Trace` on `PyMap` `PyMappingProxy` `MappingProxyInner` feat: `Trace` on PyMemoryViewNewArgs, PyMemoryViewIterator feat: `Trace` on PyProperty, PySet, PySetInner feat: `Trace` on PySlice, PyStaticMethod feat: `Trace` on FuncArgs, KwArgs, PosArgs, OptionalArg feat: `Trace` on PySuper, PySuperNewArgs feat: `Trace` on `PyTraceback` feat: `Trace` for PyBaseException, PyType, PyUnion feat: `Trace` on PyWeakProxy, PyZip, PyBuffer feat: `Trace` on PyMapping, PyNumber, PySequence feat: add `list_traceable` macro fix: right lifetime for `TracerFn` feat: `trace` PyObjectRef&PyRef feat: garbage cycle collector fix: put drop_only in different loop feat: core algorithm of garbage collect feat: add drop_only&dealloc_only to vtable feat: modify core.rs to use gc style: cargo fmt feat: add `try_gc`& gc per frame feat: add `collect` in gc module fix: set black when safe_inc fix: check if is gc-ing in `should_gc` refactor: cfg(gc) for `Drop` trait fix: not add to roots multiple times fix: add judge for if dropped doc: add TODO fix: prevent dealloc cycle garbage early fix: `partially_drop` header later fix: add dealloc guard for deref fix: run `__del__`&drop separately feat: more lock to gc&drop check feat: make gc less freq fix: cfg compile&support attr in partially_drop feat: `pytrace` macro feat: use `#[pytrace]` in some types feat: compact header feat: change gc cond to 10007 obj cnts fix: trace `PyRange` fix: drop ref vtable before dealloc to prevent UB fix: debug check&cfg cond&clippy fix: add ref only after `__del__` is done feat: trace(unsafely ) PyMutex feat: prevent trace PyMutex when not gc feat: change `PyRwlock` back to `PyMutex` fix: testcase test_reference_loop test_unique_composite refactor: early exit of collect_cycles fix: cfg cond feat: gc pause warn msg when wait too long fix: not run __del__ in cycles fix: expected failure for test_unique_composite fix: allow test_ioctl_signed_unsigned_code_param feat: split `drop` to `del`&`weakref` fix: lock cond fix: pause cond so high freq gc not halt all refactor: put impl Collector together feat: drop weak_list later feat: unlock gc pause lock fairly feat: print progress for two long test feat: adjust lock order¬ panic fix: let obj handle its weakref's dealloc fix: check stack before pop fix: not leak weakref log: remove some false alarm fix: cfg flag for cond compile fix: cfg flag for no-default-feature fix: use non-block incref test: change gc to 1ms&exit all if wait too long fix: INCREF done right¬ gc until last gc done feat: add `gc` feature to root crate doc: TODO for PEP442 del in cycle fix: temporaily add one more `gc.collect()` test: add gc feature in CI refactor: make `mark/scan_roots` associated fn refactor: `free_cycles` fn test: flush progress prompt docs: add TODO for modified testcases refactor: header state's type feat: drop_only log: gc info clippy: drop_only allow unused refactor: rename `gc` feature to `gc_bacon` refactor: remove `allow(unused)` feat: `MaybeTrace` trait feat: add `trace` Meta for `pyclass` proc macro feat: cfg cond flag for `MaybeTrace` feat: add `trace` in vtable fix: add`#[pyclass(trace)` for manual impl trace fix: elide null check in vtable&CR refactor fix: change name in CI tests feat: not use gc in wasm refactor: accord to Code Review doc&fix: explain gc&fix macro fix: test_sys_setprofile
@discord9 @youknowone do we have some blockers here? Or it's waiting for a rebase and we generally agree on its merge? |
Currently, an old bug stemming from code predating the implementation of garbage collection is causing memory corruption because the garbage collector is indeed collecting that memory. I will open a new pull request once I have fixed those bugs and can reliably pass continuous integration. |
@discord9 I am sorry miss the comment. I wish we finally could merge it. If there is any PyTraverse update, could we merge it first? Regardless how GC designed later, it will be useful. |
That might be extend to be a on-the-fly version. Based on #4158
Summary of GC Algorithm
the algorithm store a buffer of object thats' "possible roots" of garbage cycles(i.e. if a object is decrement but not to zero, its a possible root) then, during gc, algorithm have three main phase:
Progress
gc()
as a python function(in the stdlib), now invokegc()
in python will force a cycle collect to h 8000 appen, return number of cyclic garbage collected(and freed)PyInner<T>
, but I dropPyInner<Erased>
, need correct that.(Fixed)gc()
to instruction exec loop)PyObject
directly instead of dyn dispatch in gcgc()
to its' own module in stdlibstdlib::io::_io::BufferedReader
(and other types in _io module) keeps double freeheader
later to prevent UB__del__
, drop, dealloc separately to prevent UBheader
's most field into one bytefuture features for GC
GC condition
gc will be triggered if the internal buffer(
roots
) have exceed 10007(Just a random reasonably large number) element and at least 100ms after last gc, that means after 10007 different objects get decref after last gc, a new gc will be trigger,gc()
is only called in the loop of executing instruction, but also reexport agc()
to stdlib so one can callgc()
from Python side(which will force a gc regardless of above conditon).stop-the-world is impl by try to blocking on a pausing
Mutex
owned by gc whenever try to deref PyObjectRef/PyRef/PyBasic ideas
cycle collector algorithm:
Data struct
Example
for code as below:
With and without gc, the memory usage is(run on a win10(19044.2006) with rustc 1.64.0, might vary between different runs):
currently
GcTrace
d types are: