-
-
Notifications
You must be signed in to change notification settings - Fork 32.3k
gh-135552: Make the GC clear weakrefs later. #136189
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Clear the weakrefs to unreachable objects after finalizers are called.
I can confirm this PR fixes the gh-132413 issue as well. |
I think this fixes (or mostly fixes) gh-91636 as well. |
🤖 New build scheduled with the buildbot fleet by @nascheme for commit 12f0b5c 🤖 Results will be shown at: https://buildbot.python.org/all/#/grid?branch=refs%2Fpull%2F136189%2Fmerge If you want to schedule another build, you need to add the 🔨 test-with-buildbots label again. |
This introduces refleaks, it seems. One of the leaking tests: |
This is the smallest leaking example I could make so far. Something with
|
Other leaking examples (on Windows): 1. test_logging:import logging
import logging.config
import logging.handlers
from multiprocessing import Queue, Manager
class ConfigDictTest(unittest.TestCase):
def test_multiprocessing_queues_XXX(self):
config = {
'version': 1,
'handlers' : {
'spam' : {
'class': 'logging.handlers.QueueHandler',
'queue': Manager().Queue() , # Leak
# 'queue': Manager().JoinableQueue() # Leak
# 'queue': Queue(), # No leak
},
},
'root': {'handlers': ['spam']}
}
logger = logging.getLogger()
logging.config.dictConfig(config)
while logger.handlers:
h = logger.handlers[0]
logger.removeHandler(h)
h.close() 2. test_interpreters.test_api:import contextlib
import threading
import types
from concurrent import interpreters
def func():
raise Exception('spam!')
@contextlib.contextmanager
def captured_thread_exception():
ctx = types.SimpleNamespace(caught=None)
def excepthook(args):
ctx.caught = args
orig_excepthook = threading.excepthook
threading.excepthook = excepthook
try:
yield ctx
finally:
threading.excepthook = orig_excepthook
class TestInterpreterCall(unittest.TestCase):
def test_call_in_thread_XXX(self):
interp = interpreters.create()
call = (interp._call, interp.call)[1] # 0: No leak, 1: Leak
with captured_thread_exception() as _:
t = threading.Thread(target=call, args=(interp, func, (), {}))
t.start()
t.join() UPDATE: import weakref, _interpreters
wd = weakref.WeakValueDictionary()
class TestInterpreterCall(unittest.TestCase):
def test_call_in_thread_XXX(self):
id = _interpreters.create()
wd[id] = type("", (), {})
_interpreters.destroy(id) |
The majority (maybe all) of these leaks are caused by the |
This avoids breaking tests while fixing the issue with tp_subclasses. In the long term, it would be better to defer the clear of all weakrefs, not just the ones referring to classes. However, that is a more distruptive change and would seem to have a higher chance of breaking user code. So, it would not be something to do in a bugfix release.
Nope, that doesn't fix all the leaks. And having to explicitly clean the weakrefs from the WeakValueDictionary really shouldn't be needed, I think. The For the purposes of having a fix that we can backport (should probably be backported to all maintained Python versions), a less disruptive fix would be better. To that end, I've changed this PR to only defer clearing weakrefs to class objects. That fixes the |
🤖 New build scheduled with the buildbot fleet by @nascheme for commit 2f3daba 🤖 Results will be shown at: https://buildbot.python.org/all/#/grid?branch=refs%2Fpull%2F136189%2Fmerge If you want to schedule another build, you need to add the 🔨 test-with-buildbots label again. |
We need to clear those before executing the callback. Since this ensures they can't run a second time, we don't need _PyGC_SET_FINALIZED(). Revise comments.
Ah, the I revised this PR to be something that is potentially suitable for backporting. To minimize the behaviour change, I'm only deferring the clear of weakrefs that refer to types (in order to allow tp_subclasses to work) or with callbacks. I still have an extra clearing pass that gets done after the finalizers are called. That avoids bugs like #91636. If this gets merged, I think we should consider deferring clearing all weakrefs (without callbacks) until after finalizers are executed. I think that's more correct since it gives the finalizers a better opportunity to do their job. |
* *not* be cleared so that caches based on the type version are correctly | ||
* invalidated (see GH-135552 as a bug caused by this). | ||
*/ | ||
clear_weakrefs(&final_unreachable); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ough! Nice trick with final_unreachable
!
This is a bit trickly because the wrlist can be changing as we iterate over it.
I think this PR, as is, will no longer fix GH-132413. Only weakrefs with callbacks or weakrefs to classes are deferred from clearing. GH-132413 is caused by a weakref to a module object being cleared early. Depending on how serious we think that bug is, we could change this PR back to deferring the clear of all weakrefs without callbacks. Doing that doesn't break many unit tests (just a couple in |
Weakref can be deferred from clearing if a dummy callback is given, right? If so, the following result on the PR is unexpected to me: import itertools as mod
callback = lambda wr: None
def gen():
import weakref
wr = weakref.ref(mod, callback)
print(1, wr())
try:
yield
finally:
print(2, wr())
it = gen()
next(it)
print('//')
I guess gh-132413 could have another backporable workaround like this. |
Oops, I wrote that wrong. If there is a callback, the weakref is cleared early. There are callbacks that expect that and don't do the right think if the weakref is not cleared first. At least, the |
Then, would it be possible to internally introduce a special weakref that is guaranteed to be cleared later? |
I tried to use sentinel callback for this here - #136147. Idea was in using noop-callback for this and not changing weakref code. But this solution has some flaws with lifetime of callback. But in general case you can't distinguish what weakref should be cleared later (IIUC). |
+1 for all weakrefs, at least for ones to modules when the gc runs with |
IIUC, there maybe cases when IIUC, because PEP-442 landed we can remove weakrefs here Line 105 in 0c3e3da
and there cpython/Lib/importlib/_bootstrap.py Line 65 in 0c3e3da
and use a strong reference that will outlive the KeyedRefs in the dictionaries. I replaced the weakref with a strong reference in the _bootstrap.py and checked with refleak buildbot (#136345). No leaks were found. @nascheme WDYT? |
Weakref with a callback needs to be cleared early (as-is), accounting for the 3rd-party applications that have similar logic to |
I'm not against this. But IIUC (feel free to correct me) we can face a situation in which weakref to the |
The experiment #136345 seems currently invalid, which is based on main and not effective on this PR where I omitted calling |
If we use a strong reference, then it doesn't matter if we omit |
Before discussing, have you checked on this PR? I can see the leaks. |
Results
|
Let's move on to #136345. (The result above looks long here.) |
GH-136401 was merged. I think it should be backported as well since it should be pretty low risk. The main branch has been merged into this PR. I've updated the PR to defer clearing all weakrefs without callbacks (not just weakrefs to classes). This seems more correct to me (rather than special casing weakrefs to classes) and seems the best way to fix GH-132413. I think GH-132413 is actually more likely to be encountered in real code. For example, if you have some logging function in a |
// callback pointer intact. Obscure: it also changes *wrlist. | ||
_PyObject_ASSERT((PyObject *)wr, wr->wr_object == op); | ||
_PyWeakref_ClearRef(wr); | ||
_PyObject_ASSERT((PyObject *)wr, wr->wr_object == Py_None); | ||
} | ||
|
||
if (wr->wr_callback == NULL) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it worth to move this check above and remove if (wr->wr_callback != NULL)
?
Looks good to me. Thanks for detailed comments! |
Fix two bugs related to the interaction of weakrefs and the garbage
collector. The weakrefs in the
tp_subclasses
dictionary are needed inorder to correctly invalidate type caches (for example, by calling
PyType_Modified()
). Clearing weakrefs before calling finalizers causesthe caches to not be correctly invalidated. That can cause crashes since the
caches can refer to invalid objects. This is fixed by deferring the
clearing of weakrefs to classes and without callbacks until after finalizers
are executed.
The second bug is caused by weakrefs created while running finalizers. Those
weakrefs can be outside the set of unreachable garbage and therefore survive
the
delete_garbage()
phase (wheretp_clear()
is called on objects).Those weakrefs can expose to Python-level code objects that have had
tp_clear()
called on them. See GH-91636 as an example of this kind ofbug. This is fixed be clearing all weakrefs to unreachable objects after
finalizers have been executed.
datetime.timedelta
(possibly from datetime's Cdelta_new
) #132413