gh-135552: Make the GC clear weakrefs later. #136189

nascheme · 2025-07-01T20:44:59Z

Fix two bugs related to the interaction of weakrefs and the garbage
collector. The weakrefs in the tp_subclasses dictionary are needed in
order to correctly invalidate type caches (for example, by calling
PyType_Modified()). Clearing weakrefs before calling finalizers causes
the caches to not be correctly invalidated. That can cause crashes since the
caches can refer to invalid objects. This is fixed by deferring the
clearing of weakrefs to classes and without callbacks until after finalizers
are executed.

The second bug is caused by weakrefs created while running finalizers. Those
weakrefs can be outside the set of unreachable garbage and therefore survive
the delete_garbage() phase (where tp_clear() is called on objects).
Those weakrefs can expose to Python-level code objects that have had
tp_clear() called on them. See GH-91636 as an example of this kind of
bug. This is fixed be clearing all weakrefs to unreachable objects after
finalizers have been executed.

Issue: Segmentation fault, possibly due to a GC issue (tp_subclasses) #135552

Clear the weakrefs to unreachable objects after finalizers are called.

neonene · 2025-07-01T21:26:01Z

I can confirm this PR fixes the gh-132413 issue as well.

nascheme · 2025-07-01T23:02:39Z

I think this fixes (or mostly fixes) gh-91636 as well.

bedevere-bot · 2025-07-01T23:34:44Z

🤖 New build scheduled with the buildbot fleet by @nascheme for commit 12f0b5c 🤖

Results will be shown at:

https://buildbot.python.org/all/#/grid?branch=refs%2Fpull%2F136189%2Fmerge

If you want to schedule another build, you need to add the 🔨 test-with-buildbots label again.

nascheme · 2025-07-02T04:24:50Z

This introduces refleaks, it seems. One of the leaking tests:
test.test_concurrent_futures.test_shutdown.ProcessPoolSpawnProcessPoolShutdownTest.test_shutdown_gh_132969_case_1
My unconfirmed suspicion is that a finalizer is now resurrecting an object via a weakref. Previously that weakref would be cleared before the finalizer is run. The multiprocessing finalizer logic seems very complicated. :-/

nascheme · 2025-07-02T06:42:49Z

This is the smallest leaking example I could make so far. Something with ProcessPoolExecutor leaking maybe.

class LeakTest(unittest.TestCase):
    @classmethod
    def _fail_task(cls, n):
        raise ValueError("failing task")

    def test_leak_case(self):
        # this leaks references
        executor = futures.ProcessPoolExecutor(
                max_workers=1,
                max_tasks_per_child=1,
                )
        f2 = executor.submit(LeakTest._fail_task, 0)
        try:
            f2.result()
        except ValueError:
            pass

        # Ensure that the executor cleans up after called
        # shutdown with wait=False
        executor_manager_thread = executor._executor_manager_thread
        executor.shutdown(wait=False)
        time.sleep(0.2)
        executor_manager_thread.join()

    def test_leak_case2(self):
        # this does not leak
        with futures.ProcessPoolExecutor(
                max_workers=1,
                max_tasks_per_child=1,
                ) as executor:
            f2 = executor.submit(LeakTest._fail_task, 0)
            try:
                f2.result()
            except ValueError:
                pass

neonene · 2025-07-02T12:52:14Z

Other leaking examples (on Windows):

1. test_logging:

import logging
import logging.config
import logging.handlers
from multiprocessing import Queue, Manager

class ConfigDictTest(unittest.TestCase):
    def test_multiprocessing_queues_XXX(self):
        config = {
            'version': 1,
            'handlers' : {
                'spam' : {
                    'class': 'logging.handlers.QueueHandler',
                    'queue': Manager().Queue()  ,         # Leak
                    # 'queue': Manager().JoinableQueue()  # Leak
                    # 'queue': Queue(),                   # No leak

                },
            },
            'root': {'handlers': ['spam']}
        }
        logger = logging.getLogger()
        logging.config.dictConfig(config)
        while logger.handlers:
            h = logger.handlers[0]
            logger.removeHandler(h)
            h.close()

2. test_interpreters.test_api:

import contextlib
import threading
import types
from concurrent import interpreters

def func():
    raise Exception('spam!')

@contextlib.contextmanager
def captured_thread_exception():
    ctx = types.SimpleNamespace(caught=None)
    def excepthook(args):
        ctx.caught = args
    orig_excepthook = threading.excepthook
    threading.excepthook = excepthook
    try:
        yield ctx
    finally:
        threading.excepthook = orig_excepthook

class TestInterpreterCall(unittest.TestCase):
    def test_call_in_thread_XXX(self):
        interp = interpreters.create()
        call = (interp._call, interp.call)[1]   # 0: No leak, 1: Leak
        with captured_thread_exception() as _:
            t = threading.Thread(target=call, args=(interp, func, (), {}))
            t.start()
            t.join()

UPDATE:

import weakref, _interpreters
wd = weakref.WeakValueDictionary()

class TestInterpreterCall(unittest.TestCase):
    def test_call_in_thread_XXX(self):
        id = _interpreters.create()
        wd[id] = type("", (), {})
        _interpreters.destroy(id)

nascheme · 2025-07-02T19:32:02Z

The majority (maybe all) of these leaks are caused by the WeakValueDictionary used as multiprocessing.util._afterfork_registry. That took some digging to find. I'm not yet sure of a good fix for this. Explicitly cleaning the dead weak references from the .data dict works but it not too elegant.

This avoids breaking tests while fixing the issue with tp_subclasses. In the long term, it would be better to defer the clear of all weakrefs, not just the ones referring to classes. However, that is a more distruptive change and would seem to have a higher chance of breaking user code. So, it would not be something to do in a bugfix release.

nascheme · 2025-07-03T01:01:27Z

The majority (maybe all) of these leaks are caused by the WeakValueDictionary used as multiprocessing.util._afterfork_registry. That took some digging to find. I'm not yet sure of a good fix for this. Explicitly cleaning the dead weak references from the .data dict works but it not too elegant.

Nope, that doesn't fix all the leaks. And having to explicitly clean the weakrefs from the WeakValueDictionary really shouldn't be needed, I think. The KeyedRef class uses a callback and so they should be cleaned from the dict when the referred value dies. So, I'm not exactly sure what's going on there.

For the purposes of having a fix that we can backport (should probably be backported to all maintained Python versions), a less disruptive fix would be better. To that end, I've changed this PR to only defer clearing weakrefs to class objects. That fixes the tp_subclasses bug but should be less likely to break currently working code.

bedevere-bot · 2025-07-03T01:27:04Z

🤖 New build scheduled with the buildbot fleet by @nascheme for commit 2f3daba 🤖

Results will be shown at:

https://buildbot.python.org/all/#/grid?branch=refs%2Fpull%2F136189%2Fmerge

If you want to schedule another build, you need to add the 🔨 test-with-buildbots label again.

We need to clear those before executing the callback. Since this ensures they can't run a second time, we don't need _PyGC_SET_FINALIZED(). Revise comments.

nascheme · 2025-07-03T06:08:53Z

The KeyedRef class uses a callback and so they should be cleaned from the dict when the referred value dies. So, I'm not exactly sure what's going on there.

Ah, the KeyedRef callback requires that the weakref is cleared when it is called, otherwise it was not deleting the item from the weak dictionary. So we need to clear the weakrefs with callbacks before executing them. That fixes the refleaks, I believe.

I revised this PR to be something that is potentially suitable for backporting. To minimize the behaviour change, I'm only deferring the clear of weakrefs that refer to types (in order to allow tp_subclasses to work) or with callbacks. I still have an extra clearing pass that gets done after the finalizers are called. That avoids bugs like #91636.

If this gets merged, I think we should consider deferring clearing all weakrefs (without callbacks) until after finalizers are executed. I think that's more correct since it gives the finalizers a better opportunity to do their job.

Python/gc.c

sergey-miryanov · 2025-07-03T07:05:13Z

Python/gc.c

+     * *not* be cleared so that caches based on the type version are correctly
+     * invalidated (see GH-135552 as a bug caused by this).
+     */
+    clear_weakrefs(&final_unreachable);


Ough! Nice trick with final_unreachable!

This is a bit trickly because the wrlist can be changing as we iterate over it.

efimov-mikhail · 2025-07-04T08:07:48Z

IMO, it's better to defer clearing of all weakrefs without callbacks. But it's up to release managers to decide which fixes we want to backport.

FYI, @Yhg1s, @hugovk

neonene · 2025-07-04T10:55:36Z

Only weakrefs with callbacks or weakrefs to classes are deferred from clearing

Weakref can be deferred from clearing if a dummy callback is given, right? If so, the following result on the PR is unexpected to me:

import itertools as mod
callback = lambda wr: None

def gen():
    import weakref
    wr = weakref.ref(mod, callback)
    print(1, wr())
    try:
        yield
    finally:
        print(2, wr())

it = gen()
next(it)
print('//')

1 <module 'itertools' (built-in)>
//
2 None

I guess gh-132413 could have another backporable workaround like this.

nascheme · 2025-07-04T13:00:50Z

Only weakrefs with callbacks or weakrefs to classes are deferred from clearing

Weakref can be deferred from clearing if a dummy callback is given, right?

Oops, I wrote that wrong. If there is a callback, the weakref is cleared early. There are callbacks that expect that and don't do the right think if the weakref is not cleared first. At least, the KeyedRef weakref subclass expects this and I would assume there are other callbacks that might expect it too.

neonene · 2025-07-04T14:02:37Z

Then, would it be possible to internally introduce a special weakref that is guaranteed to be cleared later?

sergey-miryanov · 2025-07-04T14:26:32Z

Then, would it be possible to internally introduce a special weakref that is guaranteed to be cleared later?

I tried to use sentinel callback for this here - #136147. Idea was in using noop-callback for this and not changing weakref code. But this solution has some flaws with lifetime of callback.

But in general case you can't distinguish what weakref should be cleared later (IIUC).

neonene · 2025-07-04T19:57:14Z

it's better to defer clearing of all weakrefs without callbacks. But it's up to release managers to decide which fixes we want to backport.

+1 for all weakrefs, at least for ones to modules when the gc runs with _Py_GC_REASON_SHUTDOWN. (Usually, modules are re-importable under other gc reasons.)

sergey-miryanov · 2025-07-04T20:18:23Z

@nascheme I have added tests for #135552. I believe we should add tests for other linked issues. I plan to look into it over the weekend.

sergey-miryanov · 2025-07-07T08:00:32Z

The KeyedRef class uses a callback and so they should be cleaned from the dict when the referred value dies. So, I'm not exactly sure what's going on there.

IIUC, there maybe cases when selfref or self_weakref already cleared by WeakValueDictionary or _WeakValueDictionary still have values should should be cleared to. But they will not be cleared because of condition if selfref() is not None and if self_weakref() is not None.

IIUC, because PEP-442 landed we can remove weakrefs here

cpython/Lib/weakref.py

Line 105 in 0c3e3da

def remove(wr, selfref=ref(self), _atomic_removal=_remove_dead_weakref):

and there

cpython/Lib/importlib/_bootstrap.py

Line 65 in 0c3e3da

self_weakref = _weakref.ref(self)

and use a strong reference that will outlive the KeyedRefs in the dictionaries. I replaced the weakref with a strong reference in the _bootstrap.py and checked with refleak buildbot (#136345). No leaks were found.

@nascheme WDYT?

neonene · 2025-07-07T09:50:06Z

Weakref with a callback needs to be cleared early (as-is), accounting for the 3rd-party applications that have similar logic to WeakValueDictionary.

sergey-miryanov · 2025-07-07T10:09:16Z

Weakref with a callback needs to be cleared early (as-is)

I'm not against this. But IIUC (feel free to correct me) we can face a situation in which weakref to the WeakValueDictionary is cleared in one gc collect call and KeyedRefs are cleared in another. And strong ref should handle this situation. We can teach users about this (I suppose).

neonene · 2025-07-07T12:03:42Z

we can face a situation

The experiment #136345 seems currently invalid, which is based on main and not effective on this PR where I omitted calling _PyWeakref_ClearRef in handle_weakref_callbacks.

sergey-miryanov · 2025-07-07T12:28:21Z

The experiment #136345 seems currently invalid, which is based on main and not effective on this PR where I omitted calling _PyWeakref_ClearRef in handle_weakref_callbacks.

If we use a strong reference, then it doesn't matter if we omit _PyWeakref_ClearRef or not. Am I wrong?

neonene · 2025-07-07T12:37:34Z

Before discussing, have you checked on this PR? I can see the leaks.

sergey-miryanov · 2025-07-07T13:36:19Z

Results

# clean test-weak-value-dict
➜ .\python.bat -m test test_logging -R :
Running Debug|x64 interpreter...
Using random seed: 3090750191
0:00:00 Run 1 test sequentially in a single process
0:00:00 [1/1] test_logging
beginning 9 repetitions. Showing number of leaks (. for 0 or less, X for 10 or more)
12345:6789
XX... ....
0:06:26 [1/1] test_logging passed in 6 min 26 sec

== Tests result: SUCCESS ==

1 test OK.

Total duration: 6 min 26 sec
Total tests: run=272 skipped=13
Total test files: run=1/1
Result: SUCCESS

# clean gh-135552-wr-clear-later
➜ .\python.bat -m test test_logging -R :             
Running Debug|x64 interpreter...
Using random seed: 2520806690
0:00:00 Run 1 test sequentially in a single process
0:00:00 [1/1] test_logging
beginning 9 repetitions. Showing number of leaks (. for 0 or less, X for 10 or more)
12345:6789
XX... ...2
test_logging leaked [0, 0, 0, 2] memory blocks, sum=2 (this is fine)
0:06:19 [1/1] test_logging passed in 6 min 19 sec

== Tests result: SUCCESS ==

1 test OK.

Total duration: 6 min 19 sec
Total tests: run=272 skipped=13
Total test files: run=1/1
Result: SUCCESS

# gh-135552-wr-clear-later + weakref removed from WeakValueDictionary
➜ .\python.bat -m test test_logging -R :
Running Debug|x64 interpreter...
Using random seed: 1096797289
0:00:00 Run 1 test sequentially in a single process
0:00:00 [1/1] test_logging
beginning 9 repetitions. Showing number of leaks (. for 0 or less, X for 10 or more)
12345:6789
XX.2. ....
0:06:24 [1/1] test_logging passed in 6 min 24 sec

== Tests result: SUCCESS ==

1 test OK.

Total duration: 6 min 24 sec
Total tests: run=272 skipped=13
Total test files: run=1/1
Result: SUCCESS

# gh-135552-wr-clear-later + weakref removed from WeakValueDictionary + call _PyWeakref_ClearRef only for wr with callbacks
➜ .\python.bat -m test test_logging -R :
Running Debug|x64 interpreter...
Using random seed: 2599319069
0:00:00 Run 1 test sequentially in a single process
0:00:00 [1/1] test_logging
beginning 9 repetitions. Showing number of leaks (. for 0 or less, X for 10 or more)
12345:6789
XX.2. ....
0:06:32 [1/1] test_logging passed in 6 min 32 sec

== Tests result: SUCCESS ==

1 test OK.

Total duration: 6 min 32 sec
Total tests: run=272 skipped=13
Total test files: run=1/1
Result: SUCCESS

# gh-135552-wr-clear-later + call _PyWeakref_ClearRef only for wr with callbacks
➜ .\python.bat -m test test_logging -R :
Running Debug|x64 interpreter...
Using random seed: 1493424824
0:00:00 Run 1 test sequentially in a single process
0:00:00 [1/1] test_logging
beginning 9 repetitions. Showing number of leaks (. for 0 or less, X for 10 or more)
12345:6789
XX2.. ....
0:06:29 [1/1] test_logging passed in 6 min 29 sec

== Tests result: SUCCESS ==

1 test OK.

Total duration: 6 min 29 sec
Total tests: run=272 skipped=13
Total test files: run=1/1
Result: SUCCESS

neonene · 2025-07-07T14:29:12Z

Let's move on to #136345. (The result above looks long here.)

This fixes pythonGH-132413 and pythonGH-135552.

nascheme · 2025-07-08T22:02:13Z

GH-136401 was merged. I think it should be backported as well since it should be pretty low risk.

The main branch has been merged into this PR. I've updated the PR to defer clearing all weakrefs without callbacks (not just weakrefs to classes). This seems more correct to me (rather than special casing weakrefs to classes) and seems the best way to fix GH-132413.

I think GH-132413 is actually more likely to be encountered in real code. For example, if you have some logging function in a __del__ method that uses the datetime module. So, maybe this PR should be backported to 3.13 and 3.14 as well.

sergey-miryanov · 2025-07-09T05:44:55Z

Python/gc.c

+                // callback pointer intact.  Obscure: it also changes *wrlist.
+                _PyObject_ASSERT((PyObject *)wr, wr->wr_object == op);
+                _PyWeakref_ClearRef(wr);
+                _PyObject_ASSERT((PyObject *)wr, wr->wr_object == Py_None);
            }

            if (wr->wr_callback == NULL) {


Is it worth to move this check above and remove if (wr->wr_callback != NULL)?

Yeah that's probably a bit cleaner source code wise. The compiler should optimize to the same code anyhow though so I'll leave it as it for now. We still may decide that we only want to clear certain types of weakrefs and then the current structure is better.

sergey-miryanov · 2025-07-09T05:47:52Z

Looks good to me. Thanks for detailed comments!

neonene · 2025-07-09T19:16:17Z

Lib/test/test_finalization.py

-            # XXX is this desirable?
-            self.assertIs(wr(), None)
+            # This used to be None because weakrefs were cleared before
+            # calling finalizers.  Now they are cleared after.
+            self.assertIsNot(wr(), None)


The changes to tests look to be taken from the initial commit. I would like a comment or a test for the weakref with a callback.

neonene · 2025-07-09T19:16:26Z

Lib/test/test_finalization.py

+            for wr in wrs:
+                # These values used to be None because weakrefs were cleared
+                # before calling finalizers.  Now they are cleared after.
+                self.assertIsNotNone(wr())


The same as above.

neonene · 2025-07-09T19:16:37Z

Lib/test/test_gc.py

-        # if called, it means there is a bug in the GC.  The weakref should be
-        # cleared before Z dies.
+        # In older versions of Python, the weakref was cleared by the
+        # gc.  Now it is not cleared and so the callback is run.


The a.wz is now cleared by clear_weakrefs() rather than by handle_weakref_callbacks() in gc.c, so the callback will never be called?

neonene · 2025-07-09T19:16:42Z

Lib/test/test_weakref.py

@@ -1314,7 +1314,7 @@ def check_len_cycles(self, dict_type, cons):
        n2 = len(dct)
        # one item may be kept alive inside the iterator
        self.assertIn(n1, (0, 1))
-        self.assertEqual(n2, 0)
+        self.assertIn(n2, (0, 1))


The test_weak_valued_len_cycles() now passes the original assertion, so the change may not be necessary.

Make the GC clear weakrefs later.

42abb05

Clear the weakrefs to unreachable objects after finalizers are called.

bedevere-app bot mentioned this pull request Jul 1, 2025

Segmentation fault, possibly due to a GC issue (tp_subclasses) #135552

Open

Remove inaccurate comment.

17a4f9e

Run clear_weakrefs() with world stopped.

10000

12f0b5c

nascheme mentioned this pull request Jul 1, 2025

Assertion failure when func_repr is called on an already tp_clear-ed object #91636

Open

nascheme requested a review from pablogsal July 1, 2025 23:33

nascheme added the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Jul 1, 2025

bedevere-bot removed the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Jul 1, 2025

neonene mentioned this pull request Jul 2, 2025

gh-132413: Clear weakref to _datetime after modules are finalized #136152

Closed

nascheme added the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Jul 3, 2025

bedevere-bot removed the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Jul 3, 2025

Ensure weakrefs with callbacks are cleared early.

123bc25

We need to clear those before executing the callback. Since this ensures they can't run a second time, we don't need _PyGC_SET_FINALIZED(). Revise comments.

sergey-miryanov reviewed Jul 3, 2025

View reviewed changes

Python/gc.c Show resolved Hide resolved

sergey-miryanov reviewed Jul 3, 2025

View reviewed changes

Python/gc.c Show resolved Hide resolved

sergey-miryanov reviewed Jul 3, 2025

View reviewed changes

pablogsal self-assigned this Jul 3, 2025

nascheme added 3 commits July 3, 2025 06:04

Add NEWS.

496c0b1

Add comment about wrlist iteration.

8a553d1

This is a bit trickly because the wrlist can be changing as we iterate over it.

Merge 'origin/main' into pythongh-135552-wr-clear-later

01c882f

sergey-miryanov mentioned this pull request Jul 4, 2025

gh-135552: Add tests that check if weakref for tp_subclasses cleared after finalization #136304

Open

StanFromIreland mentioned this pull request Jul 5, 2025

gh-132413: Check if datetime current state is not NULL #136321

Closed

sergey-miryanov mentioned this pull request Jul 7, 2025

Replace weakref to the _WeakValueDictionary with strong ref to outlives keys #136345

Closed

nascheme mentioned this pull request Jul 8, 2025

GH-91636: Clear weakrefs created by finalizers. #136401

Merged

nascheme added 4 commits July 8, 2025 12:33

Merge 'origin/main' into pythongh-135552-wr-clear-later

0060602

Revise NEWS.

900022b

Defer clear for weakrefs without callbacks.

84bd123

This fixes pythonGH-132413 and pythonGH-135552.

Add unit test for pythonGH-132413.

bb29ea1

nascheme requested review from kumaraditya303, pganssle and abalkin as code owners July 8, 2025 21:54

sergey-miryanov reviewed Jul 9, 2025

View reviewed changes

neonene reviewed Jul 9, 2025

View reviewed changes

Uh oh!

gh-135552: Make the GC clear weakrefs later. #136189

Are you sure you want to change the base?

gh-135552: Make the GC clear weakrefs later. #136189

Conversation

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!