8000 Refleaks on free-threaded builds · Issue #134557 · python/cpython · GitHub
[go: up one dir, main page]

Skip to content

Refleaks on free-threaded builds #134557

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
emmatyping opened this issue May 23, 2025 · 18 comments
Open

Refleaks on free-threaded builds #134557

emmatyping opened this issue May 23, 2025 · 18 comments
Labels
tests Tests in the Lib/test dir topic-subinterpreters type-bug An unexpected behavior, bug, or error

Comments

@emmatyping
Copy link
Member
emmatyping commented May 23, 2025

Bug report

Bug description:

The nogil refleak buildbots have started failing for main and 3.14 with leaks in test_sys, test_capi and a few others.

See e.g. https://buildbot.python.org/#/builders/1714/builds/90

I bisected this to 09e72cf, cc @ericsnowcurrently

CPython versions tested on:

CPython main branch, 3.14

Operating systems tested on:

Linux

Linked PRs

@emmatyping emmatyping added type-bug An unexpected behavior, bug, or error tests Tests in the Lib/test dir labels May 23, 2025
@emmatyping emmatyping changed the title refleaks on free-threaded builds Refleaks on free-threaded builds May 23, 2025
@ericsnowcurrently
Copy link
Member

Yeah, I'll be home soon and will take a look.

@ericsnowcurrently
Copy link
Member

Any ideas on why there would be leaks only on free-threaded builds?

Also, FWIW, the leaks may actually originate in gh-133482.

@ZeroIntensity
Copy link
Member

Hm, I don't see any obvious leaks on the PRs. I wouldn't be surprised if it only revealed an existing leak on free-threading.

@ZeroIntensity
Copy link
Member

Ah, @ericsnowcurrently, this looks like a leak:

PyObject *res = PyMemoryView_FromObject(obj);

obj needs to decref'd on the successful path. I'm not sure if that's the cause of the buildbots failures, though. No idea why it would only fail under free-threading (maybe it's because the object structure is bigger?)

@emmatyping
Copy link
Member Author

I tried adding a decref after that but the tests still show a refleak.

@ericsnowcurrently
Copy link
Member

FWIW, the results have been consistent for me across runs, so it doesn't seem like a race is involved.

@ericsnowcurrently
Copy link
Member

It may be due to interned strings. The only refleak-failing test in test_sys is test_subinterp_intern_dynamically_allocated.

(Thus this is more likely to be an existing bug than one introduced yesterday.)

@picnixz
Copy link
Member
picnixz commented May 23, 2025

While reviewing the new code, I saw that

PyObject *exc = _PyErr_GetRaisedException(tstate);
PyObject *cause = PyException_GetCause(exc);
if (cause != NULL) {
Py_DECREF(exc);
exc = cause;
}
else {
assert(PyException_GetContext(exc) == NULL);
}

should use Py_SETREF instead. I don't think it's the cause of the leak though

@ericsnowcurrently
Copy link
Member
ericsnowcurrently commented May 23, 2025

In test_capi (test_capi.test_misc), the leak is manifesting exclusively in test_isolated_subinterpreter (specifically the first two subtests).

@ericsnowcurrently
Copy link
Member

I minimized the leaky test in test_capi. The following leaks:

    def test_subinterp_leak(self):
        interpid = _interpreters.create()
        self.addCleanup(lambda: _interpreters.destroy(interpid))
        timeout = time.time() + 30
        _interpreters.run_string(interpid, f"""if True:
            import time
            time.time() < {timeout}
            """)

but the following does not:

    def test_subinterp_leak(self):
        interpid = _interpreters.create()
        self.addCleanup(lambda: _interpreters.destroy(interpid))
        timeout = 30000000000
        _interpreters.run_string(interpid, f"""if True:
            import time
            time.time() < {timeout}
            """)

nor does:

    def test_subinterp_leak(self):
        interpid = _interpreters.create()
        self.addCleanup(lambda: _interpreters.destroy(interpid))
        _interpreters.run_string(interpid, f"""if True:
            import time
            time.time() < 30000000000
            """)

@ericsnowcurrently
Copy link
Member

I'm reverting for now, but I suspect there's still an existing leak to be fixed.

@emmatyping
Copy link
Member Author

FWIW the valgrind report for your minimal reproducer is:

==210093== 249,175 bytes in 1 blocks are definitely lost in loss record 8 of 8
==210093==    at 0x484D953: calloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==210093==    by 0x2DB669: _PyMem_DebugRawAlloc (Objects/obmalloc.c:2785)
==210093==    by 0x2DB669: _PyMem_DebugRawCalloc (???:2830)
==210093==    by 0x4BB06F: alloc_interpreter (Python/pystate.c:468)
==210093==    by 0x4BB06F: _PyInterpreterState_New (???:656)
==210093==    by 0x4BB53D: PyInterpreterState_New (Python/pystate.c:702)
==210093==    by 0x49DD60: new_interpreter (Python/pylifecycle.c:2294)
==210093==    by 0x49DCB2: Py_NewInterpreterFromConfig (Python/pylifecycle.c:2397)
==210093==    by 0x4442EF: _PyXI_NewInterpreter (Python/crossinterp.c:2736)
==210093==    by 0x51A7A90: interp_create (_interpretersmodule.c:632)
==210093==    by 0x2B9FE0: cfunction_call (Objects/methodobject.c:565)
==210093==    by 0x230CAA: _PyObject_MakeTpCall (Objects/call.c:242)
==210093==    by 0x3E230F: _PyEval_EvalFrameDefault (generated_cases.c.h:1619)
==210093==    by 0x3DB20C: _PyEval_EvalFrame (pycore_ceval.h:119)
==210093==    by 0x3DB20C: _PyEval_Vector (???:1961)

So perhaps an interpreter is not getting freed?

@ericsnowcurrently
Copy link
Member

It should be freed during test cleanup.

ericsnowcurrently added a commit that referenced this issue May 23, 2025
This reverts commit 09e72cf, AKA gh-134511.

We are reverting due to refleaks on free-threaded builds.
ericsnowcurrently added a commit that referenced this issue May 23, 2025
…ta()" (gh-134600)

This reverts commit bbf8048, AKA gh-134515.

We are reverting due to refleaks on free-threaded builds.
@ericsnowcurrently
Copy link
Member

At least for test_sys and test_capi, only scripts built using the BUILD_STRING opcode appear to be affected.

@ZeroIntensity
Copy link
Member

Ugh, why is it that subinterpreter changes seem to cause undebuggable buildbot failures. I can't reproduce a leak when running that test manually with -Xshowrefcount:

import _interpreters
import time

interpid = _interpreters.create()
timeout = time.time() + 30
_interpreters.run_string(interpid, f"""if True:
    import time
    time.time() < {timeout}
    """)
_interpreters.destroy(interpid)

@ericsnowcurrently
Copy link
Member

Are you using a free-threading build?

@ZeroIntensity
Copy link
Member

Yeah, I'm getting [0 refs, 0 blocks] on both FT or GIL-ful builds.

@neonene
Copy link
Contributor
neonene commented May 24, 2025
  • _PyCode_Init() -> intern_one_constant(), which interns the float timeout:

    cpython/Objects/codeobject.c

    Lines 3088 to 3095 in 7b1010a

    #ifdef Py_REF_DEBUG
    Py_ssize_t refcnt = Py_REFCNT(op);
    if (refcnt != 1) {
    // Adjust the reftotal to account for the fact that we only
    // restore a single reference in _PyCode_Fini.
    _Py_AddRefTotal(_PyThreadState_GET(), -(refcnt - 1));
    }
    #endif

  • _PyCode_Fini():

    cpython/Objects/codeobject.c

    Lines 3245 to 3252 in 7b1010a

    _PyCode_Fini(PyInterpreterState *interp)
    {
    #ifdef Py_GIL_DISABLED
    // Free interned constants
    struct _py_code_state *state = &interp->code_state;
    if (state->constants) {
    _Py_hashtable_foreach(state->constants, &clear_containers, NULL);
    _Py_hashtable_destroy(state->constants);

  • _interpreters.run_string() -> _PyXI_Exit() -> _exit_session():

    cpython/Python/crossinterp.c

    Lines 2345 to 2351 in 7b1010a

    if (session->prev_tstate != session->init_tstate) {
    assert(session->own_init_tstate);
    session->own_init_tstate = 0;
    PyThreadState_Clear(tstate);
    PyThreadState_Swap(session->prev_tstate);
    PyThreadState_Delete(tstate);
    }

_PyCode_Fini() seems to follow _PyXI_Exit(), which would affect the stat (different thread)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
tests Tests in the Lib/test dir topic-subinterpreters type-bug An unexpected behavior, bug, or error
Projects
Status: Todo
Development

No branches or pull requests

5 participants
0