8000 gh-120501: Fix reference leak in JIT build by Eclips4 · Pull Request #120649 · python/cpython · GitHub
[go: up one dir, main page]

Skip to content

gh-120501: Fix reference leak in JIT build #120649

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 17 commits into
base: main
Choose a base branch
from

Conversation

Eclips4
Copy link
Member
@Eclips4 Eclips4 commented Jun 17, 2024

@Eclips4
Copy link
Member Author
Eclips4 commented Jun 17, 2024

@JeffersGlass can you apply this patch locally and verify that this solves the refleak problem?
FYI there's a another one problem(😢) that was introduced after f6fab21, but firstly in this PR I would like to solve initial problem 🙂

@Eclips4
Copy link
Member Author
Eclips4 commented Jun 17, 2024

The second problem is bisected to 72867c9

@JeffersGlass
Copy link
Contributor

On Linux/X86_64, this seems to have fixed the refleak issue! 🎉

Building on Windows_x64, though, I get an error when starting the interpreter. Assertion failed: Py_SIZE(value), file D:\cpython\Python\executor_cases.c.h, line 405. Same error whether dropping into the REPL, or running with -c, or -m test, etc.

@vstinner vstinner changed the title gh-120501: Fix reference leak gh-120501: Fix reference leak in JIT build Jun 18, 2024
Copy link
Member
@vstinner vstinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I confirm that the change fix the reference leak for test_colorsys (I picked on test to check the PR).

@Eclips4
Copy link
Member Author
Eclips4 commented Jun 24, 2024

So, here's the second problem:
Reproducer:

def foo():
    a = [1, 2, 3]
    exhit = iter(a)
    for _ in exhit:
        pass
    a.append("this should'be in exhit")
    print(f"got {list(exhit)}, should be []")

foo()
foo()
foo()
foo()
foo()
foo()

Output:

got [], should be []
got [], should be []
got [], should be []
got [], should be []
got [], should be []
got ["this should'be in exhit"], should be []

Obviously, the last line is incorrect.

Output with a PYTHON_LLTRACE=2 env:

got [], should be []
got [], should be []
got [], should be []
got [], should be []
got [], should be []
Optimizing foo (/home/eclips4/programming-languages/cpython/example.py:1) at byte offset 42
   1 ADD_TO_TRACE: _START_EXECUTOR (0, target=21, operand=0x7f4646e59
8000
832)
21: JUMP_BACKWARD(5)
   2 ADD_TO_TRACE: _CHECK_VALIDITY_AND_SET_IP (0, target=21, operand=0x7f4646e59832)
   3 ADD_TO_TRACE: _TIER2_RESUME_CHECK (0, target=21, operand=0)
18: FOR_ITER_LIST(3)
   4 ADD_TO_TRACE: _CHECK_VALIDITY_AND_SET_IP (0, target=18, operand=0x7f4646e5982c)
   5 ADD_TO_TRACE: _ITER_CHECK_LIST (3, target=18, operand=0)
   6 ADD_TO_TRACE: _GUARD_NOT_EXHAUSTED_LIST (3, target=18, operand=0)
   7 ADD_TO_TRACE: _ITER_NEXT_LIST (3, target=18, operand=0)
20: STORE_FAST(2)
   8 ADD_TO_TRACE: _CHECK_VALIDITY_AND_SET_IP (0, target=20, operand=0x7f4646e59830)
   9 ADD_TO_TRACE: _STORE_FAST (2, target=20, operand=0)
21: JUMP_BACKWARD(5)
  10 ADD_TO_TRACE: _CHECK_VALIDITY_AND_SET_IP (0, target=21, operand=0x7f4646e59832)
  11 ADD_TO_TRACE: _JUMP_TO_TOP (0, target=0, operand=0)
Created a proto-trace for foo (/home/eclips4/programming-languages/cpython/example.py:1) at byte offset 36 -- length 11
Optimized trace (length 10):
   0 OPTIMIZED: _START_EXECUTOR (0, jump_target=7, operand=0x7f4646e59e80)
   1 OPTIMIZED: _TIER2_RESUME_CHECK (0, jump_target=7, operand=0)
   2 OPTIMIZED: _ITER_CHECK_LIST (3, jump_target=8, operand=0)
   3 OPTIMIZED: _GUARD_NOT_EXHAUSTED_LIST (3, jump_target=9, operand=0)
   4 OPTIMIZED: _ITER_NEXT_LIST (3, target=18, operand=0)
   5 OPTIMIZED: _STORE_FAST_2 (2, target=20, operand=0)
   6 OPTIMIZED: _JUMP_TO_TOP (0, target=0, operand=0)
   7 OPTIMIZED: _DEOPT (0, target=21, operand=0)
   8 OPTIMIZED: _EXIT_TRACE (0, exit_index=0, operand=0)
   9 OPTIMIZED: _EXIT_TRACE (0, exit_index=1, operand=0)
got ["this should'be in exhit"], should be []

It's definitely related to this part of code:

cpython/Python/optimizer.c

Lines 1027 to 1033 in e4a97a7

if (is_for_iter_test[opcode]) {
/* Target the POP_TOP immediately after the END_FOR,
* leaving only the iterator on the stack. */
int extended_arg = inst->oparg > 255;
int32_t next_inst = target + 1 + INLINE_CACHE_ENTRIES_FOR_ITER + extended_arg;
jump_target = next_inst + inst->oparg + 1;
}

I guess the culprit is there. If we remove the _GUARD_NOT_EXHAUSTED_LIST from is_for_iter_test, the problem will go away (although it can still be reproduced in other ways using other (range, tuple) iterators).

@markshannon
Copy link
Member

So, here's the second problem: ...

This is unrelated to this PR, but it is a real bug.
Can you make a separate issue for it (and assign it to me)? Thanks.

@Eclips4
Copy link
Member Author
Eclips4 commented Jun 25, 2024

So, here's the second problem: ...

This is unrelated to this PR, but it is a real bug. Can you make a separate issue for it (and assign it to me)? Thanks.

Done, see #121012.

@Eclips4
Copy link
Member Author
Eclips4 commented Sep 6, 2024

FYI: I came back to this and found out that there is a new leak 😢 (maybe there are more than one leak... who knows..) which I bisected to 9621a7d.

@brandtbucher
Copy link
Member

Do you have a reproducer for the leak?

And just to clarify, it's a "real" leak (objects that aren't cleaned up at shutdown)? It's normal for the JIT to allocate some new objects during hot loops, but their reference counts and everything should still be correct.

@Eclips4
Copy link
Member Author
Eclips4 commented Sep 7, 2024

Do you have a reproducer for the leak?

And just to clarify, it's a "real" leak (objects that aren't cleaned up at shutdown)? It's normal for the JIT to allocate some new objects during hot loops, but their reference counts and everything should still be correct.

It's look like a real leak since ./python -X showrefcount IIRC reports the state of reference counter at the shutdown of the interpeter.
MRE:

cases = [
    tuple(i for i in range(3)) for _ in range(39)
]


for i, j, k in cases:
    for lo in range(4):
        for hi in range(3,8):
            pass
./python -X showrefcount example.py
[1 refs, 1 blocks]

@brandtbucher
Copy link
Member

More minimal reproducer. This creates a trace, and warms up a single side exit:

for _ in range(82):
    for _ in range(0):
        pass

The issue is that we aren't clearing "child" executors from side exits when deallocating an executor.

This quick fix seems to solve the issue for me locally. Haven't dug too deep into whether it's totally correct, but it seems right.

diff --git a/Python/optimizer.c b/Python/optimizer.c
index 9198e410627..8ba2782c2c3 100644
--- a/Python/optimizer.c
+++ b/Python/optimizer.c
@@ -257,6 +257,10 @@ uop_dealloc(_PyExecutorObject *self) {
     _PyObject_GC_UNTRACK(self);
     assert(self->vm_data.code == NULL);
     unlink_executor(self);
+    for (uint32_t i = 0; i < self->exit_count; i++) {
+        self->exits[i].temperature = initial_unreachable_backoff_counter();
+        Py_CLEAR(self->exits[i].executor);
+    }
 #ifdef _Py_JIT
     _PyJIT_Free(self);
 #endif

@brandtbucher
Copy link
Member

Thanks for digging this up @Eclips4!

@Eclips4
Copy link
Member Author
Eclips4 commented Sep 8, 2024

More minimal reproducer. This creates a trace, and warms up a single side exit:

for _ in range(82):
    for _ in range(0):
        pass

The issue is that we aren't clearing "child" executors from side exits when deallocating an executor.

This quick fix seems to solve the issue for me locally. Haven't dug too deep into whether it's totally correct, but it seems right.

diff --git a/Python/optimizer.c b/Python/optimizer.c
index 9198e410627..8ba2782c2c3 100644
--- a/Python/optimizer.c
+++ b/Python/optimizer.c
@@ -257,6 +257,10 @@ uop_dealloc(_PyExecutorObject *self) {
     _PyObject_GC_UNTRACK(self);
     assert(self->vm_data.code == NULL);
     unlink_executor(self);
+    for (uint32_t i = 0; i < self->exit_count; i++) {
+        self->exits[i].temperature = initial_unreachable_backoff_counter();
+        Py_CLEAR(self->exits[i].executor);
+    }
 #ifdef _Py_JIT
     _PyJIT_Free(self);
 #endif

Thanks, Brandt!
Don't you mind if I add this code to this PR?

@Eclips4
Copy link
Member Author
Eclips4 commented Sep 8, 2024

Another one problem, which seems to be introduced in f6fab21:

./python -m test -R 3:3 test_descr
Using random seed: 315532800
0:00:00 load avg: 38.54 Run 1 test sequentially
0:00:00 load avg: 38.54 [1/1] test_descr
beginning 6 repetitions
123456
.test test_descr failed -- Traceback (most recent call last):
  File "/home/eclips4/programming/programming-languages/cpython/Lib/test/test_descr.py", line 1294, in test_slots
    self.assertEqual(orig_objects, new_objects)
    ~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: 16891 != 16892

test_descr failed (1 failure)

== Tests result: FAILURE ==

1 test failed:
    test_descr

Total duration: 1.5 sec
Total tests: run=156 failures=1 skipped=1
Total test files: run=1/1 failed=1
Result: FAILURE

Bisected to 7b21403

@brandtbucher
Copy link
Member
brandtbucher commented Sep 17, 2024

Don't you mind if I add this code to this PR?

Nope, go ahead (it should probably just be a Py_XDECREF, not a Py_CLEAR, and I don't think we need to touch the temperature).

@brandtbucher
Copy link
Member
brandtbucher commented Sep 17, 2024

Another one problem, which seems to be introduced in f6fab21:

This is expected: the JIT allocates a new executor (which is a GC-tracked object) in the loop immediately above this assertEqual call. I'm not sure that the test is correct anyways, since a GC could have happened in the loop, changing the number of GC objects in the process.

Basically, it seems like a buggy test. But in general the JIT needs to be able to perform allocations in places where maybe they didn't happen before... so our leak checks should be resilient to that.

Co-authored-by: Brandt Bucher <brandt@python.org>
@Eclips4
Copy link
Member Author
Eclips4 commented Oct 25, 2024

Okay.. let's continue digging into this.. 😄
so, there's another failure (you can get it by running ./python -m test -R 3:3 test_capi)

======================================================================
FAIL: test_guard_type_version_executor_invalidated (test.test_capi.test_opt.TestUopsOptimization.test_guard_type_version_executor_invalidated)
Verify that the executor is invalided on a type change.
----------------------------------------------------------------------
<
9E88
span class="pl-en">Traceback (most recent call last):
  File "/home/eclips4/programming/programming-languages/cpython/Lib/test/test_capi/test_opt.py", line 1458, in test_guard_type_version_executor_invalidated
    self.assertEqual(list(iter_opnames(ex)).count("_GUARD_TYPE_VERSION"), 1)
    ~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: 2 != 1

======================================================================
FAIL: test_guard_type_version_removed (test.test_capi.test_opt.TestUopsOptimization.test_guard_type_version_removed)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/eclips4/programming/programming-languages/cpython/Lib/test/test_capi/test_opt.py", line 1353, in test_guard_type_version_removed
    self.assertEqual(guard_type_version_count, 1)
    ~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: 2 != 1

======================================================================
FAIL: test_guard_type_version_removed_inlined (test.test_capi.test_opt.TestUopsOptimization.test_guard_type_version_removed_inlined)
Verify that the guard type version if we have an inlined function
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/eclips4/programming/programming-languages/cpython/Lib/test/test_capi/test_opt.py", line 1379, in test_guard_type_version_removed_inlined
    self.assertEqual(guard_type_version_count, 1)
    ~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: 2 != 1

======================================================================
UNEXPECTED SUCCESS: test_guard_type_version_not_removed_escaping (test.test_capi.test_opt.TestUopsOptimization.test_guard_type_version_not_removed_escaping)
Verify that the guard type version is not removed if have an escaping function
----------------------------------------------------------------------
Ran 924 tests in 11.506s

FAILED (failures=3, skipped=4, unexpected successes=1)
test test_capi failed
test_capi failed (3 failures)

== Tests result: FAILURE ==

1 test failed:
    test_capi

Total duration: 23.5 sec
Total tests: run=924 failures=3 skipped=4
Total test files: run=1/1 failed=1
Result: FAILURE

@Eclips4
Copy link
Member Author
Eclips4 commented Oct 25, 2024

UPD: There are many more failures:

17 tests failed:
    test.test_asyncio.test_taskgroups test.test_inspect.test_inspect
    test.test_multiprocessing_forkserver.test_processes
    test.test_multiprocessing_spawn.test_processes test_argparse
    test_ast test_capi test_clinic test_collections test_ctypes
    test_descr test_difflib test_enum test_itertools test_monitoring
    test_tarfile test_zoneinfo

@brandtbucher
Copy link
Member

Okay.. let's continue digging into this.. 😄
so, there's another failure (you can get it by running ./python -m test -R 3:3 test_capi)

Let's keep this PR in scope. This new issue you found isn't a refleak. It's just a byproduct of the fact that when we run out of version numbers, we literally can't perform certain optimizations anymore.

@Eclips4
Copy link
Member Author
Eclips4 commented Nov 1, 2024

Okay.. let's continue digging into this.. 😄
so, there's another failure (you can get it by running ./python -m test -R 3:3 test_capi)

Let's keep this PR in scope. This new issue you found isn't a refleak. It's just a byproduct of the fact that when we run out of version numbers, we literally can't perform certain optimizations anymore.

Should I create an separate issue for that?

@Eclips4
Copy link
Member Author
Eclips4 commented Jan 20, 2025

So far, the latest observable leaks are memory block leaks:

eclips4@nixos ~/p/p/cpython (issue-120501)> ./python -m test -R 3:3 test_random
Using random seed: 315532800
0:00:00 load avg: 2.11 Run 1 test sequentially in a single process
0:00:00 load avg: 2.11 [1/1] test_random
beginning 6 repetitions. Showing number of leaks (. for 0 or less, X for 10 or more)
123:456
XX. 121
test_random leaked [1, 2, 1] memory blocks, sum=4
test_random failed (reference leak)

== Tests result: FAILURE ==

1 test failed:
    test_random

Total duration: 13.5 sec
Total tests: run=103
Total test files: run=1/1 failed=1
Result: FAILURE

Though, for a larger number of repetitions it doesn't fail:

eclips4@nixos ~/p/p/cpython (issue-120501) [2]> ./python -m test -R 6:6 test_random
Using random seed: 315532800
0:00:00 load avg: 2.35 Run 1 test sequentially in a single process
0:00:00 load avg: 2.35 [1/1] test_random
beginning 12 repetitions. Showing number of leaks (. for 0 or less, X for 10 or more)
123456:789012
XX.121 1.12..
test_random leaked [1, 0, 1, 2, 0, 0] memory blocks, sum=4 (this is fine)

== Tests result: SUCCESS ==

1 test OK.

Total duration: 27.5 sec
Total tests: run=103
Total test files: run=1/1
Result: SUCCESS

So.. it doesn't look like a real leaks, because ./python -Xshowrefcount always reports that there are no leaks at all:

./python -Xshowrefcount -m test -R 3:3 test_random
Using random seed: 315532800
0:00:00 load avg: 1.99 Run 1 test sequentially in a single process
0:00:00 load avg: 1.99 [1/1] test_random
beginning 6 repetitions. Showing number of leaks (. for 0 or less, X for 10 or more)
123:456
XX. 121
test_random leaked [1, 2, 1] memory blocks, sum=4
test_random failed (reference leak)

== Tests result: FAILURE ==

1 test failed:
    test_random

Total duration: 13.6 sec
Total tests: run=103
Total test files: run=1/1 failed=1
Result: FAILURE
[0 refs, 0 blocks] 

I understand that with Tier 2, there could be allocations that didn't happen before. But I have no idea where they are. Any hints?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants
0