-
-
Notifications
You must be signed in to change notification settings - Fork 32k
gh-111520: Integrate the Tier 2 interpreter in the Tier 1 interpreter #111428
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Use `GOTO_ERROR(error)` instead of `goto error`. This macro is defined differently in the tier two interpreter.
This replaces PYTHONUOPSDEBUG=N. The meaning of N is the same (for now): 0: no tracing 1: print when tier 2 trace created 2: print contents of tier 2 trace 3: print every uop executed 4: print optimization attempts and details 5: print tier 1 instructions and stack
With uops always enabled, only test_embed fails, but that's the same on main (see GH-111339). I added that to benchmark Tier 2 perf (who knows, the speedier Tier switching may make things faster there). |
@markshannon Can you review this? If you think this is roughly going in the right direction I will clean it up. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks promising.
Python/ceval.c
Outdated
|
||
OPT_STAT_INC(traces_executed); | ||
_Py_CODEUNIT *ip_offset = (_Py_CODEUNIT *)_PyFrame_GetCode(frame)->co_code_adaptive; | ||
_PyUOpInstruction *next_uop = self->trace; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should have been set by ENTER_EXECUTOR
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes and no. It would mean that next_uop
would have to be a local in the top-level function scope. I am trying to reduce the number of locals there. For now let me keep it this way. I expect that it will be fine even when we start stitching, because each trace can only be entered at the top and we need the executor (so that we can decref it upon exiting the trace). The initial next_uop
is just the executor plus a fixed offset anyway.
); | ||
goto error_tier_two; | ||
|
||
pop_4_error_tier_two: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't this be shared with tier 1? Unwinding should work the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alas, not quite. The debug output is different, the stats collection is different, but most importantly, we need to DECREF(self)
here before jumping to error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also we need to set next_instr = frame->instr_ptr
.
- Remove CHECK_EVAL_BREAKER() from top of Tier 2 loop - Make Tier 2 default case Py_UNREACHABLE() in non-debug mode - GOTO_TIER_TWO() macro instead of `goto enter_tier_two` - Inline ip_offset (Tier 2 LOAD_IP() is now a no-op) - Move #define/#under TIER_ONE/TIER_TWO into generated .c.h files - ceval_macros.h defines Tier 1 macros, Tier 2 is inlined in ceval.c
Python/ceval.c
Outdated
|
||
OPT_STAT_INC(traces_executed); | ||
_Py_CODEUNIT *ip_offset = (_Py_CODEUNIT *)_PyFrame_GetCode(frame)->co_code_adaptive; | ||
_PyUOpInstruction *next_uop = self->trace; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes and no. It would mean that next_uop
would have to be a local in the top-level function scope. I am trying to reduce the number of locals there. For now let me keep it this way. I expect that it will be fine even when we start stitching, because each trace can only be entered at the top and we need the executor (so that we can decref it upon exiting the trace). The initial next_uop
is just the executor plus a fixed offset anyway.
); | ||
goto error_tier_two; | ||
|
||
pop_4_error_tier_two: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alas, not quite. The debug output is different, the stats collection is different, but most importantly, we need to DECREF(self)
here before jumping to error.
Python/ceval.c
Outdated
frame->return_offset = 0; // Don't leave this random | ||
_PyFrame_SetStackPointer(frame, stack_pointer); | ||
Py_DECREF(self); | ||
goto resume_with_error; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may not need to reset return_offset
; we can also skip syncing stack_pointer
; so we could make this slightly faster as follows (but it doesn't matter since errors are presumed to be on the slow path):
frame->return_offset = 0; // Don't leave this random | |
_PyFrame_SetStackPointer(frame, stack_pointer); | |
Py_DECREF(self); | |
goto resume_with_error; | |
Py_DECREF(self); | |
next_instr = frame->instr_ptr; | |
goto error; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I consider this sufficiently cleaned up to undraft it, and if @markshannon doesn't comment I'll merge tomorrow.
I could noodle endlessly with the code for dropping from tier 2 back into tier 1, with and without errors; but it may be better to put that off until a more careful review.
Whoops, back to draft, I accidentally left in the uops-forever cherrypick. Will fix later. |
(I accidentally kept this commit after pushing it experimentally. This means I've been testing with uops on all the time, which is actually pretty amazing.) This reverts commit e0e60ce.
Using `GOTO_TIER_ONE()` macro. This should make things simpler for Justin.
The tests run against debug builds, which will have very different stack usage patterns compared to optimised builds. I wouldn't worry too much about changing anything other than the default recursion limit (which is different for debug vs. release), and then do a buildbot run to check the optimised builds. Otherwise, the best way to reduce stack usage will be to reduce the size of local variables, possibly by refactoring into separate functions so that the variables are only allocated temporarily and don't remain on the stack as the Python code recurses. |
But the whole exercise is to merge two functions in one because we want to be able to transfer back and forth using I will see by how much the debug recursion limit needs to be adjusted to make the tests pass; I think I have only two extra local variables left. |
🤖 New build scheduled with the buildbot fleet by @gvanrossum for commit fdf1a2f 🤖 If you want to schedule another build, you need to add the 🔨 test-with-buildbots label again. |
If this now passes the tests and the buildbots don't freak out I am going to merge this. However, @markshannon, could you have a look at how I fixed |
Another fix would be to increase the value on this line: cpython/PCbuild/python.vcxproj Line 98 in 5d6db16
That's what determines how much stack is available, and running out is what causes the hard crash. As you can see, it's already 4x larger for debug builds than releases, but it's still only 8MB and so making it larger isn't really going to hurt anyone much. (I'll note that I haven't looked through the whole PR, so I don't know exactly what is causing it or whether the existing recursion limit comes into play at all. I'm just throwing out helpful pointers and trusting that you guys know what's actually going on :) ) |
Ah, thanks. That makes more sense. I don't know exactly what's going on either, but I know that I've added some local variables to the function containing the main interpreter loop, and that causes hard crashes in a bunch of tests that recurse deeply (at the C level). I'm now applying your fix instead of my previous hacks -- except I'm keeping the bit in I've just pushed that and if tests still pass I'll merge. |
…preter (python#111428) - There is no longer a separate Python/executor.c file. - Conventions in Python/bytecodes.c are slightly different -- don't use `goto error`, you must use `GOTO_ERROR(error)` (same for others like `unused_local_error`). - The `TIER_ONE` and `TIER_TWO` symbols are only valid in the generated (.c.h) files. - In Lib/test/support/__init__.py, `Py_C_RECURSION_LIMIT` is imported from `_testcapi`. - On Windows, in debug mode, stack allocation grows from 8MiB to 12MiB. - **Beware!** This changes the env vars to enable uops and their debugging to `PYTHON_UOPS` and `PYTHON_LLTRACE`.
…preter (python#111428) - There is no longer a separate Python/executor.c file. - Conventions in Python/bytecodes.c are slightly different -- don't use `goto error`, you must use `GOTO_ERROR(error)` (same for others like `unused_local_error`). - The `TIER_ONE` and `TIER_TWO` symbols are only valid in the generated (.c.h) files. - In Lib/test/support/__init__.py, `Py_C_RECURSION_LIMIT` is imported from `_testcapi`. - On Windows, in debug mode, stack allocation grows from 8MiB to 12MiB. - **Beware!** This changes the env vars to enable uops and their debugging to `PYTHON_UOPS` and `PYTHON_LLTRACE`.
…preter (python#111428) - There is no longer a separate Python/executor.c file. - Conventions in Python/bytecodes.c are slightly different -- don't use `goto error`, you must use `GOTO_ERROR(error)` (same for others like `unused_local_error`). - The `TIER_ONE` and `TIER_TWO` symbols are only valid in the generated (.c.h) files. - In Lib/test/support/__init__.py, `Py_C_RECURSION_LIMIT` is imported from `_testcapi`. - On Windows, in debug mode, stack allocation grows from 8MiB to 12MiB. - **Beware!** This changes the env vars to enable uops and their debugging to `PYTHON_UOPS` and `PYTHON_LLTRACE`.
See also faster-cpython/ideas#631
Beware! This changes the env vars to enable uops and their debugging to
PYTHON_UOPS
andPYTHON_LLTRACE
.