gh-111520: Integrate the Tier 2 interpreter in the Tier 1 interpreter #111428

gvanrossum · 2023-10-28T21:07:09Z

Beware! This changes the env vars to enable uops and their debugging to PYTHON_UOPS and PYTHON_LLTRACE.

Issue: Merge Tier 1 and 2 interpreters into a single function #111520

Use `GOTO_ERROR(error)` instead of `goto error`. This macro is defined differently in the tier two interpreter.

This replaces PYTHONUOPSDEBUG=N. The meaning of N is the same (for now): 0: no tracing 1: print when tier 2 trace created 2: print contents of tier 2 trace 3: print every uop executed 4: print optimization attempts and details 5: print tier 1 instructions and stack

gvanrossum · 2023-10-29T00:46:01Z

With uops always enabled, only test_embed fails, but that's the same on main (see GH-111339). I added that to benchmark Tier 2 perf (who knows, the speedier Tier switching may make things faster there).

gvanrossum · 2023-10-29T15:29:21Z

@markshannon Can you review this? If you think this is roughly going in the right direction I will clean it up.

markshannon

Looks promising.

Python/bytecodes.c

Python/ceval.c

markshannon · 2023-10-30T15:59:46Z

Python/ceval.c

+
+    OPT_STAT_INC(traces_executed);
+    _Py_CODEUNIT *ip_offset = (_Py_CODEUNIT *)_PyFrame_GetCode(frame)->co_code_adaptive;
+    _PyUOpInstruction *next_uop = self->trace;


This should have been set by ENTER_EXECUTOR.

Yes and no. It would mean that next_uop would have to be a local in the top-level function scope. I am trying to reduce the number of locals there. For now let me keep it this way. I expect that it will be fine even when we start stitching, because each trace can only be entered at the top and we need the executor (so that we can decref it upon exiting the trace). The initial next_uop is just the executor plus a fixed offset anyway.

Python/ceval.c

markshannon · 2023-10-30T16:02:31Z

Python/ceval.c

+    );
+    goto error_tier_two;
+
+pop_4_error_tier_two:


Can't this be shared with tier 1? Unwinding should work the same.

Alas, not quite. The debug output is different, the stats collection is different, but most importantly, we need to DECREF(self) here before jumping to error.

Also we need to set next_instr = frame->instr_ptr.

- Remove CHECK_EVAL_BREAKER() from top of Tier 2 loop - Make Tier 2 default case Py_UNREACHABLE() in non-debug mode - GOTO_TIER_TWO() macro instead of `goto enter_tier_two` - Inline ip_offset (Tier 2 LOAD_IP() is now a no-op) - Move #define/#under TIER_ONE/TIER_TWO into generated .c.h files - ceval_macros.h defines Tier 1 macros, Tier 2 is inlined in ceval.c

Python/bytecodes.c

Python/ceval.c

gvanrossum · 2023-10-31T22:51:01Z

Python/ceval.c

+
+    OPT_STAT_INC(traces_executed);
+    _Py_CODEUNIT *ip_offset = (_Py_CODEUNIT *)_PyFrame_GetCode(frame)->co_code_adaptive;
+    _PyUOpInstruction *next_uop = self->trace;


Yes and no. It would mean that next_uop would have to be a local in the top-level function scope. I am trying to reduce the number of locals there. For now let me keep it this way. I expect that it will be fine even when we start stitching, because each trace can only be entered at the top and we need the executor (so that we can decref it upon exiting the trace). The initial next_uop is just the executor plus a fixed offset anyway.

Python/ceval.c

gvanrossum · 2023-10-31T23:09:04Z

Python/ceval.c

+    );
+    goto error_tier_two;
+
+pop_4_error_tier_two:


Alas, not quite. The debug output is different, the stats collection is different, but most importantly, we need to DECREF(self) here before jumping to error.

gvanrossum · 2023-10-31T23:10:56Z

Python/ceval.c

+    frame->return_offset = 0;  // Don't leave this random
+    _PyFrame_SetStackPointer(frame, stack_pointer);
+    Py_DECREF(self);
+    goto resume_with_error;


We may not need to reset return_offset; we can also skip syncing stack_pointer; so we could make this slightly faster as follows (but it doesn't matter since errors are presumed to be on the slow path):

Suggested change

frame->return_offset = 0; // Don't leave this random

_PyFrame_SetStackPointer(frame, stack_pointer);

Py_DECREF(self);

goto resume_with_error;

Py_DECREF(self);

next_instr = frame->instr_ptr;

goto error;

gvanrossum

I consider this sufficiently cleaned up to undraft it, and if @markshannon doesn't comment I'll merge tomorrow.

I could noodle endlessly with the code for dropping from tier 2 back into tier 1, with and without errors; but it may be better to put that off until a more careful review.

gvanrossum · 2023-11-01T00:42:59Z

Whoops, back to draft, I accidentally left in the uops-forever cherrypick. Will fix later.

(I accidentally kept this commit after pushing it experimentally. This means I've been testing with uops on all the time, which is actually pretty amazing.) This reverts commit e0e60ce.

Using `GOTO_TIER_ONE()` macro. This should make things simpler for Justin.

zooba · 2023-11-01T17:13:58Z

Alas, we still run out of stack space on the recursion tests on Windows. I'll keep working on this.

The tests run against debug builds, which will have very different stack usage patterns compared to optimised builds. I wouldn't worry too much about changing anything other than the default recursion limit (which is different for debug vs. release), and then do a buildbot run to check the optimised builds.

Otherwise, the best way to reduce stack usage will be to reduce the size of local variables, possibly by refactoring into separate functions so that the variables are only allocated temporarily and don't remain on the stack as the Python code recurses.

gvanrossum · 2023-11-01T17:48:42Z

Otherwise, the best way to reduce stack usage will be to reduce the size of local variables, possibly by refactoring into separate functions so that the variables are only allocated temporarily and don't remain on the stack as the Python code recurses.

But the whole exercise is to merge two functions in one because we want to be able to transfer back and forth using goto rather than calls!

I will see by how much the debug recursion limit needs to be adjusted to make the tests pass; I think I have only two extra local variables left.

bedevere-bot · 2023-11-01T17:49:07Z

🤖 New build scheduled with the buildbot fleet by @gvanrossum for commit fdf1a2f 🤖

If you want to schedule another build, you need to add the 🔨 test-with-buildbots label again.

gvanrossum · 2023-11-01T18:33:44Z

If this now passes the tests and the buildbots don't freak out I am going to merge this. However, @markshannon, could you have a look at how I fixed test_call? This feels like a band-aid.

zooba · 2023-11-01T18:54:22Z

Another fix would be to increase the value on this line:

cpython/PCbuild/python.vcxproj

Line 98 in 5d6db16

    
           <StackReserveSize Condition="$(Configuration) == 'Debug'">8000000</StackReserveSize>

That's what determines how much stack is available, and running out is what causes the hard crash. As you can see, it's already 4x larger for debug builds than releases, but it's still only 8MB and so making it larger isn't really going to hurt anyone much.

(I'll note that I haven't looked through the whole PR, so I don't know exactly what is causing it or whether the existing recursion limit comes into play at all. I'm just throwing out helpful pointers and trusting that you guys know what's actually going on :) )

gvanrossum · 2023-11-01T19:35:34Z

Another fix would be to increase the value on this line:

Ah, thanks. That makes more sense. I don't know exactly what's going on either, but I know that I've added some local variables to the function containing the main interpreter loop, and that causes hard crashes in a bunch of tests that recurse deeply (at the C level). I'm now applying your fix instead of my previous hacks -- except I'm keeping the bit in test.support that dynamically pulls the value of Py_C_RECURSION_LIMIT1 from _testcapi (if it exists) rather than hardcoding a value that must be kept in sync with something defined in pystate.h. (It's technically still the case, but only if _testcapi can't be imported.)

I've just pushed that and if tests still pass I'll merge.

…preter (python#111428) - There is no longer a separate Python/executor.c file. - Conventions in Python/bytecodes.c are slightly different -- don't use `goto error`, you must use `GOTO_ERROR(error)` (same for others like `unused_local_error`). - The `TIER_ONE` and `TIER_TWO` symbols are only valid in the generated (.c.h) files. - In Lib/test/support/__init__.py, `Py_C_RECURSION_LIMIT` is imported from `_testcapi`. - On Windows, in debug mode, stack allocation grows from 8MiB to 12MiB. - **Beware!** This changes the env vars to enable uops and their debugging to `PYTHON_UOPS` and `PYTHON_LLTRACE`.

gvanrossum added 5 commits October 27, 2023 13:24

Make all labels in _PyUopExecute end in _tier_two

97984d3

Use `GOTO_ERROR(error)` instead of `goto error`. This macro is defined differently in the tier two interpreter.

Rename PYTHONUOPS to PYTHON_UOPS for consistency

d1b9c1b

Integrate Tier 2 into _PyEval_EvalFrameDefault

d805312

DO NOT MERGE: Always use -Xuops

e0e60ce

markshannon reviewed Oct 30, 2023

View reviewed changes

gvanrossum changed the title ~~Integrate the Tier 2 interpreter in the Tier 1 interpreter~~ gh-111520: Integrate the Tier 2 interpreter in the Tier 1 interpreter Oct 30, 2023

bedevere-app bot mentioned this pull request Oct 30, 2023

Merge Tier 1 and 2 interpreters into a single function #111520

Closed

gvanrossum added 4 commits October 30, 2023 13:22

Merge branch 'main' into mix-tiers

a720f1a

Merge branch 'main' into mix-tiers

b808f6d

Get rid of separate executor.c file

a0aed59

gvanrossum commented Oct 31, 2023

View reviewed changes

gvanrossum added 2 commits October 31, 2023 16:41

Fix test_generated_cases.py by stripping preprocessor prefix/suffix

5e84476

Eradicate executors.c from Windows build files

917b7a2

gvanrossum commented Nov 1, 2023

View reviewed changes

gvanrossum marked this pull request as ready for review November 1, 2023 00:23

gvanrossum requested a review from a team as a code owner November 1, 2023 00:23

bedevere-app bot added the awaiting core review label Nov 1, 2023

Rename deoptimize_tier_two back to deoptimize (for Justin)

9067eb0

gvanrossum marked this pull request as draft November 1, 2023 00:42

bedevere-app bot removed the awaiting core review label Nov 1, 2023

gvanrossum added 3 commits October 31, 2023 17:43

Fix whitespace

6a4e495

Revert "DO NOT MERGE: Always use -Xuops"

81f1883

(I accidentally kept this commit after pushing it experimentally. This means I've been testing with uops on all the time, which is actually pretty amazing.) This reverts commit e0e60ce.

Add blurb

ee27e73

gvanrossum marked this pull request as ready for review November 1, 2023 04:35

bedevere-app bot added the awaiting core review label Nov 1, 2023

gvanrossum added 3 commits November 1, 2023 08:04

Eliminate 'operand' local variable

a96ac7f

Rename self -> current_executor (TODO: eliminate it?)

e02409d

Move _EXIT_TRACE logic to a separate label

fdf1a2f

Using `GOTO_TIER_ONE()` macro. This should make things simpler for Justin.

gvanrossum added the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Nov 1, 2023

bedevere-bot removed the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Nov 1, 2023

gvanrossum added 4 commits November 1, 2023 11:06

Limit infinite recursion in test_typing

2a6450c

Limit infinite recursion in test_fileio

4783de3

Limit infinite recursion in test_xml_etree

b9516a1

Limit infinite recursion in test_call

33c3fae

gvanrossum requested review from JelleZijlstra and AlexWaygood as code owners November 1, 2023 18:28

AlexWaygood removed their request for review November 1, 2023 18:30

gvanrossum added 3 commits November 1, 2023 11:57

Fix test_call better: adjust Py_C_RECURSION_LIMIT in pystate.h

998e054

Revert unnecessary fixes to recursive tests

19d9d40

Even better fix -- increase stack space on Windows in debug mode

03de1bf

gvanrossum merged commit 7e135a4 into python:main Nov 1, 2023

bedevere-app bot removed the awaiting core review label Nov 1, 2023

gvanrossum deleted the mix-tiers branch November 1, 2023 20:13

mdboom mentioned this pull request Nov 6, 2023

PGO build broken on Windows #111786

Closed

brandtbucher mentioned this pull request Nov 7, 2023

GH-111520: Add back the operand local #111813

Merged

gvanrossum mentioned this pull request Mar 18, 2024

Move Tier 2 interpreter out of _PyEval_EvalFrameDefault #116970

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

gh-111520: Integrate the Tier 2 interpreter in the Tier 1 interpreter #111428

gh-111520: Integrate the Tier 2 interpreter in the Tier 1 interpreter #111428

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gh-111520: Integrate the Tier 2 interpreter in the Tier 1 interpreter #111428

gh-111520: Integrate the Tier 2 interpreter in the Tier 1 interpreter #111428

Uh oh!

Conversation

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!