8000 gh-114058: The Tier2 Optimizer by Fidget-Spinner · Pull Request #114059 · python/cpython · GitHub
[go: up one dir, main page]

Skip to content

gh-114058: The Tier2 Optimizer #114059

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
116 commits
Select commit Hold shift + click to select a range
29db898
abstract interp
Fidget-Spinner Jan 13, 2024
c1332cc
the abstract interpreter
Fidget-Spinner Jan 13, 2024
9d85c35
cleanup
Fidget-Spinner Jan 13, 2024
76cee0c
run black
Fidget-Spinner Jan 13, 2024
b71aa06
the optimizer
Fidget-Spinner Jan 14, 2024
f0e5dec
fix a whole bunch of bugs
Fidget-Spinner Jan 14, 2024
52e368f
properly handle runtime self_or_null
Fidget-Spinner Jan 14, 2024
a273a2f
fix faulty assertion
Fidget-Spinner Jan 14, 2024
60a1d79
fix build
Fidget-Spinner Jan 15, 2024
7077ad5
fix all tests except test_capi and maybe test_ctypes
Fidget-Spinner Jan 16, 2024
5169bf3
run black and re-enable tests
Fidget-Spinner Jan 16, 2024
0929bb8
📜🤖 Added by blurb_it.
blurb-it[bot] Jan 16, 2024
70ee73e
Merge remote-track 8000 ing branch 'upstream/main' into tier2_abstract_inte…
Fidget-Spinner Jan 18, 2024
7d66440
check for buffer overruns
Fidget-Spinner Jan 18, 2024
6b18e30
expand the ir
Fidget-Spinner Jan 18, 2024
4b1dff4
fix return path
Fidget-Spinner Jan 18, 2024
6a61b18
fix some refleaks
Fidget-Spinner Jan 19, 2024
9e32f75
basic value numbering
Fidget-Spinner Jan 19, 2024
119548e
force enable uops
Fidget-Spinner Jan 19, 2024
27ce303
allow more memory
Fidget-Spinner Jan 19, 2024
8915762
fix some refleaks, test value numbering
Fidget-Spinner Jan 20, 2024
0974dad
fix all the warnings
Fidget-Spinner Jan 20, 2024
c9cd8a8
fix test
Fidget-Spinner Jan 20, 2024
eb56a90
fix refleak tests
Fidget-Spinner Jan 20, 2024
c9c7854
lint
Fidget-Spinner Jan 20, 2024
3e64d1f
get rid of all compiler warnings
Fidget-Spinner Jan 20, 2024
df3b938
fix smelly
Fidget-Spinner Jan 20, 2024
81a859b
add comments, cleanup
Fidget-Spinner Jan 20, 2024
7280281
exclude abstract interp cases from C analyzer
Fidget-Spinner Jan 20, 2024
ac6e29f
fix c-globals check
Fidget-Spinner Jan 20, 2024
9e5ef68
fix refleak
Fidget-Spinner Jan 24, 2024
1f27abb
peepholing, make _CHECK_PEP_523 a guard
Fidget-Spinner Jan 24, 2024
c88a8c6
fix bug
Fidget-Spinner Jan 24, 2024
307c66f
make things const?
Fidget-Spinner Jan 24, 2024
19ce538
allow for more scratch space, but keep traces same
Fidget-Spinner Jan 24, 2024
c701581
Merge remote-tracking branch 'upstream/main' into tier2_abstract_inte…
Fidget-Spinner Jan 24, 2024
3353996
fix upstream changes
Fidget-Spinner Jan 24, 2024
543c827
apply same peephole from upstream
Fidget-Spinner Jan 24, 2024
6e55480
fix eval frame
Fidget-Spinner Jan 24, 2024
4e74c5b
use uops as IR
Fidget-Spinner Jan 25, 2024
6829844
fix error cases
Fidget-Spinner Jan 25, 2024
1de0ccb
fix leak, fix peepholer
Fidget-Spinner Jan 25, 2024
d2917b7
documentation, cleanup whitespace
Fidget-Spinner Jan 25, 2024
553ac53
reduce mem usage
Fidget-Spinner Jan 25, 2024
4555b0c
remove INIT_FAST add ignored
Fidget-Spinner Jan 25, 2024
4155d84
cleanup
Fidget-Spinner Jan 25, 2024
e48a794
Add back peephole for constants
Fidget-Spinner Jan 25, 2024
7e7ba2d
fix compiler warning
Fidget-Spinner Jan 25, 2024
9aa7ccd
cut the constant factor
Fidget-Spinner Jan 25, 2024
d3d1e60
loop peeling
Fidget-Spinner Jan 25, 2024
d8d82e0
loop unrolling done
Fidget-Spinner Jan 25, 2024
9692146
peel only a single loops
Fidget-Spinner Jan 25, 2024
fe648e2
reorder to fix macos
Fidget-Spinner Jan 25, 2024
2dc0d0c
please fix mac
Fidget-Spinner Jan 25, 2024
b132a87
fix test
Fidget-Spinner Jan 25, 2024
8a726e0
slightly reduce dispatch overhead
Fidget-Spinner Jan 25, 2024
0d9df64
fixx off-by-one error
Fidget-Spinner Jan 25, 2024
a72d6ef
peel less aggressively, clear peepholer
Fidget-Spinner Jan 26, 2024
9599113
undo loop peeling
Fidget-Spinner Jan 26, 2024
7de4b37
Revert "undo loop peeling"
Fidget-Spinner Jan 26, 2024
acc6490
add more memory for everything
Fidget-Spinner Jan 26, 2024
cf59bba
peephole on failure
Fidget-Spinner Jan 26, 2024
41882ce
add stats collection
Fidget-Spinner Jan 27, 2024
797bc1e
Revert "add more memory for everything"
Fidget-Spinner Jan 27, 2024
22da280
remove bad redefinition
Fidget-Spinner Jan 27, 2024
64fa51c
Partially address Guido's review
Fidget-Spinner Jan 28, 2024
262f978
remove sym_copy_type_number
Fidget-Spinner Jan 28, 2024
4a9e2b4
remove symbolic value, use symbolic type
Fidget-Spinner Jan 28, 2024
5868fb8
rename confusing name
Fidget-Spinner Jan 28, 2024
5a3f44f
Address Guido's review, 4x function cache
Fidget-Spinner Jan 28, 2024
f450646
loop peel more often, add stats for loop peeling
Fidget-Spinner Jan 28, 2024
d226882
Add `(void)found` to silence compiler warning on L802
gvanrossum Jan 28, 2024
9a3585a
convert frame and context to non-PyObjects
Fidget-Spinner Jan 29, 2024
882b48c
add test for freed functions
Fidget-Spinner Jan 29, 2024
551466f
cleanup
Fidget-Spinner Jan 29, 2024
17a989c
general cleanups
Fidget-Spinner Jan 29, 2024
fcdc84c
make static
Fidget-Spinner Jan 29, 2024
065b8a4
add comment by Guido
Fidget-Spinner Jan 30, 2024
5be3458
Merge remote-tracking branch 'upstream/main' into tier2_abstract_inte…
Fidget-Spinner Jan 30, 2024
580dd14
fix _JUMP_ABSOLUTE
Fidget-Spinner Jan 28, 2024
d603792
remove unsued var
Fidget-Spinner Jan 30, 2024
f206bd0
use iterative instead of recursive
Fidget-Spinner Jan 30, 2024
17b4ae3
fix bad test on aarch64 linux
Fidget-Spinner Jan 30, 2024
913d95b
remove non-compliant test
Fidget-Spinner Jan 30, 2024
425b40d
fix compiler warnings
Fidget-Spinner Jan 30, 2024
2c884e2
low hanging fruit in Guido's review
Fidget-Spinner Jan 31, 2024
d7d8e8c
cleanup more
Fidget-Spinner Jan 31, 2024
47ee732
Remove abstractframe_dealloc, and frame->prev
Fidget-Spinner Jan 31, 2024
01fb224
update documentation
Fidget-Spinner Jan 31, 2024
63f8abd
remove peephole pass in happy case!
Fidget-Spinner Jan 31, 2024
784d171
bail to tier 1 on failure, remove peepholer altogether
Fidget-Spinner Jan 31, 2024
79ba2be
change error codes
Fidget-Spinner Jan 31, 2024
07ba964
Merge remote-tracking branch 'upstream/main' into tier2_abstract_inte…
Fidget-Spinner Feb 2, 2024
50154ad
Merge remote-tracking branch 'upstream/main' into tier2_abstract_inte…
Fidget-Spinner Feb 2, 2024
a0b58b1
Integrate the constant promoter
Fidget-Spinner Feb 2, 2024
4573fca
fix test
Fidget-Spinner Feb 2, 2024
2096a8a
try fix memleak?
Fidget-Spinner Feb 2, 2024
4ad2804
Revert "try fix memleak?"
Fidget-Spinner Feb 2, 2024
224f6b4
fix memleak for real
Fidget-Spinner Feb 2, 2024
b784617
add more constant prop tests for promoted globals
Fidget-Spinner Feb 2, 2024
2983553
refactor
Fidget-Spinner Feb 2, 2024
6a6b11f
Simplify codegen -- either types or consts, not both
Fidget-Spinner Feb 4, 2024
cfe1de0
cleanup, reduce memory usage even more
Fidget-Spinner Feb 4, 2024
5ad6211
remove more memory allocations
Fidget-Spinner Feb 4, 2024
73daf91
zero dynamic memory allocation
Fidget-Spinner Feb 4, 2024
cac3616
forgot return
Fidget-Spinner Feb 4, 2024
e7df93a
streamline frame creation
Fidget-Spinner Feb 4, 2024
65fae9a
more cleanup
Fidget-Spinner Feb 4, 2024
3fec959
fix bug in return code
Fidget-Spinner Feb 4, 2024
dde7d12
fix on MSVC
Fidget-Spinner Feb 4, 2024
04c902b
Add back constant evaluation
Fidget-Spinner Feb 4, 2024
05e93c2
cleanup
Fidget-Spinner Feb 4, 2024
55f9bcb
add type annotation
Fidget-Spinner Feb 4, 2024
0726766
make const zapping more error-proof
Fidget-Spinner Feb 4, 2024
a8adada
fix stack metadata
Fidget-Spinner Feb 4, 2024
ab60387
address reviews
Fidget-Spinner Feb 6, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
fix leak, fix peepholer
  • Loading branch information
Fidget-Spinner committed Jan 25, 2024
commit 1de0ccb5979e719fa62c120bf6e43b997fe43d85
22 changes: 22 additions & 0 deletions Lib/test/test_capi/test_opt.py
Original file line number Diff line number Diff line change
Expand Up @@ -566,6 +566,28 @@ def testfunc(loops):
uops = {opname for opname, _, _ in ex}
self.assertNotIn("_SHRINK_STACK", uops)

def test_int_constant_propagation_many(self):
def testfunc(loops):
num = 0
for _ in range(loops):
x = 0
y = 1
a = x + y + x + y + x + y + x + y
return a

opt = _testinternalcapi.get_uop_optimizer()
res = None
with temporary_optimizer(opt):
res = testfunc(64)

ex = get_first_executor(testfunc)
self.assertIsNotNone(ex)
self.assertEqual(res, 4)
binop_count = [opname for opname, _, _ in ex if opname == "_BINARY_OP_ADD_INT"]
self.assertEqual(len(binop_count), 0)
uops = {opname for opname, _, _ in ex}
self.assertNotIn("_SHRINK_STACK", uops)

def test_int_type_propagation(self):
def testfunc(loops):
num = 0
Expand Down
62 changes: 46 additions & 16 deletions Python/optimizer_analysis.c
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,8 @@

#define OVERALLOCATE_FACTOR 3

#define PEEPHOLE_MAX_ATTEMPTS 10

#ifdef Py_DEBUG
static const char *const DEBUG_ENV = "PYTHON_OPT_DEBUG";
#define DPRINTF(level, ...) \
Expand Down Expand Up @@ -227,6 +229,7 @@ abstractinterp_dealloc(PyObject *o)
}
PyMem_Free(self->t_arena.arena);
PyMem_Free(self->s_arena.arena);
PyMem_Free(self->frame_info.arena);
Py_TYPE(self)->tp_free((PyObject *)self);
}

Expand Down Expand Up @@ -1207,7 +1210,7 @@ uop_abstract_interpret_single_inst(

}

int
static int
uop_abstract_interpret(
PyCodeObject *co,
_PyUOpInstruction *trace,
Expand Down Expand Up @@ -1323,34 +1326,56 @@ remove_unneeded_uops(_PyUOpInstruction *buffer, int buffer_size)
}
}

static void
static inline bool
op_is_zappable(int opcode)
{
switch(opcode) {
case _SET_IP:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is missing _NOP. It seems a fine opcode to back over in the peepholer. :-)

Copy link
Member
@gvanrossum gvanrossum Jan 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interestingly, after I do this, I believe we won't need to run the peepholer more than once. (I put in assert(peephole_attempts == 0 || done); after the call and it didn't fail a thing.) SO that would simplify things a bit.

I also have a more radical idea: we can move the two things that the peepholer currently does (_SHRINK_STACK and _CHECK_PEP_523) to hand-written cases in uop_abstract_interpret_single_inst(). Then we don't need the peepholer at all any more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, so that doesn't work for _SHRINK_STACK, because it's being generated by uop_abstract_interpret_single_inst(), not consumed. But it could probably be moved to emit_const() quite easily.

The advantage of doing it on the fly instead of in a separate peephole pass is that we reduce the risk that we'll run out of buffer space -- the current code occasionally emits more instructions than the input (namely whenever it emits _SHRINK_STACK), which potentially (if we get very close to the limit) could cause the optimizer to fail even if, using a longer buffer, it would have succeeded and produced a shorter result. There would still be worst-case scenarios where we'd end up with many non-zappable _SHRINK_STACK uops, but in most cases the instant zapping would free up output space.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now actually the peepholer only needs to run once. I just let it run a few times just to be sure.

My intuition why it can eliminate all _SHRINK_STACK on the first run is that since we are operating on a bytecode stack machine IR, which is operating forwards.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But see my counter-example of a walrus: A + (X := B*C).

Either way it seems we both agree that _SHRINK_STACK is a temporary crutch.

case _CHECK_VALIDITY:
case _LOAD_CONST_INLINE:
case _LOAD_CONST:
case _LOAD_FAST:
case _LOAD_CONST_INLINE_BORROW:
return true;
default:
return false;
}
}

static inline bool
op_is_load(int opcode)
{
return (opcode == _LOAD_CONST_INLINE ||
opcode == _LOAD_CONST ||
opcode == LOAD_FAST ||
opcode == _LOAD_CONST_INLINE_BORROW);
}

static int
peephole_optimizations(_PyUOpInstruction *buffer, int buffer_size)
{
bool done = true;
for (int i = 0; i < buffer_size; i++) {
_PyUOpInstruction *curr = buffer + i;
int oparg = curr->oparg;
switch(curr->opcode) {
case _SHRINK_STACK: {
// If all that precedes a _SHRINK_STACK is a bunch of LOAD_FAST,
// then we can safely eliminate that without side effects.
int load_fast_count = 0;
int load_count = 0;
_PyUOpInstruction *back = curr-1;
while((back->opcode == _SET_IP ||
back->opcode == _CHECK_VALIDITY ||
back->opcode == LOAD_FAST) &&
load_fast_count < oparg) {
load_fast_count += back->opcode == LOAD_FAST;
while(op_is_zappable(back->opcode) &&
load_count < oparg) {
load_count += op_is_load(back->opcode);
back--;
}
if (load_fast_count == oparg) { 8000
if (load_count == oparg) {
done = false;
curr->opcode = NOP;
back = curr-1;
load_fast_count = 0;
while((back->opcode == _SET_IP ||
back->opcode == _CHECK_VALIDITY ||
back->opcode == LOAD_FAST) &&
load_fast_count < oparg) {
load_fast_count += back->opcode == LOAD_FAST;
load_count = 0;
while(load_count < oparg) {
load_count += op_is_load(back->opcode);
back->opcode = NOP;
back--;
}
Expand All @@ -1368,6 +1393,7 @@ peephole_optimizations(_PyUOpInstruction *buffer, int buffer_size)
break;
}
}
return done;
}


Expand Down Expand Up @@ -1396,7 +1422,11 @@ _Py_uop_analyze_and_optimize(
goto error;
}

peephole_optimizations(temp_writebuffer, new_trace_len);
for (int peephole_attempts = 0; peephole_attempts < PEEPHOLE_MAX_ATTEMPTS &&
!peephole_optimizations(temp_writebuffer, new_trace_len);
peephole_attempts++) {

}

remove_unneeded_uops(temp_writebuffer, new_trace_len);

Expand Down
0