8000 GH-112354: Initial implementation of warm up on exits and trace-stitching by markshannon · Pull Request #114142 · python/cpython · GitHub
[go: up one dir, main page]

Skip to content
8000

GH-112354: Initial implementation of warm up on exits and trace-stitching #114142

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 50 commits into from
Feb 20, 2024
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
50 commits
Select commit Hold shift + click to select a range
1975b4c
Cold exits: Work in progress.
markshannon Jan 10, 2024
9fb97f7
Merge branch 'main' into cold-exits
markshannon Jan 11, 2024
f9aa235
Optimize on side exits
markshannon Jan 11, 2024
1288258
Merge branch 'main' into cold-exits
markshannon Jan 11, 2024
7c6267a
Modify internal interfaces
markshannon Jan 11, 2024
55c48e8
Merge branch 'main' into cold-exits
markshannon Jan 11, 2024
8b3c2e0
Jump to next executor without updating current_executor.
markshannon Jan 11, 2024
92a3b61
Support cycle GC for executors.
markshannon Jan 12, 2024
e3def48
Give cold exits their own class, to fix GC handling of exits
markshannon Jan 16, 2024
87e544b
Generate table of cold exits
markshannon Jan 16, 2024
2172d68
Treat EXIT_TRACE as a side exit
markshannon Jan 16, 2024
4448793
Treat most common guard failures as side exits
markshannon Jan 16, 2024
c70f12f
Tweak generated tble to help C analyzer
markshannon Jan 16, 2024
d73fe0a
Add some documentation about the tier 2 engine
markshannon Jan 16, 2024
5c8f0bd
Fix constness and rename hotness
markshannon Jan 17, 2024
140486b
Add new static objects to ignored file.
markshannon Jan 17, 2024
3362c93
Address review comments
markshannon Jan 18, 2024
63fe653
Transfer executor on thread-state and othe minor changes to be more j…
markshannon Feb 8, 2024
b0991a7
Merge branch 'main' into cold-exits
markshannon Feb 8, 2024
625bce2
Get side exits to build with jit enabled.
markshannon Feb 9, 2024
e191fd7
Initialize cold exits dynamically on demand
markshannon Feb 9, 2024
941a14c
Tidy tier 2 code a bit
markshannon Feb 9, 2024
cfd3285
Add Brandt's fixes
markshannon Feb 9, 2024
1025495
Free the correct amount of memory
markshannon Feb 9, 2024
171dad7
Merge branch 'main' into cold-exits
markshannon Feb 9, 2024
e6ca3fe
Remove unreachable code
markshannon Feb 9, 2024
308b2a7
Merge branch 'main' into cold-exits
markshannon Feb 9, 2024
518143e
Clear executors attached to exits when clearing executors
markshannon Feb 9, 2024
9d8cab8
Merge branch 'main' into cold-exits
markshannon Feb 9, 2024
bf07dad
Merge branch 'main' into cold-exits
markshannon Feb 9, 2024
19b6b84
Keep c-analyzer happy
markshannon Feb 9, 2024
c959e8f
Merge branch 'main' into cold-exits
markshannon Feb 14, 2024
f393ba5
Use threshold for side exits
markshannon Feb 14, 2024
bd66b01
Statically allocate cold exits
markshannon Feb 14, 2024
fe75484
Handle errors in JIT compile
markshannon Feb 14, 2024
3d0110c
Merge branch 'main' into cold-exits
markshannon Feb 14, 2024
de93130
Fix possible leak
markshannon Feb 14, 2024
77a6740
Fix refleak transfering from JIT to tier 1
markshannon Feb 14, 2024
0a61d29
Check that only one of EXIT_IF and DEOPT_IF is present
markshannon Feb 14, 2024
b3e306d
Address review comments
markshannon Feb 14, 2024
8f3aa33
Make exit_index 32 bits to avoid endianness issues in JIT
markshannon Feb 14, 2024
7c84967
Run black
markshannon Feb 15, 2024
8ee6710
Address code review
markshannon Feb 15, 2024
f37d7fc
Update comment
markshannon Feb 15, 2024
1f8967d
Address review comments
markshannon Feb 15, 2024
8e4c601
Fix compiler warning
markshannon Feb 15, 2024
4eb2cfc
Address review comments
markshannon Feb 15, 2024
ebe804f
Add missing brace
markshannon Feb 15, 2024
c38d4e8
Address review comments
markshannon Feb 15, 2024
830eb4e
Keep c-analyzer quiet
markshannon Feb 15, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Address review comments
  • Loading branch information
markshannon committed Jan 18, 2024
commit 3362c9322a3baa723442c75a121d33f63165f116
7 changes: 5 additions & 2 deletions Include/cpython/optimizer.h
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,10 @@ typedef struct {
typedef struct {
uint16_t opcode;
uint16_t oparg;
uint32_t target;
union {
uint32_t target;
uint16_t exit_index;
};
uint64_t operand; // A cache entry
} _PyUOpInstruction;

Expand All @@ -57,7 +60,7 @@ typedef struct _cold_exit {
} _PyColdExitObject;


extern _PyColdExitObject Py_FatalErrorExecutor;
extern _PyColdExitObject Py_NeverExecutedExecutor;

typedef struct _PyOptimizerObject _PyOptimizerObject;

Expand Down
11 changes: 6 additions & 5 deletions Python/bytecodes.c
Original file line number Diff line number Diff line change
Expand Up @@ -2338,7 +2338,7 @@ dummy_func(
ERROR_IF(optimized < 0, error);
if (optimized) {
assert(current_executor == NULL);
current_executor = (_PyExecutorObject *)&Py_FatalErrorExecutor;
current_executor = (_PyExecutorObject *)&Py_NeverExecutedExecutor;
next_uop = executor->trace;
GOTO_TIER_TWO();
}
Expand Down Expand Up @@ -2374,7 +2374,7 @@ dummy_func(
_PyExecutorObject *executor = code->co_executors->executors[oparg & 255];
if (executor->vm_data.valid) {
assert(current_executor == NULL);
current_executor = (_PyExecutorObject *)&Py_FatalErrorExecutor;
current_executor = (_PyExecutorObject *)&Py_NeverExecutedExecutor;
Py_INCREF(executor);
next_uop = executor->trace;
GOTO_TIER_TWO();
Expand Down Expand Up @@ -4005,13 +4005,13 @@ dummy_func(

inst(CACHE, (--)) {
TIER_ONE_ONLY
assert(0);
assert(0 && "Executing a cache.");
Py_FatalError("Executing a cache.");
}

inst(RESERVED, (--)) {
TIER_ONE_ONLY
assert(0);
assert(0 && "Executing RESERVED instruction.");
Py_FatalError("Executing RESERVED instruction.");
}

Expand Down Expand Up @@ -4120,8 +4120,9 @@ dummy_func(

op(_START_EXECUTOR, (executor/4 --)) {
TIER_TWO_ONLY
Py_DECREF(current_executor);
_PyExecutorObject *old = current_executor;
current_executor = (_PyExecutorObject*)executor;
Py_DECREF(old);
}

op(_FATAL_ERROR, (--)) {
Expand Down
6 changes: 3 additions & 3 deletions Python/ceval.c
Original file line number Diff line number Diff line change
Expand Up @@ -1079,10 +1079,10 @@ _PyEval_EvalFrameDefault(PyThreadState *tstate, _PyInterpreterFrame *frame, int
side_exit:
OPT_HIST(trace_uop_execution_counter, trace_run_length_hist);
UOP_STAT_INC(uopcode, miss);
uint32_t exit_id = next_uop[-1].target;
_PyExitData *exit = &current_executor->exits[exit_id];
uint16_t exit_index = next_uop[-1].exit_index;
_PyExitData *exit = &current_executor->exits[exit_index];
DPRINTF(2, "SIDE EXIT: [UOp %d (%s), oparg %d, operand %" PRIu64 ", exit %u, temp %d, target %d -> %s]\n",
uopcode, _PyUOpName(uopcode), next_uop[-1].oparg, next_uop[-1].operand, exit_id, exit->temperature, exit->target,
uopcode, _PyUOpName(uopcode), next_uop[-1].oparg, next_uop[-1].operand, exit_index, exit->temperature, exit->target,
_PyOpcode_OpName[_PyCode_CODE(_PyFrame_GetCode(frame))[exit->target].op.code]);
Py_INCREF(exit->executor);
next_uop = exit->executor->trace;
Expand Down
3 changes: 2 additions & 1 deletion Python/executor_cases.c.h

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

8 changes: 4 additions & 4 deletions Python/generated_cases.c.h

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

6 changes: 3 additions & 3 deletions Python/optimizer.c
Original file line number Diff line number Diff line change
Expand Up @@ -828,11 +828,11 @@ allocate_executor(int exit_count, int length)
return res;
}

_PyColdExitObject Py_FatalErrorExecutor = {
_PyColdExitObject Py_NeverExecutedExecutor = {
.base = {
PyVarObject_HEAD_INIT(&_ColdExit_Type, 0)
.vm_data = { 0 },
.trace = &Py_FatalErrorExecutor.uop
.trace = &Py_NeverExecutedExecutor.uop
},
.uop.opcode = _FATAL_ERROR,
};
Expand Down Expand Up @@ -875,7 +875,7 @@ make_executor_from_uops(_PyUOpInstruction *buffer, _PyBloomFilter *dependencies)
}
if (_PyUop_Flags[opcode] & HAS_EXIT_FLAG) {
executor->exits[next_exit].target = buffer[i].target;
dest->target = next_exit;
dest->exit_index = next_exit;
next_exit--;
}
/* Set the oparg to be the destination offset,
Expand Down
59 changes: 41 additions & 18 deletions Python/tier2_engine.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,8 +9,8 @@ instead of the tier 1 bytecode.
Since each executor must exit, we also track the "hotness" of those
exits and attach new executors to those exits.

That way we form a graph of executors that covers the most frequently
executed parts of the program.
As the program executes, and the hot parts of the program get optimized,
a graph of executors will form.

## Superblocks and Executors

Expand All @@ -20,7 +20,7 @@ using information gathered by tier 1 to guide that projection to
form a "superblock", a mostly linear sequence of micro-ops.
Although mostly linear, it may include a single loop.

We then optimize this superblock to from an optimized superblock,
We then optimize this superblock to form an optimized superblock,
which is equivalent but more efficient.

A superblock is a representation of the code we want to execute,
Expand All @@ -30,13 +30,15 @@ The executable form is know as an executor.
Executors are semantically equivalent to the superblock they are
created from, but are in a form that can be efficiently executable.

There are two execution engines for executors, adn two types of executors:
There are two execution engines for executors, and two types of executors:
* The hardware which runs machine code executors created by the JIT compiler.
* The tier 2 interpreter runs bytecode executors.

The choice of engine is a configuration option.
We will not support both the tier 2 interpreter and JIT in a
single executable.
It would be very wasteful to support both a tier 2 interpreter and
JIT compiler in the same process.
For now, we will make the choice of engine a configuration option,
but we could make it a command line option in the future if that would prove useful.


### Tier 2 Interpreter

Expand Down Expand Up @@ -64,6 +66,18 @@ Therefore, we want to make those transfers fast.

### Tier 2 to tier 2

#### Cold exits

All side exits start cold and most stay cold, but a few become
hot. We want to keep the memory consumption small for the many
cold exits, but those that become hot need to be fast.
However we cannot know in advance, which will be which.

So that tier 2 to tier 2 transfers are fast for hot exits,
exits must be implemented as executors. In order to patch
executor exits when they get hot, a pointer to the current
executor must be passed to the exit executor.

#### Handling reference counts

There must be an implicit reference to the currently executing
Expand All @@ -72,11 +86,17 @@ Consequently, we must increment the reference count of an
executor just before executing it, and decrement it just after
executing it.

Note that since executors are objects, they can contain references to
themselves, which means we do not need to pass a reference to an
executor when we start to execute it.
We want to minimize the amount of data that is passed from
one executor to the next. In the JIT, this reduces the number
of arguments in the tailcall, freeing up registers for other uses.
It is less important in the interpreter, but follwing the same
design as the JIT simplifies debugging and is good for performance.

Provided that we incref the new executor before executing it, we
can jump directly to the code of the executor, without needing
to pass a reference to that executor object.
However, we do need to pass a reference to the previous executor,
so that it can be decref'd.
so that it can be decref'd and for handling of cold exits.

#### The interpreter

Expand All @@ -91,26 +111,29 @@ points to the currently live executor. When transfering from executor
3. Start executing `B`

We also make the first instruction in `B` do the following:
1. Decrement the reference count of `A` (through `current_executor`)
2. Set `current_executor` to point to `B`
1. Set `current_executor` to point to `B`
2. Decrement the reference count of `A`

The net effect of the above is to safely decrement the refcount of `A`,
increment the refcount of `B` and set `current_executor` to point to `B`.

#### In the JIT

Transfering control form one executor to another is done via tailcalls.
Transfering control from one executor to another is done via tailcalls.

The compiled executor should do the same, except that there is no local
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same as what?

variable `current_executor`, so that the old executor, `A`, must be passed
as an additional parameter when tailcalling.

### Tier 1 to tier 2

We create a single, immortal executor that cannot be executed.
We can then perform a tier 1 to tier 2 transfer,
by setting `current_executor` to this singleton, and then performing
a tier 2 to tier 2 transfer as above.
Since the executor doesn't know if the previous code was tier 1 or tier 2,
we need to make a transfer from tier 1 to tier 2 look like a tier 2 to tier 2
transfer to the executor.

To do that, we create a single, immortal executor that cannot be executed.
We can then perform a tier 1 to tier 2 transfer by setting `current_executor`
to this singleton, and then performing a tier 2 to tier 2 transfer as above.

### Tier 2 to tier 1

Expand Down
0