8000 gh-128045: Mark unknown opcodes as deopting to themselves by DinoV · Pull Request #128044 · python/cpython · GitHub
[go: up one dir, main page]

Skip to content

gh-128045: Mark unknown opcodes as deopting to themselves #128044

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
May 19, 2025

Conversation

DinoV
Copy link
Contributor
@DinoV DinoV commented Dec 17, 2024

When accessing co_code on a code object we'll first run through to do the de-opt https://github.com/python/cpython/blob/main/Objects/codeobject.c#L1706

This removes any unknown opcodes. Instead the deopt table should just recognize unknown opcodes as deopting to themselves, allowing the extensible interpreter loop to consume unknown opcodes.

@DinoV DinoV changed the title Mark unknown opcodes as deopting to themselves gh-128045: Mark unknown opcodes as deopting to themselves Dec 17, 2024
@DinoV DinoV marked this pull request as ready for review December 17, 2024 20:12
@DinoV DinoV requested a review from markshannon as a code owner December 17, 2024 20:12
Comment on lines 251 to 256
for name, deopt in sorted(deopts):
out.emit(f"[{name}] = {deopt},\n")
defined = set(analysis.opmap.values())
for i in range(256):
if i not in defined:
out.emit(f"[{i}] = {i},\n")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we're not testing this at all in cpython, I'd suggest we at least add a couple of assertions to make sure this is correctly covering the range of opcodes:

Suggested change
for name, deopt in sorted(deopts):
out.emit(f"[{name}] = {deopt},\n")
defined = set(analysis.opmap.values())
for i in range(256):
if i not in defined:
out.emit(f"[{i}] = {i},\n")
defined = set(analysis.opmap.values())
for i in range(256):
if i not in defined:
deopts.append((f'{i}', f'{i}'))
assert len(deopts) == 256
assert len(set(x[0] for x in deopts)) == 256
for name, deopt in sorted(deopts):
out.emit(f"[{name}] = {deopt},\n")

facebook-github-bot pushed a commit to facebookincubator/cinder that referenced this pull request Dec 18, 2024
Summary:
When CPython hands out the bytecode it will first do a de-opt on it:

https://www.internalfb.com/code/fbsource/[241e0d0760d4c107a3c0ac2b4914524120b0c909]/third-party/python/3.12/Objects/codeobject.c?lines=1540&reveal=1540

This uses the de-opt table which only defines the known opcodes, meaning unknown opcodes get turned into 0's. We need CPython to at least define unknown opcodes to at least de-opt to themselves.

Upstream PR: python/cpython#128044

Reviewed By: jbower-fb

Differential Revision: D67350914

fbshipit-source-id: 0073efab52da1be775272e7dd9ae5a46468ccb10
@markshannon
Copy link
Member

I think we regard undefined instructions an error.
If you want to pass custom bytecodes to your custom interpreter, a more robust approach might be needed.
We can do the deopt thing for now, but it seems fragile.

For example, one thing I have considered is storing bytecodes in a compact format on disk: Instructions without an oparg would take 1 byte, those with an oparg would take 2. We would then combine the unmarshalling and quickening steps to create the full in-memory form in a single pass. Custom instructions would not survive this process.

@DinoV DinoV force-pushed the deopt_unknown_ops branch from 7048634 to a675bfa Compare April 2, 2025 17:28
@DinoV
Copy link
Contributor Author
DinoV commented Apr 2, 2025

Took me a while to get back to this but I'm finally back :) I applied the changes suggested by @iritkatriel.

I think we regard undefined instructions an error. If you want to pass custom bytecodes to your custom interpreter, a more robust approach might be needed. We can do the deopt thing for now, but it seems fragile.

For example, one thing I have considered is storing bytecodes in a compact format on disk: Instructions without an oparg would take 1 byte, those with an oparg would take 2. We would then combine the unmarshalling and quickening steps to create the full in-memory form in a single pass. Custom instructions would not survive this process.

I think as long as we could still have some way to construct a code object ourselves that would be fine, we'd just may need to implement our own custom unmarshaling logic that could support our opcodes. Obviously that doesn't cover all the ways that things could change in the future but we can try and figure out how to adapt to other potential changes :)

@DinoV DinoV force-pushed the deopt_unknown_ops branch from a675bfa to efe6990 Compare April 2, 2025 17:36
Copy link
Member
@markshannon markshannon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good

@DinoV DinoV merged commit cc9add6 into python:main May 19, 2025
59 checks passed
@miss-islington-app
Copy link

Thanks @DinoV for the PR 🌮🎉.. I'm working now to backport this PR to: 3.14.
🐍🍒⛏🤖

miss-islington pushed a commit to miss-islington/cpython that referenced this pull request May 19, 2025
…onGH-128044)

* Mark unknown opcodes as deopting to themselves
(cherry picked from commit cc9add6)

Co-authored-by: Dino Viehland <dinoviehland@meta.com>
@bedevere-app
Copy link
bedevere-app bot commented May 19, 2025

GH-134228 is a backport of this pull request to the 3.14 branch.

@bedevere-app bedevere-app bot removed the needs backport to 3.14 bugs and security fixes label May 19, 2025
DinoV pushed a commit that referenced this pull request May 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
0