8000 GH-98831: "Generate" the interpreter by gvanrossum · Pull Request #98830 · python/cpython · GitHub
[go: up one dir, main page]

Skip to content

GH-98831: "Generate" the interpreter #98830

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 89 commits into from
Nov 3, 2022
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
89 commits
Select commit Hold shift + click to select a range
089ab0c
New script to generate the ceval switch
gvanrossum Sep 20, 2022
bba5b94
Add 'End regular instructions' marker
gvanrossum Oct 12, 2022
99b0058
Extract to a format closer to interpreter_definition.md
gvanrossum Oct 12, 2022
2939830
Switch to generating new DSL syntax
gvanrossum Oct 12, 2022
65abfc4
Indent output; Always generate stack effect syntax; use argparse
gvanrossum Oct 12, 2022
51dbe28
Check jump flag; tweak begin and end of the script a bit
gvanrossum Oct 12, 2022
a238f69
Fix incorrect output for ??--??
gvanrossum Oct 13, 2022
2d5b1b5
Script to generate cases.h from bytecodes.inst
gvanrossum Oct 13, 2022
ae042be
Write bytecodes.inst into Python directory by default
gvanrossum Oct 13, 2022
4315621
Initial bytecodes.inst
gvanrossum Oct 13, 2022
8c80bfd
Initial cases.h
gvanrossum Oct 13, 2022
7301203
Changes to ceval.c to use cases.h
gvanrossum Oct 13, 2022
d9ac40b
Changes to Makefile.pre.in to generate cases.h
gvanrossum Oct 13, 2022
a5bff0c
Move #endif around
gvanrossum Oct 13, 2022
8c13c55
Write families to .inst file
gvanrossum Oct 13, 2022
34c84bf
Derive stack effect for super-instructions and specializations
gvanrossum Oct 14, 2022
ff70895
Regenerated bytecodes.inst with fewer errors
gvanrossum Oct 14, 2022
78f8591
Support array stack effects
gvanrossum Oct 14, 2022
a29b815
Array stack effects are now in bytecodes.inst
gvanrossum Oct 14, 2022
688bb5d
Add Mark's C lexer
gvanrossum Oct 14, 2022
7def42b
Start writing a real parser
gvanrossum Oct 14, 2022
8fae259
Move cases infrastructure to its own directory
gvanrossum Oct 14, 2022
b998a97
Pristine copy of lexer from Mark's gist
gvanrossum Oct 14, 2022
c44adad
Fix tiny bits in lexer.py
gvanrossum Oct 14, 2022
d0c4398
Move c_lexer.c to lexer.c
gvanrossum Oct 14, 2022
06a3afa
Complete parser (not using it yet):
gvanrossum Oct 15, 2022
33540af
Complete the parser and use it
gvanrossum Oct 15, 2022
1990b61
Use a dataclass for Token; move stuff around to fix tokens
gvanrossum Oct 16, 2022
690d4c5
Make a nice syntax error in lexer.py
gvanrossum Oct 16, 2022
b3cd440
More handy stuff for raising syntax errors
gvanrossum Oct 16, 2022
2ed0e15
Fix family definitions per spec
gvanrossum Oct 19, 2022
371050e
Rename bytecodes.{inst,c} in prep for better tooling support
gvanrossum Oct 19, 2022
ee59b7d
Use a template to generate top/bottom of bytecodes.c
gvanrossum Oct 19, 2022
a782bb1
Add more to bytecodes_template.c
gvanrossum Oct 20, 2022
0ff055a
Let generate_cases.py add the DISPATCH() calls
gvanrossum Oct 20, 2022
8e9b3dc
Refactor the parser in preparation of actually parsing C
gvanrossum Oct 24, 2022
a38e52b
Handle PERIOD properly
gvanrossum Oct 26, 2022
5279fe2
CHECKPOINT -- the start of a better C parser
gvanrossum Oct 27, 2022
35eb36a
Merge remote-tracking branch 'origin/main' into generate-ceval-switch
gvanrossum Oct 27, 2022
1869a8b
Add some type hints to lexer.py
gvanrossum Oct 27, 2022
17466fd
Split cparser into sparser and eparser
gvanrossum Oct 27, 2022
96832de
Identifiers are types too (-ish)
gvanrossum Oct 27, 2022
3c9a567
Begin better treatment of types
gvanrossum Oct 27, 2022
e751e9a
Begin better declarations
gvanrossum Oct 27, 2022
9e9ddbd
Support any number of postfix operators
gvanrossum Oct 27, 2022
5e90a22
Split declaration; support 'for (int i = 0;...)'
gvanrossum Oct 27, 2022
ba3c2ff
Use CHARACTER, not CHAR, for '.' token
gvanrossum Oct 27, 2022
3678d2a
Support string and character literals
gvanrossum Oct 27, 2022
607a22e
Refactor decl_stmt some more
gvanrossum Oct 28, 2022
06247ea
Switch statement (and case and default)
gvanrossum Oct 28, 2022
5ec6a12
String literal concatenation
gvanrossum Oct 28, 2022
5f8a56f
Fix hex and octal numbers
gvanrossum Oct 28, 2022
dc7732f
Hack conditional operator
gvanrossum Oct 28, 2022
a3c3c72
More numeric types, e.g. unsigned long
gvanrossum 8000 Oct 28, 2022
5c91786
Support array declarations and initializers
gvanrossum Oct 28, 2022
4ec2d9b
Fix typo in main
gvanrossum Oct 28, 2022
074be4b
Support some function types in casts
gvanrossum Oct 28, 2022
d140ae6
Support function pointer declarations
gvanrossum Oct 28, 2022
2ad6278
Make Context repr more compact
gvanrossum Oct 28, 2022
2f3e321
Make contextual more generic
gvanrossum Oct 28, 2022
64f62c7
Make Parser derive from SParser, and add test main
gvanrossum Oct 28, 2022
250ded1
CHECKPOINT: generate cases the new way
gvanrossum Oct 28, 2022
4efd6bb
Insert DISPATCH() unless block always exits
gvanrossum Oct 28, 2022
01b3735
More accuracy in DISPATCH() addition
gvanrossum Oct 28, 2022
5215027
📜🤖 Added by blurb_it.
blurb-it[bot] Oct 28, 2022
df0e8d8
Merge remote-tracking branch 'origin/main' into generate-ceval-switch
gvanrossum Oct 28, 2022
08465c9
Update generated cases due to END_FOR refactor
gvanrossum Oct 28, 2022
6111283
Add some TODO comments to lexer.py
gvanrossum Oct 28, 2022
69c5551
Add README.md to Tools/cases_generator
gvanrossum Oct 28, 2022
a7f0c08
Fix trailing whitespace
gvanrossum Oct 29, 2022
46b0def
Fix crash in require() at EOF
gvanrossum Oct 30, 2022
e757b61
More compact Token.__repr__()
gvanrossum Oct 30, 2022
c67f0ba
Proper operator priorities, add comma operator
gvanrossum Oct 30, 2022
98427bf
Format large families more nicely
gvanrossum Oct 31, 2022
4bbf11e
Rename cases.h to generated_cases.c.h
gvanrossum Oct 31, 2022
1911964
Merge remote-tracking branch 'origin/main' into generate-ceval-switch
gvanrossum Oct 31, 2022
6837c86
Regenerate cases after merging latest main
gvanrossum Oct 31, 2022
6f3c993
Separate family members by commas
gvanrossum Oct 31, 2022
4663dfc
Merge remote-tracking branch 'origin/main' into generate-ceval-switch
gvanrossum Nov 1, 2022
1c587bb
Regenerated files after merge from main
gvanrossum Nov 1, 2022
b96f5db
Get rid of the C parser infrastructure
gvanrossum Nov 2, 2022
c42d323
Remove stack effect from bytecodes.c
gvanrossum Nov 2, 2022
2eb908d
Update README.md
gvanrossum Nov 2, 2022
53a0398
Merge remote-tracking branch 'origin/main' into generate-ceval-switch
gvanrossum Nov 2, 2022
6ea380f
Add generated_cases.c.h to .gitattributes as generated file
gvanrossum Nov 2, 2022
d8e6db8
Update from Brandt's code review
gvanrossum Nov 2, 2022
8a312ca
Merge remote-tracking branch 'origin/main' into generate-ceval-switch
gvanrossum Nov 2, 2022
7a327aa
Update generated files after merging from main
gvanrossum Nov 2, 2022
fddc08c
Remove the redundant cases from ceval.c
gvanrossum Nov 3, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Add more to bytecodes_template.c
  • Loading branch information
gvanrossum committed Oct 20, 2022
commit a782bb1dc29eae06bbad716023e79d13e943cd28
26 changes: 24 additions & 2 deletions Python/bytecodes.c
Original file line number Diff line number Diff line change
@@ -1,12 +1,31 @@
#include "Python.h"

#include "opcode.h"
#include "pycore_atomic.h"
#include "pycore_frame.h"

#define inst(name, stack_effect) case name:
void _PyFloat_ExactDealloc(PyObject *);
void _PyUnicode_ExactDealloc(PyObject *);

#define SET_TOP(v) (stack_pointer[-1] = (v))
#define GETLOCAL(i) (frame->localsplus[i])

#define inst(name, stack_effect) case name:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got a nice big pile of red squiggles when I opened bytecodes.c. Here are all of the issues:

  • The inst macro definition has a stack effect parameter that isn't used by any of the actual instructions.
  • Some labels are missing:
    • exception_unwind
    • resume_frame
    • resume_with_error
    • start_frame
  • Some locals are missing:
    • call_shape
    • first_instr
    • throwflag
  • A bunch of other stuff is missing:
    • NAME_ERROR_MSG
    • PEEK
    • PyDictKeyEntry
    • PyDictOrValues
    • PyDictUnicodeEntry
    • PyModuleObject
    • _PyDictOrValues_GetValues
    • _PyListIterObject
    • _PyObject_DictOrValuesPointer
    • _PyOpcode_Deopt
    • _PyRangeIterObject
    • _Py_ID
    • _Py_STR
    • binary_ops
    • struct _dictkeysobject
    • struct _dictvalues
    • struct _is

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I added some more dummy arguments and labels, fixed the inst() macro, and I'm just copying all #includes from ceval.c...

#define family(name) static int family_##name
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's also lots of red in the family section. The family macro expands to static int family_xxx = .... The C preprocessor expands all of the following opcode names into integers, so what C sees is static int family_xxx = 123, 456, 789, .... It complains that 456 and 789 aren't identifiers, since the commas make it think you're trying to declare a bunch of variables. So either the macro or the DSL may need updating here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, that was a last-minute change, I had a + separator there first. I'll put it inside curlies and make the macro define an unsized array.


static void
dummy_func(unsigned char opcode)
dummy_func(
PyThreadState *tstate,
_PyInterpreterFrame *frame,
unsigned char opcode,
unsigned int oparg,
_Py_atomic_int * const eval_breaker,
_PyCFrame cframe,
PyObject *names,
PyObject *consts,
_Py_CODEUNIT *next_instr,
PyObject **stack_pointer
)
{
switch (opcode) {

Expand Down Expand Up @@ -3896,6 +3915,9 @@ dummy_func(unsigned char opcode)
// END BYTECODES //

}
handle_eval_breaker:;
unbound_local_error:;
error:;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These semicolons aren't needed, right?

Suggested change
handle_eval_breaker:;
unbound_local_error:;
error:;
handle_eval_breaker:
unbound_local_error:
error:

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, you're right. It'll have to wait until another time. (The last label does need the ; though, a label must be followed by a statement.)

}

// Families go below this point //
Expand Down
26 changes: 24 additions & 2 deletions Tools/cases_generator/bytecodes_template.c
Original file line number Diff line number Diff line change
@@ -1,12 +1,31 @@
#include "Python.h"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file is only needed when extracting the instructions from ceval.c?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's an implementation detail for extract_cases.py, which is itself redundant once the PR is merged (though I'd like to keep it in the repo for a few weeks to help people who have a pending PR that changes instruction cases).


#include "opcode.h"
#include "pycore_atomic.h"
#include "pycore_frame.h"

#define inst(name, stack_effect) case name:
void _PyFloat_ExactDealloc(PyObject *);
void _PyUnicode_ExactDealloc(PyObject *);

#define SET_TOP(v) (stack_pointer[-1] = (v))
#define GETLOCAL(i) (frame->localsplus[i])

#define inst(name, stack_effect) case name:
#define family(name) static int family_##name

static void
dummy_func(unsigned char opcode)
dummy_func(
PyThreadState *tstate,
_PyInterpreterFrame *frame,
unsigned char opcode,
unsigned int oparg,
_Py_atomic_int * const eval_breaker,
_PyCFrame cframe,
PyObject *names,
PyObject *consts,
_Py_CODEUNIT *next_instr,
PyObject **stack_pointer
)
{
switch (opcode) {

Expand All @@ -15,6 +34,9 @@ dummy_func(unsigned char opcode)
// END BYTECODES //

}
handle_eval_breaker:;
unbound_local_error:;
error:;
}

// Families go below this point //
Expand Down
0