-
-
Notifications
You must be signed in to change notification settings - Fork 8.2k
py: Implement partial PEP-498 (f-string) support #4998
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
py: Implement partial PEP-498 (f-string) support #4998
Conversation
I'm not entirely sure why I'm getting one similar weird case on the CircuitPython PR, so at least this isn't a totally fringe issue. Happy to patch up this PR as needed to get tests passing, but not entirely sure where to start (since I haven't hacked much/at all on uPy core before) |
Wow, thanks for this, it's great to see such progress on implementing this feature! The issue with f-strings is that they introduce more coupling between the parser and lexer, but the way it's implemented here (at the token level) is very neat indeed. This will need a full review, but for now to fix the CI errors:
|
I was able to push a quick fix for that first Looks like |
py/lexer.c
Outdated
|
||
lex->chr0 = lex->vstr_postfix.len > 0 ? lex->vstr_postfix.buf[0] : 0; | ||
lex->chr1 = lex->vstr_postfix.len > 1 ? lex->vstr_postfix.buf[1] : 0; | ||
lex->chr2 = lex->vstr_postfix.len > 2 ? lex->vstr_postfix.buf[2] : 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's probably not necessary to check len
in any of these expressions, and instead just copy in the data. Because at this point vstr_posfix
has at least 3 valid chars (if I understand the code right, at least (
a char and )
). That will save code size.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was my own form of defensiveness (the number of times I've segfaulted C by not guarding things like this is... embarrassing at best). In this PR, vstr_postfix
will always have at least .format(x)
or more. If this postfix is ever used for something else (I'm not sure what else you'd want to syntax transform at a lexer level, but who knows), this defensiveness may be useful, otherwise, happy to scrap it for simplicity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please do remove the unnecessary bits. Code should be as tight as possible (to save code space and be efficient). We have very good tests, coverage and CI so any changes that violate assumptions should be picked up.
py/lexer.c
Outdated
|
||
h0 = lex->chr0; | ||
h1 = lex->chr1; | ||
h2 = lex->chr2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need for temporary h variables, can just write directly to chr3-5.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep - that's now true. A former implementation here used those holding vars (for what, I'm not sure at this point). Good catch.
py/lexer.c
Outdated
lex->chr4 = h1; | ||
lex->chr5 = h2; | ||
|
||
lex->vstr_postfix_idx = lex->vstr_postfix.len > 2 ? 3 : lex->vstr_postfix.len; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there cases where len<3?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See above comment about defensiveness.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please simplify / tighten up the code.
py/lexer.c
Outdated
lex->chr2 = lex->chr5; | ||
lex->chr3 = 0; | ||
lex->chr4 = 0; | ||
lex->chr5 = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it necessary to reset chr3-5 to 0? Could just leave it out (to save code size).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not technically necessary, just something I did to ensure we weren't keeping now-garbage data around. Probably sane to throw this out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove it if it's not strictly needed.
lex->chr5 = 0; | ||
|
||
vstr_reset(&lex->vstr_postfix); | ||
lex->vstr_postfix_idx = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does idx need to be reset? Do the vstr need to be reset?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given the current implementation: I think since I blindly write .format(
and the postfix translations to vstr_postfix
at least the string should be reset/cleared. vstr_postfix_idx
needs reset somewhere, though there may well be better places to do it.
If you're implying it'd be lighter on code size/CPU cycles/memory usage to just maintain a constantly growing vstr
that starts to look roughly like .format(x).format(y).format(z)
, with an ever-increasing vstr_postfix_idx
, I'm totally open to trying that. I'm not sure how expensive the vstr_reset
s are
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reset is cheap, I just thought it might be better to reset it before use (to be guaranteed it's reset), rather than after use (when you may not use it again).
py/parse.c
Outdated
break; | ||
case MP_TOKEN_FSTRING_RAW: | ||
exc = mp_obj_new_exception_msg(&mp_type_NotImplementedError, | ||
"raw f-strings are not implemented"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These error messages are the things that will really increase code size. I don't know how useful it is to have them so detailed, probably all the SyntaxError
ones can be collapsed into a single "malformed f-string"
message.
There is actually support for terse/normal/detailed errors, so at the least the above code should use that feature, and in terse/normal mode just have a single message, and for detailed mode can use all the different messages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did this mostly for compatibility with CPython's error messaging (and admittedly, as a side effect of firing blind at the lexer at first to try to understand how it worked) and I'm in no way attached to it. Happy to cull some strings here (it'll also lighten up the CircuitPython PR which was oversized for one of the translation targets...).
I'll take a look at this error leveling system - sounds neat.
except NotImplementedError: | ||
pass | ||
else: | ||
if sys.implementation.name in {'micropython', 'circuitpython'}: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Best not to rely on this kind of auto-detection. A better place for these not-implemented tests would be tests/misc/non_compliant_lexer.py
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was my hack to ensure I wasn't erroneously throwing RuntimeError
when generating the .exp
file (since CPython does implement raw f-strings - admittedly I probably should at some point, but I wanted to scope this first PR to the probably most useful bits of the PEP).
Given that, do you still recommend moving these tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, probably best to just remove it completely then. If it's not testing something it doesn't need to be in the test. Also not clean to have names of implementations in tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly clarifying responses - I'll submit some code updates this evening / tomorrowish for some/many of these.
py/lexer.c
Outdated
|
||
h0 = lex->chr0; | ||
h1 = lex->chr1; | ||
h2 = lex->chr2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep - that's now true. A former implementation here used those holding vars (for what, I'm not sure at this point). Good catch.
py/lexer.c
Outdated
|
||
lex->chr0 = lex->vstr_postfix.len > 0 ? lex->vstr_postfix.buf[0] : 0; | ||
lex->chr1 = lex->vstr_postfix.len > 1 ? lex->vstr_postfix.buf[1] : 0; | ||
lex->chr2 = lex->vstr_postfix.len > 2 ? lex->vstr_postfix.buf[2] : 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was my own form of defensiveness (the number of times I've segfaulted C by not guarding things like this is... embarrassing at best). In this PR, vstr_postfix
will always have at least .format(x)
or more. If this postfix is ever used for something else (I'm not sure what else you'd want to syntax transform at a lexer level, but who knows), this defensiveness may be useful, otherwise, happy to scrap it for simplicity.
py/lexer.c
Outdated
lex->chr4 = h1; | ||
lex->chr5 = h2; | ||
|
||
lex->vstr_postfix_idx = lex->vstr_postfix.len > 2 ? 3 : lex->vstr_postfix.len; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See above comment about defensiveness.
py/lexer.c
Outdated
lex->chr2 = lex->chr5; | |||
lex->chr3 = 0; | |||
lex->chr4 = 0; | |||
lex->chr5 = 0; |
lex->chr5 = 0; | ||
|
||
vstr_reset(&lex->vstr_postfix); | ||
lex->vstr_postfix_idx = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given the current implementation: I think since I blindly write .format(
and the postfix translations to vstr_postfix
at least the string should be reset/cleared. vstr_postfix_idx
needs reset somewhere, though there may well be better places to do it.
If you're implying it'd be lighter on code size/CPU cycles/memory usage to just maintain a constantly growing vstr
that starts to look roughly like .format(x).format(y).format(z)
, with an ever-increasing vstr_postfix_idx
, I'm totally open to trying that. I'm not sure how expensive the vstr_reset
s are
py/parse.c
Outdated
break; | ||
case MP_TOKEN_FSTRING_RAW: | ||
exc = mp_obj_new_exception_msg(&mp_type_NotImplementedError, | ||
"raw f-strings are not implemented"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did this mostly for compatibility with CPython's error messaging (and admittedly, as a side effect of firing blind at the lexer at first to try to understand how it worked) and I'm in no way attached to it. Happy to cull some strings here (it'll also lighten up the CircuitPython PR which was oversized for one of the translation targets...).
I'll take a look at this error leveling system - sounds neat.
except NotImplementedError: | ||
pass | ||
else: | ||
if sys.implementation.name in {'micropython', 'circuitpython'}: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was my hack to ensure I wasn't erroneously throwing RuntimeError
when generating the .exp
file (since CPython does implement raw f-strings - admittedly I probably should at some point, but I wanted to scope this first PR to the probably most useful bits of the PEP).
Given that, do you still recommend moving these tests?
@dpgeorge Any chance you could take a look at my questions/clarifications so I can get a good vision of how/where I should clean things up ~this weekend? Hate to be a bother but wanting to make sure this gets completed when my time is relatively freer than it becomes in a month or two. Cheers! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I responded to comments. Let me know if there's anything I missed.
py/lexer.c
Outdated
|
||
lex->chr0 = lex->vstr_postfix.len > 0 ? lex->vstr_postfix.buf[0] : 0; | ||
lex->chr1 = lex->vstr_postfix.len > 1 ? lex->vstr_postfix.buf[1] : 0; | ||
lex->chr2 = lex->vstr_postfix.len > 2 ? lex->vstr_postfix.buf[2] : 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please do remove the unnecessary bits. Code should be as tight as possible (to save code space and be efficient). We have very good tests, coverage and CI so any changes that violate assumptions should be picked up.
py/lexer.c
Outdated
lex->chr4 = h1; | ||
lex->chr5 = h2; | ||
|
||
lex->vstr_postfix_idx = lex->vstr_postfix.len > 2 ? 3 : lex->vstr_postfix.len; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please simplify / tighten up the code.
py/lexer.c
Outdated
lex->chr2 = lex->chr5; | ||
lex->chr3 = 0; | ||
lex->chr4 = 0; | ||
lex->chr5 = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove it if it's not strictly needed.
lex->chr5 = 0; | ||
|
||
vstr_reset(&lex->vstr_postfix); | ||
lex->vstr_postfix_idx = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reset is cheap, I just thought it might be better to reset it before use (to be guaranteed it's reset), rather than after use (when you may not use it again).
except NotImplementedError: | ||
pass | ||
else: | ||
if sys.implementation.name in {'micropython', 'circuitpython'}: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, probably best to just remove it completely then. If it's not testing something it doesn't need to be in the test. Also not clean to have names of implementations in tests.
The Literal String Interpolation system could (and should ?) introduce more meaning to the prefixes than cpython does to allow better routing than unconditionnal interning which is very wrong for general purpose:
"standard" cpython use of f-strings is useless since @asottile tool ( which is maintened and covers ALL CASES up to 3.7) is all what you need as a filter in your upload tool or mpy-cross abusing f"" r"" u"" F"" R"" U"" for improving MicroPython efficiency may have no side effect on cpython script broad compatibility ( unlike const() for example ). @klardotsh what do you think ?
|
I'm not entirely sure I follow all of that comment (and some of that is almost certainly my fault on the understanding end) - what I can respond to is: I wouldn't consider "standard cpython use" useless - f-strings are growing more and more common in Python code overall, and they're also being taught to (some) new CPython developers as "the modern way" to format strings. I'd argue that supporting them within MicroPython itself (whether through my own implementation - which I still need to finish patching up: life got very hectic over the past month..., or through a different/possibly smarter implementation) is both the philosophically right way to go, and also the easier way for end developers (forcing upload/mpy-cross filtering through, I'd imagine, a Makefile, is not an "obvious" solution to a developer who hears MicroPython can support F-strings and just wants to write a few in I'm struggling a bit to understand your discussion of |
that's not about variable expansion but string storage itselff ( storage transmitted to format usually inconditionnaly coming from QSTR pool ) "standard use" is of course usefull in a repl. but if its implementation just lead to bigger core with no benefits then why ??? My question is do you think your PR open ( or close i don't know) ways to the points listed ? if you open ways i will read your code with great hopes :) otherwise you would just add more code to maintain on an already faulty design ( i did not want to use that qualifier at first but i'll speak my mind ) and yes it's not directly related to f-string support but you look like the expert there since your work is closely releated to string qualifier ( i kinda get tired of waiting getting answer from other sources ). |
I think "supporting a PEP that is part of the language definition" qualifies as a benefit on its own accord. I am not in any official position to comment on whether or not PEP-498 should or should not have been added to the language spec, which I think is the broader philosophical debate (which I'd rather not entertain). In any event, I am not familiar enough (let alone do I remember the full context I had when I worked on this PR originally - it's been a few weeks) with the string storage mechanisms in MP core to comment on the benefits of moving things around and/or storing them differently. My gut tells me that splitting f-strings out individually from standard strings (which I think is part of what your suggestions include?) re-introduces the parser overhead for strings (which, as mentioned in this PR description, was removed, with strings moved to the lexer, for performance and code size reasons). I'm going to defer the remainder of those discussions / decisions to the core crew since I don't, at this time, have the knowledge to make such a decision myself :) Regarding the status of this PR - it's still open, and I'm open to ideas to make it better, while sticking to the spirit of PEP-498 as written by the PEP authors. I'm not certain when I'll get the feedback items from Damien handled (should be soon), but once those are in, I'm open to those extended discussions. |
Compile overhead is always a trade off with runtime speed/ram requirements.
neither am i so many thanks for the overview. i'm a big fan of f-strings anyway i modified https://github.com/asottile/future-fstrings just for micropython use https://github.com/pmp-p/fstrings_helper a long before future-fstrings[rewrite] made it easy to use for that purpose But i'm more concerned with overrall string support than philosophical debate too : so just maybe fix and improve things before adding a layer upon ( "things" is currently only targeting 3.4 and is not yet there ) if outside the version targeting context then don't forget to implement https://bugs.python.org/issue36817 cf asottile-archive/future-fstrings#40 |
Thanks for clarifying the "unconditional" part. There seem to be two different things being discussed here -- interning of literals and interning of runtime values. Lets look at literals first:
Currently (on all ports), only strings shorter than 10 chars are interned. Look at As was already explained in #4422 short literal strings should always be interned, because the alternative is that they take up a GC block (four words / 16 bytes). QSTRs are packed, which is why they're efficient. Off the top of my head I forget the overhead for storing a QSTR (it's at least the two byte hash), so this is where the 10 byte threshold comes from and why it's less than 16 bytes. (The heap-allocated non-interned len>10 literal strings do get packed when they end up in a .mpy file, but I'm not sure this is relevant to this discussion -- when running a .py file, they're heap allocated, and also they get unpacked into the heap when loading the .mpy too). So there's no need for the user to be able to customise interning for literal strings. You might be able to make a case for doing even more aggressive interning in mpy-cross where code size isn't relevant -- it could go searching for duplicates and use that to influence whether a string could be interned, rather than just looking at its length.
OK this example is a bit far-fetched, but I understand the point and how many programs could end up in a situation like this. But it's all about tradeoffs, more complexity is more code. There are probably some good heuristics that could be applied general though, and not just for the
I think this goes against the spirit of MicroPython. There are lots of other features that could have been implemented with a preprocessor. But being able to just copy some existing Python code and run it is a really important feature. "Oh sometimes it works but sometimes you have to use this tool" gets confusing. |
i wonder what the said user would say to you on a more general purpose channel like IRC . /ignore My questions about these PR are still :
note to self: |
I'll only address two aspects of these latest comments - I don't believe "pretend MicroPython 3.4 is perfect and can go embrace blindly 3.6 goals" is an argument in good faith here.
Finally, for the second aspect - I'd also argue "so MP gets as crappy as cpython" is beyond even arguing in bad faith, that's simply an attack on another project that is also written by lots of volunteers. It's fine to disagree with how CPython implements or does not implement various features, it's far less appropriate to attack them in an unrelated pull request to an entirely different distribution of Python. I have no interest in engaging further with this discussion if the discussion can't stay diplomatic. I'll patch up the PR based on the feedback from Damien and work with him and the rest of the development team towards a potential merge, or towards an alternative implementation, or towards outright closing the idea if they decide PEP-498 isn't desired, but I do not have the mental space to entertain bad faith arguments and/or attacks. |
@klardotsh Sorry if you did not notice but you are not the one who got attacked here. That's not the first time and i'm getting bored asking myself to what point. i came HERE and to YOU despite the continuous noise around to possibly extend your PR to get something even more usefull than just ... f-strings. Because i for one know not only preprocessors can give that but those exists already for more than a year and now are dead simple to use as pip install. if core size growing also fixes some other problems => then you get better acceptance from everyone. As a matter of fact lots of people don't care about f-string at all or have feelings against them like assignment expressions. i'm trying to layout a base ( for a parser plugin system ?) to avoid further more discussions on those hot subjects. And by everyone i don't mean just "Damien [ and the rest of the development team]" who are @dpgeorge and .... well at this point maybe show us to a list of experts and core dev because i don't know them all by their first names. (i saw a Paul @pfalcon once, but Scott ?? ) in case you did not notice that too, there are (still some) CONTRIBUTORS around and maybe USERS that may potentially become more Discarding of those would surely make someone's ways suspicious. But obviously some don't want me to stick my nose in anything neither help discussions to open ways for real solutions fixing real problems or even trying to find people to work with here. "lock in" may have a different meaning, for me it is in the context of being "rude to go rampage someone's code in core AFTER his PR " => code gets locked in. Also i changed the wording ( english is not native tongue of quite a lot of people on this crowded earth may i remind). I don't know the exact english difference beetween all the qualifiers that come in mind for a VM that requires 16 MiB to startup asyncio even with threads and multiprocessing removed. But you can't deny where getting more speed/features by trashing both rom and ram has lead cpython ( which i run on both android and emscripten while maintaining patches for that since pre 3.7) and this is ( was?) a tradition on MicroPython to try to prevent that but maybe i don't have "the spirit of MicroPython ©".
Neither do i so please stick around my questions, there #4998 (comment). and yes i'm quite sure those are relative to your PR Please just tell i'll go find somewhere where it is not. edit/ i've re-read your reply recently and something greatly bothers me so i'll add that too:
FYI i'm on a lot of Python VM projects. I don't give you the right to represent the voice of volunters on any project and what they may think or not from the respective projects goals, attitude and governance. i already made that aggressive remark to @pfalcon in forum. I intend to keep my FREE SPEECH rights. Please keep things technical. |
191f478
to
4969832
Compare
I can try to address the coveralls failure. Unsure how to fix the |
You have a test, but maybe it doesn't test all new code paths?
It depends, on a case-by-case basis. For this case I'd say that f-string support should be an optional (at compile-item) feature. It might be difficult (and not sensible) to to fully feature flag all the changes here, but it'd probably be a good idea to put the big chunks of new code behind something like |
@klardotsh if you don't have time to make the final changes I'm happy to take it from here myself. |
Been using this for a while in production on Windows port and no problems so far. Which is problematic beacuse it's really hard to go back to other string formatting types :) |
Oh goodness, thank $DEITY for Github email notifications reminding me I never patched the coveralls failure. At work right now but let me see if I can get a diff up tonight to finish this PR - I've got some time and this should be ~an hour or two to take over the finish line. Glad it's working well for you @stinos! |
Thanks for your excellent work @klardotsh! What's left to do before this can be merged? I've recently been trying to port a couple of regular Python libraries and one of the first stumbling blocks is working around f-strings use; It'd be great to see f-strings supported in MicroPython! |
@mattytrentini , you can use future-fstrings[rewrite] tool from https://pypi.org/project/future-fstrings prior porting to allow any Python 3 to use libraries. |
4969832
to
6e0f83b
Compare
4ad8eb4
to
c1d5a93
Compare
As of now the branch is cleanly rebased onto master, and I went ahead and simplified down the numerous error cases into just two to save on token space, as well as gated the entire feature behind Still chasing down the coverage/cmdtree things - replicating this locally is a bit fun. |
Also - not sure how to track down these AppVeyor failures. I can't repro locally - |
This implements (most of) the PEP-498 spec for f-strings, with two exceptions: - raw f-strings (`fr` or `rf` prefixes) raise `NotImplementedError` - one special corner case does not function as specified in the PEP (more on that in a moment) This is implemented in the core as a syntax translation, brute-forcing all f-strings to run through `String.format`. For example, the statement `x='world'; print(f'hello {x}')` gets translated *at a syntax level* (injected into the lexer) to `x='world'; print('hello {}'.format(x))`. While this may lead to weird column results in tracebacks, it seemed like the fastest, most efficient, and *likely* most RAM-friendly option, despite being implemented under the hood with a completely separate `vstr_t`. Since [string concatenation of adjacent literals is implemented in the lexer](micropython@534b7c3), two side effects emerge: - All strings with at least one f-string portion are concatenated into a single literal which *must* be run through `String.format()` wholesale, and: - Concatenation of a raw string with interpolation characters with an f-string will cause `IndexError`/`KeyError`, which is both different from CPython *and* different from the corner case mentioned in the PEP (which gave an example of the following:) ```python x = 10 y = 'hi' assert ('a' 'b' f'{x}' '{c}' f'str<{y:^4}>' 'd' 'e') == 'ab10{c}str< hi >de' ``` The above-linked commit detailed a pretty solid case for leaving string concatenation in the lexer rather than putting it in the parser, and undoing that decision would likely be disproportionately costly on resources for the sake of a probably-low-impact corner case. An alternative to become complaint with this corner case of the PEP would be to revert to string concatenation in the parser *only when an f-string is part of concatenation*, though I've done no investigation on the difficulty or costs of doing this. A decent set of tests is included. I've manually tested this on the `unix` port on Linux and on a Feather M4 Express (`atmel-samd`) and things seem sane.
c1d5a93
to
600051d
Compare
@@ -37,6 +37,7 @@ | |||
#define MICROPY_PY_ARRAY (0) | |||
#define MICROPY_PY_ATTRTUPLE (0) | |||
#define MICROPY_PY_COLLECTIONS (0) | |||
#define MICROPY_PY_FSTRING (0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the default is disabled this line doesn't need to be here
@@ -291,15 +334,69 @@ STATIC void parse_string_literal(mp_lexer_t *lex, bool is_raw) { | |||
} | |||
|
|||
size_t n_closing = 0; | |||
# if MICROPY_PY_FSTRING |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There shouldn't be a space between the #
and the if
(same for endif below, and others below)
@@ -172,6 +177,9 @@ typedef struct _mp_lexer_t { | |||
size_t tok_column; // token source column | |||
mp_token_kind_t tok_kind; // token kind | |||
vstr_t vstr; // token data | |||
vstr_t vstr_postfix; // postfix to apply to string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably should guard these additions in #if MICROPY_PY_FSTRING
because otherwise they'll unnecessarily use up RAM. Then also need to guard all code that uses these new members (hopefully that's not too messy...)
@klardotsh thanks for coming back to this! You'll see that the CI is now a bit stricter, eg there is code formatting to follow. To get the new f-string test to be handled correctly by ports with f-strings disabled (eg qemu-arm test on the CI) it'll need to have a new feature-check script. Eg make a script called f"fstring" Then add some code to |
This implements (most of) the PEP-498 spec for f-strings and is based on micropython#4998 by @klardotsh. It is implemented in the lexer as a syntax translation to `str.format`: f"{a}" --> "{}".format(a) It also supports: f"{a=}" --> "a={}".format(a) This is done by extracting the arguments into a temporary vstr buffer, then after the string has been tokenized, the lexer input queue is saved and the contents of the temporary vstr buffer are injected ito the lexer instead. There are four main limitations: - raw f-strings (`fr` or `rf` prefixes) are not supported and will raise `SyntaxError: raw f-strings are not supported`. - literal concatenation of f-strings with adjacent strings will fail "{}" f"{a}" --> "{}{}".format(a) (str.format will incorrectly use the braces from the non-f-string) f"{a}" f"{a}" --> "{}".format(a) "{}".format(a) (cannot concatenate) - PEP-498 requires the full parser to understand the interpolated argument, however because this entirely runs in the lexer it cannot resolve nested braces in expressions like f"{'}'}" - The !r, !s, and !a conversions are not supported. Includes tests and cpydiffs.
This implements (most of) the PEP-498 spec for f-strings and is based on micropython#4998 by @klardotsh. It is implemented in the lexer as a syntax translation to `str.format`: f"{a}" --> "{}".format(a) It also supports: f"{a=}" --> "a={}".format(a) This is done by extracting the arguments into a temporary vstr buffer, then after the string has been tokenized, the lexer input queue is saved and the contents of the temporary vstr buffer are injected ito the lexer instead. There are four main limitations: - raw f-strings (`fr` or `rf` prefixes) are not supported and will raise `SyntaxError: raw f-strings are not supported`. - literal concatenation of f-strings with adjacent strings will fail "{}" f"{a}" --> "{}{}".format(a) (str.format will incorrectly use the braces from the non-f-string) f"{a}" f"{a}" --> "{}".format(a) "{}".format(a) (cannot concatenate) - PEP-498 requires the full parser to understand the interpolated argument, however because this entirely runs in the lexer it cannot resolve nested braces in expressions like f"{'}'}" - The !r, !s, and !a conversions are not supported. Includes tests and cpydiffs. Signed-off-by: Jim Mussared <jim.mussared@gmail.com>
This implements (most of) the PEP-498 spec for f-strings and is based on micropython#4998 by @klardotsh. It is implemented in the lexer as a syntax translation to `str.format`: f"{a}" --> "{}".format(a) It also supports: f"{a=}" --> "a={}".format(a) This is done by extracting the arguments into a temporary vstr buffer, then after the string has been tokenized, the lexer input queue is saved and the contents of the temporary vstr buffer are injected ito the lexer instead. There are four main limitations: - raw f-strings (`fr` or `rf` prefixes) are not supported and will raise `SyntaxError: raw f-strings are not supported`. - literal concatenation of f-strings with adjacent strings will fail "{}" f"{a}" --> "{}{}".format(a) (str.format will incorrectly use the braces from the non-f-string) f"{a}" f"{a}" --> "{}".format(a) "{}".format(a) (cannot concatenate) - PEP-498 requires the full parser to understand the interpolated argument, however because this entirely runs in the lexer it cannot resolve nested braces in expressions like f"{'}'}" - The !r, !s, and !a conversions are not supported. Includes tests and cpydiffs. Signed-off-by: Jim Mussared <jim.mussared@gmail.com>
This implements (most of) the PEP-498 spec for f-strings and is based on micropython#4998 by @klardotsh. It is implemented in the lexer as a syntax translation to `str.format`: f"{a}" --> "{}".format(a) It also supports: f"{a=}" --> "a={}".format(a) This is done by extracting the arguments into a temporary vstr buffer, then after the string has been tokenized, the lexer input queue is saved and the contents of the temporary vstr buffer are injected ito the lexer instead. There are four main limitations: - raw f-strings (`fr` or `rf` prefixes) are not supported and will raise `SyntaxError: raw f-strings are not supported`. - literal concatenation of f-strings with adjacent strings will fail "{}" f"{a}" --> "{}{}".format(a) (str.format will incorrectly use the braces from the non-f-string) f"{a}" f"{a}" --> "{}".format(a) "{}".format(a) (cannot concatenate) - PEP-498 requires the full parser to understand the interpolated argument, however because this entirely runs in the lexer it cannot resolve nested braces in expressions like f"{'}'}" - The !r, !s, and !a conversions are not supported. Includes tests and cpydiffs. Signed-off-by: Jim Mussared <jim.mussared@gmail.com>
This implements (most of) the PEP-498 spec for f-strings and is based on micropython#4998 by @klardotsh. It is implemented in the lexer as a syntax translation to `str.format`: f"{a}" --> "{}".format(a) It also supports: f"{a=}" --> "a={}".format(a) This is done by extracting the arguments into a temporary vstr buffer, then after the string has been tokenized, the lexer input queue is saved and the contents of the temporary vstr buffer are injected ito the lexer instead. There are four main limitations: - raw f-strings (`fr` or `rf` prefixes) are not supported and will raise `SyntaxError: raw f-strings are not supported`. - literal concatenation of f-strings with adjacent strings will fail "{}" f"{a}" --> "{}{}".format(a) (str.format will incorrectly use the braces from the non-f-string) f"{a}" f"{a}" --> "{}".format(a) "{}".format(a) (cannot concatenate) - PEP-498 requires the full parser to understand the interpolated argument, however because this entirely runs in the lexer it cannot resolve nested braces in expressions like f"{'}'}" - The !r, !s, and !a conversions are not supported. Includes tests and cpydiffs. Signed-off-by: Jim Mussared <jim.mussared@gmail.com>
This implements (most of) the PEP-498 spec for f-strings and is based on #4998 by @klardotsh. It is implemented in the lexer as a syntax translation to `str.format`: f"{a}" --> "{}".format(a) It also supports: f"{a=}" --> "a={}".format(a) This is done by extracting the arguments into a temporary vstr buffer, then after the string has been tokenized, the lexer input queue is saved and the contents of the temporary vstr buffer are injected into the lexer instead. There are four main limitations: - raw f-strings (`fr` or `rf` prefixes) are not supported and will raise `SyntaxError: raw f-strings are not supported`. - literal concatenation of f-strings with adjacent strings will fail "{}" f"{a}" --> "{}{}".format(a) (str.format will incorrectly use the braces from the non-f-string) f"{a}" f"{a}" --> "{}".format(a) "{}".format(a) (cannot concatenate) - PEP-498 requires the full parser to understand the interpolated argument, however because this entirely runs in the lexer it cannot resolve nested braces in expressions like f"{'}'}" - The !r, !s, and !a conversions are not supported. Includes tests and cpydiffs. Signed-off-by: Jim Mussared <jim.mussared@gmail.com>
This was merged with modifications in 692d36d |
This implements (most of) the PEP-498 spec for f-strings, with two
exceptions:
fr
orrf
prefixes) raiseNotImplementedError
(more on that in a moment)
This is implemented in the core as a syntax translation, brute-forcing
all f-strings to run through
String.format
. For example, the statementx='world'; print(f'hello {x}')
gets translated at a syntax level(injected into the lexer) to
x='world'; print('hello {}'.format(x))
.While this may lead to weird column results in tracebacks, it seemed
like the fastest, most efficient, and likely most RAM-friendly option,
despite being implemented under the hood with a completely separate
vstr_t
.Since string concatenation of adjacent literals is implemented in the
lexer,
two side effects emerge:
single literal which must be run through
String.format()
wholesale,and:
f-string will cause
IndexError
/KeyError
, which is both differentfrom CPython and different from the corner case mentioned in the PEP
(which gave an example of the following:)
The above-linked commit detailed a pretty solid case for leaving string
concatenation in the lexer rather than putting it in the parser, and
undoing that decision would likely be disproportionately costly on
resources for the sake of a probably-low-impact corner case. An
alternative to become complaint with this corner case of the PEP would
be to revert to string concatenation in the parser only when an
f-string is part of concatenation, though I've done no investigation on
the difficulty or costs of doing this.
A decent set of tests is included. I've manually tested this on the
unix
port on Linux andthings seem sane.