8000 py: Implement partial PEP-498 (f-string) support by klardotsh · Pull Request #4998 · micropython/micropython · GitHub
[go: up one dir, main page]

Skip to content

py: Implement partial PEP-498 (f-string) support #4998

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

klardotsh
Copy link
@klardotsh klardotsh commented Aug 11, 2019

This is a near-identical PR to adafruit#2054, which is where I originally implemented this functionality. Since none of it (except the localization stuff) was CircuitPython-specific, I'm opening this up so upstream can benefit as well. The actual diff between the two PRs is entirely translation-related.

This refs #2415

This implements (most of) the PEP-498 spec for f-strings, with two
exceptions:

  • raw f-strings (fr or rf prefixes) raise NotImplementedError
  • one special corner case does not function as specified in the PEP
    (more on that in a moment)

This is implemented in the core as a syntax translation, brute-forcing
all f-strings to run through String.format. For example, the statement
x='world'; print(f'hello {x}') gets translated at a syntax level
(injected into the lexer) to x='world'; print('hello {}'.format(x)).
While this may lead to weird column results in tracebacks, it seemed
like the fastest, most efficient, and likely most RAM-friendly option,
despite being implemented under the hood with a completely separate
vstr_t.

Since string concatenation of adjacent literals is implemented in the
lexer
,
two side effects emerge:

  • All strings with at least one f-string portion are concatenated into a
    single literal which must be run through String.format() wholesale,
    and:
  • Concatenation of a raw string with interpolation characters with an
    f-string will cause IndexError/KeyError, which is both different
    from CPython and different from the corner case mentioned in the PEP
    (which gave an example of the following:)
x = 10
y = 'hi'
assert ('a' 'b' f'{x}' '{c}' f'str<{y:^4}>' 'd' 'e') == 'ab10{c}str< hi >de'

The above-linked commit detailed a pretty solid case for leaving string
concatenation in the lexer rather than putting it in the parser, and
undoing that decision would likely be disproportionately costly on
resources for the sake of a probably-low-impact corner case. An
alternative to become complaint with this corner case of the PEP would
be to revert to string concatenation in the parser only when an
f-string is part of concatenation
, though I've done no investigation on
the difficulty or costs of doing this.

A decent set of tests is included. I've manually tested this on the
unix port on Linux and
things seem sane.

@klardotsh
Copy link
Author

I'm not entirely sure why SyntaxErrors are being thrown here. The unix port in particular is what this was built on, and at that, it looks like some test runs are passing, others didn't seem to build the feature at all.

I'm getting one similar weird case on the CircuitPython PR, so at least this isn't a totally fringe issue. Happy to patch up this PR as needed to get tests passing, but not entirely sure where to start (since I haven't hacked much/at all on uPy core before)

@dpgeorge
Copy link
Member

Wow, thanks for this, it's great to see such progress on implementing this feature!

The issue with f-strings is that they introduce more coupling between the parser and lexer, but the way it's implemented here (at the token level) is very neat indeed.

This will need a full review, but for now to fix the CI errors:

  • generate locally the tests/basics/string_pep498_fstring.py.exp file by running the test through CPython 3.6+, and commit that .exp file in this PR
  • fix tests/cmdline/cmd_parsetree.py.exp to match what is actually output; tok(4)->tok(10) and tok(14)->tok(20)

@klardotsh
Copy link
Author
klardotsh commented Aug 13, 2019

I was able to push a quick fix for that first .exp file thing, thanks for the heads up on how that works! I'll be able to fix up the parsetree one some time this evening probably.

Looks like bare-arm is failing as well because the code size increased ~800 bytes. Not sure how to tell it "that's fine because the language spec increased", or if I should (as ended up discussed in the CircuitPython PR, where CN-localized SAMD21 builds are now oversized) feature flag this - makes an inconsistent UX if different targets do/don't have f-strings, but given the implementation that may be necessary.

py/lexer.c Outdated

lex->chr0 = lex->vstr_postfix.len > 0 ? lex->vstr_postfix.buf[0] : 0;
lex->chr1 = lex->vstr_postfix.len > 1 ? lex->vstr_postfix.buf[1] : 0;
lex->chr2 = lex->vstr_postfix.len > 2 ? lex->vstr_postfix.buf[2] : 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's probably not necessary to check len in any of these expressions, and instead just copy in the data. Because at this point vstr_posfix has at least 3 valid chars (if I understand the code right, at least ( a char and )). That will save code size.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was my own form of defensiveness (the number of times I've segfaulted C by not guarding things like this is... embarrassing at best). In this PR, vstr_postfix will always have at least .format(x) or more. If this postfix is ever used for something else (I'm not sure what else you'd want to syntax transform at a lexer level, but who knows), this defensiveness may be useful, otherwise, happy to scrap it for simplicity.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please do remove the unnecessary bits. Code should be as tight as possible (to save code space and be efficient). We have very good tests, coverage and CI so any changes that violate assumptions should be picked up.

py/lexer.c Outdated

h0 = lex->chr0;
h1 = lex->chr1;
h2 = lex->chr2;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for temporary h variables, can just write directly to chr3-5.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep - that's now true. A former implementation here used those holding vars (for what, I'm not sure at this point). Good catch.

py/lexer.c Outdated
lex->chr4 = h1;
lex->chr5 = h2;

lex->vstr_postfix_idx = lex->vstr_postfix.len > 2 ? 3 : lex->vstr_postfix.len;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there cases where len<3?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above comment about defensiveness.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please simplify / tighten up the code.

py/lexer.c Outdated
lex->chr2 = lex->chr5;
lex->chr3 = 0;
lex->chr4 = 0;
lex->chr5 = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it necessary to reset chr3-5 to 0? Could just leave it out (to save code size).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not technically necessary, just something I did to ensure we weren't keeping now-garbage data around. Probably sane to throw this out.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove it if it's not strictly needed.

lex->chr5 = 0;

vstr_reset(&lex->vstr_postfix);
lex->vstr_postfix_idx = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does idx need to be reset? Do the vstr need to be reset?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the current implementation: I think since I blindly write .format( and the postfix translations to vstr_postfix at least the string should be reset/cleared. vstr_postfix_idx needs reset somewhere, though there may well be better places to do it.

If you're implying it'd be lighter on code size/CPU cycles/memory usage to just maintain a constantly growing vstr that starts to look roughly like .format(x).format(y).format(z), with an ever-increasing vstr_postfix_idx, I'm totally open to trying that. I'm not sure how expensive the vstr_resets are

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reset is cheap, I just thought it might be better to reset it before use (to be guaranteed it's reset), rather than after use (when you may not use it again).

py/parse.c Outdated
break;
case MP_TOKEN_FSTRING_RAW:
exc = mp_obj_new_exception_msg(&mp_type_NotImplementedError,
"raw f-strings are not implemented");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These error messages are the things that will really increase code size. I don't know how useful it is to have them so detailed, probably all the SyntaxError ones can be collapsed into a single "malformed f-string" message.

There is actually support for terse/normal/detailed errors, so at the least the above code should use that feature, and in terse/normal mode just have a single message, and for detailed mode can use all the different messages.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did this mostly for compatibility with CPython's error messaging (and admittedly, as a side effect of firing blind at the lexer at first to try to understand how it worked) and I'm in no way attached to it. Happy to cull some strings here (it'll also lighten up the CircuitPython PR which was oversized for one of the translation targets...).

I'll take a look at this error leveling system - sounds neat.

except NotImplementedError:
pass
else:
if sys.implementation.name in {'micropython', 'circuitpython'}:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Best not to rely on this kind of auto-detection. A better place for these not-implemented tests would be tests/misc/non_compliant_lexer.py.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was my hack to ensure I wasn't erroneously throwing RuntimeError when generating the .exp file (since CPython does implement raw f-strings - admittedly I probably should at some point, but I wanted to scope this first PR to the probably most useful bits of the PEP).

Given that, do you still recommend moving these tests?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, probably best to just remove it completely then. If it's not testing something it doesn't need to be in the test. Also not clean to have names of implementations in tests.

Copy link
Author
@klardotsh klardotsh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly clarifying responses - I'll submit some code updates this evening / tomorrowish for some/many of these.

py/lexer.c Outdated

h0 = lex->chr0;
h1 = lex->chr1;
h2 = lex->chr2;
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep - that's now true. A former implementation here used those holding vars (for what, I'm not sure at this point). Good catch.

py/lexer.c Outdated

lex->chr0 = lex->vstr_postfix.len > 0 ? lex->vstr_postfix.buf[0] : 0;
lex->chr1 = lex->vstr_postfix.len > 1 ? lex->vstr_postfix.buf[1] : 0;
lex->chr2 = lex->vstr_postfix.len > 2 ? lex->vstr_postfix.buf[2] : 0;
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was my own form of defensiveness (the number of times I've segfaulted C by not guarding things like this is... embarrassing at best). In this PR, vstr_postfix will always have at least .format(x) or more. If this postfix is ever used for something else (I'm not sure what else you'd want to syntax transform at a lexer level, but who knows), this defensiveness may be useful, otherwise, happy to scrap it for simplicity.

py/lexer.c Outdated
lex->chr4 = h1;
lex->chr5 = h2;

lex->vstr_postfix_idx = lex->vstr_postfix.len > 2 ? 3 : lex->vstr_postfix.len;
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above comment about defensiveness.

py/lexer.c Outdated
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not technically necessary, just something I did to ensure we weren't keeping now-garbage data around. Probably sane to throw this out.

lex->chr2 = lex->chr5;
lex->chr3 = 0;
lex->chr4 = 0;
lex->chr5 = 0;
lex->chr5 = 0;

vstr_reset(&lex->vstr_postfix);
lex->vstr_postfix_idx = 0;
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the current implementation: I think since I blindly write .format( and the postfix translations to vstr_postfix at least the string should be reset/cleared. vstr_postfix_idx needs reset somewhere, though there may well be better places to do it.

If you're implying it'd be lighter on code size/CPU cycles/memory usage to just maintain a constantly growing vstr that starts to look roughly like .format(x).format(y).format(z), with an ever-increasing vstr_postfix_idx, I'm totally open to trying that. I'm not sure how expensive the vstr_resets are

py/parse.c Outdated
break;
case MP_TOKEN_FSTRING_RAW:
exc = mp_obj_new_exception_msg(&mp_type_NotImplementedError,
"raw f-strings are not implemented");
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did this mostly for compatibility with CPython's error messaging (and admittedly, as a side effect of firing blind at the lexer at first to try to understand how it worked) and I'm in no way attached to it. Happy to cull some strings here (it'll also lighten up the CircuitPython PR which was oversized for one of the translation targets...).

I'll take a look at this error leveling system - sounds neat.

except NotImplementedError:
pass
else:
if sys.implementation.name in {'micropython', 'circuitpython'}:
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was my hack to ensure I wasn't erroneously throwing RuntimeError when generating the .exp file (since CPython does implement raw f-strings - admittedly I probably should at some point, but I wanted to scope this first PR to the probably most useful bits of the PEP).

Given that, do you still recommend moving these tests?

@klardotsh
Copy link
Author

@dpgeorge Any chance you could take a look at my questions/clarifications so I can get a good vision of how/where I should clean things up ~this weekend? Hate to be a bother but wanting to make sure this gets completed when my time is relatively freer than it becomes in a month or two. Cheers!

Copy link
Member
@dpgeorge dpgeorge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I responded to comments. Let me know if there's anything I missed.

py/lexer.c Outdated

lex->chr0 = lex->vstr_postfix.len > 0 ? lex->vstr_postfix.buf[0] : 0;
lex->chr1 = lex->vstr_postfix.len > 1 ? lex->vstr_postfix.buf[1] : 0;
lex->chr2 = lex->vstr_postfix.len > 2 ? lex->vstr_postfix.buf[2] : 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please do remove the unnecessary bits. Code should be as tight as possible (to save code space and be efficient). We have very good tests, coverage and CI so any changes that violate assumptions should be picked up.

py/lexer.c Outdated
lex->chr4 = h1;
lex->chr5 = h2;

lex->vstr_postfix_idx = lex->vstr_postfix.len > 2 ? 3 : lex->vstr_postfix.len;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please simplify / tighten up the code.

py/lexer.c Outdated
lex->chr2 = lex->chr5;
lex->chr3 = 0;
lex->chr4 = 0;
lex->chr5 = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove it if it's not strictly needed.

lex->chr5 = 0;

vstr_reset(&lex->vstr_postfix);
lex->vstr_postfix_idx = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reset is cheap, I just thought it might be better to reset it before use (to be guaranteed it's reset), rather than after use (when you may not use it again).

except NotImplementedError:
pass
else:
if sys.implementation.name in {'micropython', 'circuitpython'}:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, probably best to just remove it completely then. If it's not testing something it doesn't need to be in the test. Also not clean to have names of implementations in tests.

@jimmo jimmo mentioned this pull request Sep 20, 2019
53 tasks
@pmp-p
Copy link
Contributor
pmp-p commented Sep 22, 2019

The Literal String Interpolation system could (and should ?) introduce more meaning to the prefixes than cpython does to allow better routing than unconditionnal interning which is very wrong for general purpose:

  • f"" an interned always string type , can be used with operators. => runtime

  • (r?)"" an huffman compressed string type only for display/format using unicode PUA for dict entries ( that can help solve complex script display in gui mode ). => import-time.
    potential use case Unicode Bidirectional Algorithm lvgl/lvgl#899 (comment) or gettext like system

  • u"" explicit non interned.

  • globals() import time should be interned always and filtered accuratly for QSTR merge.

  • locals() runtime should be non interned it is up to user to use f"" ( think as F / flashrom string from arduino ) to prevent collection.

"standard" cpython use of f-strings is useless since @asottile tool ( which is maintened and covers ALL CASES up to 3.7) is all what you need as a filter in your upload tool or mpy-cross
with no rom/ram or cpu cost on any port !

abusing f"" r"" u"" F"" R"" U"" for improving MicroPython efficiency may have no side effect on cpython script broad compatibility ( unlike const() for example ).

@klardotsh what do you think ?

#### technical note for @jimmo 
micropython -X heapsize=32K
#### why unconditionnal string interning is actually bad ?
for i in range(65536):chr(i)
#### cf https://github.com/micropython/micropython/issues/4422 
#### and this is NOT a corner case it's the pinpointing of a incomplete design, the proposed fix is long term useless

@klardotsh
Copy link
Author

I'm not entirely sure I follow all of that comment (and some of that is almost certainly my fault on the understanding end) - what I can respond to is: I wouldn't consider "standard cpython use" useless - f-strings are growing more and more common in Python code overall, and they're also being taught to (some) new CPython developers as "the modern way" to format strings. I'd argue that supporting them within MicroPython itself (whether through my own implementation - which I still need to finish patching up: life got very hectic over the past month..., or through a different/possibly smarter implementation) is both the philosophically right way to go, and also the easier way for end developers (forcing upload/mpy-cross filtering through, I'd imagine, a Makefile, is not an "obvious" solution to a developer who hears MicroPython can support F-strings and just wants to write a few in main.py or the REPL or whatever)

I'm struggling a bit to understand your discussion of globals and locals - the PEP explicitly states that those variables are not to be used directly in an implementation, so I think I'm misunderstanding something in your comment?

@pmp-p
Copy link
Contributor
pmp-p commented Sep 22, 2019

globals and locals

that's not about variable expansion but string storage itselff ( storage transmitted to format usually inconditionnaly coming from QSTR pool )

"standard use" is of course usefull in a repl. but if its implementation just lead to bigger core with no benefits then why ???

My question is do you think your PR open ( or close i don't know) ways to the points listed ?

if you open ways i will read your code with great hopes :)

otherwise you would just add more code to maintain on an already faulty design ( i did not want to use that qualifier at first but i'll speak my mind )

and yes it's not directly related to f-string support but you look like the expert there since your work is closely releated to string qualifier ( i kinda get tired of waiting getting answer from other sources ).

@klardotsh
Copy link
Author

I think "supporting a PEP that is part of the language definition" qualifies as a benefit on its own accord. I am not in any official position to comment on whether or not PEP-498 should or should not have been added to the language spec, which I think is the broader philosophical debate (which I'd rather not entertain).

In any event, I am not familiar enough (let alone do I remember the full context I had when I worked on this PR originally - it's been a few weeks) with the string storage mechanisms in MP core to comment on the benefits of moving things around and/or storing them differently. My gut tells me that splitting f-strings out individually from standard strings (which I think is part of what your suggestions include?) re-introduces the parser overhead for strings (which, as mentioned in this PR description, was removed, with strings moved to the lexer, for performance and code size reasons). I'm going to defer the remainder of those discussions / decisions to the core crew since I don't, at this time, have the knowledge to make such a decision myself :)

Regarding the status of this PR - it's still open, and I'm open to ideas to make it better, while sticking to the spirit of PEP-498 as written by the PEP authors. I'm not certain when I'll get the feedback items from Damien handled (should be soon), but once those are in, I'm open to those extended discussions.

@pmp-p
Copy link
Contributor
pmp-p commented Sep 22, 2019

re-introduces the parser overhead for strings

Compile overhead is always a trade off with runtime speed/ram requirements.
but MicroPython should favour balance toward ram as speed can be reached with libraries.

I am not familiar enough (...) with the string storage mechanisms in MP core

neither am i so many thanks for the overview.

i'm a big fan of f-strings anyway i modified https://github.com/asottile/future-fstrings just for micropython use https://github.com/pmp-p/fstrings_helper a long before future-fstrings[rewrite] made it easy to use for that purpose

But i'm more concerned with overrall string support than philosophical debate too : so just maybe fix and improve things before adding a layer upon ( "things" is currently only targeting 3.4 and is not yet there )

if outside the version targeting context then don't forget to implement https://bugs.python.org/issue36817 cf asottile-archive/future-fstrings#40

@jimmo
Copy link
Member
jimmo commented Sep 22, 2019

technical note for @jimmo
why unconditionnal string interning is actually bad ?

Thanks for clarifying the "unconditional" part.

There seem to be two different things being discussed here -- interning of literals and interning of runtime values. Lets look at literals first:

The Literal String Interpolation system could (and should ?) introduce more meaning to the prefixes than cpython ...

Currently (on all ports), only strings shorter than 10 chars are interned. Look at MICROPY_ALLOC_PARSE_INTERN_STRING_LEN. You can verify this yourself using micropython.qstr_info(1)

As was already explained in #4422 short literal strings should always be interned, because the alternative is that they take up a GC block (four words / 16 bytes). QSTRs are packed, which is why they're efficient. Off the top of my head I forget the overhead for storing a QSTR (it's at least the two byte hash), so this is where the 10 byte threshold comes from and why it's less than 16 bytes.

(The heap-allocated non-interned len>10 literal strings do get packed when they end up in a .mpy file, but I'm not sure this is relevant to this discussion -- when running a .py file, they're heap allocated, and also they get unpacked into the heap when loading the .mpy too).

So there's no need for the user to be able to customise interning for literal strings.

You might be able to make a case for doing even more aggressive interning in mpy-cross where code size isn't relevant -- it could go searching for duplicates and use that to influence whether a string could be interned, rather than just looking at its length.

for i in range(65536):chr(i)

OK this example is a bit far-fetched, but I understand the point and how many programs could end up in a situation like this. But it's all about tradeoffs, more complexity is more code.

There are probably some good heuristics that could be applied general though, and not just for the chr() case. The referenced PR references a patch that helps a tiny bit specifically for the chr() case (maybe find out what the code size difference is and send a PR?).

"standard" cpython use of f-strings is useless since @asottile tool ( which is maintened and covers ALL CASES up to 3.7) is all what you need as a filter in your upload tool or mpy-cross
with no rom/ram or cpu cost on any port !

I think this goes against the spirit of MicroPython. There are lots of other features that could have been implemented with a preprocessor. But being able to just copy some existing Python code and run it is a really important feature. "Oh sometimes it works but sometimes you have to use this tool" gets confusing.

@pmp-p
Copy link
Contributor
pmp-p commented Sep 22, 2019

So there's no need for the user to be able to customise interning for literal strings.

i wonder what the said user would say to you on a more general purpose channel like IRC .

/ignore

My questions about these PR are still :

  • does that PR bury more string handling code as in "a bit more so MP gets as crappy hungry as cpython ( get more speed by trashing both rom and ram) in a uncorrectable way" ?

  • does that PR open ways and attention to fix string handling code so user gets finally control ( and that's what Python is all about ) ?

  • are the previous points stated clear or irrelevant enough to pretend MicroPython 3.4 is perfect and can go embrace blindly 3.6 goals by adding that PR because another fork had it first and want the lock in ?

  • Could using Literal String Interpolation system syntax/parser/lexer/whatever allow fined grained control over string handling while still preserving script compatibility ( eg unlike const() ) using f"" F"" and by extension r"" R"" u"" U"" regardless of adoption of the specific 3.6 PEP-498 feature ?

note to self:
pre-compiler needs vs "preprocessor are a nice option" case
#2893 (comment)

@klardotsh
Copy link
Author

I'll only address two aspects of these latest comments - I don't believe "pretend MicroPython 3.4 is perfect and can go embrace blindly 3.6 goals" is an argument in good faith here.

  • MicroPython doesn't claim to be perfect from any conversation I've had with... anyone ever? We all acknowledge there are limitations and a few things missing here and there. That shouldn't stop progress on otherwise mostly independent features (as PEP-498 support is, and frankly as many of the PEPs that made up Python 3.5+ have been - they are almost all incremental feature additions, and most can be implemented fairly independently).

  • This PR was to scratch my own itch of... wanting to use F-strings, and offering my solution upstream. Nobody in this project asked me to write this; by all means this was an unprompted PR in hopes it may be useful to someone / the project overall.

  • That I submitted this patch additionally as Implement partial PEP-498 (f-string) support adafruit/circuitpython#2054 is a bit beside the point. If you peek at that PR you'll see Scott and I working to ensure the implementation is largely the same between Micro and CircuitPython, because the PRs submitted are identical (except localization, which is CircuitPython-specific). Quite literally a cherry-pick+rebase, and I've been keeping the trees in sync after any changes to either repo as much as is possible. If the same author contributed both PRs and left them open until the upstream PR is ready to go, lock-in is very clearly not the goal.

Finally, for the second aspect - I'd also argue "so MP gets as crappy as cpython" is beyond even arguing in bad faith, that's simply an attack on another project that is also written by lots of volunteers. It's fine to disagree with how CPython implements or does not implement various features, it's far less appropriate to attack them in an unrelated pull request to an entirely different distribution of Python.

I have no interest in engaging further with this discussion if the discussion can't stay diplomatic. I'll patch up the PR based on the feedback from Damien and work with him and the rest of the development team towards a potential merge, or towards an alternative implementation, or towards outright closing the idea if they decide PEP-498 isn't desired, but I do not have the mental space to entertain bad faith arguments and/or attacks.

@pmp-p
Copy link
Contributor
pmp-p commented Sep 23, 2019

@klardotsh Sorry if you did not notice but you are not the one who got attacked here. That's not the first time and i'm getting bored asking myself to what point.

i came HERE and to YOU despite the continuous noise around to possibly extend your PR to get something even more usefull than just ... f-strings. Because i for one know not only preprocessors can give that but those exists already for more than a year and now are dead simple to use as pip install.

if core size growing also fixes some other problems => then you get better acceptance from everyone.

As a matter of fact lots of people don't care about f-string at all or have feelings against them like assignment expressions. i'm trying to layout a base ( for a parser plugin system ?) to avoid further more discussions on those hot subjects.

And by everyone i don't mean just "Damien [ and the rest of the development team]" who are @dpgeorge and .... well at this point maybe show us to a list of experts and core dev because i don't know them all by their first names. (i saw a Paul @pfalcon once, but Scott ?? )

in case you did not notice that too, there are (still some) CONTRIBUTORS around and maybe USERS that may potentially become more

Discarding of those would surely make someone's ways suspicious.

But obviously some don't want me to stick my nose in anything neither help discussions to open ways for real solutions fixing real problems or even trying to find people to work with here.

"lock in" may have a different meaning, for me it is in the context of being "rude to go rampage someone's code in core AFTER his PR " => code gets locked in.

Also i changed the wording ( english is not native tongue of quite a lot of people on this crowded earth may i remind). I don't know the exact english difference beetween all the qualifiers that come in mind for a VM that requires 16 MiB to startup asyncio even with threads and multiprocessing removed.

But you can't deny where getting more speed/features by trashing both rom and ram has lead cpython ( which i run on both android and emscripten while maintaining patches for that since pre 3.7) and this is ( was?) a tradition on MicroPython to try to prevent that but maybe i don't have "the spirit of MicroPython ©".

I do not have the mental space to entertain bad faith arguments and/or attacks

Neither do i so please stick around my questions, there #4998 (comment). and yes i'm quite sure those are relative to your PR
i'll just add:
Is doing something with that code different than CircuitPython is at some point a problem ?.

Please just tell i'll go find somewhere where it is not.

edit/ i've re-read your reply recently and something greatly bothers me so i'll add that too:

"that's simply an attack on another project that is also written by lots of volunteers. "

FYI i'm on a lot of Python VM projects. I don't give you the right to represent the voice of volunters on any project and what they may think or not from the respective projects goals, attitude and governance. i already made that aggressive remark to @pfalcon in forum.

I intend to keep my FREE SPEECH rights.

Please keep things technical.

@klardotsh
Copy link
Author

@dpgeorge @jimmo This has (finally!) been updated with what I'm pretty sure was all the feedback. Whenever y'all get time to give it another look-over, let me know if there's anything else you'd like me to address (or just merge it at your leisure)!

@klardotsh
Copy link
Author

I can try to address the coveralls failure. Unsure how to fix the bare-arm code size increased! one - either I can feature flag this, or.......... do you just YOLO merge core changes like this when they implicitly have to increase code size?

@dpgeorge
Copy link
Member

I can try to address the coveralls failure.

You have a test, but maybe it doesn't test all new code paths?

Unsure how to fix the bare-arm code size increased! one - either I can feature flag this, or.......... do you just YOLO merge core changes like this when they implicitly have to increase code size?

It depends, on a case-by-case basis. For this case I'd say that f-string support should be an optional (at compile-item) feature. It might be difficult (and not sensible) to to fully feature flag all the changes here, but it'd probably be a good idea to put the big chunks of new code behind something like #if MICROPY_PY_FSTRING

@dpgeorge
Copy link
Member

@klardotsh if you don't have time to make the final changes I'm happy to take it from here myself.

@stinos
Copy link
Contributor
stinos commented Jan 16, 2020

Been using this for a while in production on Windows port and no problems so far. Which is problematic beacuse it's really hard to go back to other string formatting types :)

@klardotsh
Copy link
Author

Oh goodness, thank $DEITY for Github email notifications reminding me I never patched the coveralls failure. At work right now but let me see if I can get a diff up tonight to finish this PR - I've got some time and this should be ~an hour or two to take over the finish line.

Glad it's working well for you @stinos!

@mattytrentini
Copy link
Contributor

Thanks for your excellent work @klardotsh! What's left to do before this can be merged? I've recently been trying to port a couple of regular Python libraries and one of the first stumbling blocks is working around f-strings use; It'd be great to see f-strings supported in MicroPython!

@pmp-p
Copy link
Contributor
pmp-p commented May 15, 2020

@mattytrentini , you can use future-fstrings[rewrite] tool from https://pypi.org/project/future-fstrings prior porting to allow any Python 3 to use libraries.

@klardotsh klardotsh force-pushed the topic-pep-498-fstrings-upstream branch from 4969832 to 6e0f83b Compare May 15, 2020 23:58
@klardotsh klardotsh force-pushed the topic-pep-498-fstrings-upstream branch from 4ad8eb4 to c1d5a93 Compare May 16, 2020 00:18
@klardotsh
Copy link
Author

As of now the branch is cleanly rebased onto master, and I went ahead and simplified down the numerous error cases into just two to save on token space, as well as gated the entire feature behind MICROPY_PY_FSTRING which is false by default, but overridden to true for the unix port (I'll leave the decisions of what other ports should have this on by default up to people who are actually qualified to make them 😄 )

Still chasing down the coverage/cmdtree things - replicating this locally is a bit fun.

@klardotsh
Copy link
Author

Also - not sure how to track down these AppVeyor failures. I can't repro locally - ../mpy-cross/mpy-cross -mcache-lookup-bc -o mpytest.mpy -X emit=bytecode $(pwd)/basics/string_pep498_fstring.py (which appears to be the incantation getting generated at least for Release-x64) returns a 0 status for me (on Linux, mind you)

This implements (most of) the PEP-498 spec for f-strings, with two
exceptions:

- raw f-strings (`fr` or `rf` prefixes) raise `NotImplementedError`
- one special corner case does not function as specified in the PEP
(more on that in a moment)

This is implemented in the core as a syntax translation, brute-forcing
all f-strings to run through `String.format`. For example, the statement
`x='world'; print(f'hello {x}')` gets translated *at a syntax level*
(injected into the lexer) to `x='world'; print('hello {}'.format(x))`.
While this may lead to weird column results in tracebacks, it seemed
like the fastest, most efficient, and *likely* most RAM-friendly option,
despite being implemented under the hood with a completely separate
`vstr_t`.

Since [string concatenation of adjacent literals is implemented in the
lexer](micropython@534b7c3),
two side effects emerge:

- All strings with at least one f-string portion are concatenated into a
single literal which *must* be run through `String.format()` wholesale,
and:
- Concatenation of a raw string with interpolation characters with an
f-string will cause `IndexError`/`KeyError`, which is both different
from CPython *and* different from the corner case mentioned in the PEP
(which gave an example of the following:)

```python
x = 10
y = 'hi'
assert ('a' 'b' f'{x}' '{c}' f'str<{y:^4}>' 'd' 'e') == 'ab10{c}str< hi >de'
```

The above-linked commit detailed a pretty solid case for leaving string
concatenation in the lexer rather than putting it in the parser, and
undoing that decision would likely be disproportionately costly on
resources for the sake of a probably-low-impact corner case. An
alternative to become complaint with this corner case of the PEP would
be to revert to string concatenation in the parser *only when an
f-string is part of concatenation*, though I've done no investigation on
the difficulty or costs of doing this.

A decent set of tests is included. I've manually tested this on the
`unix` port on Linux and on a Feather M4 Express (`atmel-samd`) and
things seem sane.
@klardotsh klardotsh force-pushed the topic-pep-498-fstrings-upstream branch from c1d5a93 to 600051d Compare May 16, 2020 00:35
@@ -37,6 +37,7 @@
#define MICROPY_PY_ARRAY (0)
#define MICROPY_PY_ATTRTUPLE (0)
#define MICROPY_PY_COLLECTIONS (0)
#define MICROPY_PY_FSTRING (0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the default is disabled this line doesn't need to be here

@@ -291,15 +334,69 @@ STATIC void parse_string_literal(mp_lexer_t *lex, bool is_raw) {
}

size_t n_closing = 0;
# if MICROPY_PY_FSTRING
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There shouldn't be a space between the # and the if (same for endif below, and others below)

@@ -172,6 +177,9 @@ typedef struct _mp_lexer_t {
size_t tok_column; // token source column
mp_token_kind_t tok_kind; // token kind
vstr_t vstr; // token data
vstr_t vstr_postfix; // postfix to apply to string
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably should guard these additions in #if MICROPY_PY_FSTRING because otherwise they'll unnecessarily use up RAM. Then also need to guard all code that uses these new members (hopefully that's not too messy...)

@dpgeorge
Copy link
Member

@klardotsh thanks for coming back to this! You'll see that the CI is now a bit stricter, eg there is code formatting to follow. To get the new f-string test to be handled correctly by ports with f-strings disabled (eg qemu-arm test on the CI) it'll need to have a new feature-check script. Eg make a script called tests/feature_check/fstring.py with contents:

f"fstring"

Then add some code to tests/run-tests along the lines of how skip_async and is_async work.

jimmo pushed a commit to jimmo/micropython that referenced this pull request Aug 12, 2021
This implements (most of) the PEP-498 spec for f-strings and is based on
micropython#4998 by @klardotsh.

It is implemented in the lexer as a syntax translation to `str.format`:
  f"{a}" --> "{}".format(a)

It also supports:
  f"{a=}" --> "a={}".format(a)

This is done by extracting the arguments into a temporary vstr buffer, then
after the string has been tokenized, the lexer input queue is saved and the
contents of the temporary vstr buffer are injected ito the lexer instead.

There are four main limitations:
- raw f-strings (`fr` or `rf` prefixes) are not supported and will raise
  `SyntaxError: raw f-strings are not supported`.

- literal concatenation of f-strings with adjacent strings will fail
    "{}" f"{a}" --> "{}{}".format(a)    (str.format will incorrectly use the
                                         braces from the non-f-string)
    f"{a}" f"{a}" --> "{}".format(a) "{}".format(a) (cannot concatenate)

- PEP-498 requires the full parser to understand the interpolated argument,
  however because this entirely runs in the lexer it cannot resolve nested
  braces in expressions like
    f"{'}'}"

- The !r, !s, and !a conversions are not supported.

Includes tests and cpydiffs.
jimmo added a commit to jimmo/micropython that referenced this pull request Aug 12, 2021
This implements (most of) the PEP-498 spec for f-strings and is based on
micropython#4998 by @klardotsh.

It is implemented in the lexer as a syntax translation to `str.format`:
  f"{a}" --> "{}".format(a)

It also supports:
  f"{a=}" --> "a={}".format(a)

This is done by extracting the arguments into a temporary vstr buffer,
then after the string has been tokenized, the lexer input queue is saved
and the contents of the temporary vstr buffer are injected ito the lexer
instead.

There are four main limitations:
- raw f-strings (`fr` or `rf` prefixes) are not supported and will raise
  `SyntaxError: raw f-strings are not supported`.

- literal concatenation of f-strings with adjacent strings will fail
    "{}" f"{a}" --> "{}{}".format(a)    (str.format will incorrectly use
                                         the braces from the non-f-string)
    f"{a}" f"{a}" --> "{}".format(a) "{}".format(a) (cannot concatenate)

- PEP-498 requires the full parser to understand the interpolated
  argument, however because this entirely runs in the lexer it cannot
  resolve nested braces in expressions like
    f"{'}'}"

- The !r, !s, and !a conversions are not supported.

Includes tests and cpydiffs.

Signed-off-by: Jim Mussared <jim.mussared@gmail.com>
jimmo added a commit to jimmo/micropython that referenced this pull request Aug 12, 2021
This implements (most of) the PEP-498 spec for f-strings and is based on
micropython#4998 by @klardotsh.

It is implemented in the lexer as a syntax translation to `str.format`:
  f"{a}" --> "{}".format(a)

It also supports:
  f"{a=}" --> "a={}".format(a)

This is done by extracting the arguments into a temporary vstr buffer,
then after the string has been tokenized, the lexer input queue is saved
and the contents of the temporary vstr buffer are injected ito the lexer
instead.

There are four main limitations:
- raw f-strings (`fr` or `rf` prefixes) are not supported and will raise
  `SyntaxError: raw f-strings are not supported`.

- literal concatenation of f-strings with adjacent strings will fail
    "{}" f"{a}" --> "{}{}".format(a)    (str.format will incorrectly use
                                         the braces from the non-f-string)
    f"{a}" f"{a}" --> "{}".format(a) "{}".format(a) (cannot concatenate)

- PEP-498 requires the full parser to understand the interpolated
  argument, however because this entirely runs in the lexer it cannot
  resolve nested braces in expressions like
    f"{'}'}"

- The !r, !s, and !a conversions are not supported.

Includes tests and cpydiffs.

Signed-off-by: Jim Mussared <jim.mussared@gmail.com>
jimmo added a commit to jimmo/micropython that referenced this pull request Aug 13, 2021
This implements (most of) the PEP-498 spec for f-strings and is based on
micropython#4998 by @klardotsh.

It is implemented in the lexer as a syntax translation to `str.format`:
  f"{a}" --> "{}".format(a)

It also supports:
  f"{a=}" --> "a={}".format(a)

This is done by extracting the arguments into a temporary vstr buffer,
then after the string has been tokenized, the lexer input queue is saved
and the contents of the temporary vstr buffer are injected ito the lexer
instead.

There are four main limitations:
- raw f-strings (`fr` or `rf` prefixes) are not supported and will raise
  `SyntaxError: raw f-strings are not supported`.

- literal concatenation of f-strings with adjacent strings will fail
    "{}" f"{a}" --> "{}{}".format(a)    (str.format will incorrectly use
                                         the braces from the non-f-string)
    f"{a}" f"{a}" --> "{}".format(a) "{}".format(a) (cannot concatenate)

- PEP-498 requires the full parser to understand the interpolated
  argument, however because this entirely runs in the lexer it cannot
  resolve nested braces in expressions like
    f"{'}'}"

- The !r, !s, and !a conversions are not supported.

Includes tests and cpydiffs.

Signed-off-by: Jim Mussared <jim.mussared@gmail.com>
jimmo added a commit to jimmo/micropython that referenced this pull request Aug 14, 2021
This implements (most of) the PEP-498 spec for f-strings and is based on
micropython#4998 by @klardotsh.

It is implemented in the lexer as a syntax translation to `str.format`:
  f"{a}" --> "{}".format(a)

It also supports:
  f"{a=}" --> "a={}".format(a)

This is done by extracting the arguments into a temporary vstr buffer,
then after the string has been tokenized, the lexer input queue is saved
and the contents of the temporary vstr buffer are injected ito the lexer
instead.

There are four main limitations:
- raw f-strings (`fr` or `rf` prefixes) are not supported and will raise
  `SyntaxError: raw f-strings are not supported`.

- literal concatenation of f-strings with adjacent strings will fail
    "{}" f"{a}" --> "{}{}".format(a)    (str.format will incorrectly use
                                         the braces from the non-f-string)
    f"{a}" f"{a}" --> "{}".format(a) "{}".format(a) (cannot concatenate)

- PEP-498 requires the full parser to understand the interpolated
  argument, however because this entirely runs in the lexer it cannot
  resolve nested braces in expressions like
    f"{'}'}"

- The !r, !s, and !a conversions are not supported.

Includes tests and cpydiffs.

Signed-off-by: Jim Mussared <jim.mussared@gmail.com>
dpgeorge pushed a commit that referenced this pull request Aug 14, 2021
This implements (most of) the PEP-498 spec for f-strings and is based on
#4998 by @klardotsh.

It is implemented in the lexer as a syntax translation to `str.format`:
  f"{a}" --> "{}".format(a)

It also supports:
  f"{a=}" --> "a={}".format(a)

This is done by extracting the arguments into a temporary vstr buffer,
then after the string has been tokenized, the lexer input queue is saved
and the contents of the temporary vstr buffer are injected into the lexer
instead.

There are four main limitations:
- raw f-strings (`fr` or `rf` prefixes) are not supported and will raise
  `SyntaxError: raw f-strings are not supported`.

- literal concatenation of f-strings with adjacent strings will fail
    "{}" f"{a}" --> "{}{}".format(a)    (str.format will incorrectly use
                                         the braces from the non-f-string)
    f"{a}" f"{a}" --> "{}".format(a) "{}".format(a) (cannot concatenate)

- PEP-498 requires the full parser to understand the interpolated
  argument, however because this entirely runs in the lexer it cannot
  resolve nested braces in expressions like
    f"{'}'}"

- The !r, !s, and !a conversions are not supported.

Includes tests and cpydiffs.

Signed-off-by: Jim Mussared <jim.mussared@gmail.com>
@dpgeorge
Copy link
Member

This was merged with modifications in 692d36d

@dpgeorge dpgeorge closed this Aug 14, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants
0