Implement partial PEP-498 (f-string) support #2054

klardotsh · 2019-08-11T04:48:37Z

This implements (most of) the PEP-498 spec for f-strings, with two
exceptions:

raw f-strings (fr or rf prefixes) raise NotImplementedError
one special corner case does not function as specified in the PEP
(more on that in a moment)

This is implemented in the core as a syntax translation, brute-forcing
all f-strings to run through String.format. For example, the statement
x='world'; print(f'hello {x}') gets translated at a syntax level
(injected into the lexer) to x='world'; print('hello {}'.format(x)).
While this may lead to weird column results in tracebacks, it seemed
like the fastest, most efficient, and likely most RAM-friendly option,
despite being implemented under the hood with a completely separate
vstr_t.

Since string concatenation of adjacent literals is implemented in the
lexer,
two side effects emerge:

All strings with at least one f-string portion are concatenated into a
single literal which must be run through String.format() wholesale,
and:
Concatenation of a raw string with interpolation characters with an
f-string will cause IndexError/KeyError, which is both different
from CPython and different from the corner case mentioned in the PEP
(which gave an example of the following:)

x = 10
y = 'hi'
assert ('a' 'b' f'{x}' '{c}' f'str<{y:^4}>' 'd' 'e') == 'ab10{c}str< hi >de'

The above-linked commit detailed a pretty solid case for leaving string
concatenation in the lexer rather than putting it in the parser, and
undoing that decision would likely be disproportionately costly on
resources for the sake of a probably-low-impact corner case. An
alternative to become compliant with this corner case of the PEP would
be to revert to string concatenation in the parser only when an
f-string is part of concatenation, though I've done no investigation on
the difficulty or costs of doing this.

A decent set of tests is included. I've manually tested this on the
unix port on Linux and on a Feather M4 Express (atmel-samd) and
things seem sane.

klardotsh · 2019-08-11T05:09:10Z

I've also submitted this upstream as micropython#4998 - modulo the lack of translation support upstream (at least in the way CircuitPython does it?), the PRs are identical. I'm happy for both projects to benefit from this, but since I mostly work in CircuitPython (and I'm also not sure how often CircuitPython rebases off upstream), I wanted to make sure the PR here landed as my top priority 🤷‍♂️

klardotsh · 2019-08-11T21:14:30Z

Looks like one failure here was due to flash size for one locale of M0 builds - perhaps this needs stashed behind a mpconfig flag and disabled on M0s? Seems like an unideal UX but might have to do what we have to do (or I just made a really flash-size-bloated feature?)

Another failure I don't understand at all - this seems to have not even built the feature, as the very first f-string causes a SyntaxError. I'm not familiar enough with the test infra here to know what's up.

jepler · 2019-08-11T21:33:11Z

The way the tests work, they are mostly comparing the circuitpython result with the python3 result. It looks like we are using python 3.5 as the baseline, due to travis being based on ubuntu xenial. That may mean that there's no way for us to validate the functionality right now, which is a bummer.

jepler · 2019-08-11T21:35:47Z

And, yes, many of the ports are almost full (in terms of flash space usage). As a first step you could try making the f-string feature depend on the preprocessor macro CIRCUITPY_FULL_BUILD. I think this gets you "almost everything except boards based on SAMD21".

tannewt

I would prefer to have this on in all builds. I've never actually seen the Itsy Bitsy M0 be the one hold out on space. Looks like we're 160 bytes over which we could probably find a way to squeeze.

I agree with Damien's suggestion on the other PR about tests too.

py/parse.c

klardotsh · 2019-08-13T18:30:05Z

Based on the bare-arm test failure I got in upstream PR, this feature added about 700 bytes to the bare-arm build. I can probably find a way to trim that down (though I have very few ideas on that just yet - at work at the moment so can't toy with things much right now).

I'm wondering if finding a better alternative to the dual-buffer chr0-chr2 + chr3-chr5 workaround for switching between reader/postfix buffers would help code size by enough. I'm not sure if mp_readers are seekable, but if they are, a more sane alternative may be to rewind that reader rather than backing up its already-read values when switching to the postfix sou 8000 rce. Hmm...

tannewt · 2019-08-13T18:55:55Z

Ok, drop this to 1 from 3 and you should have space: https://github.com/adafruit/circuitpython/blob/master/py/circuitpy_mpconfig.h#L317

I'll go back in and bump it up for M4+ later.

tannewt · 2019-08-16T19:03:37Z

Ok, we'll just wait for fixes from the upstream PR and then merge.

tannewt · 2019-08-27T17:12:13Z

@klardotsh Looks like upstream's PR has stalled. Want to update this and we'll merge?

klardotsh · 2019-08-27T20:08:38Z

@tannewt Let me bump that thread, it's possible Damien didn't see a notification/etc. of my questions/clarifications.

If I don't hear back by the weekend (likely the next availability I have to work on this) I'll focus on patching things up here!

This implements (most of) the PEP-498 spec for f-strings, with two exceptions: - raw f-strings (`fr` or `rf` prefixes) raise `NotImplementedError` - one special corner case does not function as specified in the PEP (more on that in a moment) This is implemented in the core as a syntax translation, brute-forcing all f-strings to run through `String.format`. For example, the statement `x='world'; print(f'hello {x}')` gets translated *at a syntax level* (injected into the lexer) to `x='world'; print('hello {}'.format(x))`. While this may lead to weird column results in tracebacks, it seemed like the fastest, most efficient, and *likely* most RAM-friendly option, despite being implemented under the hood with a completely separate `vstr_t`. Since [string concatenation of adjacent literals is implemented in the lexer](micropython@534b7c3), two side effects emerge: - All strings with at least one f-string portion are concatenated into a single literal which *must* be run through `String.format()` wholesale, and: - Concatenation of a raw string with interpolation characters with an f-string will cause `IndexError`/`KeyError`, which is both different from CPython *and* different from the corner case mentioned in the PEP (which gave an example of the following:) ```python x = 10 y = 'hi' assert ('a' 'b' f'{x}' '{c}' f'str<{y:^4}>' 'd' 'e') == 'ab10{c}str< hi >de' ``` The above-linked commit detailed a pretty solid case for leaving string concatenation in the lexer rather than putting it in the parser, and undoing that decision would likely be disproportionately costly on resources for the sake of a probably-low-impact corner case. An alternative to become complaint with this corner case of the PEP would be to revert to string concatenation in the parser *only when an f-string is part of concatenation*, though I've done no investigation on the difficulty or costs of doing this. A decent set of tests is included. I've manually tested this on the `unix` port on Linux and on a Feather M4 Express (`atmel-samd`) and things seem sane.

tannewt · 2020-03-09T23:00:40Z

Closing in favor of #2690

klardotsh mentioned this pull request Aug 11, 2019

py: Implement partial PEP-498 (f-string) support micropython/micropython#4998

Closed

tannewt added cpython api modules from cpython enhancement labels Aug 13, 2019

tannewt added this to the 5.x.0 - Features milestone Aug 13, 2019

tannewt requested changes Aug 13, 2019

View reviewed changes

py/parse.c Show resolved Hide resolved

klardotsh added 3 commits September 26, 2019 16:51

Address dpgeorge feedback - largely simplifications

24df59d

py(f-strings): translations

1ae77fb

klardotsh force-pushed the topic-pep-498-fstrings branch from 170b269 to 1ae77fb Compare September 26, 2019 23:51

jepler mentioned this pull request Mar 9, 2020

Implement partial PEP-498 (f-string) support #2690

Merged

tannewt closed this Mar 9, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Implement partial PEP-498 (f-string) support #2054

Implement partial PEP-498 (f-string) support #2054

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Implement partial PEP-498 (f-string) support #2054

Implement partial PEP-498 (f-string) support #2054

Uh oh!

Conversation

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!