|
| 1 | +PEP: 536 |
| 2 | +Title: Final Grammar for Literal String Interpolation |
| 3 | +Version: $Revision$ |
| 4 | +Last-Modified: $Date$ |
| 5 | +Author: Philipp Angerer <phil.angerer@gmail.com> |
| 6 | +Status: Draft |
| 7 | +Type: Standards Track |
| 8 | +Content-Type: text/x-rst |
| 9 | +Created: 11-Dec-2016 |
| 10 | +Python-Version: 3.7 |
| 11 | +Post-History: 12-Dec-2016 |
| 12 | + |
| 13 | +Abstract |
| 14 | +======== |
| 15 | + |
| 16 | +PEP 498 introduced Literal String Interpolation (or “f-strings”). |
| 17 | +The expression portions of those literals however are subject to |
| 18 | +certain restrictions. This PEP proposes a formal grammar lifting |
| 19 | +those restrictions, promoting “f-strings” to “f expressions” or f-literals. |
| 20 | + |
| 21 | +This PEP expands upon the f-strings introduced by PEP 498, |
| 22 | +so this text requires familiarity with PEP 498. |
| 23 | + |
| 24 | +Terminology |
| 25 | +=========== |
| 26 | + |
| 27 | +This text will refer to the existing grammar as “f-strings”, |
| 28 | +and the proposed one as “f-literals”. |
| 29 | + |
| 30 | +Furthermore, it will refer to the ``{}``-delimited expressions in |
| 31 | +f-literals/f-strings as “expression portions” and the static string content |
| 32 | +around them as “string portions”. |
| 33 | + |
| 34 | +Motivation |
| 35 | +========== |
| 36 | + |
| 37 | +The current implementation of f-strings in CPython relies on the existing |
| 38 | +string parsing machinery and a post processing of its tokens. This results in |
| 39 | +several restrictions to the possible expressions usable within f-strings: |
| 40 | + |
| 41 | +#. It is impossible to use the quote character delimiting the f-string |
| 42 | + within the expression portion:: |
| 43 | + |
| 44 | + >>> f'Magic wand: { bag['wand'] }' |
| 45 | + ^ |
| 46 | + SyntaxError: invalid syntax |
| 47 | + |
| 48 | +#. A previously considered way around it would lead to escape sequences |
| 49 | + in executed code and is prohibit
10000
ed in f-strings:: |
| 50 | + |
| 51 | + >>> f'Magic wand { bag[\'wand\'] } string' |
| 52 | + SyntaxError: f-string expression portion cannot include a backslash |
| 53 | + |
| 54 | +#. Comments are forbidden even in multi-line f-strings:: |
| 55 | + |
| 56 | + >>> f'''A complex trick: { |
| 57 | + ... bag['bag'] # recursive bags! |
| 58 | + ... }''' |
| 59 | + SyntaxError: f-string expression part cannot include '#' |
| 60 | + |
| 61 | +#. Expression portions need to wrap ``':'`` and ``'!'`` in braces:: |
| 62 | + |
| 63 | + >>> f'Useless use of lambdas: { lambda x: x*2 }' |
| 64 | + SyntaxError: unexpected EOF while parsing |
| 65 | + |
| 66 | +These limitations serve no purpose from a language user perspective and |
| 67 | +can be lifted by giving f-literals a regular grammar without exceptions |
| 68 | +and implementing it using dedicated parse code. |
| 69 | + |
| 70 | +Rationale |
| 71 | +========= |
| 72 | + |
| 73 | +.. https://mail.python.org/pipermail/python-ideas/2016-August/041727.html |
| 74 | + |
| 75 | +The restrictions mentioned in Motivation_ are non-obvious and counter-intuitive |
| 76 | +unless the user is familiar with the f-literals’ implementation details. |
| 77 | + |
| 78 | +As mentioned, a previous version of PEP 498 allowed escape sequences |
| 79 | +anywhere in f-strings, including as ways to encode the braces delimiting |
| 80 | +the expression portions and in their code. They would be expanded before |
| 81 | +the code is parsed, which would have had several important ramifications: |
| 82 | + |
| 83 | +#. It would not be clear to human readers which portions are Expressions |
| 84 | +and which are strings. Great material for an “obfuscated/underhanded |
| 85 | +Python challenge” |
| 86 | +#. Syntax highlighters are good in parsing nested grammar, but not |
| 87 | +in recognizing escape sequences. ECMAScript 2016 (JavaScript) allows |
| 88 | +escape sequences in its identifiers [1]_ and the author knows of no |
| 89 | +syntax highlighter able to correctly highlight code making use of this. |
| 90 | + |
| 91 | +As a consequence, the expression portions would be harder to recognize |
| 92 | +with and without the aid of syntax highlighting. With the new grammar, |
| 93 | +it is easy to extend syntax highlighters to correctly parse |
| 94 | +and display f-literals: |
| 95 | + |
| 96 | +.. raw:: html |
| 97 | + |
| 98 | + <pre><span style=color:#ff5500>f'Magic wand: </span><span style=color:#3daee9>{</span>bag[<span style=color:#bf0303>'wand'</span>]<span style=color:#3daee9>:^10}</span><span style=color:#ff5500>'</span></pre> |
| 99 | + |
| 100 | +.. This is the output of kate-syntax-highlighter when given that code |
| 101 | + (with some quotes stripped) |
| 102 | + |
| 103 | +Highlighting expression portions with possible escape sequences would |
| 104 | +mean to create a modified copy of all rules of the complete expression |
| 105 | +grammar, accounting for the possibility of escape sequences in key words, |
| 106 | +delimiters, and all other language syntax. One such duplication would |
| 107 | +yield one level of escaping depth and have to be repeated for a deeper |
| 108 | +escaping in a recursive f-literal. This is the case since no highlighting |
| 109 | +engine known to the author supports expanding escape sequences before |
| 110 | +applying rules to a certain context. Nesting contexts however is a |
| 111 | +standard feature of all highlighting engines. |
| 112 | + |
| 113 | +Familiarity also plays a role: Arbitrary nesting of expressions |
| 114 | +without expansion of escape sequences is available in every single |
| 115 | +other language employing a string interpolation method that uses |
| 116 | +expressions instead of just variable names. [2]_ |
| 117 | + |
| 118 | +Specification |
| 119 | +============= |
| 120 | + |
| 121 | +PEP 498 specified f-strings as the following, but places restrictions on it:: |
| 122 | + |
| 123 | + f ' <text> { <expression> <optional !s, !r, or !a> <optional : format specifier> } <text> ... ' |
| 124 | + |
| 125 | +All restrictions mentioned in the PEP are lifted from f-literals, |
| 126 | +as explained below: |
| 127 | + |
| 128 | +#. Expression portions may now contain strings delimited with the same |
| 129 | + kind of quote that is used to delimit the f-literal. |
| 130 | +#. Backslashes may now appear within expressions just like anywhere else |
| 131 | + in Python code. In case of strings nested within f-literals, |
| 132 | + escape sequences are expanded when the innermost string is evaluated. |
| 133 | +#. Comments, using the ``'#'`` character, are possible only in multi-line |
| 134 | + f-literals, since comments are terminated by the end of the line |
| 135 | + (which makes closing a single-line f-literal impossible). |
| 136 | +#. Expression portions may contain ``':'`` or ``'!'`` wherever |
| 137 | + syntactically valid. The first ``':'`` or ``'!'`` that is not part |
| 138 | + of an expression has to be followed a valid coercion or format specifier. |
| 139 | + |
| 140 | +A remaining restriction not explicitly mentioned by PEP 498 is line breaks |
| 141 | +in expression portions. Since strings delimited by single ``'`` or ``"`` |
| 142 | +characters are expected to be single line, line breaks remain illegal |
| 143 | +in expression portions of single line strings. |
| 144 | + |
| 145 | +.. note:: Is lifting of the restrictions sufficient, |
| 146 | + or should we specify a more complete grammar? |
| 147 | + |
| 148 | +Backwards Compatibility |
| 149 | +======================= |
| 150 | + |
| 151 | +f-literals are fully backwards compatible to f-strings, |
| 152 | +and expands the syntax considered legal. |
| 153 | + |
| 154 | +Reference Implementation |
| 155 | +======================== |
| 156 | + |
| 157 | +TBD |
| 158 | + |
| 159 | +References |
| 160 | +========== |
| 161 | + |
| 162 | +.. [1] ECMAScript ``IdentifierName`` specification |
| 163 | + ( http://ecma-international.org/ecma-262/6.0/#sec-names-and-keywords ) |
| 164 | + |
| 165 | + Yes, ``const cthulhu = { H̹̙̦̮͉̩̗̗ͧ̇̏̊̾Eͨ͆͒̆ͮ̃͏̷̮̣̫̤̣Cͯ̂͐͏̨̛͔̦̟͈̻O̜͎͍͙͚̬̝̣̽ͮ͐͗̀ͤ̍̀͢M̴̡̲̭͍͇̼̟̯̦̉̒͠Ḛ̛̙̞̪̗ͥͤͩ̾͑̔͐ͅṮ̴̷̷̗̼͍̿̿̓̽͐H̙̙̔̄͜\u
9634
0042: 42 }`` is valid ECMAScript 2016 |
| 166 | + |
| 167 | +.. [2] Wikipedia article on string interpolation |
| 168 | + ( https://en.wikipedia.org/wiki/String_interpolation ) |
| 169 | + |
| 170 | +Copyright |
| 171 | +========= |
| 172 | + |
| 173 | +This document has been placed in the public domain. |
| 174 | + |
| 175 | + |
| 176 | +.. |
| 177 | + Local Variables: |
| 178 | + mode: indented-text |
| 179 | + indent-tabs-mode: nil |
| 180 | + sentence-end-double-space: t |
| 181 | + fill-column: 70 |
| 182 | + coding: utf-8 |
| 183 | + End: |
0 commit comments