10000 PEP 750: danger with explicit concatenation · Issue #289 · python/steering-council · GitHub
[go: up one dir, main page]

Skip to content

PEP 750: danger with explicit concatenation #289

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
dvarrazzo opened this issue May 1, 2025 · 8 comments
Open

PEP 750: danger with explicit concatenation #289

dvarrazzo opened this issue May 1, 2025 · 8 comments

Comments

@dvarrazzo
Copy link

Dear Steering Council,

to introduce myself, I have been involved in the Psycopg project (the de facto standard PostgreSQL driver for Python) since 2005, I have been the main maintainer of psycopg2 since 2010, and, in 2020, I designed and implemented Psycopg 3, in order to use new Python features (typing, async), making new choices based on the experience gathered in the previous 15 years.

As you can imagine, one of the main preoccupations of our project is safety: how to enable end users to craft any statement they need to execute on a database while guarding them from unsafe input. Therefore, we have always stressed using the best safety practices when dealing with untrusted user input.

I have recently come across the efforts of PEP 750 to provide a template string object in Python, and I found it extremely fitting for the project. A few days ago I have implemented a first version of template strings execution, and I must say that it is the most important and refreshing change I have seen in the language, positively affecting our project.

However, during a discussion about whether to include the implementation of a Template.join() method (which would be very desirable in my opinion, but this is a different matter), it dawned on me that explicit string concatenation is a very dangerous feature, and it pretty much defeats entirely the safety that the PEP declares as being one of the design goals.

Because templates and strings can be concatenated without any safety check, it is very easy to include insecure input in a template. Taking the same example from the PEP's Motivation section:

evil = "<script>alert('evil')</script>"
template = t"<p>"+ evil + t"</p>"
assert html(template) == "<p>&lt;script&gt;alert('evil')&lt;/script&gt;</p>"  # Will fail

The hypothetical html() function will receive a Template object on which it cannot put any trust. The design of the Template object makes any insecure input string instantly secure.

Please note that this design goes pretty much in the opposite direction of the LiteralString defined in PEP 675: concatenating a safe LiteralString and an unsafe str produces an unsafe str. This is bizarre:

  • with a LiteralString, anything safe, when it comes in contact with something unsafe, becomes unsafe.
  • with template strings, anything unsafe, when it comes in contact with something safe, becomes safe!

This, I am afraid, is not a well-thought-out design from the safety point of view.

LiteralString, in our project, is so relevant that it is actually the only string accepted by the execute() function: we define our Query type as:

Query: TypeAlias = Union[LiteralString, bytes, sql.SQL, sql.Composed]

What we require is for a statement to be either a literal string or to have been produced by composition of the psycopg.sql family of objects, which are designed to compose the different parts of a SQL statement employing the correct escaping method (and whose use would become largely marginal with a good template string solution). While there isn't widespread support for this feature yet in type checkers, this is our formal requirement for a query, and even if, every now and then, some user is confused by linter errors, it is no problem to explain our design.

I have looked for information about the rationale of the current design and I have been kindly provided some references. There were some discussions mixing explicit and implicit concatenation, with a final resolution stating:

Added full support for both explicit and implicit concatenation. template+template, template+str, and str+template are all supported. Concatenation always results in a Template. In the end, we decided the arguments in favor of allowing concatenation outweighed the potential disadvantages. We’ve updated the “rejected ideas” section of the PEP to describe this.

There are indeed explanations in the PEP stating:

In the end, we decided that the surprise to developers of a new string type not supporting concatenation was likely to be greater than the theoretical harm caused by supporting it. (Developers concatenate f-strings all the time, after all, and while we are sure there are cases where this introduces bugs, it’s not clear that those bugs outweigh the benefits of supporting concatenation.)

This statement is misguided. People can concatenate f-strings and normal strings without a problem because 1) they are the same type and 2) there is no safety semantics behind str. The type of bug that can be caused by disallowing str + Template is an immediate TypeError; the type of bug that can be caused by allowing it is a safety bug.

Using the current design, accepting a template string in a query cannot be considered safe. We are back to the point of people being able to compose queries such as:

name = input()
cur.execute(t"INSERT INTO names VALUES ('" + name + "')")

and no runtime or static checker should have any problem with it.

This is less safe than LiteralString or sql.SQL objects, which require an active action from the user to allow a str to be part of a statement, signifying that the author has taken their measures to prevent problems:

snip: str

cur.execute(sql.SQL("SELECT * FROM table WHERE ") + sql.SQL(snip))  # I know what I am doing

cur.execute("SELECT * FROM table WHERE " + cast(LiteralString, snip))  # I know what I am doing

cur.execute(t"SELECT * FROM table WHERE " + Template(snip))  # I know what I am doing

cur.execute(t"SELECT * FROM table WHERE " + snip)  # This might be an error

This goes in the opposite direction of where we want to go, in terms of safety. Therefore, we cannot, in our conscience, allow the use of template strings as query input, and, despite the initial enthusiasm, we will prefer to not merge the feature.

I understand that the template string branch was merged to the Python 3.14 branch only yesterday; version 3.14a7 didn't include the feature, and 3.14b1 is due to be released in a few days, after which no change would be accepted. I believe we are still in time to fix this design.

Thank you very much.

-- Daniele

@davepeck
Copy link
davepeck commented May 1, 2025

Hi SC,

To add on: after discussion, the PEP 750 team agrees that we should update the spec and implementation to:

  1. Remove Template.__radd__ entirely
  2. Remove support for str in Template.__add__

No other changes are necessary. In particular, we:

  • Continue to support implicit concatenation of str + Template and Template + str, since here the str is a presumed-safe literal (and t"" f"" is rare/lintable)
  • Continue to support Template + Template since this is safe in all cases

This change effectively requires developers to mark arbitrary strings as trusted with Template(my_str) or untrusted with Template(Interpolation(my_str)) before concatenating via __add__. We think that's a good thing.

There's more color on discuss.python.org.

We'll provide proposed PRs for both the PEP and for cpython. I'll link them here when they're available. Thank you for considering this!

@davepeck
Copy link
davepeck commented May 1, 2025

The PEP PR is now up: python/peps#4395

@davepeck
Copy link
davepeck commented May 1, 2025

A corresponding update to cpython is available here: python/cpython@main...davepeck:cpython:tstrings-concat-update -- tested locally, but not yet vetted by @lysnikolaou

I haven't created a PR for this yet because I don't know whether I should create a GitHub issue now or not. :-)

@AA-Turner
Copy link
Member
AA-Turner commented May 1, 2025

I haven't created a PR for this yet because I don't know whether I should create a GitHub issue now or not.

Process: I would open a draft PR using the overarching 'implement PEP 750' issue, but also cross reference here.

A

@dvarrazzo
Copy link
Author
dvarrazzo commented May 3, 2025

Polite critique to the release methodology: this is a major language feature that is being pushed to the language without testing apart from the people who have been directly involved in the PEP: it wasn't present in 3.14a7, it will be immutable in 3.14b1.

If there are other major design flaws in the feature, or even if the current issue is not addressed, it will take Python 4 to fix them.

I think that the feature requires still more soaking in reality to be considered mature. I suggest either:

  • introduce an a8 release to allow testing and contributors from major users
  • allow some flexibility to change things between b1 and b2
  • supersede the PEP, postpone the feature to Python 3.15.

I am available for testing on my part.

@AA-Turner
Copy link
Member

Just to clarify a few process points that you raised

it wasn't present in 3.14a7, it will be immutable in 3.14b1.

If there are other major design flaws in the feature, or even if the current issue is not addressed, it will take Python 4 to fix them.

The first beta release is 'feature freeze', but that isn't a total moratorium on changes -- there will be many bug fixes. You might find the development cycle overview in the Developer's Guide useful as a quick briefing on how releases work. Features can, and have, been removed or reverted up to and including release candidates, so please don't worry that there is no opportunity to improve things after Tuesday (b1 release). As such, I personally don't think we need an 3.14a8 release, though ultimately this is Hugo's decision as RM.

Our deprecation & versioning policy also doesn't follow SemVer, so it wouldn't be 4.0, but 3.1X, if we needed to deprecate or remove a feature.

I am available from testing on my part.

Thank you! We always appreciate more testing of alpha and beta releases, to help iron out these problems before the final release -- especially for PEP-sized features. It would also be great to know how we can make testing pre-releases easier!

A

@davepeck
Copy link
davepeck commented May 5, 2025

Just to clarify our position, the PEP 750 team does not believe the spec has "major design flaws". We think the spec as accepted is fine and can ship as-is.

This said, we do think this proposed change is a useful improvement to the spec and is worth making even at this late stage. We would all like to see it go through.

(Removing footguns is good! But we see it as a footgun, not a show-stopper. Linters exist and can flag uses of Template + str. The proposed change is, effectively: yes, but why allow such uses at all when there appears to be no benefit, only drawbacks?)

@davepeck
Copy link
davepeck commented May 20, 2025

(Updated my comment above to explicitly mention t"" f"" which we intentionally proposed keeping, to allow for plausibly useful and safe t"" "" to also remain, and because there may be unacceptable implementation risk with disallowing some kinds of implicit concat but not others.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants
0