8000 gh-119182: Add PyUnicodeWriter C API by vstinner · Pull Request #119184 · python/cpython · GitHub
[go: up one dir, main page]

Skip to content

gh-119182: Add PyUnicodeWriter C API #119184

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 15 commits into from
Jun 17, 2024
Merged

Conversation

vstinner
Copy link
Member
@vstinner vstinner commented May 19, 2024

@vstinner vstinner force-pushed the WIP_unicode_writer branch from fd7432e to b12f085 Compare June 7, 2024 19:33
@vstinner vstinner removed the skip news label Jun 7, 2024
@vstinner
Copy link
Member Author
vstinner commented Jun 7, 2024

@erlend-aasland @serhiy-storchaka @encukou: Do you want to review this change?

@serhiy-storchaka serhiy-storchaka self-requested a review June 7, 2024 20:45
Comment on lines +1568 to +1570
*str* must be Python :class:`str` object. *start* must be greater than or
equal to 0, and less than or equal to *end*. *end* must be less than or
equal to *str* length.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit; I prefer to use SemBr for paragraphs like this.

Suggested change
*str* must be Python :class:`str` object. *start* must be greater than or
equal to 0, and less than or equal to *end*. *end* must be less than or
equal to *str* length.
*str* must be Python :class:`str` object.
*start* must be greater than or equal to 0,
and less than or equal to *end*.
*end* must be less than or equal to *str* length.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIL that this is called SemBr!

Breaking on comma may be too much, but I prefer to break at the sentence boundary.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. You don't need to break at comma, but I often do to minimise future diffs.

Alternative suggestion:

Suggested change
*str* must be Python :class:`str` object. *start* must be greater than or
equal to 0, and less than or equal to *end*. *end* must be less than or
equal to *str* length.
*str* must be Python :class:`str` object.
*start* must be greater than or equal to 0, and less than or equal to *end*.
*end* must be less than or equal to *str* length.

vstinner and others added 2 commits June 10, 2024 10:02
Co-authored-by: Erlend E. Aasland <erlend.aasland@protonmail.com>
Co-authored-by: Erlend E. Aasland <erlend.aasland@protonmail.com>
vstinner and others added 3 commits June 10, 2024 10:10
Co-authored-by: Erlend E. Aasland <erlend.aasland@protonmail.com>
@vstinner
Copy link
Member Author

@erlend-aasland: I addressed your review. Would you mind to review the updated PR?

Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
Copy link
Member
@serhiy-storchaka serhiy-storchaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Except using TypeError and ValueError instead of SystemError, LGTM.

But I do not insist if other core developers prefer TypeError and ValueError. I just think that SystemError is more appropriate for programming errors in using the C API.

Copy link
Member
@malemburg malemburg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A more general question in the light of the free threading patches: Are these writer APIs thread-safe ?

@vstinner
Copy link
Member Author

A more general question in the light of the free threading patches: Are these writer APIs thread-safe ?

I did nothing to make them safe, so I would say that they are not thread safe. You must not share a writer between two threads. Use one writer per thread.

@vstinner
Copy link
Member Author

@malemburg @serhiy-storchaka: I modified the implementation to make write operations atomic. Either the whole string is written, or "nothing is written".

In the doc, I wrote "On error, leave the writer unchanged". In practice, it's more subtle, the internal buffer can be enlarged, or its kind (UCS1, UCS2 or UCS4) can change. But from the API consumer point of view, it's as if nothing was written.

I added two unit tests to show that it's still possible to use a writer after an error.

The hack is just to save/restore writer->pos internally in the 2 functions which are not atomic: WriteUTF8() and WriteFormat().

@serhiy-storchaka serhiy-storchaka self-requested a review June 10, 2024 16:14
@serhiy-storchaka serhiy-storchaka dismissed their stale review June 10, 2024 16:15

I withdraw my approve because writing cannot be atomic.

Copy link
Contributor
@erlend-aasland erlend-aasland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice; thanks!

@@ -13408,6 +13546,17 @@ _PyUnicodeWriter_Finish(_PyUnicodeWriter *writer)
return unicode_result(str);
}


PyObject*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: inconsistent with the rest of the PR.

Suggested change
PyObject*
PyObject *

Copy link
Member
@picnixz picnixz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also took the liberty of exhaustively point where the 2-blank lines rule is missing (the file is huge so committing directly might help).

@vstinner
Copy link
Member Author

@picnixz: I appreciate your remarks about coding style, but IMO you're going too far. I marked your "two empty lines" suggestions as resolved. This PR is already approved, twice. It's a complex API being discussed for 1 month and I would prefer to not waste my time on coding style. However, I fixed the "an Unicode" typos, thanks :-)

@vstinner vstinner merged commit 5c4235c into python:main Jun 17, 2024
36 checks passed
@vstinner vstinner deleted the WIP_unicode_writer branch June 17, 2024 15:10
@vstinner
Copy link
Member Author

The C API Working Group agreed to start with this minimum PyUnicodeWriter API. I merged my PR.

mrahtz pushed a commit to mrahtz/cpython that referenced this pull request Jun 30, 2024
noahbkim pushed a commit to hudson-trading/cpython that referenced this pull request Jul 11, 2024
estyxx pushed a commit to estyxx/cpython that referenced this pull request Jul 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants
0