-
-
Notifications
You must be signed in to change notification settings - Fork 32k
gh-119182: Add PyUnicodeWriter C API #119184
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
edited
- Issue: [C API] Add an efficient public PyUnicodeWriter API #119182
8de16a1
to
4d18300
Compare
4d18300
to
308d608
Compare
49a46ac
to
14e739b
Compare
14e739b
to
fd7432e
Compare
Remove PyUnicodeWriter_SetOverallocate().
fd7432e
to
b12f085
Compare
@erlend-aasland @serhiy-storchaka @encukou: Do you want to review this change? |
*str* must be Python :class:`str` object. *start* must be greater than or | ||
equal to 0, and less than or equal to *end*. *end* must be less than or | ||
equal to *str* length. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit; I prefer to use SemBr for paragraphs like this.
*str* must be Python :class:`str` object. *start* must be greater than or | |
equal to 0, and less than or equal to *end*. *end* must be less than or | |
equal to *str* length. | |
*str* must be Python :class:`str` object. | |
*start* must be greater than or equal to 0, | |
and less than or equal to *end*. | |
*end* must be less than or equal to *str* length. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TIL that this is called SemBr!
Breaking on comma may be too much, but I prefer to break at the sentence boundary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah. You don't need to break at comma, but I often do to minimise future diffs.
Alternative suggestion:
*str* must be Python :class:`str` object. *start* must be greater than or | |
equal to 0, and less than or equal to *end*. *end* must be less than or | |
equal to *str* length. | |
*str* must be Python :class:`str` object. | |
*start* must be greater than or equal to 0, and less than or equal to *end*. | |
*end* must be less than or equal to *str* length. |
Co-authored-by: Erlend E. Aasland <erlend.aasland@protonmail.com>
Co-authored-by: Erlend E. Aasland <erlend.aasland@protonmail.com>
Co-authored-by: Erlend E. Aasland <erlend.aasland@protonmail.com>
@erlend-aasland: I addressed your review. Would you mind to review the updated PR? |
Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Except using TypeError and ValueError instead of SystemError, LGTM.
But I do not insist if other core developers prefer TypeError and ValueError. I just think that SystemError is more appropriate for programming errors in using the C API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A more general question in the light of the free threading patches: Are these writer APIs thread-safe ?
I did nothing to make them safe, so I would say that they are not thread safe. You must not share a writer between two threads. Use one writer per thread. |
@malemburg @serhiy-storchaka: I modified the implementation to make write operations atomic. Either the whole string is written, or "nothing is written". In the doc, I wrote "On error, leave the writer unchanged". In practice, it's more subtle, the internal buffer can be enlarged, or its kind (UCS1, UCS2 or UCS4) can change. But from the API consumer point of view, it's as if nothing was written. I added two unit tests to show that it's still possible to use a writer after an error. The hack is just to save/restore |
I withdraw my approve because writing cannot be atomic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice; thanks!
@@ -13408,6 +13546,17 @@ _PyUnicodeWriter_Finish(_PyUnicodeWriter *writer) | |||
return unicode_result(str); | |||
} | |||
|
|||
|
|||
PyObject* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: inconsistent with the rest of the PR.
PyObject* | |
PyObject * |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also took the liberty of exhaustively point where the 2-blank lines rule is missing (the file is huge so committing directly might help).
@picnixz: I appreciate your remarks about coding style, but IMO you're going too far. I marked your "two empty lines" suggestions as resolved. This PR is already approved, twice. It's a complex API being discussed for 1 month and I would prefer to not waste my time on coding style. However, I fixed the "an Unicode" typos, thanks :-) |
The C API Working Group agreed to start with this minimum PyUnicodeWriter API. I merged my PR. |