10000 doc: PyUnicode_AsUTF8String() fails if string contains surrogates by vstinner · Pull Request #124605 · python/cpython · GitHub
[go: up one dir, main page]

Skip to content

doc: PyUnicode_AsUTF8String() fails if string contains surrogates #124605

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Sep 27, 2024

Conversation

vstinner
Copy link
Member
@vstinner vstinner commented Sep 26, 2024

@vstinner vstinner added skip news needs backport to 3.12 only security fixes needs backport to 3.13 bugs and security fixes labels Sep 26, 2024
@bedevere-app bedevere-app bot added awaiting core review docs Documentation in the Doc dir labels Sep 26, 2024
@vstinner
Copy link
Member Author

cc @pitrou @serhiy-storchaka

I prefer to say "surrogate characters" rather than "lone surrogates", since surrogate pairs are also disallowed.

>>> "\uDC80".encode('utf8')
UnicodeEncodeError: 'utf-8' codec can't encode character '\udc80' in position 0: surrogates not allowed

>>> "\uDBFF\uDFFF".encode('utf8')
UnicodeEncodeError: 'utf-8' codec can't encode characters in position 0-1: surrogates not allowed

@vstinner
Copy link
Member Author

Copy link
Member
@serhiy-storchaka serhiy-storchaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See the surrogatepass error handler documentation:

+===================+========================+===========================================+
|``'surrogatepass'``| utf-8, utf-16, utf-32, | Allow encoding and decoding surrogate code|
|                   | utf-16-be, utf-16-le,  | point (``U+D800`` - ``U+DFFF``) as normal |
|                   | utf-32-be, utf-32-le   | code point. Otherwise these codecs treat  |
|                   |                        | the presence of surrogate code point in   |
|                   |                        | :class:`str` as an error.                 |
+-------------------+------------------------+-------------------------------------------+

I suggest to use term "surrogate code points" and specify the range.

@vstinner
Copy link
Member Author

I suggest to use term "surrogate code points" and specify the range.

Done. Please review my updated PR.

Copy link
Member
@serhiy-storchaka serhiy-storchaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@vstinner vstinner enabled auto-merge (squash) September 27, 2024 20:07
@vstinner vstinner merged commit d8cf587 into python:main Sep 27, 2024
36 checks passed
@vstinner vstinner deleted the asutf8 branch September 27, 2024 20:13
@miss-islington-app
Copy link

Thanks @vstinner for the PR 🌮🎉.. I'm working now to backport this PR to: 3.12, 3.13.
🐍🍒⛏🤖

miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Sep 27, 2024
…thonGH-124605)

(cherry picked from commit d8cf587)

Co-authored-by: Victor Stinner <vstinner@python.org>
@miss-islington-app
Copy link

Sorry, @vstinner, I could not cleanly backport this to 3.12 due to a conflict.
Please backport using cherry_picker on command line.

cherry_picker d8cf587dc749cf21eafc1064237970ee7460634f 3.12

@bedevere-app
Copy link
bedevere-app bot commented Sep 27, 2024

GH-124707 is a backport of this pull request to the 3.13 branch.

@bedevere-app bedevere-app bot removed the needs backport to 3.13 bugs and security fixes label Sep 27, 2024
@vstinner vstinner removed the needs backport to 3.12 only security fixes label Sep 27, 2024
Yhg1s pushed a commit that referenced this pull request Sep 27, 2024
…tes (GH-124605) (#124707)

doc: PyUnicode_AsUTF8String() fails if string contains surrogates (GH-124605)
(cherry picked from commit d8cf587)

Co-authored-by: Victor Stinner <vstinner@python.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs Documentation in the Doc dir skip issue skip news
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants
0