-
-
Notifications
You must be signed in to change notification settings - Fork 32.1k
Use-after-free in unicode_escape
decoder with error handler
#133767
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Labels
3.9
only security fixes
3.10
only security fixes
3.11
only security fixes
3.12
only security fixes
3.13
bugs and security fixes
3.14
bugs and security fixes
3.15
new features, bugs and security fixes
interpreter-core
(Objects, Python, Grammar, and Parser dirs)
release-blocker
topic-unicode
type-crash
A hard crash of the interpreter, possibly with a core dump
type-security
A security issue
Comments
unicode_escape
decoder with error handler
serhiy-storchaka
added a commit
that referenced
this issue
May 12, 2025
…rror handler (GH-129648) If the error handler is used, a new bytes object is created to set as the object attribute of UnicodeDecodeError, and that bytes object then replaces the original data. A pointer to the decoded data will became invalid after destroying that temporary bytes object. So we need other way to return the first invalid escape from _PyUnicode_DecodeUnicodeEscapeInternal(). _PyBytes_DecodeEscape() does not have such issue, because it does not use the error handlers registry, but it should be changed for compatibility with _PyUnicode_DecodeUnicodeEscapeInternal().
miss-islington
pushed a commit
to miss-islington/cpython
that referenced
this issue
May 12, 2025
…h an error handler (pythonGH-129648) If the error handler is used, a new bytes object is created to set as the object attribute of UnicodeDecodeError, and that bytes object then replaces the original data. A pointer to the decoded data will became invalid after destroying that temporary bytes object. So we need other way to return the first invalid escape from _PyUnicode_DecodeUnicodeEscapeInternal(). _PyBytes_DecodeEscape() does not have such issue, because it does not use the error handlers registry, but it should be changed for compatibility with _PyUnicode_DecodeUnicodeEscapeInternal(). (cherry picked from commit 9f69a58) Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
serhiy-storchaka
added a commit
to serhiy-storchaka/cpython
that referenced
this issue
May 12, 2025
…der with an error handler (pythonGH-129648) If the error handler is used, a new bytes object is created to set as the object attribute of UnicodeDecodeError, and that bytes object then replaces the original data. A pointer to the decoded data will became invalid after destroying that temporary bytes object. So we need other way to return the first invalid escape from _PyUnicode_DecodeUnicodeEscapeInternal(). _PyBytes_DecodeEscape() does not have such issue, because it does not use the error handlers registry, but it should be changed for compatibility with _PyUnicode_DecodeUnicodeEscapeInternal(). (cherry picked from commit 9f69a58) Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
serhiy-storchaka
added a commit
that referenced
this issue
May 13, 2025
…th an error handler (GH-129648) (GH-133942) If the error handler is used, a new bytes object is created to set as the object attribute of UnicodeDecodeError, and that bytes object then replaces the original data. A pointer to the decoded data will became invalid after destroying that temporary bytes object. So we need other way to return the first invalid escape from _PyUnicode_DecodeUnicodeEscapeInternal(). _PyBytes_DecodeEscape() does not have such issue, because it does not use the error handlers registry, but it should be changed for compatibility with _PyUnicode_DecodeUnicodeEscapeInternal(). (cherry picked from commit 9f69a58) Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
This was referenced May 16, 2025
GH-134255 is a backport of this pull request to the 3.12 branch. |
encukou
pushed a commit
that referenced
this issue
May 20, 2025
…th an error handler (GH-129648) (GH-133944) If the error handler is used, a new bytes object is created to set as the object attribute of UnicodeDecodeError, and that bytes object then replaces the original data. A pointer to the decoded data will became invalid after destroying that temporary bytes object. So we need other way to return the first invalid escape from _PyUnicode_DecodeUnicodeEscapeInternal(). _PyBytes_DecodeEscape() does not have such issue, because it does not use the error handlers registry, but it should be changed for compatibility with _PyUnicode_DecodeUnicodeEscapeInternal(). (cherry picked from commit 9f69a58) Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
serhiy-storchaka
added a commit
to serhiy-storchaka/cpython
that referenced
this issue
May 20, 2025
…der with an error handler (pythonGH-129648) (pythonGH-133944) If the error handler is used, a new bytes object is created to set as the object attribute of UnicodeDecodeError, and that bytes object then replaces the original data. A pointer to the decoded data will became invalid after destroying that temporary bytes object. So we need other way to return the first invalid escape from _PyUnicode_DecodeUnicodeEscapeInternal(). _PyBytes_DecodeEscape() does not have such issue, because it does not use the error handlers registry, but it should be changed for compatibility with _PyUnicode_DecodeUnicodeEscapeInternal(). (cherry picked from commit 9f69a58) (cherry picked from commit 6279eb8) Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
serhiy-storchaka
added a commit
to serhiy-storchaka/cpython
that referenced
this issue
May 20, 2025
…der with an error handler (pythonGH-129648) (pythonGH-133944) If the error handler is used, a new bytes object is created to set as the object attribute of UnicodeDecodeError, and that bytes object then replaces the original data. A pointer to the decoded data will became invalid after destroying that temporary bytes object. So we need other way to return the first invalid escape from _PyUnicode_DecodeUnicodeEscapeInternal(). _PyBytes_DecodeEscape() does not have such issue, because it does not use the error handlers registry, but it should be changed for compatibility with _PyUnicode_DecodeUnicodeEscapeInternal(). (cherry picked from commit 9f69a58) (cherry picked from commit 6279eb8) (cherry picked from commit a75953b) Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
serhiy-storchaka
added a commit
to serhiy-storchaka/cpython
that referenced
this issue
May 20, 2025
…der with an error handler (pythonGH-129648) (pythonGH-133944) If the error handler is used, a new bytes object is created to set as the object attribute of UnicodeDecodeError, and that bytes object then replaces the original data. A pointer to the decoded data will became invalid after destroying that temporary bytes object. So we need other way to return the first invalid escape from _PyUnicode_DecodeUnicodeEscapeInternal(). _PyBytes_DecodeEscape() does not have such issue, because it does not use the error handlers registry, but it should be changed for compatibility with _PyUnicode_DecodeUnicodeEscapeInternal(). (cherry picked from commit 9f69a58) (cherry picked from commit 6279eb8) (cherry picked from commit a75953b) (cherry picked from commit 0c33e5b) Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
serhiy-storchaka
added a commit
to serhiy-storchaka/cpython
that referenced
this issue
May 20, 2025
…er with an error handler (pythonGH-129648) (pythonGH-133944) If the error handler is used, a new bytes object is created to set as the object attribute of UnicodeDecodeError, and that bytes object then replaces the original data. A pointer to the decoded data will became invalid after destroying that temporary bytes object. So we need other way to return the first invalid escape from _PyUnicode_DecodeUnicodeEscapeInternal(). _PyBytes_DecodeEscape() does not have such issue, because it does not use the error handlers registry, but it should be changed for compatibility with _PyUnicode_DecodeUnicodeEscapeInternal(). (cherry picked from commit 9f69a58) (cherry picked from commit 6279eb8) (cherry picked from commit a75953b) (cherry picked from commit 0c33e5b) (cherry picked from commit 8b528ca) Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
Yhg1s
pushed a commit
that referenced
this issue
May 26, 2025
…th an error handler (GH-129648) (GH-133944) (#134337) If the error handler is used, a new bytes object is created to set as the object attribute of UnicodeDecodeError, and that bytes object then replaces the original data. A pointer to the decoded data will became invalid after destroying that temporary bytes object. So we need other way to return the first invalid escape from _PyUnicode_DecodeUnicodeEscapeInternal(). _PyBytes_DecodeEscape() does not have such issue, because it does not use the error handlers registry, but it should be changed for compatibility with _PyUnicode_DecodeUnicodeEscapeInternal(). (cherry picked from commit 9f69a58) (cherry picked from commit 6279eb8)
ambv
pushed a commit
that referenced
this issue
Jun 2, 2025
…th an error handler (GH-129648) (GH-133944) (GH-134341) If the error handler is used, a new bytes object is created to set as the object attribute of UnicodeDecodeError, and that bytes object then replaces the original data. A pointer to the decoded data will became invalid after destroying that temporary bytes object. So we need other way to return the first invalid escape from _PyUnicode_DecodeUnicodeEscapeInternal(). _PyBytes_DecodeEscape() does not have such issue, because it does not use the error handlers registry, but it should be changed for compatibility with _PyUnicode_DecodeUnicodeEscapeInternal(). (cherry picked from commit 9f69a58) (cherry picked from commit 6279eb8) (cherry picked from commit a75953b) Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
ambv
pushed a commit
that referenced
this issue
Jun 2, 2025
…th an error handler (GH-129648) (GH-133944) (GH-134345) If the error handler is used, a new bytes object is created to set as the object attribute of UnicodeDecodeError, and that bytes object then replaces the original data. A pointer to the decoded data will became invalid after destroying that temporary bytes object. So we need other way to return the first invalid escape from _PyUnicode_DecodeUnicodeEscapeInternal(). _PyBytes_DecodeEscape() does not have such issue, because it does not use the error handlers registry, but it should be changed for compatibility with _PyUnicode_DecodeUnicodeEscapeInternal(). (cherry picked from commit 9f69a58) (cherry picked from commit 6279eb8) (cherry picked from commit a75953b) (cherry picked from commit 0c33e5b) Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
ambv
pushed a commit
that referenced
this issue
Jun 2, 2025
…h an error handler (GH-129648) (GH-133944) (#134346) * [3.9] gh-133767: Fix use-after-free in the unicode-escape decoder with an error handler (GH-129648) (GH-133944) If the error handler is used, a new bytes object is created to set as the object attribute of UnicodeDecodeError, and that bytes object then replaces the original data. A pointer to the decoded data will became invalid after destroying that temporary bytes object. So we need other way to return the first invalid escape from _PyUnicode_DecodeUnicodeEscapeInternal(). _PyBytes_DecodeEscape() does not have such issue, because it does not use the error handlers registry, but it should be changed for compatibility with _PyUnicode_DecodeUnicodeEscapeInternal(). (cherry picked from commit 9f69a58) (cherry picked from commit 6279eb8) (cherry picked from commit a75953b) (cherry picked from commit 0c33e5b) (cherry picked from commit 8b528ca) Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
mcepl
pushed a commit
to openSUSE-Python/cpython
that referenced
this issue
Jun 4, 2025
… an error handler Cut disused recode_encoding logic in _PyBytes_DecodeEscape. All call sites pass NULL for `recode_encoding`, so this path is completely untested. That's been true since before Python 3.0. It adds significant complexity to this logic, so it's best to take it out. All call sites now have a literal NULL, and that's been true since commit 768921c eliminated a conditional (`foo ? bar : NULL`) at the call site in Python/ast.c where we're parsing a bytes literal. But even before then, that condition `foo` had been a constant since unadorned string literals started meaning Unicode, in commit 572dbf8 aka v3.0a1~1035 . The `unicode` parameter is already unused, so mark it as unused too. The code that acted on it was also taken out before Python 3.0, in commit 8d30cc0 aka v3.0a1~1031 . The function (PyBytes_DecodeEscape) is exposed in the API, but it's never been documented. Fixes: bsc#1243273 (CVE-2025-4516) Fixes: gh#python#133767 From-PR: gh#python/cpython!134346 Patch: CVE-2025-4516-DecodeError-handler.patch
I have uploaded above my draft of the backport to 3.6. Any review or comments would be more than welcome. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
3.9
only security fixes
3.10
only security fixes
3.11
only security fixes
3.12
only security fixes
3.13
bugs and security fixes
3.14
bugs and security fixes
3.15
new features, bugs and security fixes
interpreter-core
(Objects, Python, Grammar, and Parser dirs)
release-blocker
topic-unicode
type-crash
A hard crash of the interpreter, possibly with a core dump
type-security
A security issue
Uh oh!
There was an error while loading. Please reload this page.
Crash report
What happened?
When using
.decode("unicode_escape")
with an error handler there is a use-after-free segfault.CPython versions tested on:
CPython main branch
Operating systems tested on:
No response
Output from running 'python -VV' on the command line:
No response
Linked PRs
The text was updated successfully, but these errors were encountered: