-
-
Notifications
You must be signed in to change notification settings - Fork 32.1k
[3.13] gh-133767: Fix use-after-free in the unicode-escape decoder with an error handler (GH-129648) #133944
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…der with an error handler (pythonGH-129648) If the error handler is used, a new bytes object is created to set as the object attribute of UnicodeDecodeError, and that bytes object then replaces the original data. A pointer to the decoded data will became invalid after destroying that temporary bytes object. So we need other way to return the first invalid escape from _PyUnicode_DecodeUnicodeEscapeInternal(). _PyBytes_DecodeEscape() does not have such issue, because it does not use the error handlers registry, but it should be changed for compatibility with _PyUnicode_DecodeUnicodeEscapeInternal(). (cherry picked from commit 9f69a58) Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
Removing
May means that it depends on the error handler and if an encoding error happens before an invalid escape sequence. |
@Yhg1s, what do you prefer? |
I'm not @Yhg1s, but, I'd vote for making them raise. |
The next beta is close, so I implemented a simple, but with minimal impact, solution. The users of Anything simpler can break the user code if they actually use these functions. Anything more complex increases probability of adding bugs in non-tested code. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good, thank you!
Thanks @serhiy-storchaka for the PR, and @encukou for merging it 🌮🎉.. I'm working now to backport this PR to: 3.9, 3.10, 3.11, 3.12. |
Sorry, @serhiy-storchaka and @encukou, I could not cleanly backport this to
|
Sorry, @serhiy-storchaka and @encukou, I could not cleanly backport this to
|
Sorry, @serhiy-storchaka and @encukou, I could not cleanly backport this to
|
Sorry, @serhiy-storchaka and @encukou, I could not cleanly backport this to
|
…der with an error handler (pythonGH-129648) (pythonGH-133944) If the error handler is used, a new bytes object is created to set as the object attribute of UnicodeDecodeError, and that bytes object then replaces the original data. A pointer to the decoded data will became invalid after destroying that temporary bytes object. So we need other way to return the first invalid escape from _PyUnicode_DecodeUnicodeEscapeInternal(). _PyBytes_DecodeEscape() does not have such issue, because it does not use the error handlers registry, but it should be changed for compatibility with _PyUnicode_DecodeUnicodeEscapeInternal(). (cherry picked from commit 9f69a58) (cherry picked from commit 6279eb8) Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
GH-134337 is a backport of this pull request to the 3.12 branch. |
…der with an error handler (pythonGH-129648) (pythonGH-133944) If the error handler is used, a new bytes object is created to set as the object attribute of UnicodeDecodeError, and that bytes object then replaces the original data. A pointer to the decoded data will became invalid after destroying that temporary bytes object. So we need other way to return the first invalid escape from _PyUnicode_DecodeUnicodeEscapeInternal(). _PyBytes_DecodeEscape() does not have such issue, because it does not use the error handlers registry, but it should be changed for compatibility with _PyUnicode_DecodeUnicodeEscapeInternal(). (cherry picked from commit 9f69a58) (cherry picked from commit 6279eb8) (cherry picked from commit a75953b) Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
GH-134341 is a backport of this pull request to the 3.11 branch. |
GH-134345 is a backport of this pull request to the 3.10 branch. |
…der with an error handler (pythonGH-129648) (pythonGH-133944) If the error handler is used, a new bytes object is created to set as the object attribute of UnicodeDecodeError, and that bytes object then replaces the original data. A pointer to the decoded data will became invalid after destroying that temporary bytes object. So we need other way to return the first invalid escape from _PyUnicode_DecodeUnicodeEscapeInternal(). _PyBytes_DecodeEscape() does not have such issue, because it does not use the error handlers registry, but it should be changed for compatibility with _PyUnicode_DecodeUnicodeEscapeInternal(). (cherry picked from commit 9f69a58) (cherry picked from commit 6279eb8) (cherry picked from commit a75953b) (cherry picked from commit 0c33e5b) Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
…er with an error handler (pythonGH-129648) (pythonGH-133944) If the error handler is used, a new bytes object is created to set as the object attribute of UnicodeDecodeError, and that bytes object then replaces the original data. A pointer to the decoded data will became invalid after destroying that temporary bytes object. So we need other way to return the first invalid escape from _PyUnicode_DecodeUnicodeEscapeInternal(). _PyBytes_DecodeEscape() does not have such issue, because it does not use the error handlers registry, but it should be changed for compatibility with _PyUnicode_DecodeUnicodeEscapeInternal(). (cherry picked from commit 9f69a58) (cherry picked from commit 6279eb8) (cherry picked from commit a75953b) (cherry picked from commit 0c33e5b) (cherry picked from commit 8b528ca) Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
GH-134346 is a backport of this pull request to the 3.9 branch. |
…th an error handler (GH-129648) (GH-133944) (#134337) If the error handler is used, a new bytes object is created to set as the object attribute of UnicodeDecodeError, and that bytes object then replaces the original data. A pointer to the decoded data will became invalid after destroying that temporary bytes object. So we need other way to return the first invalid escape from _PyUnicode_DecodeUnicodeEscapeInternal(). _PyBytes_DecodeEscape() does not have such issue, because it does not use the error handlers registry, but it should be changed for compatibility with _PyUnicode_DecodeUnicodeEscapeInternal(). (cherry picked from commit 9f69a58) (cherry picked from commit 6279eb8)
## Summary Fix use-after-free vulnerability in the unicode-escape decoder with non-strict error handlers. ## Details - **CVE**: CVE-2025-4516 - **Severity**: Medium - **Issue**: Use-after-free crash when using `bytes.decode("unicode_escape", error="ignore|replace")` ## Changes - Add CVE-2025-4516.patch from upstream merged PRs - Python 3.12: [PR #134337](python/cpython#134337) - Python 3.13: [PR #133944](python/cpython#133944) - Increment epoch to 2 for both packages ## Status - ✅ Python 3.12: Upstream patch merged and applied - ✅ Python 3.13: Upstream patch merged and applied - ⏳ Python 3.9, 3.10, 3.11: Waiting for upstream PRs to be merged ## Testing CI will validate that: - Patches apply cleanly - Packages build successfully - Tests pass ## References - [CVE-2025-4516 Details](https://www.cve.org/CVERecord?id=CVE-2025-4516) - [Security Advisory](https://mail.python.org/archives/list/security-announce@python.org/thread/L75IPBBTSCYEF56I2M4KIW353BB3AY74/) - Related to: chainguard-dev/internal-dev#12589
…th an error handler (GH-129648) (GH-133944) (GH-134341) If the error handler is used, a new bytes object is created to set as the object attribute of UnicodeDecodeError, and that bytes object then replaces the original data. A pointer to the decoded data will became invalid after destroying that temporary bytes object. So we need other way to return the first invalid escape from _PyUnicode_DecodeUnicodeEscapeInternal(). _PyBytes_DecodeEscape() does not have such issue, because it does not use the error handlers registry, but it should be changed for compatibility with _PyUnicode_DecodeUnicodeEscapeInternal(). (cherry picked from commit 9f69a58) (cherry picked from commit 6279eb8) (cherry picked from commit a75953b) Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
…th an error handler (GH-129648) (GH-133944) (GH-134345) If the error handler is used, a new bytes object is created to set as the object attribute of UnicodeDecodeError, and that bytes object then replaces the original data. A pointer to the decoded data will became invalid after destroying that temporary bytes object. So we need other way to return the first invalid escape from _PyUnicode_DecodeUnicodeEscapeInternal(). _PyBytes_DecodeEscape() does not have such issue, because it does not use the error handlers registry, but it should be changed for compatibility with _PyUnicode_DecodeUnicodeEscapeInternal(). (cherry picked from commit 9f69a58) (cherry picked from commit 6279eb8) (cherry picked from commit a75953b) (cherry picked from commit 0c33e5b) Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
…h an error handler (GH-129648) (GH-133944) (#134346) * [3.9] gh-133767: Fix use-after-free in the unicode-escape decoder with an error handler (GH-129648) (GH-133944) If the error handler is used, a new bytes object is created to set as the object attribute of UnicodeDecodeError, and that bytes object then replaces the original data. A pointer to the decoded data will became invalid after destroying that temporary bytes object. So we need other way to return the first invalid escape from _PyUnicode_DecodeUnicodeEscapeInternal(). _PyBytes_DecodeEscape() does not have such issue, because it does not use the error handlers registry, but it should be changed for compatibility with _PyUnicode_DecodeUnicodeEscapeInternal(). (cherry picked from commit 9f69a58) (cherry picked from commit 6279eb8) (cherry picked from commit a75953b) (cherry picked from commit 0c33e5b) (cherry picked from commit 8b528ca) Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
If the error handler is used, a new bytes object is created to set as the object attribute of UnicodeDecodeError, and that bytes object then replaces the original data. A pointer to the decoded data will became invalid after destroying that temporary bytes object. So we need other way to return the first invalid escape from _PyUnicode_DecodeUnicodeEscapeInternal().
_PyBytes_DecodeEscape() does not have such issue, because it does not use the error handlers registry, but it should be changed for compatibility with _PyUnicode_DecodeUnicodeEscapeInternal().
(cherry picked from commit 9f69a58)
unicode_escape
decoder with error handler #133767