8000 gh-129349: Accept bytes in bytes.fromhex()/bytearray.fromhex() by lordmauve · Pull Request #129844 · python/cpython · GitHub
[go: up one dir, main page]

Skip to content

gh-129349: Accept bytes in bytes.fromhex()/bytearray.fromhex() #129844

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Mar 12, 2025

Conversation

lordmauve
Copy link
Contributor
@lordmauve lordmauve commented Feb 8, 2025

Change bytes.fromhex() and bytearray.fromhex() to accept a bytes object interpreted as ASCII.

This matches the behaviour of int e.g

>>> int(b'123abc', 16)
1194684
>>> bytes.fromhex(b'123abc')
b'\x12:\xbc'

Fixes #129349

Copy link
Member
@picnixz picnixz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need more tests where invalid non-hexadecimal bytes are given (e.g., b"à", '\uD834\uDD1E'.encode() or bytes with NULs) and where non-bytes/str/bytearray objects are also passed.

cc @vstinner

@picnixz
Copy link
Member
picnixz commented Feb 8, 2025

Although the use case of git cat-file --batch is relevant, I still think it's a little too niche for this. I think the output of git cat-file --batch should be post-processed (and you should indeed call .decode() instead) but others might disagree.

lordmauve added a commit to lordmauve/cpython that referenced this pull request Feb 9, 2025
Copy link
Member
@picnixz picnixz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry I reviewed the PR before thinking more about this. Actually, we have binascii.unhexlify that already supports bytes inputs. And the docs say:

Similar functionality (accepting only text string arguments, but more liberal towards whitespace) is also accessible using the bytes.fromhex() class method.

So I think the text-only feature for bytes.fromhex was actually desired.
I also tried to search on GitHub whether there are use cases for this, but I couldn't find any that actually used a non-string input. So I think this modification is likely not ideal. We might want to really restrict the input type for built-in types and perhaps be less restrictive with binascii.

Thus, sorry for having reviewed the PR (which seemed I endorsed the change), but I don't think we need to make that change. For your use-case, I think you can definitely use binascii.unhexlify instead (for instance binascii.unhexlify(b'012abc') would return b'\x01\x2a\xbc as you tested).

Note that binascii actually supports any object implementing the buffer protocol, so it's likely the one that needs to be used in such cases (assuming that whitespaces are cleaned). See https://github.com/python/cpython/blob/main/Python/pystrhex.c.

TL;DR: I'm rather -0.5 for the inclusion of this feature but if others disagree and find it good to align it with binascii.unhexlify, then I won't be against.

cc @serhiy-storchaka @vstinner

@picnixz
Copy link
Member
picnixz commented Feb 9, 2025

Note: The PR you wrote is of good quality for a first contribution. It's just that the feature can be quite niche, especially if it's added on a built-in class.

@lordmauve
Copy link
Contributor Author

binascii seems a little obscure given that int(bytes, 16) works and binascii docs say:

Normally, you will not use these functions directly but use wrapper modules like base64

It seems like a logical extension that bytes.fromhex() should support bytes and it still leaves a place for binascii as the version that supports the buffer protocol.

It's not my first credited contribution, just the first under my own GitHub account maybe.

@picnixz
Copy link
Member
picnixz commented Feb 9, 2025

It's not my first credited contribution, just the first under my own GitHub account maybe.

Sorry I got confused with another account (but the fact that the PR is good remains).

It seems like a logical extension that bytes.fromhex() should support bytes and it still leaves a place for binascii as the version that supports the buffer protocol.

Yes, but OTOH, it's a built-in and except for your use case, I couldn't find other use cases where binascii.unhexlify couldn't fit the bill (or just a call to .decode() before). So I would defer the final decision to Victor and/or Serhiy on that matter.

Copy link
Member
@vstinner vstinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member
@serhiy-storchaka serhiy-storchaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add also a test for memoryview or array.array?

lordmauve added a commit to lordmauve/cpython that referenced this pull request Mar 7, 2025
@lordmauve lordmauve force-pushed the lordmauve/issue129349 branch from ad5cc10 to 1a18438 Compare March 7, 2025 11:12
@lordmauve
Copy link
Contributor Author

Maybe add also a test for memoryview or array.array?

Ok, added this and support for the PyBuf_SIMPLE buffer protocol.

@@ -354,6 +354,10 @@ Other language changes
(with :func:`format` or :ref:`f-strings`).
(Contrubuted by Sergey B Kirpichev in :gh:`87790`.)

* The :func:`bytes.fromhex` and :func:`bytearray.fromhex` methods now accept
ASCII :class:`bytes` and :term:`bytes-like objects <bytes-like object>`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: are bytes also bytes-like objects? if so, you can just link the term. Or more generally, isn't it objects that support the buffer protocol?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but it seems less accessible to users to just link bytes-like object because that dives into a description of the buffer protocol which is lower level than the audience I was writing for in builtins docs.

Maybe the fault is with :term:\bytes-like object`` because it could just list types that duck-type like bytes.

So I hedged and did both.

Copy link
Member
@picnixz picnixz Mar 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could sa "that support the buffer protocol such as memoryviews" and add a link to whatever example of a type that supports the buffer protocol you used

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer to say "bytes and bytes-like", it's more explicit.

@lordmauve lordmauve force-pushed the lordmauve/issue129349 branch from 91566c9 to ba05050 Compare March 9, 2025 12:51
lordmauve added a commit to lordmauve/cpython that referenced this pull request Mar 10, 2025
@lordmauve lordmauve force-pushed the lordmauve/issue129349 branch from be4b9a4 to 1e035db Compare March 10, 2025 20:07
@vstinner
Copy link
Member

Oh no, there is now a conflict on clinic/ files. You can merge main into your branch and re-run make clinic.

@lordmauve lordmauve force-pushed the lordmauve/issue129349 branch from 1e035db to 31b9ab0 Compare March 12, 2025 09:45
@vstinner vstinner merged commit e0637ce into python:main Mar 12, 2025
42 checks passed
@vstinner
Copy link
Member

Merged, thank you @lordmauve!

@lordmauve lordmauve deleted the lordmauve/issue129349 branch March 12, 2025 12:54
seehwan pushed a commit to seehwan/cpython that referenced this pull request Apr 16, 2025
…ython#129844)

Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com>
Co-authored-by: Victor Stinner <vstinner@python.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

bytes.fromhex() should parse a bytes
4 participants
0