-
-
Notifications
You must be signed in to change notification settings - Fork 32.1k
bpo-16995: add support for base32 extended hex (base32hex) #20441
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
787becd
to
1b167ec
Compare
@@ -0,0 +1 @@ | |||
Add Base32 w/ Extended Hex Alphabet functions to the base64 module (b32hexencode, b32hexdecode) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add Base32 w/ Extended Hex Alphabet functions to the base64 module (b32hexencode, b32hexdecode) | |
Add :func:`base64.b32hexencode` and :func:`base64.b32hexdecode` to support Base 32 Encoding with Extended Hex Alphabet. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you'd like credit in the changelog, you can also add your name to this, like:
Patch contributed by Filipe Laíns.
Also, is this based on Matthäus Wander's patch? If so we should probably give them attribution as well, like:
Patch contributed by Matthäus Wander and Filipe Laíns.
This should probably also have a What's New entry.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need for credit and the patch was made from scratch.
Doc/library/base64.rst
Outdated
@@ -135,6 +135,13 @@ The modern interface provides: | |||
incorrectly padded or if there are non-alphabet characters present in the | |||
input. | |||
|
|||
.. function:: b32hexencode(s) | |||
|
|||
Similar to :func:`b32encode` but uses the Extended Hex Alphabet. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe a link to the RFC could be added? It's not clear what the Extended Hex Alphabet is and which alphabet you should be using when you open the documentation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. We can also update the references to RFC 3548 to instead refer to RFC 4648.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I think we can save it for another PR, but the base64
documentation is pretty sparse, it might be nice to bring in some of the information available in the RFCs, like the tables of the alphabets and whatnot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The RFC 3548 reference here is only used for map01
. RFC 4648 does not define this interchangable characters, probably because 0
and O
or 1
and I
are different in the hex alphabet. The RFC 3548 reference should stay, but only show in the b32decode
documentation, where it is used to describe the map01
argument.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RFC 4648 obsoletes RFC 3548, so you can replace even the one reference to map01
to the new RFC. It's in section 3.4, and yeah it's only supported in the base32
alphabet (which deliberately leaves out 0 and 1 to avoid this confusion).
There are multiple references to RFC 3548 that should now be cleaned up in the documentation — having actual support for RFC 4648 (which this adds) is what blocked PR #2336.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, sorry. Missed that in the RFC.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks really good! Just a few nitpicks and suggestions, but nothing structurally or fundamentally wrong here.
@serhiy-storchaka You were involved in the last round of this — was there anything else that was missing that never got addressed in Matthäus's patch? I didn't see any major blockers but obviously it never got merged.
@@ -135,6 +135,13 @@ The modern interface provides: | |||
incorrectly padded or if there are non-alphabet characters present in the | |||
input. | |||
|
|||
.. function:: b32hexencode(s) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think both of these need ..versionadded:: 3.10
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, right! Forgot :)
Lib/base64.py
Outdated
@@ -241,6 +242,26 @@ def b32decode(s, casefold=False, map01=None): | |||
return bytes(decoded) | |||
|
|||
|
|||
def b32encode(s): | |||
return _b32encode(_b32alphabet, s) | |||
b32encode.__doc__ = _b32encode.__doc__ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm -0 on this, I think copy-pasting the docstrings is better than manipulating them like this (particularly since it means the _b32encode
docstring is actually inaccurate now).
I'd say either copy-paste or move the docstring into a temporary stand-alone variable like:
_B32_DECODE_DOCSTRING = """
Decode a string s using the alphabet {alphabet}...
"""
def b32decode(...):
...
b32decode.__doc__ = _B32_DECODE_DOCSTRING.format(alphabet="Base32")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think this approach is better.
Doc/library/base64.rst
Outdated
@@ -135,6 +135,13 @@ The modern interface provides: | |||
incorrectly padded or if there are non-alphabet characters present in the | |||
input. | |||
|
|||
.. function:: b32hexencode(s) | |||
|
|||
Similar to :func:`b32encode` but uses the Extended Hex Alphabet. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. We can also update the references to RFC 3548 to instead refer to RFC 4648.
Lib/test/test_base64.py
Outdated
@@ -351,6 +351,81 @@ def test_b32decode_error(self): | |||
with self.assertRaises(binascii.Error): | |||
base64.b32decode(data.decode('ascii')) | |||
|
|||
def test_b32hexencode(self): | |||
eq = self.assertEqual | |||
eq(base64.b32hexencode(b''), b'') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be better to use a loop with subtests here, and not alias self.assertEqual
to eq
, like so:
test_cases = [
(b'', b''),
(b'\x00', '00======'),
(b'a', b'C4======'),
...
]
for to_decode, expected in test_cases:
with self.subTest(to_decode=to_decode):
actual = base64.b32hexencode(to_decode)
self.assertEqual(actual, expected)
eq(base64.b32hexencode(b'abcd'), b'C5H66P0=') | ||
eq(base64.b32hexencode(b'abcde'), b'C5H66P35') | ||
# Non-bytes | ||
self.check_other_types(base64.b32hexencode, b'abcd', b'C5H66P0=') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would move this test and the next one into their own test functions.
Lib/test/test_base64.py
Outdated
eq(base64.b32hexdecode(data, True), res) | ||
eq(base64.b32hexdecode(data.decode('ascii'), True), res) | ||
|
||
self.assertRaises(binascii.Error, base64.b32hexdecode, b'c4======') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As in the other test, I'd say move this to its own test function, or move it into test_b32hexdecode_error
.
Also, I think these may not be testing this correctly, shouldn't this also have a casefold=True
in here somewhere?
Lib/test/test_base64.py
Outdated
} | ||
|
||
for data, res in tests.items(): | ||
eq(base64.b32hexdecode(data, True), res) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Total nitpick: I think I'd recommend casefold=True
over passing by position.
@@ -0,0 +1 @@ | |||
Add Base32 w/ Extended Hex Alphabet functions to the base64 module (b32hexencode, b32hexdecode) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you'd like credit in the changelog, you can also add your name to this, like:
Patch contributed by Filipe Laíns.
Also, is this based on Matthäus Wander's patch? If so we should probably give them attribution as well, like:
Patch contributed by Matthäus Wander and Filipe Laíns.
This should probably also have a What's New entry.
@@ -198,8 +199,8 @@ def b32decode(s, casefold=False, map01=None): | |||
global _b32rev | |||
# Delay the initialization of the table to not waste memory | F438 tr>|||
# if the function is never called | |||
if _b32rev is None: | |||
_b32rev = {v: k for k, v in enumerate(_b32alphabet)} | |||
if alphabet not in _b32rev: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It feels mildly weird to me to hash the entire alphabet on every call, but I tried a different approach that uses sentinels (enums would also work) and found that it did not make an appreciable difference in speed, and this is more elegant — it has the nice property that you don't even need to change the code if additional base32 alphabets are added in the future — so I am in favor of keeping it like this.
Just thought I'd make a comment to explain my reasoning to future software archaeologists.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it has the nice property that you don't even need to change the code if additional base32 alphabets are added in the future
That was my goal :P
Doc/library/base64.rst
Outdated
@@ -135,6 +135,13 @@ The modern interface provides: | |||
incorrectly padded or if there are non-alphabet characters present in the | |||
input. | |||
|
|||
.. function:: b32hexencode(s) | |||
|
|||
Similar to :func:`b32encode` but uses the Extended Hex Alphabet. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I think we can save it for another PR, but the base64
documentation is pretty sparse, it might be nice to bring in some of the information available in the RFCs, like the tables of the alphabets and whatnot.
A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated. Once you have made the requested changes, please leave a comment on this pull request containing the phrase |
In relation to the tests, they are copied from the normal base32 variants and I think the two should keep in sync. There are two options, I send a PR fixing the base32{en,de}code tests and then sync this or we merge this as is and I send a PR fixing all tests later. There is also the possiblity of maybe merging the tests together in this PR, or maybe a later PR to not put too much stuff here. Please let me know how you want to proceed. |
I don't see that there's any particular reason to keep them in sync. The old tests were written before It's true that it would be good to refactor the tests for the other cases as well, but there's actually a good reason not to do this, which is that you're actually changing the existing code under test in this PR and the existing tests have been stable for a while. Doing it in two steps by refactoring the code, then refactoring the tests means that we can be pretty confident that adding these new features didn't break anything that was already under test. So I'd say we should definitely refactor the new tests and I'm more or less ambivalent about refactoring the old tests. On the one hand, we generally prefer to avoid big refactoring PRs in favor of incremental improvements whenever you are changing something anyway, but on 10000 the other hand the risk of destabilizing a test while you are changing the thing it tests should probably weigh against that. I'd say refactor the new tests in this PR and we can do the old tests in a separate PR (whether that takes the form of merging them all together and parameterizing or just refactoring the old tests to the new style). |
I have made the requested changes; please review again |
Thanks for making the requested changes! @pganssle: please review the changes made to this pull request. |
cbf86ed
to
2aaf479
Compare
@pganssle, friendly ping since it has been a week. No rush to review, just thought I would ping you so that you don't forget this exists :) |
@FFY00 Can you rebase to resolve the merge conflicts? I don't have time today to look at this, but I've requested a review from @serhiy-storchaka, who reviewed the original patch that never got merged here, to see if we've missed anything. I'll give a final review as soon as I get a chance. Thanks again for taking this on! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One cosmetic thing here, but otherwise this seems fine to me. Sorry for the long delay in the review.
@serhiy-storchaka Unless you ask for more time for review, I'd like to merge this in the next few days.
@FFY00 Would you mind rebasing and incorporating my one suggestion? I think we can merge this when that's done. Thanks for the work on this and sorry for the delays in review. |
Yes, I have just been focusing on my GSoC project so I didn't manage to get to this before 😁. |
6eba08f
to
773e000
Compare
@pganssle friendly ping 😊 |
Signed-off-by: Filipe Laíns <lains@archlinux.org>
@FFY00: Status check is done, and it's a success ✅ . |
cc @pganssle
https://bugs.python.org/issue16995
Automerge-Triggered-By: @pganssle