8000 bpo-16995: add support for base32 extended hex (base32hex) by FFY00 · Pull Request #20441 · python/cpython · GitHub
[go: up one dir, main page]

Skip to content

bpo-16995: add support for base32 extended hex (base32hex) #20441

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Aug 10, 2020

Conversation

FFY00
Copy link
Member
@FFY00 FFY00 commented May 27, 2020

@@ -0,0 +1 @@
Add Base32 w/ Extended Hex Alphabet functions to the base64 module (b32hexencode, b32hexdecode)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Add Base32 w/ Extended Hex Alphabet functions to the base64 module (b32hexencode, b32hexdecode)
Add :func:`base64.b32hexencode` and :func:`base64.b32hexdecode` to support Base 32 Encoding with Extended Hex Alphabet.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you'd like credit in the changelog, you can also add your name to this, like:

Patch contributed by Filipe Laíns.

Also, is this based on Matthäus Wander's patch? If so we should probably give them attribution as well, like:

Patch contributed by Matthäus Wander and Filipe Laíns.

This should probably also have a What's New entry.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for credit and the patch was made from scratch.

@@ -135,6 +135,13 @@ The modern interface provides:
incorrectly padded or if there are non-alphabet characters present in the
input.

.. function:: b32hexencode(s)

Similar to :func:`b32encode` but uses the Extended Hex Alphabet.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a link to the RFC could be added? It's not clear what the Extended Hex Alphabet is and which alphabet you should be using when you open the documentation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. We can also update the references to RFC 3548 to instead refer to RFC 4648.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I think we can save it for another PR, but the base64 documentation is pretty sparse, it might be nice to bring in some of the information available in the RFCs, like the tables of the alphabets and whatnot.

Copy link
Member Author
@FFY00 FFY00 May 27, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The RFC 3548 reference here is only used for map01. RFC 4648 does not define this interchangable characters, probably because 0 and O or 1 and I are different in the hex alphabet. The RFC 3548 reference should stay, but only show in the b32decode documentation, where it is used to describe the map01 argument.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RFC 4648 obsoletes RFC 3548, so you can replace even the one reference to map01 to the new RFC. It's in section 3.4, and yeah it's only supported in the base32 alphabet (which deliberately leaves out 0 and 1 to avoid this confusion).

There are multiple references to RFC 3548 that should now be cleaned up in the documentation — having actual support for RFC 4648 (which this adds) is what blocked PR #2336.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, sorry. Missed that in the RFC.

Copy link
Member
@pganssle pganssle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks really good! Just a few nitpicks and suggestions, but nothing structurally or fundamentally wrong here.

@serhiy-storchaka You were involved in the last round of this — was there anything else that was missing that never got addressed in Matthäus's patch? I didn't see any major blockers but obviously it never got merged.

@@ -135,6 +135,13 @@ The modern interface provides:
incorrectly padded or if there are non-alphabet characters present in the
input.

.. function:: b32hexencode(s)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think both of these need ..versionadded:: 3.10

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, right! Forgot :)

Lib/base64.py Outdated
@@ -241,6 +242,26 @@ def b32decode(s, casefold=False, map01=None):
return bytes(decoded)


def b32encode(s):
return _b32encode(_b32alphabet, s)
b32encode.__doc__ = _b32encode.__doc__
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm -0 on this, I think copy-pasting the docstrings is better than manipulating them like this (particularly since it means the _b32encode docstring is actually inaccurate now).

I'd say either copy-paste or move the docstring into a temporary stand-alone variable like:

_B32_DECODE_DOCSTRING = """
Decode a string s using the alphabet {alphabet}...
"""

def b32decode(...):
    ...
b32decode.__doc__ = _B32_DECODE_DOCSTRING.format(alphabet="Base32")

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think this approach is better.

@@ -135,6 +135,13 @@ The modern interface provides:
incorrectly padded or if there are non-alphabet characters present in the
input.

.. function:: b32hexencode(s)

Similar to :func:`b32encode` but uses the Extended Hex Alphabet.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. We can also update the references to RFC 3548 to instead refer to RFC 4648.

@@ -351,6 +351,81 @@ def test_b32decode_error(self):
with self.assertRaises(binascii.Error):
base64.b32decode(data.decode('ascii'))

def test_b32hexencode(self):
eq = self.assertEqual
eq(base64.b32hexencode(b''), b'')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be better to use a loop with subtests here, and not alias self.assertEqual to eq, like so:

test_cases = [
    (b'', b''),
    (b'\x00', '00======'),
    (b'a', b'C4======'),
    ...
]

for to_decode, expected in test_cases:
    with self.subTest(to_decode=to_decode):
        actual = base64.b32hexencode(to_decode)
        self.assertEqual(actual, expected)

eq(base64.b32hexencode(b'abcd'), b'C5H66P0=')
eq(base64.b32hexencode(b'abcde'), b'C5H66P35')
# Non-bytes
self.check_other_types(base64.b32hexencode, b'abcd', b'C5H66P0=')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would move this test and the next one into their own test functions.

eq(base64.b32hexdecode(data, True), res)
eq(base64.b32hexdecode(data.decode('ascii'), True), res)

self.assertRaises(binascii.Error, base64.b32hexdecode, b'c4======')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As in the other test, I'd say move this to its own test function, or move it into test_b32hexdecode_error.

Also, I think these may not be testing this correctly, shouldn't this also have a casefold=True in here somewhere?

}

for data, res in tests.items():
eq(base64.b32hexdecode(data, True), res)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Total nitpick: I think I'd recommend casefold=True over passing by position.

@@ -0,0 +1 @@
Add Base32 w/ Extended Hex Alphabet functions to the base64 module (b32hexencode, b32hexdecode)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you'd like credit in the changelog, you can also add your name to this, like:

Patch contributed by Filipe Laíns.

Also, is this based on Matthäus Wander's patch? If so we should probably give them attribution as well, like:

Patch contributed by Matthäus Wander and Filipe Laíns.

This should probably also have a What's New entry.

@@ -198,8 +199,8 @@ def b32decode(s, casefold=False, map01=None):
global _b32rev
# Delay the initialization of the table to not waste memory
# if the function is never called
if _b32rev is None:
_b32rev = {v: k for k, v in enumerate(_b32alphabet)}
if alphabet not in _b32rev:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels mildly weird to me to hash the entire alphabet on every call, but I tried a different approach that uses sentinels (enums would also work) and found that it did not make an appreciable difference in speed, and this is more elegant — it has the nice property that you don't even need to change the code if additional base32 alphabets are added in the future — so I am in favor of keeping it like this.

Just thought I'd make a comment to explain my reasoning to future software archaeologists.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it has the nice property that you don't even need to change the code if additional base32 alphabets are added in the future

That was my goal :P

@@ -135,6 +135,13 @@ The modern interface provides:
incorrectly padded or if there are non-alphabet characters present in the
input.

.. function:: b32hexencode(s)

Similar to :func:`b32encode` but uses the Extended Hex Alphabet.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I think we can save it for another PR, but the base64 documentation is pretty sparse, it might be nice to bring in some of the information available in the RFCs, like the tables of the alphabets and whatnot.

@bedevere-bot
Copy link

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

@FFY00
Copy link
Member Author
FFY00 commented May 27, 2020

In relation to the tests, they are copied from the normal base32 variants and I think the two should keep in sync. There are two options, I send a PR fixing the base32{en,de}code tests and then sync this or we merge this as is and I send a PR fixing all tests later. There is also the possiblity of maybe merging the tests together in this PR, or maybe a later PR to not put too much stuff here.

Please let me know how you want to proceed.

@pganssle
Copy link
Member

In relation to the tests, they are copied from the normal base32 variants and I think the two should keep in sync. There are two options, I send a PR fixing the base32{en,de}code tests and then sync this or we merge this as is and I send a PR fixing all tests later. There is also the possiblity of maybe merging the tests together in this PR, or maybe a later PR to not put too much stuff here.

I don't see that there's any particular reason to keep them in sync. The old tests were written before subTest existed, and also when it was much more common to pile a bunch of semi-related tests into a single test function.

It's true that it would be good to refactor the tests for the other cases as well, but there's actually a good reason not to do this, which is that you're actually changing the existing code under test in this PR and the existing tests have been stable for a while. Doing it in two steps by refactoring the code, then refactoring the tests means that we can be pretty confident that adding these new features didn't break anything that was already under test.

So I'd say we should definitely refactor the new tests and I'm more or less ambivalent about refactoring the old tests. On the one hand, we generally prefer to avoid big refactoring PRs in favor of incremental improvements whenever you are changing something anyway, but on 10000 the other hand the risk of destabilizing a test while you are changing the thing it tests should probably weigh against that.

I'd say refactor the new tests in this PR and we can do the old tests in a separate PR (whether that takes the form of merging them all together and parameterizing or just refactoring the old tests to the new style).

@FFY00
Copy link
Member Author
FFY00 commented May 27, 2020

I have made the requested changes; please review again

@bedevere-bot
Copy link

Thanks for making the requested changes!

@pganssle: please review the changes made to this pull request.

@bedevere-bot bedevere-bot requested a review from pganssle May 27, 2020 17:26
@FFY00 FFY00 force-pushed the bpo-16995 branch 2 times, most recently from cbf86ed to 2aaf479 Compare May 27, 2020 18:07
@FFY00
Copy link
Member Author
FFY00 commented Jun 4, 2020

@pganssle, friendly ping since it has been a week. No rush to review, just thought I would ping you so that you don't forget this exists :)

@pganssle pganssle requested a review from serhiy-storchaka June 8, 2020 17:38
@pganssle
Copy link
Member
pganssle commented Jun 8, 2020

@FFY00 Can you rebase to resolve the merge conflicts?

I don't have time today to look at this, but I've requested a review from @serhiy-storchaka, who reviewed the original patch that never got merged here, to see if we've missed anything. I'll give a final review as soon as I get a chance.

Thanks again for taking this on!

Copy link
Member
@pganssle pganssle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One cosmetic thing here, but otherwise this seems fine to me. Sorry for the long delay in the review.

@serhiy-storchaka Unless you ask for more time for review, I'd like to merge this in the next few days.

@pganssle
Copy link
Member

@FFY00 Would you mind rebasing and incorporating my one suggestion? I think we can merge this when that's done. Thanks for the work on this and sorry for the delays in review.

@FFY00
Copy link
Member Author
FFY00 commented Jul 27, 2020

Yes, I have just been focusing on my GSoC project so I didn't manage to get to this before 😁.

@FFY00 FFY00 force-pushed the bpo-16995 branch 2 times, most recently from 6eba08f to 773e000 Compare July 27, 2020 14:12
@FFY00
Copy link
Member Author
FFY00 commented Aug 4, 2020

@pganssle friendly ping 😊

Signed-off-by: Filipe Laíns <lains@archlinux.org>
@miss-islington
Copy link
Contributor

@FFY00: Status check is done, and it's a success ✅ .

57AE
@miss-islington miss-islington merged commit 4ce6faa into python:master Aug 10, 2020
shihai1991 pushed a commit to shihai1991/cpython that referenced this pull request Aug 20, 2020
xzy3 pushed a commit to xzy3/cpython that referenced this pull request Oct 18, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants
0