bpo-16995: add support for base32 extended hex (base32hex) #20441

FFY00 · 2020-05-27T00:08:09Z

https://bugs.python.org/issue16995

Automerge-Triggered-By: @pganssle

remilapeyre · 2020-05-27T11:18:12Z

Misc/NEWS.d/next/Library/2020-05-27-00-09-52.bpo-16995.4niOT7.rst

@@ -0,0 +1 @@
+Add Base32 w/ Extended Hex Alphabet functions to the base64 module (b32hexencode, b32hexdecode)


Suggested change

Add Base32 w/ Extended Hex Alphabet functions to the base64 module (b32hexencode, b32hexdecode)

Add :func:`base64.b32hexencode` and :func:`base64.b32hexdecode` to support Base 32 Encoding with Extended Hex Alphabet.

If you'd like credit in the changelog, you can also add your name to this, like:

Patch contributed by Filipe Laíns.

Also, is this based on Matthäus Wander's patch? If so we should probably give them attribution as well, like:

Patch contributed by Matthäus Wander and Filipe Laíns.

This should probably also have a What's New entry.

No need for credit and the patch was made from scratch.

remilapeyre · 2020-05-27T11:19:14Z

Doc/library/base64.rst

@@ -135,6 +135,13 @@ The modern interface provides:
   incorrectly padded or if there are non-alphabet characters present in the
   input.

+.. function:: b32hexencode(s)
+
+   Similar to :func:`b32encode` but uses the Extended Hex Alphabet.


Maybe a link to the RFC could be added? It's not clear what the Extended Hex Alphabet is and which alphabet you should be using when you open the documentation.

Agreed. We can also update the references to RFC 3548 to instead refer to RFC 4648.

Also, I think we can save it for another PR, but the base64 documentation is pretty sparse, it might be nice to bring in some of the information available in the RFCs, like the tables of the alphabets and whatnot.

The RFC 3548 reference here is only used for map01. RFC 4648 does not define this interchangable characters, probably because 0 and O or 1 and I are different in the hex alphabet. The RFC 3548 reference should stay, but only show in the b32decode documentation, where it is used to describe the map01 argument.

RFC 4648 obsoletes RFC 3548, so you can replace even the one reference to map01 to the new RFC. It's in section 3.4, and yeah it's only supported in the base32 alphabet (which deliberately leaves out 0 and 1 to avoid this confusion).

There are multiple references to RFC 3548 that should now be cleaned up in the documentation — having actual support for RFC 4648 (which this adds) is what blocked PR #2336.

Ah, sorry. Missed that in the RFC.

pganssle

This looks really good! Just a few nitpicks and suggestions, but nothing structurally or fundamentally wrong here.

@serhiy-storchaka You were involved in the last round of this — was there anything else that was missing that never got addressed in Matthäus's patch? I didn't see any major blockers but obviously it never got merged.

pganssle · 2020-05-27T13:40:09Z

Doc/library/base64.rst

@@ -135,6 +135,13 @@ The modern interface provides:
   incorrectly padded or if there are non-alphabet characters present in the
   input.

+.. function:: b32hexencode(s)


I think both of these need ..versionadded:: 3.10

Ah, right! Forgot :)

pganssle · 2020-05-27T13:44:14Z

Lib/base64.py

@@ -241,6 +242,26 @@ def b32decode(s, casefold=False, map01=None):
    return bytes(decoded)


+def b32encode(s):
+    return _b32encode(_b32alphabet, s)
+b32encode.__doc__ = _b32encode.__doc__


I'm -0 on this, I think copy-pasting the docstrings is better than manipulating them like this (particularly since it means the _b32encode docstring is actually inaccurate now).

I'd say either copy-paste or move the docstring into a temporary stand-alone variable like:

_B32_DECODE_DOCSTRING = """ Decode a string s using the alphabet {alphabet}... """ def b32decode(...): ... b32decode.__doc__ = _B32_DECODE_DOCSTRING.format(alphabet="Base32")

Yes, I think this approach is better.

pganssle · 2020-05-27T13:46:27Z

Doc/library/base64.rst

@@ -135,6 +135,13 @@ The modern interface provides:
   incorrectly padded or if there are non-alphabet characters present in the
   input.

+.. function:: b32hexencode(s)
+
+   Similar to :func:`b32encode` but uses the Extended Hex Alphabet.


Agreed. We can also update the references to RFC 3548 to instead refer to RFC 4648.

pganssle · 2020-05-27T13:54:26Z

Lib/test/test_base64.py

@@ -351,6 +351,81 @@ def test_b32decode_error(self):
                with self.assertRaises(binascii.Error):
                    base64.b32decode(data.decode('ascii'))

+    def test_b32hexencode(self):
+        eq = self.assertEqual
+        eq(base64.b32hexencode(b''), b'')


I think it would be better to use a loop with subtests here, and not alias self.assertEqual to eq, like so:

test_cases = [ (b'', b''), (b'\x00', '00======'), (b'a', b'C4======'), ... ] for to_decode, expected in test_cases: with self.subTest(to_decode=to_decode): actual = base64.b32hexencode(to_decode) self.assertEqual(actual, expected)

pganssle · 2020-05-27T13:56:00Z

Lib/test/test_base64.py

+        eq(base64.b32hexencode(b'abcd'), b'C5H66P0=')
+        eq(base64.b32hexencode(b'abcde'), b'C5H66P35')
+        # Non-bytes
+        self.check_other_types(base64.b32hexencode, b'abcd', b'C5H66P0=')


I would move this test and the next one into their own test functions.

pganssle · 2020-05-27T14:01:11Z

Lib/test/test_base64.py

+            eq(base64.b32hexdecode(data, True), res)
+            eq(base64.b32hexdecode(data.decode('ascii'), True), res)
+
+        self.assertRaises(binascii.Error, base64.b32hexdecode, b'c4======')


As in the other test, I'd say move this to its own test function, or move it into test_b32hexdecode_error.

Also, I think these may not be testing this correctly, shouldn't this also have a casefold=True in here somewhere?

pganssle · 2020-05-27T14:04:31Z

Lib/test/test_base64.py

+                 }
+
+        for data, res in tests.items():
+            eq(base64.b32hexdecode(data, True), res)


Total nitpick: I think I'd recommend casefold=True over passing by position.

pganssle · 2020-05-27T14:15:13Z

Misc/NEWS.d/next/Library/2020-05-27-00-09-52.bpo-16995.4niOT7.rst

@@ -0,0 +1 @@
+Add Base32 w/ Extended Hex Alphabet functions to the base64 module (b32hexencode, b32hexdecode)


If you'd like credit in the changelog, you can also add your name to this, like:

Patch contributed by Filipe Laíns.

Also, is this based on Matthäus Wander's patch? If so we should probably give them attribution as well, like:

Patch contributed by Matthäus Wander and Filipe Laíns.

This should probably also have a What's New entry.

pganssle · 2020-05-27T14:24:38Z

Lib/base64.py

@@ -198,8 +199,8 @@ def b32decode(s, casefold=False, map01=None):
    global _b32rev
    # Delay the initialization of the table to not waste memory
    # if the function is never called
-    if _b32rev is None:
-        _b32rev = {v: k for k, v in enumerate(_b32alphabet)}
+    if alphabet not in _b32rev:


It feels mildly weird to me to hash the entire alphabet on every call, but I tried a different approach that uses sentinels (enums would also work) and found that it did not make an appreciable difference in speed, and this is more elegant — it has the nice property that you don't even need to change the code if additional base32 alphabets are added in the future — so I am in favor of keeping it like this.

Just thought I'd make a comment to explain my reasoning to future software archaeologists.

it has the nice property that you don't even need to change the code if additional base32 alphabets are added in the future

That was my goal :P

pganssle · 2020-05-27T14:25:58Z

Doc/library/base64.rst

@@ -135,6 +135,13 @@ The modern interface provides:
   incorrectly padded or if there are non-alphabet characters present in the
   input.

+.. function:: b32hexencode(s)
+
+   Similar to :func:`b32encode` but uses the Extended Hex Alphabet.


Also, I think we can save it for another PR, but the base64 documentation is pretty sparse, it might be nice to bring in some of the information available in the RFCs, like the tables of the alphabets and whatnot.

bedevere-bot · 2020-05-27T14:27:37Z

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

FFY00 · 2020-05-27T15:34:26Z

In relation to the tests, they are copied from the normal base32 variants and I think the two should keep in sync. There are two options, I send a PR fixing the base32{en,de}code tests and then sync this or we merge this as is and I send a PR fixing all tests later. There is also the possiblity of maybe merging the tests together in this PR, or maybe a later PR to not put too much stuff here.

Please let me know how you want to proceed.

pganssle · 2020-05-27T16:12:02Z

In relation to the tests, they are copied from the normal base32 variants and I think the two should keep in sync. There are two options, I send a PR fixing the base32{en,de}code tests and then sync this or we merge this as is and I send a PR fixing all tests later. There is also the possiblity of maybe merging the tests together in this PR, or maybe a later PR to not put too much stuff here.

I don't see that there's any particular reason to keep them in sync. The old tests were written before subTest existed, and also when it was much more common to pile a bunch of semi-related tests into a single test function.

It's true that it would be good to refactor the tests for the other cases as well, but there's actually a good reason not to do this, which is that you're actually changing the existing code under test in this PR and the existing tests have been stable for a while. Doing it in two steps by refactoring the code, then refactoring the tests means that we can be pretty confident that adding these new features didn't break anything that was already under test.

So I'd say we should definitely refactor the new tests and I'm more or less ambivalent about refactoring the old tests. On the one hand, we generally prefer to avoid big refactoring PRs in favor of incremental improvements whenever you are changing something anyway, but on 10000 the other hand the risk of destabilizing a test while you are changing the thing it tests should probably weigh against that.

I'd say refactor the new tests in this PR and we can do the old tests in a separate PR (whether that takes the form of merging them all together and parameterizing or just refactoring the old tests to the new style).

FFY00 · 2020-05-27T17:26:07Z

I have made the requested changes; please review again

bedevere-bot · 2020-05-27T17:26:10Z

Thanks for making the requested changes!

@pganssle: please review the changes made to this pull request.

FFY00 · 2020-06-04T05:46:05Z

@pganssle, friendly ping since it has been a week. No rush to review, just thought I would ping you so that you don't forget this exists :)

pganssle · 2020-06-08T17:39:17Z

@FFY00 Can you rebase to resolve the merge conflicts?

I don't have time today to look at this, but I've requested a review from @serhiy-storchaka, who reviewed the original patch that never got merged here, to see if we've missed anything. I'll give a final review as soon as I get a chance.

Thanks again for taking this on!

pganssle

One cosmetic thing here, but otherwise this seems fine to me. Sorry for the long delay in the review.

@serhiy-storchaka Unless you ask for more time for review, I'd like to merge this in the next few days.

Lib/test/test_base64.py

pganssle · 2020-07-27T13:40:00Z

@FFY00 Would you mind rebasing and incorporating my one suggestion? I think we can merge this when that's done. Thanks for the work on this and sorry for the delays in review.

FFY00 · 2020-07-27T14:07:23Z

Yes, I have just been focusing on my GSoC project so I didn't manage to get to this before 😁.

FFY00 · 2020-08-04T19:07:50Z

@pganssle friendly ping 😊

Signed-off-by: Filipe Laíns <lains@archlinux.org>

miss-islington · 2020-08-10T14:48:19Z

@FFY00: Status check is done, and it's a success ✅ .

@pganssle

…20441) cc @pganssle Automerge-Triggered-By: @pganssle

@pganssle

…20441) cc @pganssle Automerge-Triggered-By: @pganssle

the-knights-who-say-ni added the CLA signed label May 27, 2020

bedevere-bot added the awaiting review label May 27, 2020

FFY00 force-pushed the bpo-16995 branch 2 times, most recently from 787becd to 1b167ec Compare May 27, 2020 00:21

remilapeyre reviewed May 27, 2020

View reviewed changes

pganssle requested changes May 27, 2020

View reviewed changes

bedevere-bot added awaiting changes and removed awaiting review labels May 27, 2020

FFY00 force-pushed the bpo-16995 branch from 1b167ec to 87836d2 Compare May 27, 2020 17:25

bedevere-bot added awaiting change review and removed awaiting changes labels May 27, 2020

bedevere-bot requested a review from pganssle May 27, 2020 17:26

FFY00 force-pushed the bpo-16995 branch 2 times, most recently from cbf86ed to 2aaf479 Compare May 27, 2020 18:07

pganssle requested a review from serhiy-storchaka June 8, 2020 17:38

FFY00 force-pushed the bpo-16995 branch from 2aaf479 to 2d64683 Compare June 8, 2020 18:17

pganssle approved these changes Jul 23, 2020

View reviewed changes

Lib/test/test_base64.py Outdated Show resolved Hide resolved

bedevere-bot added awaiting merge and removed awaiting change review labels Jul 23, 2020

FFY00 force-pushed the bpo-16995 branch 2 times, most recently from 6eba08f to 773e000 Compare July 27, 2020 14:12

bpo-16995: add support for base32 extended hex (base32hex)

7306649

Signed-off-by: Filipe Laíns <lains@archlinux.org>

pganssle force-pushed the bpo-16995 branch from 773e000 to 7306649 Compare August 10, 2020 14:28

pganssle added the 🤖 automerge label Aug 10, 2020

miss-islington merged commit 4ce6faa into python:master Aug 10, 2020

bedevere-bot removed the awaiting merge label Aug 10, 2020

shihai1991 pushed a commit to shihai1991/cpython that referenced this pull request Aug 20, 2020

bpo-16995: add support for base32 extended hex (base32hex) (pythonGH-…

3705f26

…20441) cc @pganssle Automerge-Triggered-By: @pganssle

xzy3 pushed a commit to xzy3/cpython that referenced this pull request Oct 18, 2020

bpo-16995: add support for base32 extended hex (base32hex) (pythonGH-…

0a3e166

…20441) cc @pganssle Automerge-Triggered-By: @pganssle

matthaeuswander mannequin mentioned this pull request Apr 10, 2022

Add Base32 support for RFC4648 "Extended Hex" alphabet (patch attached) #61199

Closed

		@@ -0,0 +1 @@
		Add Base32 w/ Extended Hex Alphabet functions to the base64 module (b32hexencode, b32hexdecode)

	Add Base32 w/ Extended Hex Alphabet functions to the base64 module (b32hexencode, b32hexdecode)
	Add :func:`base64.b32hexencode` and :func:`base64.b32hexdecode` to support Base 32 Encoding with Extended Hex Alphabet.

Uh oh!

bpo-16995: add support for base32 extended hex (base32hex) #20441

bpo-16995: add support for base32 extended hex (base32hex) #20441

Uh oh!

Conversation

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!