-
-
Notifications
You must be signed in to change notification settings - Fork 32k
Bug: binascii.a2b_uu
incorrectly assumes padded bytes are always whitespace
#100308
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
binascii.a2b_uu
Implementation incorrectly assumes padded bytes are always whitespacebinascii.a2b_uu
Implementation incorrectly assumes padded bytes are always whitespace
binascii.a2b_uu
Implementation incorrectly assumes padded bytes are always whitespacebinascii.a2b_uu
Implementation incorrectly assumes padded bytes are always whitespace
binascii.a2b_uu
Implementation incorrectly assumes padded bytes are always whitespacebinascii.a2b_uu
incorrectly assumes padded bytes are always whitespace
I guess the question is: are any of the padding bits allowed to be anything but zeros, or the special case (noted in the Wikipedia page) of using an accent grave? What software did you see that used "!" as a padding character? I guess actually the padding is \x01, which gets converted to "!". |
@ericvsmith I don't know which program generated the uuencoding for these, but public financial reports such as the following (scroll to the very bottom of the txt file) and thousands of other examples across SEC's EDGAR public database have UUEncoded data that uses the I also found a StackOverflow post from over a decade ago raising the same issue here. |
Maybe we could add a That said, it looks like uu_codec will actually decode this currently: >>> import codecs
>>> codecs.decode(b'begin\n%-@ !\nend\n', 'uu_codec')
b'6\x00\x00\x00\x00' |
@ericvsmith I was noticing that as well, that the codecs module could decode the string. If you dig into it, it's using a similar workaround I produced. I just feel like the workaround shouldn't be necessary if the binascii module accepted other characters as padding like the linux and other implementations. I agree though, a strict=True flag could maintain backwards compat while strict=False could be lenient on different encoders' idea of padding characters. Should we name it strict_padding for more clarity? Or is brevity preferred in this case? |
Just FYI, the Given that there appear to be a couple niche legacy use cases, perhaps those would be better served by a dedicated PyPI module (perhaps based on the copy-pasted existing |
Bug Description
I was decoding some UUEncoded data when I encountered a 'Trailing Garbage' error from the
binascii.a2b_uu
function. After digging into Linux's uu decode implementation(L248) and other resources (linked below) I'm decently certain the python implementation is bugged.The following is what I tried:
The expected output is:
The actual output is:
Notice there are 5 bytes in the expected output (b'6\x00\x00\x00\x00') because the
%
(first byte of input string,s
) means 5 bytes of data follow (ascii code 37 - 32 = 5). UUEncoding requires output be divisible by 3 bytes so an extra padding character is added. In this case it's an!
.The python implementation assumes the padding is always whitespace. Different uuencoders will use different characters for padding though. I've seen three so far:
,
`
, and!
.The following several lines of code are the issue
Proposed fix
Simply remove the following lines (279 - 296). Or if we really want the verification of padding we can include the '!' in the condition of valid padding chars. (The linked linux implementation does not verify padding, however.) And based on my research, there isn't a well defined padding character so we will be jumping to the same potentially false conclusion that we have here: believing we've accounted for all the padding characters that exist in the wild.
Problematically, this bug propagated up to the uu_codec decode implementation as well. See the following code
A comment indicates the caught exception and "workaround" are due to broken uuencoders. According to what I've read, it's the broken python binascii.a2b_uu that incorrectly assumes any padding bytes are
or
`
.Here are the sources for my understanding of uu encoding:
Examples of non whitespace padding
Wikipedia uuencoding
Busybox uudecode implementation
Following is an illustration that helped me find a sense of understanding:

[1] I couldn't find an RFC or other standards document so I looked for the earliest implementation I could find (1983 Linux implementation) along with the wikipedia entry.
In the meantime
If others encounter this issue I'm using the following workaround:
The text was updated successfully, but these errors were encountered: