10000 makeqstrdata: don't print "compression incrased length" messages · ricardoquesada/circuitpython@08ed09a · GitHub
[go: up one dir, main page]

Skip to content

Commit 08ed09a

Browse files
committed
makeqstrdata: don't print "compression incrased length" messages
This check as implemented is misleading, because it compares the compressed size in bytes (including the length indication) with the source string length in Unicode code points. For English this is approximately fair, but for Japanese this is quite unfair and produces an excess of "increased length" messages. This message might have existed for one of two reasons: * to alert to an improperly function huffman compression * to call attention to a need for a "string is stored uncompressed" case We know by now that the huffman compression is functioning as designed and effective in general. Just to be on the safe side, I did some back-of-the-envelope estimates. I considered these three replacements for "the true source string size, in bytes": + decompressed_len_utf8 = len(decompressed.encode('utf-8')) + decompressed_len_utf16 = len(decompressed.encode('utf-16be')) + decompressed_len_bitsize = ((1+len(decompressed)) * math.ceil(math.log(1+len(values), 2)) + 7) // 8 The third counts how many bits each character requires (fewer than 128 characters in the source character set = 7, fewer than 256 = 8, fewer than 512 = 9, etc, adding a string-terminating value) and is in some way representative of the best way we would be able to store "uncompressed strings". The Japanese translation (largest as of writing) has just a few strings which increase by this metric. However, the amount of loss due to expansion in those cases is outweighed by the cost of adding 1 bit per string to indicate whether it's compressed or not. For instance, in the BOARD=trinket_m0 TRANSLATION=ja build the loss is 47 bytes over 300 strings. Adding 1 bit to each of 300 strings will cost about 37 bytes, leaving just 5 Thumb i 8000 nstructions to implement the code to check and decode "uncompressed" strings in order to break even.
1 parent ac15726 commit 08ed09a

File tree

1 file changed

+0
-2
lines changed

1 file changed

+0
-2
lines changed

py/makeqstrdata.py

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -259,8 +259,6 @@ def compress(encoding_table, decompressed, encoded_length_bits, len_translation_
259259
current_bit -= 1
260260
if current_bit != 7:
261261
current_byte += 1
262-
if current_byte > len(decompressed):
263-
print("Note: compression increased length", repr(decompressed), len(decompressed), current_byte, file=sys.stderr)
264262
return enc[:current_byte]
265263

266264
def qstr_escape(qst):

0 commit comments

Comments
 (0)
0