10000 py/makecompresseddata.py: Make compression deterministic. · micropython/micropython@388d419 · GitHub
[go: up one dir, main page]

Skip to content

Commit 388d419

Browse files
committed
py/makecompresseddata.py: Make compression deterministic.
Error string compression is not deterministic in certain cases: it depends on the Python version (whether dicts are ordered by default or not) and probably also the order files are passed to this script, leading to a difference in which words are included in the top 128 most common. The changes in this commit use OrderedDict to keep parsed lines in a known order, and, when computing how many bytes are saved by a given word, it uses the word itself to break ties (which would otherwise be "random").
1 parent 1b1ceb6 commit 388d419

File tree

1 file changed

+3
-2
lines changed

1 file changed

+3
-2
lines changed

py/makecompresseddata.py

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -51,9 +51,10 @@ def word_compression(error_strings):
5151
topn[word] += 1
5252

5353
# Order not just by frequency, but by expected saving. i.e. prefer a longer string that is used less frequently.
54+
# Use the word itself for ties so that compression is deterministic.
5455
def bytes_saved(item):
5556
w, n = item
56-
return -((len(w) + 1) * (n - 1))
57+
return -((len(w) + 1) * (n - 1)), w
5758

5859
top128 = sorted(topn.items(), key=bytes_saved)[:128]
5960

@@ -143,7 +144,7 @@ def bytes_saved(item):
143144

144145

145146
def main(collected_path, fn):
146-
error_strings = {}
147+
error_strings = collections.OrderedDict()
147148
max_uncompressed_len = 0
148149
num_uses = 0
149150

0 commit comments

Comments
 (0)
0