8000 [3.7] gh-93065: Fix HAMT to iterate correctly over 7-level deep trees… · python/cpython@3a4ca49 · GitHub
[go: up one dir, main page]

Skip to content

Commit 3a4ca49

Browse files
ambvmolaxx1st1
authored
[3.7] gh-93065: Fix HAMT to iterate correctly over 7-level deep trees (GH-93149)
Also while there, clarify a few things about why we reduce the hash to 32 bits. Co-authored-by: Eli Libman <eli@hyro.ai> Co-authored-by: Yury Selivanov <yury@edgedb.com> Co-authored-by: Łukasz Langa <lukasz@langa.pl> (cherry picked from commit c1f5c90)
1 parent 2a353b2 commit 3a4ca49

File tree

5 files changed

+66
-4
lines changed

5 files changed

+66
-4
lines changed

Include/internal/hamt.h

Lines changed: 13 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,19 @@
22
#define Py_INTERNAL_HAMT_H
33

44

5-
#define _Py_HAMT_MAX_TREE_DEPTH 7
5+
6+
/*
7+
HAMT tree is shaped by hashes of keys. Every group of 5 bits of a hash denotes
8+
the exact position of the key in one level of the tree. Since we're using
9+
32 bit hashes, we can have at most 7 such levels. Although if there are
10+
two distinct keys with equal hashes, they will have to occupy the same
11+
cell in the 7th level of the tree -- so we'd put them in a "collision" node.
12+
Which brings the total possible tree depth to 8. Read more about the actual
13+
layout of the HAMT tree in `hamt.c`.
14+
15+
This constant is used to define a datastucture for storing iteration state.
16+
*/
17+
#define _Py_HAMT_MAX_TREE_DEPTH 8
618

719

820
#define PyHamt_Check(o) (Py_TYPE(o) == &_PyHamt_Type)

Lib/test/test_context.py

Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -537,6 +537,41 @@ def test_hamt_collision_1(self):
537537
self.assertEqual(len(h4), 2)
538538
self.assertEqual(len(h5), 3)
539539

540+
def test_hamt_collision_3(self):
541+
# Test that iteration works with the deepest tree possible.
542+
# https://github.com/python/cpython/issues/93065
543+
544+
C = HashKey(0b10000000_00000000_00000000_00000000, 'C')
545+
D = HashKey(0b10000000_00000000_00000000_00000000, 'D')
546+
547+
E = HashKey(0b00000000_00000000_00000000_00000000, 'E')
548+
549+
h = hamt()
550+
h = h.set(C, 'C')
551+
h = h.set(D, 'D')
552+
h = h.set(E, 'E')
553+
554+
# BitmapNode(size=2 count=1 bitmap=0b1):
555+
# NULL:
556+
# BitmapNode(size=2 count=1 bitmap=0b1):
557+
# NULL:
558+
# BitmapNode(size=2 count=1 bitmap=0b1):
559+
# NULL:
560+
# BitmapNode(size=2 count=1 bitmap=0b1):
561+
# NULL:
562+
# BitmapNode(size=2 count=1 bitmap=0b1):
563+
# NULL:
564+
# BitmapNode(size=2 count=1 bitmap=0b1):
565+
# NULL:
566+
# BitmapNode(size=4 count=2 bitmap=0b101):
567+
# <Key name:E hash:0>: 'E'
568+
# NULL:
569+
# CollisionNode(size=4 id=0x107a24520):
570+
# <Key name:C hash:2147483648>: 'C'
571+
# <Key name:D hash:2147483648>: 'D'
572+
573+
self.assertEqual({k.name for k in h.keys()}, {'C', 'D', 'E'})
574+
540575
def test_hamt_stress(self):
541576
COLLECTION_SIZE = 7000
542577
TEST_ITERS_EVERY = 647

Misc/ACKS

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -963,6 +963,8 @@ Robert Li
963963
Xuanji Li
964964
Zekun Li
965965
Zheao Li
966+
Eli Libman
967+
Dan Lidral-Porter
966968
Robert van Liere
967969
Ross Light
968970
Shawn Ligocki
Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
Fix contextvars HAMT implementation to handle iteration over deep trees.
2+
3+
The bug was discovered and fixed by Eli Libman. See
4+
`MagicStack/immutables#84 <https://github.com/MagicStack/immutables/issues/84>`_
5+
for more details.

Python/hamt.c

Lines changed: 11 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -406,14 +406,22 @@ hamt_hash(PyObject *o)
406406
return -1;
407407
}
408408

409-
/* While it's suboptimal to reduce Python's 64 bit hash to
409+
/* While it's somewhat suboptimal to reduce Python's 64 bit hash to
410410
32 bits via XOR, it seems that the resulting hash function
411411
is good enough (this is also how Long type is hashed in Java.)
412412
Storing 10, 100, 1000 Python strings results in a relatively
413413
shallow and uniform tree structure.
414414
415-
Please don't change this hashing algorithm, as there are many
416-
tests that test some exact tree shape to cover all code paths.
415+
Also it's worth noting that it would be possible to adapt the tree
416+
structure to 64 bit hashes, but that would increase memory pressure
417+
and provide little to no performance benefits for collections with
418+
fewer than billions of key/value pairs.
419+
420+
Important: do not change this hash reducing function. There are many
421+
tests that need an exact tree shape to cover all code paths and
422+
we do that by specifying concrete values for test data's `__hash__`.
423+
If this function is changed most of the regression tests would
424+
become useless.
417425
*/
418426
int32_t xored = (int32_t)(hash & 0xffffffffl) ^ (int32_t)(hash >> 32);
419427
return xored == -1 ? -2 : xored;

0 commit comments

Comments
 (0)
0