-
Notifications
You must be signed in to change notification settings - Fork 2
Description
We'll need a hash table and hash set for sparse use cases where our existing dense equivalents are not reasonable to use.
For these data structures we'll need a hash function (sometimes multiple) as the underlying function powering them.
hash function
This is a huge complicated space of non-cryptographic hash functions; a very simple way to get good results is tabulation hashing.
- https://en.wikipedia.org/wiki/Tabulation_hashing
- https://arxiv.org/abs/1011.5200
- https://arxiv.org/pdf/1311.3121
This could be a simple way for us to get started.
It works like this
- for a int32 hash function, create four tables with 2^8 random values
- for the four bytes in the int32 type, index the tables
- xor the four table entries together
Note: the tables are filled with random values. The tables are not filled with consecutive values starting from zero and then randomly shuffled. This subtle difference makes a huge difference in the theoretical aspects of the papers linked above and the hash function's quality.
Pros
- simple to implement
- simple to reason about
- good enough especially for cuckoo hashing (see wikipedia for a start)
Cons
- the tables use up quite a lot of space especially if we want to extend it from int32 to int64 data types
- it's not the fastest in terms of throughput
Overall seems like a good pragmatic solution.
hash combine
Combining hash values is not straight forward, that's why we should also look into hash combine / hash mix functions.
The boost folks had to iterate many many times on their hash combine, maybe there is something we can learn from them
- https://www.boost.org/doc/libs/1_87_0/libs/container_hash/doc/html/hash.html#notes_hash_combine
- https://www.boost.org/doc/libs/1_87_0/libs/container_hash/doc/html/hash.html#combine
- https://www.boost.org/doc/libs/1_87_0/libs/container_hash/doc/html/hash.html#ref_hash_combine
- https://www.boost.org/doc/libs/1_87_0/boost/intrusive/detail/hash_combine.hpp
- https://www.boost.org/doc/libs/1_87_0/boost/intrusive/detail/hash_mix.hpp
cc @ucyo