[go: up one dir, main page]

0% found this document useful (0 votes)
20 views32 pages

9.02.hash Functions

This document discusses hash functions, which map objects to integer values in a deterministic way. It outlines different types of hash functions including predetermined hashes that assign unique values to each object, and arithmetic hashes that calculate a value based on an object's properties. As an example, it describes using arithmetic operations like addition and multiplication of numerator and denominator to hash rational numbers in a way that balances uniqueness and efficiency. The goal of hashing is to distribute objects evenly across possible values while maintaining determinism.

Uploaded by

rizi007
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views32 pages

9.02.hash Functions

This document discusses hash functions, which map objects to integer values in a deterministic way. It outlines different types of hash functions including predetermined hashes that assign unique values to each object, and arithmetic hashes that calculate a value based on an object's properties. As an example, it describes using arithmetic operations like addition and multiplication of numerator and denominator to hash rational numbers in a way that balances uniqueness and efficiency. The goal of hashing is to distribute objects evenly across possible values while maintaining determinism.

Uploaded by

rizi007
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 32

ECE 250 Algorithms and Data Structures

Hash functions

Douglas Wilhelm Harder, M.Math. LEL


Department of Electrical and Computer Engineering
University of Waterloo
Waterloo, Ontario, Canada

ece.uwaterloo.ca
dwharder@alumni.uwaterloo.ca

© 2006-2013 by Douglas Wilhelm Harder. Some rights reserved.


Hash functions
2

Outline

In this talk, we will discuss


– Finding 32-bit hash values using:
• Predetermined hash values
– Auto-incremented hash values
– Address-based hash values
• Arithmetic hash values
– Example: strings
Hash functions
3
9.2 Definitions

What is a hash of an object?


From Merriam-Webster:
a restatement of something that is already known

The ultimate goal is to map onto an integer range


0, 1, 2, ..., M – 1
Hash functions
4
9.2.1 The hash process
Object
We will look at the first problem
– Hashing an object into a
32-bit integer
– Subsequent topics 32-bit integer
will examine the next
Modulus, mid-square,
steps
multiplicative, Fibonacci
Map to an index 0, ..., M – 1

Deal with collisions Chained hash tables


Open addressing

Linear probing
Quadratic probing
Double hashing
Hash functions
5
9.2.2 Properties

Necessary properties of such a hash function h are:


1a. Should be fast: ideally Q(1)
1b. The hash value must be deterministic
• It must always return the same 32-bit integer each time
1c. Equal objects hash to equal values
• x = y ⇒ h(x) = h(y)
1d. If two objects are randomly chosen, there should be only a
one-in-
232 chance that they have the same hash value
Hash functions
6
9.2.3 Types of hash functions

We will look at two classes of hash functions


– Predetermined hash functions (explicit)
– Arithmetic hash functions (implicit)
Hash functions
7
9.2.4 Predetermined hash functions

The easiest solution is to give each object a unique number

class Class_name {
private:
unsigned int hash_value; // int: –231, ..., 231 – 1
// unsigned int: 0, ..., 232 – 1
public:
Class_name();
unsigned int hash() const;
}; Class_name::Class_name() {
hash_value = ???;
}

unsigned int Class_name::hash() const {


return hash_value;
}
Hash functions
8
9.2.4 Predetermined hash functions

For example, an auto-incremented static member variable

class Class_name {
private:
unsigned int hash_value;
static unsigned int hash_count;
public:
Class_name();
unsigned int hash() const;
}; Class_name::Class_name() {
hash_value = hash_count;
++hash_count;
unsigned int Class_name::hash_count = 0; }

unsigned int Class_name::hash() const {


return hash_value;
}
Hash functions
9
9.2.4 Predetermined hash functions

Examples: All UW co-op student have two hash values:


– UW Student ID Number
– Social Insurance Number

Any 9-digit-decimal integer yields a 32-bit integer


lg( 109 ) = 29.897
Hash functions
10
9.2.4 Predetermined hash functions

If we only need the hash value while the object exists in memory,
use the address:

unsigned int Class_name::hash() const {


return reinterpret_cast<unsigned int>( this );
}

This fails if an object may be stored in secondary memory


– It will have a different address the next time it is loaded
Hash functions
11
9.2.4.1 Predetermined hash functions

Predetermined hash values give each object a unique hash value

This is not always appropriate:


– Objects which are conceptually equal:
Rational x(1, 2);
Rational y(3, 6);
– Strings with the same characters:
string str1 = "Hello world!";
string str2 = "Hello world!";

These hash values must depend on the member variables


– Usually this uses arithmetic functions
Hash functions
12
9.2.5 Arithmetic Hash Values

An arithmetic hash value is a deterministic function that is calculated


from the relevant member variables of an object

We will look at arithmetic hash functions for:


– Rational numbers, and
– Strings
Hash functions
13
9.2.5.1 Rational number class

What if we just add the numerator and denominator?


class Rational {
private:
int numer, denom;
public:
Rational( int, int );
};

unsigned int Rational::hash() const {


return static_cast<unsigned int>( numer ) +
static_cast<unsigned int>( denom );
}
Hash functions
14
9.2.5.1 Rational number class

We could improve on this: multiply the denominator by a large


prime:
class Rational {
private:
int numer, denom;
public:
Rational( int, int );
};

unsigned int Rational::hash() const {


return static_cast<unsigned int>( numer ) +
429496751*static_cast<unsigned int>( denom );
}
Hash functions
15
9.2.5.1 Rational number class

For example, the output of


int main() {
cout << Rational( 0, 1 ).hash() << endl;
cout << Rational( 1, 2 ).hash() << endl;
cout << Rational( 2, 3 ).hash() << endl;
cout << Rational( 99, 100 ).hash() << endl;

return 0;
}
is
429496751
858993503
1288490255
2239
http://xkcd.com/571/
Recall that arithmetic operations wrap on overflow
Hash functions
16
9.2.5.1 Rational number class

This hash function does not generate unique values


– The following pairs have the same hash values:
0/1 1327433019/800977868
1/2 534326814/1480277007
2/3 820039962/1486995867
– Finding rational numbers with matching hash values is very difficult:
– Finding these required the generation of 1 500 000 000 random rational
numbers
– It is fast: Q(1)
– It does produce an even distribution
Hash functions
17
9.2.5.1 Rational number class

Problem:
– The rational numbers 1/2 and 2/4 have different values
– The output of
int main() {
cout << Rational( 1, 2 ).hash();
cout << Rational( 2, 4 ).hash();
return 0;
}
is
858993503
1717987006
Hash functions
18
9.2.5.1 Rational number class

Solution: divide through by the greatest common divisor


Rational::Rational( int a, int b ):numer(a), denom(b) {
int divisor = gcd( numer, denom );
numer /= divisor;
denom /= divisor;
}
int gcd( int a, int b) {
while( true ) {
if ( a == 0 ) {
return (b >= 0) ? b : -b;
}

b %= a;

if ( b == 0 ) {
return (a >= 0) ? a : -a;
}
a %= b;
}
}
Hash functions
19
9.2.5.1 Rational number class

Problem: 1 1
– The rational numbers and have different values
– The output of 2 2
int main() {
cout << Rational( 1, 2 ).hash();
cout << Rational( -1, -2 ).hash();
return 0;
}
is
858993503
3435973793
Hash functions
20
9.2.5.1 Rational number class

Solution: define a normal form


– Require that the denominator is positive

Rational::Rational( int a, int b ):numer(a), denom(b) {


int divisor = gcd( numer, denom );
divisor = (denom >= 0) ? divisor : -divisor;
numer /= divisor;
denom /= divisor;
}
Hash functions
21
9.2.5.3 String class

Two strings are equal if all the characters are equal and in the
identical order

A string is simply an array of bytes:


– Each byte stores a value from 0 to 255

Any hash function must be a function of these bytes


Hash functions
22
9.2.5.3.1 String class

We could, for example, just add the characters:

unsigned int hash( const string &str ) {


unsigned int hash_vaalue = 0;

for ( int k = 0; k < str.length(); ++k ) {


hash_value += str[k];
}

return hash_value;
}
Hash functions
23
9.2.5.3.1 String class

Not very good:


– Slow run time: Q(n)
– Words with the same characters hash to the same code:
• "form" and "from"
– A poor distribution, e.g., all words in MobyTM Words II by Grady Ward
(single.txt) Project Gutenberg):
Hash functions
24
9.2.5.3.2 String class

Let the individual characters represent the coefficients of a


polynomial in x:
p(x) = c0 xn – 1 + c1 xn – 2 + ··· + cn – 3 x2 + cn – 2 x + cn – 1
Use Horner’s rule to evaluate this polynomial at a prime number,
e.g., x = 12347:
unsigned int hash( string const &str ) {
unsigned int hash_value = 0;

for ( int k = 0; k < str.length(); ++k ) {


hash_value = 12347*hash_value + str[k];
}

return hash_value;
}
Hash functions
25
9.2.5.3.2 String class

Is this hash function actually better?

Suppose I pick n random integers from 1 to L


– One would expect each integer to appear l = n/L times
– Some, however, will appear more often, others less often

To test whether or not the integers are random, we will as:


“How many (what proportion of) integers were chosen k times?”
Hash functions
26
9.2.5.3.2 String class

Consider the hash of each of the 354985 strings in single.txt to


be a random value in 0, 1, 2, 3, …, 232 – 1
– Subdivide the integers into groups of approximately 12099
– We expect one hash value per interval
– Count the number of these subintervals which contain 0, 1, 2, ... of
these hash values
– Plotting these proportions and 1/en!, we see they’re very similar

Proportion of intervals with n hash values


Poisson distribution with l = 1
Hash functions
27
9.2.5.3.3 String class

Problem, Horner’s rule runs in Q(n)


"A Elbereth Gilthoniel,\n
Silivren penna miriel\n
O menal aglar elenath!\n
Na-chaered palan-diriel\n
O galadhremmin ennorath,\n
Fanuilos, le linnathon\n
nef aear, si nef aearon!"

Suggestions? J.R.R. Tolkien


Hash functions
28
9.2.5.3.3 String class

Use characters in locations 2k – 1 for k = 0, 1, 2, ...:


"A_Elbereth Gilthoniel,\n
Silivren_penna miriel\n
O menal aglar elenath!\n
Na-chaered palan-diriel\n
O galadhremmin ennorath,\n
Fanuilos, le linnathon\n
nef aear, si nef aearon!"

J.R.R. Tolkien
Hash functions
29
9.2.5.3.3 String class

The run time is now Q(ln(n)) :

unsigned int hash( const string &str ) {


unsigned int hash_value = 0;

for ( int k = 1; k <= str.length(); k *= 2 ) {


hash_value = 12347*hash_value + str[k – 1];
}

return hash_value;
}

Note: this cannot be used if you require a cryptographic hash


function or message digest
Hash functions
30
9.2.5.3.3 Arithmetic hash functions

In general, any member variables that are used to uniquely define


an object may be used as coefficients in such a polynomial
– The salary hopefully changes over time…

class Person {
string surname;
string *given_names;
unsigned char num_given_names;
unsigned short birth_year;
unsigned char birth_month;
unsigned char birth_day;
unsigned int salary;
// ...
};
Hash functions
31

Summary

We have seen how a number of objects can be mapped onto a 32-


bit integer

We considered
– Predetermined hash functions
• Auto-incremented variables
• Addresses
– Hash functions calculated using arithmetic

Next: map a 32-bit integer onto a smaller range 0, 1, ..., M – 1


Hash functions
32

References

Wikipedia, http://en.wikipedia.org/wiki/Hash_function

[1] Cormen, Leiserson, and Rivest, Introduction to Algorithms, McGraw Hill, 1990.
[2] Weiss, Data Structures and Algorithm Analysis in C++, 3 rd Ed., Addison Wesley.

These slides are provided for the ECE 250 Algorithms and Data Structures course. The
material in it reflects Douglas W. Harder’s best judgment in light of the information available to
him at the time of preparation. Any reliance on these course slides by any party for any other
purpose are the responsibility of such parties. Douglas W. Harder accepts no responsibility for
damages, if any, suffered by any party as a result of decisions made or actions based on these
course slides for any other purpose than that for which it was intended.

You might also like