[go: up one dir, main page]

0% found this document useful (0 votes)
51 views27 pages

Ctec2909 Data Structures and Algorithms: Lecture Week 3 Friday Hash Maps

The document discusses hash maps, which store information with keys. A hash function maps keys to locations in the hash map to allow for fast insertion, deletion, and searching. Collisions can occur if multiple keys map to the same location. Collision resolution techniques include open addressing, which finds another location, and separate chaining, which stores multiple items in a linked list at each location. The efficiency depends on the load factor, which should be less than 2/3.

Uploaded by

rabbit
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
51 views27 pages

Ctec2909 Data Structures and Algorithms: Lecture Week 3 Friday Hash Maps

The document discusses hash maps, which store information with keys. A hash function maps keys to locations in the hash map to allow for fast insertion, deletion, and searching. Collisions can occur if multiple keys map to the same location. Collision resolution techniques include open addressing, which finds another location, and separate chaining, which stores multiple items in a linked list at each location. The efficiency depends on the load factor, which should be less than 2/3.

Uploaded by

rabbit
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 27

CTEC2909 DATA STRUCTURES AND

ALGORITHMS
Lecture Week 3 Friday Hash Maps
Hash Maps
A hash map (or hash table) stores information with keys.

The key is used to find the location of the item in the hash map.

Robert Culp
ID: 121535
Ingrid Bergman
ID: 685865

Ricardo Montalban
ID: 34637
Hash Maps
The aim of hash maps is to get O(1) worst-case performance
for insertion, deletion and searching.
insertion: calculate the address for the given key
store the item at that address
deletion: calculate the address for the given key
delete the item at that address

search: calculate the address for the given key


return the item at that address
Hash Maps
insert(Key key, Data item){ find(Key key){
int i = findAddress(key); int i = findAddress(key);
store item+key at map[i]; if (key at map[i] == key){
} return item at map[i];
}
return null;
delete(Key key){ }
int i = findAddress(key);
if (key at map[i] == key){
delete map[i];
}
}
Hash Functions
A hash function is used to map keys to locations.

Robert Culp

ID: 121535 Ingrid Bergman

Hash
ID: 685865 function
Ricardo Montalban
ID: 34637
Hash Functions
How does the hash function work?
It could be done by having a unique location for every single key.
However, this could require a large amount of space, e.g. for ID
numbers ranging from 111111 to 999999.
In most cases, there will be much fewer data items than the total
possible, e.g. only 100 people with ID numbers in the above range.
Solution: Use a smaller size hash map and map the keys into the
smaller range.
Hash Functions
A perfect hash function maps every key into a unique location, but this
is only possible if all the keys that occur in this set are known in advance.
Otherwise, collisions could occur:
Robert Culp

ID: 121535 Ingrid Bergman

Hash
ID: 685865
function Peter Falk
Ricardo Montalban
ID: 46363

ID: 34637
Hash Functions
Some simple hash functions for integer keys:
 Selecting particular digits – e.g. select the 3rd and 7th digit
– depending on the data, this might not evenly distribute the items,
e.g. for phone numbers where some digits are area codes.
 Folding digits – add all the digits, e.g. 2523 = 2 + 5 + 2 + 3 = 12
– or to get larger values, group and then add, e.g. 25 + 23 = 48.
 Mod function – number mod size, where size is the size of the hash
map, e.g. 482567 mod 10 = 7.
– Using prime number sizes distributes the items more evenly, e.g. 101
instead of 100.
Hash Functions
Hash functions for non-integers:
 For strings: Change each character into its ASCII value.
- Adding the values will mean “abcd” and “dacb” get mapped to the same
location.
- Alternative: change into binary and concatenate, e.g. N = 14 = 01110 and
T = 20 = 10100, so NT = 0111010100. Then convert back to decimal
(468) and use mod.
- Another way: 14 * 321 + 20 * 320 = 468. (Horner’s Rule)
Collisions
What should be done if collisions occur?
- can’t just not add the item since collisions can occur even if the map has
only 1 item!
Robert Culp

ID: 121535 Ingrid Bergman

Hash
ID: 685865
function Peter Falk Ricardo Montalban
ID: 46363

ID: 34637
Collision Resolution
Two options:
 Find another space in the hash map.
 Allow the hash map to store multiple items at each location.

Option 1: Find another space in the hash map.

This is called Open Addressing. If the location is taken, look


for another location.
- Remember that it has to be easy to find it again!
Collision Resolution – Open Addressing
Option 1: Linear Probing
Search the hash map locations sequentially starting from where it
should have been put.
e.g. if the key maps to position 2, and that location is full, try
position 3, then position 4, etc.
Key maps to here
Robert Culp
searching: start with the expected
location, then keep trying the next Ingrid Bergman
ones until an empty location is Item added here
reached or the item is found.
- if it is empty then the search
failed.
Collision Resolution – Open Addressing
Option 1: Linear Probing
deleting: start at the expected location and keep trying the next
ones, then delete.
Delete Peter Falk
Key maps to here
Robert Culp Robert Culp

Ingrid Bergman Ingrid Bergman


Item found here
Peter Falk

Ricardo Montalban Ricardo Montalban


Collision Resolution – Open Addressing
Option 1: Linear Probing
Now suppose that Stephanie Zimbalist was supposed to be in
position 2. How would a search for Stephanie’s key work?
Solution: have states full,
Robert Culp empty, deleted – search keeps
going if it reaches a deleted
Key maps to here
Ingrid Bergman slot
– insert uses empty or
deleted slots

Ricardo Montalban
Search stops here!
Stephanie Zimbalist
Collision Resolution
Linear probing can lead to clustering – lots of items next to each other.
– This keeps getting worse (large clusters get larger).
– It decreases performance as each search requires searching through
the clusters.
– called primary clustering.

Quadratic probing stops primary clustering.


Instead of trying the next space, try key+12 then key+22 then key+32 etc.
– can lead to secondary clustering – when two items hash to the same
location. This is not a problem.
Collision Resolution
Double hashing – Use one hash function for finding the initial location
and then another hash function to find the step size for finding the next
space.

e.g. hash function 1 is key mod 11 and hash function 2 is 7 – (key mod 7).
Key = 58 - therefore it maps to location 3 and if that is full, try steps of 5,
i.e. 3, 8, 2 (wrapped around), 7, etc.
Key = 14 - therefore it maps to location 3 and if that is full, try steps of 7,
i.e. 3, 10, 6, 2 (wrapped around), 9, etc.
So both keys map initially to the same place but then different places.
If more than one hash function is used it is called rehashing.
Collision Resolution
Option 2: Allowing several items to be stored at each location.

Buckets - Each location is an array.

Keys of Lisa and Anthony


Zach both map here
Lisa Zach

Problem: Could lead to wasted space if the arrays are too large,
or collisions again if the arrays are too small.
Collision Resolution
Option 2: Allowing several items to be stored at each location.
Separate Chaining - Each location is a linked list.

Keys of Lisa and Anthony


Zach both map here
Zach Lisa

Note the different order of Zach


and Lisa.
Now only the required amount of space is used.
Collision Resolution – Separate Chaining
Implementation of Separate Chaining:
public class Node{
private Object key;
private Object data;
private Node next;

public Node(Object key, Object data, Node next){


this.key = key;
this.data = data;
this.next = next;
}
// get and set methods.
}
Collision Resolution – Separate Chaining
Implementation of Separate Chaining:
void insert(Object key, Object item){
if (! map.isFull()){
int index = hashFunction(key);
Node n = new Node(key, item, map[index]);
map[index] = n;
} n
} map[index]

Lisa Zach Lisa


Collision Resolution – Separate Chaining
Implementation of Separate Chaining:
Object find(Object key){
int index = hashFunction(key);
Node n = map[index];
while((n != null) && (!n.getKey().equals(key)){
n = n.next();
} n
if (n != null){
return n.getData();
}
Zach Lisa
return null;
}
Hashing Efficiency
The efficiency of hash functions depends on the load factor of the hash
map: load factor = no. of items in the map / size of map
As the load factor increases, the number of collisions increases, so the
performance decreases.
Select a hash map size so that the load factor is less than 2/3 for the
estimated largest number of items.
Probing methods:
Unsuccessful searches take more time than successful searches.
Smaller hash maps can be used for quadratic probing/ double hashing
than for linear probing.
Hashing Efficiency
Separate Chaining:

Inserting is O(1) because it always adds to the start of the linked list.
Searching and deleting is O(n) where n is the size of the linked list, so
shorter lists are better.
Load factor = no. of items / no. of linked lists
The load factor is the average length of the linked lists.
In the worst case, all the items hash to the same location, so that linked
list contains all the items.
Java Hashmap
Map map = new HashMap();
Adding to the map: (using some int id and String name)
map.put(id, name);
Getting an item from the map: (using some int id)
String str = (String) map.get(id);
Also: containsKey; size; clear
Iterating through a map:
Set s = map.entrySet(); All Java objects have a hashcode.
Iterator it = s.iterator();
while(it.hasNext()){
currentEntry = (Map.Entry)it.next();
currentKey = (int)currentEntry.getKey();
currentItem = (String)currentEntry.getValue();
}
Java Hashmap
Example: A student grade system.
Map grades = new HashMap();
public void setGrade(int studentID, int grade){
grades.put(studentID, grade);
}

public int getGrade(int studentID){


return (int)grades.get(studentID);
}
Java Hashmap
public void printAllGrades(){
int student;
int grade;
Set s = map.entrySet();
Iterator it = s.iterator();
while(it.hasNext()){
currentEntry = (Map.Entry)it.next();
student = (int)currentEntry.getKey();
grade = (String)currentEntry.getValue();
System.out.println(“Student: ”+student+“Grade: ”+grade);
}
}
Java Hashmap
Example: A student record system.
Map studentInfo = new HashMap();
public void storeInfo(int studentID, String name, String address){
Student s = new Student(name, address);
studentInfo.put(studentID, s);
}
public int getName(int studentID){
Student s = (Student)studentInfo.get(studentID);
return s.getName();
}

You might also like