[go: up one dir, main page]

0% found this document useful (0 votes)
96 views17 pages

Interview

tests
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
96 views17 pages

Interview

tests
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 17

Hashing :How HashMap Works In Java Or

How Get() Method Works Internally


One of the most darling question of the core java interviewers is How
HashMap works in java or internal implementation of HashMap. Most of the
candidates rejection chances increases if the candidate do not give the
satisfactory explanation . This question shows that candidate has good
knowledge of Collection . So this question should be in your to do list before
appearing for the interview. I have updated the article with deep explanation of
java 8 changes, i.e, how HashMap works in java 8.

Read also How Hashset works in java or How it ensures uniqueness in java
Java Interview questions for experienced

How HashMap works in Java

HashMap works on the principle of Hashing . To understand Hashing , we


should understand the three terms first i.e Hash Function , Hash Value and
Bucket .

What is Hash Function , Hash Value and Bucket ?

hashCode() function which returns an integer value is the Hash function.


The important point to note that , this method is present in Object class
( Mother of all class ) .

This is the code for the hash function(also known as hashCode method) in
Object Class :

public native int hashCode();

The most important point to note from the above line : hashCode method
return int value .
So the Hash value is the int value returned by the hash function .

So summarize the terms in the diagram below :


What is bucket ?
A bucket is used to store key value pairs . A bucket can have multiple key-
value pairs . In hash map, bucket used simple linked list to store objects .

After understanding the terms we are ready to move next step , How
HashMap works in java or How get() works internally in java .

Code inside Java Api (HashMap class internal implementation) for


HashMap get(Obejct key) method

1. Public V get(Object key)


{
2. if (key ==null)
3. //Some code

4. int hash = hash(key.hashCode());


5. // if key found in hash table then return value
6. // else return null
}

Hash map works on the principle of hashing

HashMap get(Key k) method calls hashCode method on the key object and
applies returned hashValue to its own static hash function to find a bucket
location(backing array) where keys and values are stored in form of a nested
class called Entry (Map.Entry) . So you have concluded that from the
previous line that Both key and value is stored in the bucket as a form of
Entry object . So thinking that Only value is stored in the bucket is not
correct and will not give a good impression on the interviewer .

* Whenever we call get( Key k ) method on the HashMap object . First it


checks that whether key is null or not . Note that there can only be one null
key in HashMap .

If key is null , then Null keys always map to hash 0, thus index 0.

If key is not null then , it will call hashfunction on the key object , see line 4 in
above method i.e. key.hashCode() ,so after key.hashCode() returns
hashValue , line 4 looks like

4. int hash = hash(hashValue)

, and now ,it applies returned hashValue into its own hashing function .

We might wonder why we are calculating the hashvalue again using


hash(hashValue). Answer is ,It defends against poor quality hash functions.

Now step 4 final hashvalue is used to find the bucket location at which the
Entry object is stored . Entry object stores in the bucket like this
(hash,key,value,bucketindex) .

Interviewer: What if when two different keys have the same


hashcode ?

Solution, equals() method comes to rescue.Here candidate gets puzzled.


Since bucket is one and we have two objects with the same
hashcode .Candidate usually forgets that bucket is a simple linked list.

The bucket is the linked list effectively . Its not a LinkedList as in a


java.util.LinkedList - It's a separate (simpler) implementation just for the
map .

So we traverse through linked list , comparing keys in each entries using


keys.equals() until it return true. Then the corresponding entry object Value is
returned .

One of our readers Jammy asked a very good question

When the functions 'equals' traverses through the linked list does it traverses
from start to end one by one...in other words brute method. Or the linked list is
sorted based on key and then it traverses?

Answer is when an element is added/retrieved, same procedure follows:


a. Using key.hashCode() [ see above step 4],determine initial hashvalue for
the key

b. Pass intial hashvalue as hashValue in hash(hashValue) function, to


calculate the final hashvalue.

c. Final hash value is then passed as a first parameter in the


indexFor(int ,int )method .
The second parameter is length which is a constant in HashMap Java Api ,
represented by DEFAULT_INITIAL_CAPACITY

The default value of DEFAULT_INITIAL_CAPACITY is 16 in HashMap


Java Api .

indexFor(int,int) method returns the first entry in the appropriate bucket. The
linked list in the bucket is then iterated over - (the end is found and the
element is added or the key is matched and the value is returned )

Explanation about indexFor(int,int) is below :

/**
* Returns index for hash code h.
*/
static int indexFor(int h, int length) {
return h & (length-1);
}

The above function indexFor() works because Java HashMaps always have a
capacity, i.e. number of buckets, as a power of 2.
Let's work with a capacity of 256,which is 0x100, but it could work with any
power of 2. Subtracting 1
from a power of 2 yields the exact bit mask needed to bitwise-and with the
hash to get the proper bucket index, of range 0 to length - 1.
256 - 1 = 255
0x100 - 0x1 = 0xFF
E.g. a hash of 257 (0x101) gets bitwise-anded with 0xFF to yield a bucket
number of 1.

Interviewer: What if when two keys are same and have the same
hashcode ?
If key needs to be inserted and already inserted hashkey's hashcodes are
same, and keys are also same(via reference or using equals() method) then
override the previous key value pair with the current key value pair.

The other important point to note is that in Map ,Any class(String etc.) can
serve as a key if and only if it overrides the equals() and hashCode()
method .

Interviewer: How will you measure the performance of HashMap?

According to Oracle Java docs,

An instance of HashMap has two parameters that affect its performance: initial
capacity and load factor.

The capacity is the number of buckets in the hash table( HashMap class is roughly
equivalent to Hashtable, except that it is unsynchronized and permits nulls.), and the
initial capacity is simply the capacity at the time the hash table is created.

The load factor is a measure of how full the hash table is allowed to get before its
capacity is automatically increased. When the number of entries in the hash table
exceeds the product of the load factor and the current capacity, the hash table is
rehashed (that is, internal data structures are rebuilt) so that the hash table has
approximately twice the number of buckets.

In HashMap class, the default value of load factor is (.75) .

Interviewer : What is the time complexity of Hashmap get() and put()


method ?
The hashmap implementation provides constant time performance for (get
and put) basic operations
i.e the complexity of get() and put() is O(1) , assuming the hash function
disperses the elements properly among the buckets.

Interviewer : What is the difference between HashMap and


ConcurrentHashMap ?

It is also one of the popular interview question on HashMap , you can find the
answer here
HashMap vs ConcurrentHashMap

How HashMap works in Java 8


In java 8 there is a lot of changes in the inner representation of HashMap. The
implementation of HashMap went from 1k lines of code in java 7 to 2k lines of
code in java 8. In java 8, Node class contains the exact same information as
the Entry class i.e Node class contains ( hash , key, value, bucketindex).
Here is the implementation of the Node class in java 8.

static class Node<K,V> implements Map.Entry<K,V> {

final int hash;


final K key;
V value;
Node<K,V> next;

Node(int hash, K key, V value, Node<K,V> next) {


this.hash = hash;
this.key = key;
this.value = value;
this.next = next;
}

What’s the big difference from java 7 ?

TreeNode class extends Node through LInkedHashMap.Entry<K,V>. In other


words ,
TreeNode indirectly extends Node class. A TreeNode internally implements
Red-Black tree structure. It stores more information than Node class so that it
can perform get(),add() or delete() operations in O(log(n)). As we know Node
class contains (hash,key,value,bucketindex) where as TreeNode class
contains following list of data.

static final class TreeNode<K,V> extends LinkedHashMap.Entry<K,V> {

TreeNode<K,V> parent; // red-black tree links


TreeNode<K,V> left;
TreeNode<K,V> right;
TreeNode<K,V> prev; // needed to unlink next upon deletion
boolean red;

TreeNode(int hash, K key, V val, Node<K,V> next) {


super(hash, key, val, next);
}

What are Red Black Trees and Why they are used?

According to Wikipedia,
Red-Black trees are self-balancing binary search trees. Red-black tree makes
sure that the length of the binary search trees is always log(n) despite new
addition or removal of nodes. The main advantage of using Red-black tree
structure is in a case where many entries are in the same bucket. For search
operation,in java 7,it will take O(n) with a linked list. While in java 8 , the same
search operation in a tree will cost O(log(n)).

Drawbacks : Tree really takes more space than the linked list.

By Inheritance, bucket can contain both Node(Entry object) and


TreeNode(Red-black tree).

Oracle java developers decided to use both data structures and following
rules are applied.

1. If for a given bucket , there are more than 8 Nodes, the linked list is
converted into a
red-black tree. This is represented by the following code in HashMap class :
static final int TREEIFY_THRESHOLD = 8;

2. If for a given bucket , there are less than 6 nodes, then the tree can be
converted
into a linkedlist.

static final int UNTREEIFY_THRESHOLD = 6;

The below Java 8 HashMap image shows both trees(at bucket 0) and
linkedlists (at bucket 1,2 and 3). Bucket 0 is a Tree because it contains at
least 8 nodes.

Performance Issue in Java 8

In the best case scenario, put() and get() operations have a O(1) time
complexity. But if you do not provide efficient hash function of the key , then
you might end up with very slow get() and put() operations.

The good performance of the get() and put() operations depend on the
repartition of the data into the different indexes of the bucket.
If the hash function of your key is poorly designed, then you will have a skew
repartition (capacity of the bucket becomes irrelevant). All the get() and put()
operations that use the biggest linked lists of entry will be really slow. It is due
to the reason as the get() and put() operation need to iterate the entire lists. In
the worst case scenario (when hash function of the key is poorly designed and
all the entry objects are in the same buckets), you would end up with O(n)
time complexity.

Below are the images showing skewed HashMap and well balanced
HashMap. In the case of skewed HashMap . the put() and get() operations on
the bucket 0 are costlier. Getting the Entry F will cost 5 iterations.

In the case of well balanced HashMap , getting the Entry F will cost 2
iterations.
Interestingly, both above HashMaps store the same amount of data and have
the same bucket size.

Why there is difference between skewedHashMap and well balanced


HashMap entries
The only difference is the hash function of the key that distributes the Entries
in the buckets.

Testing : Performance of HashMap Java 7 vs Java 8 :

1. get() method with proper hashCode() logic

1. get() method with poorly designed hashCode() logic


When using HashMap, your goal is to write a hash function for your keys that
spreads the keys into most number of possible buckets. To do that, you need
to avoid hash collisions.

Memory Overhead in Java 8 and Java 7

With the java 8 implementation of HashMap, it is quite difficult to get the


memory usage because a Node object can contain the same data as an Entry
object or same data plus a boolean and 4 more references (if it’s TreeNode).
If all the nodes are Nodes then memory overhead in Java 8 will be same as
the Java 7 HashMap.
The worst case scenario , if all the nodes in the HashMap are TreeNodes ,
then the memory overhead of a Java 8 HashMap becomes :

N * sizeOf(integer) + N * sizeOf(boolean) + sizeOf(reference)* (9*N+CAPACITY)

Resizing overhead in Java 8 and Java 7

If you need to store large amount of data into the HashMap then you should
create a
HashMap with an initial capacity close to your expected volume.

If you missed that , the HashMap will take the default size of 16 and load
factor of 0.75. The first 11 put() operations will be fast but the 12th (16*0.75)
will resize the bucket with a new capacity of 32. The 13th to 23rd will be fast
but 24th (0.75 * 32) will again recreate a costly new representation that
doubles the initial size of the bucket. The initial resizing operation will appear
at 48th , 96th ,192nd , 384th call of put() operation.
At low volume of data the full recreation of the bucket is fast but at high
volume of data it can take seconds to minutes.

Note : By setting the expected size initially, you can avoid these costly
operations.

Drawback : If you initialize HashMap with size 2^32 but you are using only
2^29 buckets then you will waste a lot of memory.

This is the most popular question in the java developer role. You should at
least familiar with How HashMap works in java. For performance wise you
will not see the difference between a O(1) and O(n) or O(log(n)). During
performance testing, it becomes important to know how HashMap works and
to understand the importance of the hash function of the key. Please mention
in comments if you have any questions.

How TreeMap Works In Java : 10 TreeMap


Java Interview Questions
One of the commonly asked Java Developer Interview Questions is How
TreeMap works in java or internal implementation of TreeMap. The problem
with the candidate that he/she knows what TreeMap/TreeSet do but not
How ? We have already discussed other popular java developer interview
questions like how hashmap works in java , how hashset works in java , what
is the difference between hashmap and hashtable . So let us discuss the
working of TreeMap.

What is a Tree Map ?

Treemap class is like HashMap which stores key- value pairs . The major
difference is that Treemap sorts
the key in ascending order.

According to Java doc :

Treemap is sorted according to the natural ordering of its keys, or by a


Comparator provided at map creation time, depending on which constructor is
used.
This implementation provides guaranteed log(n) time cost for the containsKey,
get, put and remove operations. Algorithms are adaptations of those in
Cormen, Leiserson, and Rivest's Introduction to Algorithms.

How TreeMap works in java ?

TreeMap is a Red-Black tree based NavigableMap implementation.In other


words , it sorts the TreeMap object keys using Red-Black tree algorithm.

So we learned that TreeMap uses Red Black tree algorithm internally to


sort the elements.

Red Black algorithm is a complex algorithm . We should read the pseudo


code of Red Black algorithm in order to understand the internal
implementation .

Red Black tree has the following properties :

1. As the name of the algorithm suggests ,color of every node in the tree is
either red or black.

2. Root node must be Black in color.

3. Red node can not have a red color neighbor node.

4. All paths from root node to the null should consist the same number of
black nodes .

Rotation in Red Black Tree :


Rotations maintains the inorder ordering of the keys(x,y,z).
A rotation can be maintained in O(1) time.

You can find more about the red black tree algorithm here

Interviewer : Why and when we use TreeMap ?

We need TreeMap to get the sorted list of keys in ascending order.

Interviewer : What is the runtime performance of the get() method in


TreeMap and HashMap ,where n represents the number of elements ?

According to TreeMap Java doc,

TreeMap implementation provides guaranteed log(n) time cost for the


containsKey,get,put and remove operations.

According to HashMap Java doc :

HashMap implementation provides constant-time performance for the basic


operations (get and put), assuming the hash function disperses the elements
properly among the buckets.
One liner : TreeMap : log(n) HashMap : Constant time performance
assuming elements disperses properly

Interviewer : What is "natural ordering" in TreeMap ?

"Natural" ordering is the ordering implied by the implementation of the


Comparable interface by the objects used as keys in the TreeMap.
Essentially, RBTree must be able to tell which key is smaller than the other
key, and there are two ways to supply that logic to the RBTree
implementation:

1.Implement Comparable interface in the class(es) used as keys to TreeMap,


or
2.Supply an implementation of the Comparator that would do comparing
outside the key class itself.

Natural ordering is the order provided by the Comparable interface .If


somebody puts the key that do not implement natural order then it will throw
ClassCastException.

Interviewer : Why do we need TreeMap when we have sortedMap ?

sortedMap is a interface and TreeMap is the class implementing it .As we


know one can not create objects of the interface . Interface tells us which
methods a sortedMap implementation should provide .TreeMap is such an
implementation.

Interviewer : Which data structure you will prefer in your code :


HashMap or TreeMap ?

HashMap is faster while TreeMap is sorted .Thus we choose them according


to their advantage.

If you do not want to sort the elements but just to insert and retrieve the
elements then use HashMap .

But if you want to maintain the order of the elements then TreeMap should be
preferred because the result is alphabetically sorted .While iterating HashMap
there is no ordering of the elements ,on the other hand , TreeMap iterates in
the natural key order.

Interviewer : What happens if the TreeMap is concurrently modified


while iterating the elements ?

The iterator fails fast and quickly if structurally modified at any time after the
iterator is created (in any way except through the iterator's own remove
method ). We already discussed the difference between Fail-fast and Fail safe
iterators .

Interviewer : Which copy technique (deep or shallow ) is used by the


TreeMap clone() method ?

According to docjar , clone() method returns the shallow copy of the TreeMap
instance . In shallow copy object B points to object A location in memory . In
other words , both object A and B are sharing the same elements .The keys
and values themselves are not cloned .

Interviewer : Why java's treemap does not allow an initial size ?

HashMap reallocates its internals as the new one gets inserted while TreeMap
does not reallocate nodes on adding new ones. Thus , the size of the
TreeMap dynamically increases if needed , without shuffling the internals. So
it is meaningless to set the initial size of the TreeMap .

Read Also : Difference between HashMap and ConcurrentHashMap

Please mention in comments if you know any other good java interview
questions .

You might also like