Interview
Interview
Read also How Hashset works in java or How it ensures uniqueness in java
Java Interview questions for experienced
This is the code for the hash function(also known as hashCode method) in
Object Class :
The most important point to note from the above line : hashCode method
return int value .
So the Hash value is the int value returned by the hash function .
After understanding the terms we are ready to move next step , How
HashMap works in java or How get() works internally in java .
HashMap get(Key k) method calls hashCode method on the key object and
applies returned hashValue to its own static hash function to find a bucket
location(backing array) where keys and values are stored in form of a nested
class called Entry (Map.Entry) . So you have concluded that from the
previous line that Both key and value is stored in the bucket as a form of
Entry object . So thinking that Only value is stored in the bucket is not
correct and will not give a good impression on the interviewer .
If key is null , then Null keys always map to hash 0, thus index 0.
If key is not null then , it will call hashfunction on the key object , see line 4 in
above method i.e. key.hashCode() ,so after key.hashCode() returns
hashValue , line 4 looks like
, and now ,it applies returned hashValue into its own hashing function .
Now step 4 final hashvalue is used to find the bucket location at which the
Entry object is stored . Entry object stores in the bucket like this
(hash,key,value,bucketindex) .
When the functions 'equals' traverses through the linked list does it traverses
from start to end one by one...in other words brute method. Or the linked list is
sorted based on key and then it traverses?
indexFor(int,int) method returns the first entry in the appropriate bucket. The
linked list in the bucket is then iterated over - (the end is found and the
element is added or the key is matched and the value is returned )
/**
* Returns index for hash code h.
*/
static int indexFor(int h, int length) {
return h & (length-1);
}
The above function indexFor() works because Java HashMaps always have a
capacity, i.e. number of buckets, as a power of 2.
Let's work with a capacity of 256,which is 0x100, but it could work with any
power of 2. Subtracting 1
from a power of 2 yields the exact bit mask needed to bitwise-and with the
hash to get the proper bucket index, of range 0 to length - 1.
256 - 1 = 255
0x100 - 0x1 = 0xFF
E.g. a hash of 257 (0x101) gets bitwise-anded with 0xFF to yield a bucket
number of 1.
Interviewer: What if when two keys are same and have the same
hashcode ?
If key needs to be inserted and already inserted hashkey's hashcodes are
same, and keys are also same(via reference or using equals() method) then
override the previous key value pair with the current key value pair.
The other important point to note is that in Map ,Any class(String etc.) can
serve as a key if and only if it overrides the equals() and hashCode()
method .
An instance of HashMap has two parameters that affect its performance: initial
capacity and load factor.
The capacity is the number of buckets in the hash table( HashMap class is roughly
equivalent to Hashtable, except that it is unsynchronized and permits nulls.), and the
initial capacity is simply the capacity at the time the hash table is created.
The load factor is a measure of how full the hash table is allowed to get before its
capacity is automatically increased. When the number of entries in the hash table
exceeds the product of the load factor and the current capacity, the hash table is
rehashed (that is, internal data structures are rebuilt) so that the hash table has
approximately twice the number of buckets.
It is also one of the popular interview question on HashMap , you can find the
answer here
HashMap vs ConcurrentHashMap
What are Red Black Trees and Why they are used?
According to Wikipedia,
Red-Black trees are self-balancing binary search trees. Red-black tree makes
sure that the length of the binary search trees is always log(n) despite new
addition or removal of nodes. The main advantage of using Red-black tree
structure is in a case where many entries are in the same bucket. For search
operation,in java 7,it will take O(n) with a linked list. While in java 8 , the same
search operation in a tree will cost O(log(n)).
Drawbacks : Tree really takes more space than the linked list.
Oracle java developers decided to use both data structures and following
rules are applied.
1. If for a given bucket , there are more than 8 Nodes, the linked list is
converted into a
red-black tree. This is represented by the following code in HashMap class :
static final int TREEIFY_THRESHOLD = 8;
2. If for a given bucket , there are less than 6 nodes, then the tree can be
converted
into a linkedlist.
The below Java 8 HashMap image shows both trees(at bucket 0) and
linkedlists (at bucket 1,2 and 3). Bucket 0 is a Tree because it contains at
least 8 nodes.
In the best case scenario, put() and get() operations have a O(1) time
complexity. But if you do not provide efficient hash function of the key , then
you might end up with very slow get() and put() operations.
The good performance of the get() and put() operations depend on the
repartition of the data into the different indexes of the bucket.
If the hash function of your key is poorly designed, then you will have a skew
repartition (capacity of the bucket becomes irrelevant). All the get() and put()
operations that use the biggest linked lists of entry will be really slow. It is due
to the reason as the get() and put() operation need to iterate the entire lists. In
the worst case scenario (when hash function of the key is poorly designed and
all the entry objects are in the same buckets), you would end up with O(n)
time complexity.
Below are the images showing skewed HashMap and well balanced
HashMap. In the case of skewed HashMap . the put() and get() operations on
the bucket 0 are costlier. Getting the Entry F will cost 5 iterations.
In the case of well balanced HashMap , getting the Entry F will cost 2
iterations.
Interestingly, both above HashMaps store the same amount of data and have
the same bucket size.
If you need to store large amount of data into the HashMap then you should
create a
HashMap with an initial capacity close to your expected volume.
If you missed that , the HashMap will take the default size of 16 and load
factor of 0.75. The first 11 put() operations will be fast but the 12th (16*0.75)
will resize the bucket with a new capacity of 32. The 13th to 23rd will be fast
but 24th (0.75 * 32) will again recreate a costly new representation that
doubles the initial size of the bucket. The initial resizing operation will appear
at 48th , 96th ,192nd , 384th call of put() operation.
At low volume of data the full recreation of the bucket is fast but at high
volume of data it can take seconds to minutes.
Note : By setting the expected size initially, you can avoid these costly
operations.
Drawback : If you initialize HashMap with size 2^32 but you are using only
2^29 buckets then you will waste a lot of memory.
This is the most popular question in the java developer role. You should at
least familiar with How HashMap works in java. For performance wise you
will not see the difference between a O(1) and O(n) or O(log(n)). During
performance testing, it becomes important to know how HashMap works and
to understand the importance of the hash function of the key. Please mention
in comments if you have any questions.
Treemap class is like HashMap which stores key- value pairs . The major
difference is that Treemap sorts
the key in ascending order.
1. As the name of the algorithm suggests ,color of every node in the tree is
either red or black.
4. All paths from root node to the null should consist the same number of
black nodes .
You can find more about the red black tree algorithm here
If you do not want to sort the elements but just to insert and retrieve the
elements then use HashMap .
But if you want to maintain the order of the elements then TreeMap should be
preferred because the result is alphabetically sorted .While iterating HashMap
there is no ordering of the elements ,on the other hand , TreeMap iterates in
the natural key order.
The iterator fails fast and quickly if structurally modified at any time after the
iterator is created (in any way except through the iterator's own remove
method ). We already discussed the difference between Fail-fast and Fail safe
iterators .
According to docjar , clone() method returns the shallow copy of the TreeMap
instance . In shallow copy object B points to object A location in memory . In
other words , both object A and B are sharing the same elements .The keys
and values themselves are not cloned .
HashMap reallocates its internals as the new one gets inserted while TreeMap
does not reallocate nodes on adding new ones. Thus , the size of the
TreeMap dynamically increases if needed , without shuffling the internals. So
it is meaningless to set the initial size of the TreeMap .
Please mention in comments if you know any other good java interview
questions .