Sunday 7 January 2018

Collections In-depth : Part 2: How Hash Map works internally?

HashMap works on the principle of Hashing. To understand Hashing, we should understand three terms.They are Hash Function , Hash Value and Bucket.

Hash Function , Hash Value and Bucket:
    When we try to add any value to the hashmap, Java internally calls this hashfunction in order to get unique integer value. This function generates the unique integer value based on the key that we pass. That Value is called as hash value. In java, every object has a method "public int hashcode()" which returns a hash value for a given object.

A bucket is used to store key value pairs. A bucket can have multiple key-value pairs. In hash map, bucket uses simple linked list to store objects.

When we create a hash map, it creates array of nodes. Each node contains hash code, key, value and next  node's reference as shown in the above picture.

Lets jump into its internal working mechanism. We will start will PUT.

PUT Functionality:

Lets consider we are storing the below values into hash map using PUT functionality.  

                   
marks.put("Ram", 100);
marks.put("Ravi", 89);
marks.put("Rajesh", 70);
marks.put("Ramesh", 99);
  
As we discussed earlier the hash map comprises of the table. The initial size of the table would be 0-15 slots.


Now lets see how we can fit our values into this table.


When we add "marks.put("Ram", 100);" entry to the map, Java internally calls hash function in order  to get hashcode.


put(K k,V v)
hash(k)
index=hash&(n-1)

In our case, hash code will be calculated for "Ram" which is our key. Based on the returned hash code, index will be calculated as shown in the above method. Lets consider index as 4. So, our key value pair will be stored in that table as a node with key, value, hash code and next node reference. If you observe here, our next node reference is null. We will discuss about it later.

Lets consider that index for the next entry is 3 and stored as shown in below.

For now all the entries has unique hash code and unique index value. What will happen, if hash() function returns the same value for two different entries. We call this scenario as hash collision.

Lets see, how hash map deals this situation.

For that consider, 3rd entry in our map has got the same hashcode and index as our first entry. In this case, our new entry would also be stored in index 4, but as a linked list of node as shown below.

As you see in the above picture, first entry got the reference of 3rd entry as its next node. So, when there is a collision in hash code, those entries would be stored as linked list of nodes. This is how we avoid hash collision.

Note: In Java hash map allows null keys and it is always stored index 0 as hash of null is 0.

GET Functionality:

V get(Object key)
hash(key)
index=hash&(n-1)

For GET functionality also, java performs the same kind of operations.

Lets get the first entry in our hash map. When we try to get the value for "Ram" (which is the key for our first entry), it finds the hash of the key, and will compute the index. It returns the index 4. Now will look up index 4 in the table. We have an entry there. At this point, it compares the hashcode returned in our GET functionality and the hashcode stored in that node. Both will match. After this, it also compares the Key that we are sending for GET functionality and Key stored in our node by using equals() method. Both will match. So, it returns the value 100 which is stored in the node.

Now lets try to get our 3rd entry value which is little complicated. When we pass "Rajesh"(our 3rd entry's key), it calculates both the hash code and index returned as 4. We will go to index 4. The first node in index 4 is "Ram". It will try to compare both the hashcode and key of the node. They will not match. But by using the reference of the next node, it will go to next node and try to compare both hashcode and key. So, it returns value 70, which is stored in the second node of that index.

The hashmap implementation provides constant time performance for (get and put) basic operations i.e the complexity of get() and put() is O(1) , assuming the hash function disperses the elements properly among the buckets.