Wednesday 27 December 2017

Collections In-depth : Part 1: How HashSet works internally

As we know that a set is a well-defined collection of distinct objects. Each member of a set is called an element of the set. So in other words, we can say that a set will never contain duplicate elements.

How HashSet Works internally:
Each and every element in the set is unique. So that there are no duplicate elements in set. 

In java if we want to add elements in the set then we write code like this:

public class SeleniumOcean{    
    public static void main(String[] args)
    {              
        HashSet<Object> hashset = new HashSet<Object>();
        hashset.add(3);
        hashset.add("India");
        hashset.add("US");
        System.out.println("Set is:  "+hashset);
    }
}

It will print the result :       Set is:  [3, India, US]

Now lets add duplicate element in the above code

public class SeleniumOcean{    
    public static void main(String[] args)
    {
        HashSet<Object> hashset = new HashSet<Object>();
        hashset.add(3);
        hashset.add("India");
        hashset.add("US");
        hashset.add(3);         // duplicate elements
        hashset.add("India");   // duplicate elements
        System.out.println("Set is:  "+hashset);
    }
}

It will print the result :       Set is:  [3, India, US]

When we add duplicate elements in the add() method of the Set object, it returns false and do not add to the HashSet , as the element is already present. You may be thinking how does hashset returns false and do not add. 

All the classes of Set interface internally backed up by Map. HashSet uses HashMap for storing its object internally. You must be wondering that to enter a value in HashMap we need a key-value pair, but in HashSet we are passing only one value.

When you open the HashSet implementation of the add() method in Java Apis that is rt.jar,  you will find the following code in it:

public class HashSet<E>
extends AbstractSet<E>
implements Set<E>, Cloneable, java.io.Serializable
{
    private transient HashMap<E,Object> map;
    
    // Dummy value to associate with an Object in the backing Map
    
    private static final Object PRESENT = new Object();
         
    public HashSet() {
        map = new HashMap<>();
    }
    
    // SOME CODE ,i.e Other methods in Hash Set
    
    
    public boolean add(E e) {
        return map.put(e, PRESENT)==null;
    }       
    
    // SOME CODE ,i.e Other methods in Hash Set
}


So, we are achieving this uniqueness in Set through HashMap. Whenever you create an object of HashSet, it will create an object of HashMap as you can see in the in the above code. 

We know that in HashMap each key is unique. So what we do in the set is that we pass the argument in the add(Elemene E) that is E as a key to HashMap. Now we need to associate some value to the key. Set implementation internally passes the Dummy value that is ( new Object () ) which is referred by Object reference PRESENT.

When you are adding a line in HashSet like  hashset.add(3), java internally puts that element 3 as a key in the HashMap(created during HashSet object creation) and some dummy value that is PRESENT object is passed as a value to the key.

In HashMap internal implementation (we will see this implementation in our next post), when we try to use put(Key, Value) method with the already existing value, map returns previous value associated with key. And if we try to use put(Key, Value) method with new value, map returns null value. 

1. If map.put(key, value) returns null, then the statement “map.put(e, PRESENT) == null” will return true and element is added to the HashSet(internally HashMap).

2. If map.put(key, value) returns old value of the key, then the statement “map.put(e, PRESENT) == null” will return false and element is not added to the HashSet(internally HashMap).

No comments:

Post a Comment