Java

What is Rehashing and Load factor in HashMap?

What is Rehashing and Load factor in HashMap?

HashMap is a very popular data structures for storing key and value pairs and helps in solving many problems. Operations on HashMap takes constant O(1) time complexity for both get() and put(). Rehashing is one of the popular questions asked on HashMap. To understand rehashing we also have to understand load factor and why it’s used.

HashMap works on the hashing principle. For insertion of <Key(K), Value(V)> pair into HashMap below operations are executed:

  1. First hash code of key ‘K’ is calculated using the internal hash() function.
  2. The hash code is then used to calculate the index position of the bucket (hashCode % sizeOfBucket. Internally bucket is an array which holds the Nodes.
  3. If object is not present already at that index position then its inserted else its value is updated as Node in the list.

What is Rehashing?

Rehashing is the process of re-calculating the hashcode of already stored entries (Key-Value pairs), to move them to another bigger size hashmap when the threshold is reached/crossed.

Rehashing of a hash map is done when the number of elements in the map reaches the maximum threshold value. Java specification suggests that the Good load factor value is .75 and the default initial capacity of HashMap is 16. Once the number of elements reaches or crosses 0.75 times the capacity, the complexity increases, So to overcome this the size of the array is increased by double and all the values are hashed again and stored in the new array of double size. This is done to reduce the complexity and maintain low load factor. In this case, when the number of elements is 12, rehashing occurs. (0.75 * 16 = 12).

Why Rehashing is done?

Rehashing is done because whenever a new key value pair is inserted into map, the load factor increases and due to which complexity also increases. And if complexity increases our HashMap will not have constant O(1) time complexity. Hence rehashing is done to distribute the items across the hashmap as to reduce both laod factor and complexity, So that get() and put() have constant time complexity of O(1). After rehashing is done existing items may fall in the same bucket or different bucket.

Rehashing HashMap

Rehashing – How it’s done?

Steps for Rehashing as follows:

  • For every new entry into the map, check the load factor.
  • If the load factor is greater than its threshold value (default 0.75 for HashMap), then start Rehash.
  • For Rehashing, initialize a new array of double the size of the previous one.
  • Copy all elements into a new array and make it the new bucket array.

 

What is Load factor in HashMap?

The load factor in HashMap is basically a measure that decides when exactly to increase the size of the HashMap to maintain the same time complexity of O(1).

Load factor is defined as (m/n) where n is the total size of the hash table and m is the preferred number of entries which can be inserted before an increment in the size of the underlying data structure is required.

If you are going to store a really large no of elements in the hashmap then it is always good to create HashMap with sufficient capacity upfront as rehashing will not be done frequently, this is more efficient than letting it to perform automatic rehashing. Please read more here on how to improve performance of HashMap.

More interview questions:

 

 

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.