JavaJava Coding

What is Load Factor in Hashing?

What is Load Factor in Hashing?

Load factor is defined as (m/n) where n is the total size of the hash table and m is the preferred number of entries that can be inserted before an increment in the size of the underlying data structure is required.

Every data structure has a fixed size. The Hash table provides Constant time complexity of insertion and searching, provided the hash function can distribute the input load evenly.

Because if each element is at a different index, we can directly calculate the hash and locate it at that index, but in the case of collision, the time complexity can reach O(N) in the worst case, as we may need to transverse over other elements comparing against the element we need to search.

To store elements in Hash Table it should have little collision as possible. But even with the best possible Hash function and with more and more elements stored inside the Hash table, it would eventually run out of vacant indexes and will result in unavoidable collisions.

To handle such a scenario we have the Load Factor in hashing which is basically a measure that decides when exactly to increase the size of the Hash Table to maintain the same time complexity of O(1). To maintain the same time complexity for HashTable it internally performs hashing using the Load Factor.

Load factor is defined as (m/n) where n is the total size of the hash table and m is the preferred number of entries that can be inserted before an increment in the size of the underlying data structure is required.

 

Load Factor in Hashing

The Load factor is a measure that decides when to increase the HashTable capacity to maintain the search and insert operation complexity of O(1).
The default load factor of HashMap used in java, for instance, is 0.75f (75% of the map size).
That means if we have a HashTable with an array size of 100, then whenever we have 75 elements stored, we will increase the size of the array to double of its previous size i.e. to 200 now, in this case.

How is the Load Factor Used?

The Load Factor decides “when to increase the size of the hash Table.”
The load factor can be decided using the following formula:

The initial capacity of the HashTable * Load factor of the HashTable

E.g. If The initial capacity of HashTable is = 16
And the load factor of HashTable = 0.75

According to the formula as mentioned above: 16 * 0.75 = 12

It represents that the 12th key-value pair of a HashTable will keep its size to 16.
As soon as the 13th element (key-value pair) will come into the HashTable, it will increase its size from 16 buckets to 16 * 2 = 32 buckets.

Another way to calculate size:
When the current load factor becomes equal to the defined load factor i.e. 0.75, the hashmap increases its capacity.

Where:
m – is the number of entries in a HashTable
n – is the total size of HashTable

What is Rehashing?

As the name suggests, rehashing means hashing again. Basically, when the load factor increases to more than its pre-defined value (e.g. 0.75 as taken in above examples), the Time Complexity for search and insert increases.

So to overcome this, the size of the array is increased(usually doubled) and all the values are hashed again and stored in the new double sized array to maintain a low load factor and low complexity.

Why Rehashing?

Rehashing is done because whenever key-value pairs are inserted into the map, the load factor increases, which implies that the time complexity also increases as explained above. This might not give the required time complexity of O(1). Hence, rehash must be done, increasing the size of the bucketArray so as to reduce the load factor and the time complexity.

Rehashing Steps –

For each addition of a new entry to the map, check the current load factor.

  • If it’s greater than its pre-defined value, then Rehash.
  • For Rehash, make a new array of double the previous size and make it the new bucket array.
  • Then traverse to each element in the old bucketArray and insert them back so as to insert it into the new larger bucket array.

 

 

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.