Chapter 33. Hashing

1. Hashing Introduction

Hashing is a technique used to convert data from one type to another. It is commonly used in data structures like maps and sets to efficiently store and retrieve data. The core idea behind hashing is to transform the input data into a fixed-size output (hash value) using a hash function.

Data Structures Using Hashing:

  1. Map: A map is a collection of key-value pairs, where each key is unique. In Java, there are three types of maps:

    • HashMap: Stores data in an unordered way.

    • LinkedHashMap: Maintains the insertion order.

    • TreeMap: Stores data in a sorted order based on keys.

  2. Set: A set is a collection that contains only unique elements. In Java, there are three types of sets:

    • HashSet: Unordered collection of unique elements.

    • LinkedHashSet: Maintains insertion order.

    • TreeSet: Stores elements in a sorted order.

Among these, HashMap and HashSet are the most important and widely used due to their constant time complexity for basic operations.


2. HashMap

A HashMap is used to store data in key-value pairs. Each key is unique, while values can be duplicated. For example, consider a menu in a bar:

Item (Key)Price (Value)
Samosa10
Pizza15
Burger50

In this example, the items are the keys, and the prices are the values. The key-value pairs are stored in an unordered manner within the HashMap.

Key Characteristics:

  • Unique Keys: Each key in a HashMap must be unique.

  • Repeatable Values: Values can be duplicated.

  • Unordered Storage: Data is stored without any specific order.


3. HashMap Operations

HashMap provides several operations, all with an average time complexity of O(1) due to efficient hashing:

  • put(key, value): Inserts a key-value pair into the map. If the key already exists, the value is updated.

  • get(key): Retrieves the value associated with a given key. Returns null if the key does not exist.

  • containsKey(key): Checks if a key exists in the map. Returns true if the key exists; otherwise, false.

  • remove(key): Deletes the key-value pair associated with the given key.

  • size(): Returns the number of key-value pairs in the map.

  • isEmpty(): Checks if the map is empty.

  • clear(): Removes all key-value pairs from the map.

Here’s a basic Java implementation:

import java.util.HashMap;

public class Basic {
    public static void main(String[] args) {
        HashMap<String, Integer> map = new HashMap<>();
        // Insert O(1)
        map.put("India", 100);
        map.put("America", 55);
        map.put("Aus", 50);
        map.put("SA", 60);

        System.out.println(map); // Output: {India=100, America=55, Aus=50, SA=60}

        int popu = map.get("India");
        System.out.println(popu); // Output: 100

        System.out.println(map.get("America")); // Output: 55
        System.out.println(map.get("Bhutan")); // Output: null
        System.out.println(map.containsKey("India")); // Output: true
        System.out.println(map.containsKey("Sri")); // Output: false
        System.out.println(map.remove("SA")); // Output: 60
        System.out.println(map); // Output: {India=100, America=55, Aus=50}
        System.out.println(map.size()); // Output: 3
        System.out.println(map.isEmpty()); // Output: false
        map.clear();
        System.out.println(map.isEmpty()); // Output: true
    }
}

4. Iteration on HashMap

Iteration over a HashMap can be done using several methods:

a. KeySet

The keySet() method returns a set of keys contained in the map. It allows iteration over the keys.

Set<String> keys = map.keySet();
for (String key : keys) {
    System.out.println("Key = " + key + " Value = " + map.get(key));
}

b. EntrySet

The entrySet() method returns a set view of the mappings contained in the map. This method allows iteration over both keys and values.

Set<Map.Entry<String, Integer>> entries = map.entrySet();
for (Map.Entry<String, Integer> entry : entries) {
    System.out.println("Key = " + entry.getKey() + " Value = " + entry.getValue());
}

Here’s an example:

import java.util.*;
import java.util.Map.Entry;

public class Iteration {
    public static void main(String[] args) {
        HashMap<String, Integer> map1 = new HashMap<>();
        map1.put("India", 100);
        map1.put("America", 55);
        map1.put("Aus", 50);
        map1.put("SA", 60);

        // Iterate using keySet
        Set<String> keys = map1.keySet();
        System.out.println(keys); // Output: [India, America, Aus, SA]
        for (String k : keys) {
            System.out.println("Key=" + k + " Value=" + map1.get(k));
        }

        // Iterate using entrySet
        Set<Entry<String, Integer>> entries = map1.entrySet();
        System.out.println(entries); // Output: [India=100, America=55, Aus=50, SA=60]
    }
}

5. HashMap Implementation

In Java, HashMap is one of the most efficient data structures for storing key-value pairs. It offers constant time complexity for insertion, retrieval, and deletion operations, thanks to its underlying mechanism, which combines arrays (buckets) and linked lists. Understanding the intricacies of how HashMap operates internally is crucial for optimizing its use in your applications. This guide dives into the internal workings of HashMap, explaining everything from bucket structures to rehashing, with real-world examples and clear formulas.

5.1 Structure of a HashMap: Buckets and Linked Lists

At its core, a HashMap is an array of buckets. Each bucket corresponds to a specific index in the array, calculated using the key’s hash value. However, due to hash collisions (where different keys generate the same hash value), multiple entries may end up in the same bucket. To manage this, Java's HashMap uses linked lists within each bucket.

Components:

  • Array (Buckets): The array is the primary storage unit, divided into buckets. Each bucket stores entries based on their hash value.

  • Linked List: When multiple entries hash to the same bucket, they are stored in a linked list within that bucket. Each node in this linked list holds a key-value pair.

Visual Representation:

+---------+      +---------+
| Bucket 0 | --->| [Key-Value Node] -> null
+---------+      +---------+
| Bucket 1 | --->| [Key-Value Node] -> [Key-Value Node] -> null
+---------+      +---------+
| Bucket 2 | ---> null
+---------+

Table: HashMap Structure Overview

ComponentDescription
ArrayThe main storage, divided into buckets (indices) representing hash values.
BucketEach index in the array, potentially storing multiple key-value pairs.
Linked ListHandles collisions by storing multiple entries in a bucket.
NodeRepresents a key-value pair within the linked list.

2. Hash Function: Converting Key to Bucket Index

The efficiency of a HashMap relies heavily on the hash function. This function converts a key into a hash value, which is then modded by the array size to determine the bucket index.

Standard Formula for Bucket Index:

[ {Bucket Index} = {Hash Value} % {Array Size} ]

Example Process:

  1. Input Key: "India"

  2. Hash Function: The hash function processes the string "India" and converts it into a hash value, say 123456.

  3. Bucket Index Calculation: Applying the formula, the bucket index would be:

[ 123456 % {Array Size} ]

Assuming the array size is 10, the index would be:

[ 123456 % 10 = 6 ]

Thus, the key "India" is stored in bucket 6.

Table: Hash Function Example

Input KeyHash ValueBucket Index (Array Size = 10)
"India"1234566
"USA"9876544

3. Insertion Using put(key, value)

Inserting a key-value pair into a HashMap using the put method involves the following steps:

Scenario 1: Key Exists

  • The hash function calculates the bucket index using the key.

  • The linked list at that bucket is traversed to find if the key already exists.

  • If the key is found, its value is updated with the new value.

Scenario 2: Key Does Not Exist

  • The hash function calculates the bucket index.

  • The linked list at that bucket is checked.

  • Since the key is not present, a new node (representing the key-value pair) is created and added to the linked list.

Algorithm for put(key, value) in Detail:

  1. Calculate Hash: Use the hash function to calculate the bucket index.

  2. Traverse Linked List: Iterate through the linked list at the calculated index.

    • If the key exists, update the value.

    • If the key does not exist, create a new node and add it to the linked list.

Table: Steps in put(key, value)

StepDescription
1. Hash CalculationConvert key to hash value, determine bucket index using modulus.
2. Traverse Linked ListSearch for the key in the linked list at the bucket index.
3. Update or InsertUpdate value if key exists; otherwise, insert a new node.

Example:

Let’s say we want to store information about the population of different countries:

HashMap<String, Integer> map = new HashMap<>();
map.put("India", 100);
map.put("USA", 50);
map.put("Australia", 30);

Here, "India", "USA", and "Australia" are keys, and 100, 50, and 30 are their respective values. The hash function calculates the bucket index for each key, and the key-value pairs are stored in the corresponding buckets.


4. Handling Collisions: Linked Lists and Hash Function

Collision: A collision occurs when two different keys produce the same bucket index. This is inevitable in a HashMap, especially as the number of keys increases relative to the size of the array.

Solution: In Java's HashMap, collisions are handled using linked lists within each bucket. When multiple keys hash to the same bucket, they are stored in a linked list, and the correct entry is found by traversing the list.

Key Concepts:

  • Lambda (λ): The load factor, represented as:

[ lambda = {{Number of Nodes (n)}}/{{Number of Buckets (N)}} ]

  • Rehashing: If the load factor exceeds a certain threshold (commonly 0.75 in Java's HashMap), rehashing occurs. This involves creating a new array with double the size of the original and redistributing all entries to new buckets based on recalculated hash values.

Example:

If a HashMap with 10 buckets has 15 entries (nodes), the load factor is:

[ lambda = {15}/{10} = 1.5 ]

Since λ exceeds the threshold, rehashing is triggered, doubling the array size to 20 buckets. The entries are then redistributed.

Table: Rehashing Process

Before RehashingAfter Rehashing
Array Size: 10New Array Size: 20
Load Factor: 1.5Load Factor: 0.75
Linked List Length: LongLinked List Length: Short

5. get(key) and remove(key) Operations

The get(key) and remove(key) operations in a HashMap are also efficient, with average time complexity of O(1).

get(key) Operation:

  1. Calculate Hash: Determine the bucket index using the hash function.

  2. Traverse Linked List: Search for the key in the linked list at the bucket index.

    • If found, return the associated value.

    • If not found, return null.

Example:

int population = map.get("India"); // Returns 100

remove(key) Operation:

  1. Calculate Hash: Determine the bucket index.

  2. Traverse Linked List: Search for the key in the linked list.

    • If found, remove the node from the linked list.

    • If not found, return null.

Example:

map.remove("USA"); // Removes the entry with key "USA"

6. Performance Considerations

While HashMap generally provides O(1) time complexity for its operations, performance can degrade if the linked lists in buckets grow too long (which happens if the load factor becomes too high). Therefore, managing the load factor and rehashing when necessary is crucial.

Formulas:

  • Time Complexity: O(1) on average, O(n) in the worst case (if all entries hash to the same bucket).

  • Space Complexity: O(n), where n is the number of entries.

import java.util.*;
import java.util.LinkedList;
public class Implementation {
    static class HashMap<K,V> { //Generic
        private class Node {
            K key;
            V value;
            public Node(K key,V value){
                this.key=key;
                this.value=value;
            }

        }
        private int n;// n
        private int N;
        private LinkedList<Node> buckets[]; //N=buckets.length
        @SuppressWarnings("unchecked")
        public HashMap(){
            this.N=4;
            this.buckets=new LinkedList[4];
            for (int i = 0; i < 4; i++) {
                this.buckets[i]=new LinkedList<>();
            }
        }
        private int hashFunc(K key){
            int hc=key.hashCode();
           return Math.abs(hc)% N;
        }
        private int SearchInLL(K key ,int bi){
            LinkedList<Node> ll=buckets[bi];
            int di=0;
            for (int i = 0; i < ll.size(); i++) {
                Node node=ll.get(i);
                if (node.key.equals(key)) {
                  return di;  
                }
                di++;
            }
                return -1;
        }
        private void reHash(){
            LinkedList<Node>oldbuck[]=buckets;
            buckets=new LinkedList[2*N];
            N=2*N;
            for (int i = 0; i < buckets.length; i++) {
                buckets[i]=new LinkedList<>();
            }
            //Nodess -->add in bucket
            for (int i = 0; i < oldbuck.length; i++) {
                LinkedList<Node>ll=oldbuck[i];
                for (int j = 0; j < ll.size(); j++) {
                    Node node=ll.remove();
                    put(node.key,node.value);
                }
            }
        }
        public void put(K key,V value){
            int bi=hashFunc(key);
            int di=SearchInLL(key, bi);
            if (di!=-1) {
                Node node=buckets[bi].get(di);
                node.value=value;
            }else{
                buckets[bi].add(new Node(key, value));
                n++;
            }
            double lambda=(double)n/N;
            if (lambda>2.0) {
                reHash();
            }
        }

        public boolean containsKey(K key){
            return false;
        }
        public V remove(K key){
            return null;
        }
        public V get(K key){
            return null;
        } 
    }
    public static void main(String[] args) {
        HashMap<String,Integer>hm=new HashMap<>();
        hm.put("India", 100);
        hm.put("America", 55);
        hm.put("Aus", 50);
        hm.put("SA", 60);
    }
}

a. Put Operation

When you perform a put(key, value) operation, the following steps occur:

  1. Hash Function: The key is passed through a hash function to determine the bucket index.

  2. Check Existence: The linked list at the determined index is traversed to check if the key already exists.

    • If the key exists, the value is updated.

    • If the key does not exist, a new node is created and added to the linked list.

b. Time Complexity

The time complexity of a HashMap operation is O(1) in the average case. However, if there are too many collisions (i.e., multiple keys hash to the same bucket), the time complexity can degrade to O(n). To avoid this, Java uses the concept of rehashing.

c. Rehashing

Rehashing occurs when the load factor (λ) exceeds a certain threshold. The load factor is defined as the ratio of the number of elements to the number of buckets. When rehashing, the array is resized, and all elements are redistributed based on the new hash function.

6.LinkedHashMap in Java

The LinkedHashMap is a subclass of HashMap, and while it shares many similarities with HashMap, it has a unique feature: it maintains the insertion order of keys. This means that entries in a LinkedHashMap will always appear in the same order in which they were inserted.

Internally, it uses a doubly linked list in addition to a hash table, which allows it to store the order of the elements while ensuring efficient performance for all standard operations.

Key Features of LinkedHashMap:

  • Insertion-Order Preservation: The order in which you insert entries is retained.

  • Doubly Linked List: This list helps maintain the order of entries. The list links the map entries, allowing traversal in both directions.

  • Performance: Despite the linked list addition, operations like put, get, remove, etc., still run in constant time: O(1).

Example Code:

import java.util.LinkedHashMap;
public class LinkedHashMapExample {
    public static void main(String[] args) {
        LinkedHashMap<String, Integer> lhm = new LinkedHashMap<>();
        lhm.put("India", 100);
        lhm.put("Aus", 55);
        lhm.put("China", 98);
        System.out.println(lhm);
    }
}

Output:

{India=100, Aus=55, China=98}

In the output, you can see that the keys India, Aus, and China are displayed in the same order they were inserted. This is the main benefit of using LinkedHashMap.

Internal Working:

  • Doubly Linked List: The doubly linked list structure allows traversal in both forward and backward directions, which helps maintain the order of elements.

  • Efficiency: The time complexity for operations like put, get, and remove remains O(1), just like in a HashMap. However, LinkedHashMap comes with the extra memory overhead for storing the pointers for the doubly linked list.


7. TreeMap in Java

A TreeMap is another class that implements the Map interface but offers a different property: it maintains the keys in sorted order. Unlike LinkedHashMap, where the order is based on insertion, the TreeMap sorts the keys based on their natural order (or a comparator you define).

Internally, it uses a Red-Black Tree, a type of self-balancing binary search tree. This makes all TreeMap operations (put, get, remove) run in O(log n) time.

Key Features of TreeMap:

  • Sorted Key Order: Keys are stored in their natural order (for example, alphabetical order for strings).

  • Red-Black Tree: The implementation is based on a self-balancing binary search tree known as a Red-Black Tree, ensuring that all operations have a time complexity of O(log n).

  • No Null Keys: Unlike HashMap and LinkedHashMap, TreeMap doesn’t allow null keys.

Example Code:

import java.util.TreeMap;
public class TreeMapExample {
    public static void main(String[] args) {
        TreeMap<String, Integer> tm = new TreeMap<>();
        tm.put("India", 100);
        tm.put("Aus", 55);
        tm.put("China", 98);
        System.out.println(tm);
    }
}

Output:

{Aus=55, China=98, India=100}

In this case, the output shows that the keys are sorted in alphabetical order: Aus, China, and India. This behavior is a direct result of the underlying Red-Black Tree structure.

Internal Working:

  • Red-Black Tree: A self-balancing binary search tree that automatically orders the keys as they are inserted. The tree ensures that the height of the tree is always balanced, leading to efficient operations.

  • Logarithmic Time Complexity: The put, get, and remove operations have a time complexity of O(log n), which is slower than HashMap and LinkedHashMap but necessary for maintaining sorted order.


Comparison Table

Below is a comparison table that summarizes the key differences between LinkedHashMap, TreeMap, and HashMap:

  • LinkedHashMap maintains insertion order using a doubly linked list, with O(1) time complexity for basic operations.

  • TreeMap ensures keys are sorted using a Red-Black Tree, with O(log n) complexity.

  • HashMap provides constant time for operations but does not preserve any specific order.

Property

HashMap

Linked HashMap

Tree Map

Sorted according to either Insertion Order of Access Order (as specified during construction)


8. Majority Element

Majority Element Problem

The Majority Element problem asks us to find all elements in an integer array that appear more than n/3 times, where n is the size of the array. This means we need to find elements that occur at least 33.33% of the time in the array.

Problem Definition:

Given an integer array num[], return all elements that appear more than n/3 times.

Example:

  1. Input: num[] = {1,3,2,5,1,3,1,5,1}

    • Output: 1

    • Here, 1 appears 4 times out of 9, which is greater than 9/3 = 3.

  2. Input: num[] = {1,2}

    • Output: 1, 2

    • Both 1 and 2 appear more than 2/3 = 0.66 times (1 occurrence each).

Approach to Solve

To solve this, we can use a HashMap to count the frequency of each element in the array. Once we have the frequencies, we can simply check which elements occur more than n/3 times and print them.

Key Shortcuts in the Code:

  1. map.getOrDefault():

    • This function simplifies the task of incrementing values in the map.

    • It checks if the key exists in the map. If the key exists, it returns its value; otherwise, it returns the default value, which is 0.

    • Example:

        map.put(num[i], map.getOrDefault(num[i], 0) + 1);
      

      This single line of code replaces the need to check if the key exists and manually update it.

  2. map.keySet():

    • This method returns a set of all the keys in the map. We can loop through these keys to find the ones that meet the condition.

    • Example:

        for (Integer key : map.keySet()) {
            if (map.get(key) > num.length / 3) {
                System.out.println(key);
            }
        }
      

Example Code:

import java.util.HashMap;

public class MajorityElement {
    public static void main(String[] args) {
        int num[] = {1, 3, 2, 5, 1, 3, 1, 5, 1};
        HashMap<Integer, Integer> map = new HashMap<>();

        // Count frequencies using map.getOrDefault
        for (int i = 0; i < num.length; i++) {
            map.put(num[i], map.getOrDefault(num[i], 0) + 1);
        }

        // Print elements with count > n/3
        for (Integer key : map.keySet()) {
            if (map.get(key) > num.length / 3) {
                System.out.println(key);
            }
        }
    }
}

Explanation of Code:

  1. Input: We initialize an integer array num[] and a HashMap<Integer, Integer> to store the frequency of each element.

  2. Counting Frequencies:

    • We loop through the array, and for each element, we increment its count in the map using map.put(num[i], map.getOrDefault(num[i], 0) + 1).
  3. Finding Majority Elements:

    • After counting frequencies, we check which keys (elements) have a count greater than n/3 by looping through the map using map.keySet().

Output:

For the input num[] = {1, 3, 2, 5, 1, 3, 1, 5, 1}, the output will be:

1

Here, 1 appears 4 times, which is more than 9/3 = 3. Hence, 1 is the majority element.


Frequency Table:

Let’s break down the frequencies of elements for better understanding:

NumFrequency / Count
14
32
21
52

From this table, we can clearly see that 1 is the only element that appears more than n/3 times.

Key Points Learned:

  1. map.getOrDefault():

    • A handy method that allows us to get a value from the map or return a default value (like 0) if the key does not exist.

    • This helps simplify the frequency counting process in a single line of code.

  2. map.keySet():

    • This returns all the keys in the map, allowing us to loop through them to find the majority elements.

This is an efficient way to solve the problem with a time complexity of O(n), where n is the size of the array.


9. Valid Anagram

An anagram is a word formed by rearranging the letters of another word. To check if two strings are anagrams, you can use a HashMap to count the frequency of each character in both strings and then compare the two maps.


10. HashSet

HashSet is a set implementation that uses a hash table for storage. It ensures that no duplicate elements are stored.

Key Characteristics:

  • Unique Elements: Only unique elements are allowed.

  • Unordered Storage: Elements are stored in an unordered manner.

  • Constant Time Operations: Basic operations like add, remove, and contains have a time complexity of O(1).


11. Iteration on HashSet

Iteration over a HashSet can be done using an iterator or an enhanced for loop. Since HashSet does not maintain any order, the order of elements during iteration is not guaranteed.


12. LinkedHashSet

LinkedHashSet is a subclass of HashSet that maintains a doubly linked list of the elements. This ensures that the elements are stored in the order in which they were inserted.


13. TreeSet

TreeSet is a set implementation that stores elements in a sorted order. It uses a Red-Black tree internally to maintain the order.


14. Count Distinct Elements

Counting distinct elements in an array can be efficiently done using a HashSet. By adding each element to the set, duplicates are automatically removed, and the size of the set gives the count of distinct elements.


15. Union and Intersection

Union:

To find the union of two sets, you can use a HashSet and add all elements from both sets.

Intersection:

To find the intersection, you can use a HashSet to store the elements of one set and then check for each element in the other set if it exists in the first set.


16. Find it for Tickets

The problem of finding if tickets are valid or not can be solved using a HashMap. Each ticket can be stored as a key-value pair, where the key is the ticket number, and the value is the ticket details.


17. Largest Subarray with Sum Zero

To find the largest subarray with a sum of zero, you can use a HashMap to store the cumulative sum at each index. If the same sum occurs again, it means that the subarray between the two indices has a sum of zero.


18. Subarray Sum Equal to K

To find the number of subarrays that sum to a given value k, you can use a HashMap to store the cumulative sum at each index. By checking if (current_sum - k) exists in the map, you can determine if a subarray with the sum k exists.