Chapter 33. Hashing
Table of contents
- 1. Hashing Introduction
- 2. HashMap
- 3. HashMap Operations
- 4. Iteration on HashMap
- 5. HashMap Implementation
- 5.1 Structure of a HashMap: Buckets and Linked Lists
- 2. Hash Function: Converting Key to Bucket Index
- 3. Insertion Using put(key, value)
- 4. Handling Collisions: Linked Lists and Hash Function
- 5. get(key) and remove(key) Operations
- 6. Performance Considerations
- a. Put Operation
- b. Time Complexity
- c. Rehashing
- 6.LinkedHashMap in Java
- 7. TreeMap in Java
- Comparison Table
- 8. Majority Element
- 9. Valid Anagram
- 10. HashSet
- 11. Iteration on HashSet
- 12. LinkedHashSet
- 13. TreeSet
- 14. Count Distinct Elements
- 15. Union and Intersection
- 16. Find it for Tickets
- 17. Largest Subarray with Sum Zero
- 18. Subarray Sum Equal to K
1. Hashing Introduction
Hashing is a technique used to convert data from one type to another. It is commonly used in data structures like maps and sets to efficiently store and retrieve data. The core idea behind hashing is to transform the input data into a fixed-size output (hash value) using a hash function.
Data Structures Using Hashing:
Map: A map is a collection of key-value pairs, where each key is unique. In Java, there are three types of maps:
HashMap: Stores data in an unordered way.
LinkedHashMap: Maintains the insertion order.
TreeMap: Stores data in a sorted order based on keys.
Set: A set is a collection that contains only unique elements. In Java, there are three types of sets:
HashSet: Unordered collection of unique elements.
LinkedHashSet: Maintains insertion order.
TreeSet: Stores elements in a sorted order.
Among these, HashMap and HashSet are the most important and widely used due to their constant time complexity for basic operations.
2. HashMap
A HashMap
is used to store data in key-value pairs. Each key is unique, while values can be duplicated. For example, consider a menu in a bar:
Item (Key) | Price (Value) |
Samosa | 10 |
Pizza | 15 |
Burger | 50 |
In this example, the items are the keys, and the prices are the values. The key-value pairs are stored in an unordered manner within the HashMap
.
Key Characteristics:
Unique Keys: Each key in a
HashMap
must be unique.Repeatable Values: Values can be duplicated.
Unordered Storage: Data is stored without any specific order.
3. HashMap Operations
HashMap
provides several operations, all with an average time complexity of O(1) due to efficient hashing:
put(key, value): Inserts a key-value pair into the map. If the key already exists, the value is updated.
get(key): Retrieves the value associated with a given key. Returns
null
if the key does not exist.containsKey(key): Checks if a key exists in the map. Returns
true
if the key exists; otherwise,false
.remove(key): Deletes the key-value pair associated with the given key.
size(): Returns the number of key-value pairs in the map.
isEmpty(): Checks if the map is empty.
clear(): Removes all key-value pairs from the map.
Here’s a basic Java implementation:
import java.util.HashMap;
public class Basic {
public static void main(String[] args) {
HashMap<String, Integer> map = new HashMap<>();
// Insert O(1)
map.put("India", 100);
map.put("America", 55);
map.put("Aus", 50);
map.put("SA", 60);
System.out.println(map); // Output: {India=100, America=55, Aus=50, SA=60}
int popu = map.get("India");
System.out.println(popu); // Output: 100
System.out.println(map.get("America")); // Output: 55
System.out.println(map.get("Bhutan")); // Output: null
System.out.println(map.containsKey("India")); // Output: true
System.out.println(map.containsKey("Sri")); // Output: false
System.out.println(map.remove("SA")); // Output: 60
System.out.println(map); // Output: {India=100, America=55, Aus=50}
System.out.println(map.size()); // Output: 3
System.out.println(map.isEmpty()); // Output: false
map.clear();
System.out.println(map.isEmpty()); // Output: true
}
}
4. Iteration on HashMap
Iteration over a HashMap
can be done using several methods:
a. KeySet
The keySet()
method returns a set of keys contained in the map. It allows iteration over the keys.
Set<String> keys = map.keySet();
for (String key : keys) {
System.out.println("Key = " + key + " Value = " + map.get(key));
}
b. EntrySet
The entrySet()
method returns a set view of the mappings contained in the map. This method allows iteration over both keys and values.
Set<Map.Entry<String, Integer>> entries = map.entrySet();
for (Map.Entry<String, Integer> entry : entries) {
System.out.println("Key = " + entry.getKey() + " Value = " + entry.getValue());
}
Here’s an example:
import java.util.*;
import java.util.Map.Entry;
public class Iteration {
public static void main(String[] args) {
HashMap<String, Integer> map1 = new HashMap<>();
map1.put("India", 100);
map1.put("America", 55);
map1.put("Aus", 50);
map1.put("SA", 60);
// Iterate using keySet
Set<String> keys = map1.keySet();
System.out.println(keys); // Output: [India, America, Aus, SA]
for (String k : keys) {
System.out.println("Key=" + k + " Value=" + map1.get(k));
}
// Iterate using entrySet
Set<Entry<String, Integer>> entries = map1.entrySet();
System.out.println(entries); // Output: [India=100, America=55, Aus=50, SA=60]
}
}
5. HashMap Implementation
In Java, HashMap
is one of the most efficient data structures for storing key-value pairs. It offers constant time complexity for insertion, retrieval, and deletion operations, thanks to its underlying mechanism, which combines arrays (buckets) and linked lists. Understanding the intricacies of how HashMap
operates internally is crucial for optimizing its use in your applications. This guide dives into the internal workings of HashMap
, explaining everything from bucket structures to rehashing, with real-world examples and clear formulas.
5.1 Structure of a HashMap: Buckets and Linked Lists
At its core, a HashMap
is an array of buckets. Each bucket corresponds to a specific index in the array, calculated using the key’s hash value. However, due to hash collisions (where different keys generate the same hash value), multiple entries may end up in the same bucket. To manage this, Java's HashMap
uses linked lists within each bucket.
Components:
Array (Buckets): The array is the primary storage unit, divided into buckets. Each bucket stores entries based on their hash value.
Linked List: When multiple entries hash to the same bucket, they are stored in a linked list within that bucket. Each node in this linked list holds a key-value pair.
Visual Representation:
+---------+ +---------+
| Bucket 0 | --->| [Key-Value Node] -> null
+---------+ +---------+
| Bucket 1 | --->| [Key-Value Node] -> [Key-Value Node] -> null
+---------+ +---------+
| Bucket 2 | ---> null
+---------+
Table: HashMap
Structure Overview
Component | Description |
Array | The main storage, divided into buckets (indices) representing hash values. |
Bucket | Each index in the array, potentially storing multiple key-value pairs. |
Linked List | Handles collisions by storing multiple entries in a bucket. |
Node | Represents a key-value pair within the linked list. |
2. Hash Function: Converting Key to Bucket Index
The efficiency of a HashMap
relies heavily on the hash function. This function converts a key into a hash value, which is then modded by the array size to determine the bucket index.
Standard Formula for Bucket Index:
[ {Bucket Index} = {Hash Value} % {Array Size} ]
Example Process:
Input Key:
"India"
Hash Function: The hash function processes the string
"India"
and converts it into a hash value, say123456
.Bucket Index Calculation: Applying the formula, the bucket index would be:
[ 123456 % {Array Size} ]
Assuming the array size is 10, the index would be:
[ 123456 % 10 = 6 ]
Thus, the key "India"
is stored in bucket 6.
Table: Hash Function Example
Input Key | Hash Value | Bucket Index (Array Size = 10) |
"India" | 123456 | 6 |
"USA" | 987654 | 4 |
3. Insertion Using put(key, value)
Inserting a key-value pair into a HashMap
using the put
method involves the following steps:
Scenario 1: Key Exists
The hash function calculates the bucket index using the key.
The linked list at that bucket is traversed to find if the key already exists.
If the key is found, its value is updated with the new value.
Scenario 2: Key Does Not Exist
The hash function calculates the bucket index.
The linked list at that bucket is checked.
Since the key is not present, a new node (representing the key-value pair) is created and added to the linked list.
Algorithm for put(key, value)
in Detail:
Calculate Hash: Use the hash function to calculate the bucket index.
Traverse Linked List: Iterate through the linked list at the calculated index.
If the key exists, update the value.
If the key does not exist, create a new node and add it to the linked list.
Table: Steps in put(key, value)
Step | Description |
1. Hash Calculation | Convert key to hash value, determine bucket index using modulus. |
2. Traverse Linked List | Search for the key in the linked list at the bucket index. |
3. Update or Insert | Update value if key exists; otherwise, insert a new node. |
Example:
Let’s say we want to store information about the population of different countries:
HashMap<String, Integer> map = new HashMap<>();
map.put("India", 100);
map.put("USA", 50);
map.put("Australia", 30);
Here, "India"
, "USA"
, and "Australia"
are keys, and 100
, 50
, and 30
are their respective values. The hash function calculates the bucket index for each key, and the key-value pairs are stored in the corresponding buckets.
4. Handling Collisions: Linked Lists and Hash Function
Collision: A collision occurs when two different keys produce the same bucket index. This is inevitable in a HashMap
, especially as the number of keys increases relative to the size of the array.
Solution: In Java's HashMap
, collisions are handled using linked lists within each bucket. When multiple keys hash to the same bucket, they are stored in a linked list, and the correct entry is found by traversing the list.
Key Concepts:
- Lambda (λ): The load factor, represented as:
[ lambda = {{Number of Nodes (n)}}/{{Number of Buckets (N)}} ]
- Rehashing: If the load factor exceeds a certain threshold (commonly 0.75 in Java's
HashMap
), rehashing occurs. This involves creating a new array with double the size of the original and redistributing all entries to new buckets based on recalculated hash values.
Example:
If a HashMap
with 10 buckets has 15 entries (nodes), the load factor is:
[ lambda = {15}/{10} = 1.5 ]
Since λ
exceeds the threshold, rehashing is triggered, doubling the array size to 20 buckets. The entries are then redistributed.
Table: Rehashing Process
Before Rehashing | After Rehashing |
Array Size: 10 | New Array Size: 20 |
Load Factor: 1.5 | Load Factor: 0.75 |
Linked List Length: Long | Linked List Length: Short |
5. get(key)
and remove(key)
Operations
The get(key)
and remove(key)
operations in a HashMap
are also efficient, with average time complexity of O(1).
get(key)
Operation:
Calculate Hash: Determine the bucket index using the hash function.
Traverse Linked List: Search for the key in the linked list at the bucket index.
If found, return the associated value.
If not found, return
null
.
Example:
int population = map.get("India"); // Returns 100
remove(key)
Operation:
Calculate Hash: Determine the bucket index.
Traverse Linked List: Search for the key in the linked list.
If found, remove the node from the linked list.
If not found, return
null
.
Example:
map.remove("USA"); // Removes the entry with key "USA"
6. Performance Considerations
While HashMap
generally provides O(1) time complexity for its operations, performance can degrade if the linked lists in buckets grow too long (which happens if the load factor becomes too high). Therefore, managing the load factor and rehashing when necessary is crucial.
Formulas:
Time Complexity: O(1) on average, O(n) in the worst case (if all entries hash to the same bucket).
Space Complexity: O(n), where n is the number of entries.
import java.util.*;
import java.util.LinkedList;
public class Implementation {
static class HashMap<K,V> { //Generic
private class Node {
K key;
V value;
public Node(K key,V value){
this.key=key;
this.value=value;
}
}
private int n;// n
private int N;
private LinkedList<Node> buckets[]; //N=buckets.length
@SuppressWarnings("unchecked")
public HashMap(){
this.N=4;
this.buckets=new LinkedList[4];
for (int i = 0; i < 4; i++) {
this.buckets[i]=new LinkedList<>();
}
}
private int hashFunc(K key){
int hc=key.hashCode();
return Math.abs(hc)% N;
}
private int SearchInLL(K key ,int bi){
LinkedList<Node> ll=buckets[bi];
int di=0;
for (int i = 0; i < ll.size(); i++) {
Node node=ll.get(i);
if (node.key.equals(key)) {
return di;
}
di++;
}
return -1;
}
private void reHash(){
LinkedList<Node>oldbuck[]=buckets;
buckets=new LinkedList[2*N];
N=2*N;
for (int i = 0; i < buckets.length; i++) {
buckets[i]=new LinkedList<>();
}
//Nodess -->add in bucket
for (int i = 0; i < oldbuck.length; i++) {
LinkedList<Node>ll=oldbuck[i];
for (int j = 0; j < ll.size(); j++) {
Node node=ll.remove();
put(node.key,node.value);
}
}
}
public void put(K key,V value){
int bi=hashFunc(key);
int di=SearchInLL(key, bi);
if (di!=-1) {
Node node=buckets[bi].get(di);
node.value=value;
}else{
buckets[bi].add(new Node(key, value));
n++;
}
double lambda=(double)n/N;
if (lambda>2.0) {
reHash();
}
}
public boolean containsKey(K key){
return false;
}
public V remove(K key){
return null;
}
public V get(K key){
return null;
}
}
public static void main(String[] args) {
HashMap<String,Integer>hm=new HashMap<>();
hm.put("India", 100);
hm.put("America", 55);
hm.put("Aus", 50);
hm.put("SA", 60);
}
}
a. Put Operation
When you perform a put(key, value)
operation, the following steps occur:
Hash Function: The key is passed through a hash function to determine the bucket index.
Check Existence: The linked list at the determined index is traversed to check if the key already exists.
If the key exists, the value is updated.
If the key does not exist, a new node is created and added to the linked list.
b. Time Complexity
The time complexity of a HashMap
operation is O(1) in the average case. However, if there are too many collisions (i.e., multiple keys hash to the same bucket), the time complexity can degrade to O(n). To avoid this, Java uses the concept of rehashing.
c. Rehashing
Rehashing occurs when the load factor (λ) exceeds a certain threshold. The load factor is defined as the ratio of the number of elements to the number of buckets. When rehashing, the array is resized, and all elements are redistributed based on the new hash function.
6.LinkedHashMap in Java
The LinkedHashMap
is a subclass of HashMap
, and while it shares many similarities with HashMap
, it has a unique feature: it maintains the insertion order of keys. This means that entries in a LinkedHashMap
will always appear in the same order in which they were inserted.
Internally, it uses a doubly linked list in addition to a hash table, which allows it to store the order of the elements while ensuring efficient performance for all standard operations.
Key Features of LinkedHashMap:
Insertion-Order Preservation: The order in which you insert entries is retained.
Doubly Linked List: This list helps maintain the order of entries. The list links the map entries, allowing traversal in both directions.
Performance: Despite the linked list addition, operations like
put
,get
,remove
, etc., still run in constant time: O(1).
Example Code:
import java.util.LinkedHashMap;
public class LinkedHashMapExample {
public static void main(String[] args) {
LinkedHashMap<String, Integer> lhm = new LinkedHashMap<>();
lhm.put("India", 100);
lhm.put("Aus", 55);
lhm.put("China", 98);
System.out.println(lhm);
}
}
Output:
{India=100, Aus=55, China=98}
In the output, you can see that the keys India
, Aus
, and China
are displayed in the same order they were inserted. This is the main benefit of using LinkedHashMap
.
Internal Working:
Doubly Linked List: The doubly linked list structure allows traversal in both forward and backward directions, which helps maintain the order of elements.
Efficiency: The time complexity for operations like
put
,get
, andremove
remains O(1), just like in aHashMap
. However,LinkedHashMap
comes with the extra memory overhead for storing the pointers for the doubly linked list.
7. TreeMap in Java
A TreeMap
is another class that implements the Map
interface but offers a different property: it maintains the keys in sorted order. Unlike LinkedHashMap
, where the order is based on insertion, the TreeMap
sorts the keys based on their natural order (or a comparator you define).
Internally, it uses a Red-Black Tree, a type of self-balancing binary search tree. This makes all TreeMap
operations (put
, get
, remove
) run in O(log n) time.
Key Features of TreeMap:
Sorted Key Order: Keys are stored in their natural order (for example, alphabetical order for strings).
Red-Black Tree: The implementation is based on a self-balancing binary search tree known as a Red-Black Tree, ensuring that all operations have a time complexity of O(log n).
No Null Keys: Unlike
HashMap
andLinkedHashMap
,TreeMap
doesn’t allow null keys.
Example Code:
import java.util.TreeMap;
public class TreeMapExample {
public static void main(String[] args) {
TreeMap<String, Integer> tm = new TreeMap<>();
tm.put("India", 100);
tm.put("Aus", 55);
tm.put("China", 98);
System.out.println(tm);
}
}
Output:
{Aus=55, China=98, India=100}
In this case, the output shows that the keys are sorted in alphabetical order: Aus
, China
, and India
. This behavior is a direct result of the underlying Red-Black Tree structure.
Internal Working:
Red-Black Tree: A self-balancing binary search tree that automatically orders the keys as they are inserted. The tree ensures that the height of the tree is always balanced, leading to efficient operations.
Logarithmic Time Complexity: The
put
,get
, andremove
operations have a time complexity of O(log n), which is slower thanHashMap
andLinkedHashMap
but necessary for maintaining sorted order.
Comparison Table
Below is a comparison table that summarizes the key differences between LinkedHashMap
, TreeMap
, and HashMap
:
LinkedHashMap maintains insertion order using a doubly linked list, with O(1) time complexity for basic operations.
TreeMap ensures keys are sorted using a Red-Black Tree, with O(log n) complexity.
HashMap provides constant time for operations but does not preserve any specific order.
Property | HashMap | Linked HashMap | Tree Map | ||
Sorted according to either Insertion Order of Access Order (as specified during construction) | |||||
8. Majority Element
Majority Element Problem
The Majority Element problem asks us to find all elements in an integer array that appear more than n/3 times, where n
is the size of the array. This means we need to find elements that occur at least 33.33% of the time in the array.
Problem Definition:
Given an integer array num[]
, return all elements that appear more than n/3 times.
Example:
Input:
num[] = {1,3,2,5,1,3,1,5,1}
Output:
1
Here,
1
appears 4 times out of 9, which is greater than 9/3 = 3.
Input:
num[] = {1,2}
Output:
1, 2
Both
1
and2
appear more than 2/3 = 0.66 times (1 occurrence each).
Approach to Solve
To solve this, we can use a HashMap to count the frequency of each element in the array. Once we have the frequencies, we can simply check which elements occur more than n/3
times and print them.
Key Shortcuts in the Code:
map.getOrDefault()
:This function simplifies the task of incrementing values in the map.
It checks if the key exists in the map. If the key exists, it returns its value; otherwise, it returns the default value, which is
0
.Example:
map.put(num[i], map.getOrDefault(num[i], 0) + 1);
This single line of code replaces the need to check if the key exists and manually update it.
map.keySet()
:This method returns a set of all the keys in the map. We can loop through these keys to find the ones that meet the condition.
Example:
for (Integer key : map.keySet()) { if (map.get(key) > num.length / 3) { System.out.println(key); } }
Example Code:
import java.util.HashMap;
public class MajorityElement {
public static void main(String[] args) {
int num[] = {1, 3, 2, 5, 1, 3, 1, 5, 1};
HashMap<Integer, Integer> map = new HashMap<>();
// Count frequencies using map.getOrDefault
for (int i = 0; i < num.length; i++) {
map.put(num[i], map.getOrDefault(num[i], 0) + 1);
}
// Print elements with count > n/3
for (Integer key : map.keySet()) {
if (map.get(key) > num.length / 3) {
System.out.println(key);
}
}
}
}
Explanation of Code:
Input: We initialize an integer array
num[]
and aHashMap<Integer, Integer>
to store the frequency of each element.Counting Frequencies:
- We loop through the array, and for each element, we increment its count in the map using
map.put(num[i], map.getOrDefault(num[i], 0) + 1)
.
- We loop through the array, and for each element, we increment its count in the map using
Finding Majority Elements:
- After counting frequencies, we check which keys (elements) have a count greater than
n/3
by looping through the map usingmap.keySet()
.
- After counting frequencies, we check which keys (elements) have a count greater than
Output:
For the input num[] = {1, 3, 2, 5, 1, 3, 1, 5, 1}
, the output will be:
1
Here, 1
appears 4 times, which is more than 9/3 = 3
. Hence, 1
is the majority element.
Frequency Table:
Let’s break down the frequencies of elements for better understanding:
Num | Frequency / Count |
1 | 4 |
3 | 2 |
2 | 1 |
5 | 2 |
From this table, we can clearly see that 1
is the only element that appears more than n/3
times.
Key Points Learned:
map.getOrDefault()
:A handy method that allows us to get a value from the map or return a default value (like 0) if the key does not exist.
This helps simplify the frequency counting process in a single line of code.
map.keySet()
:- This returns all the keys in the map, allowing us to loop through them to find the majority elements.
This is an efficient way to solve the problem with a time complexity of O(n), where n
is the size of the array.
9. Valid Anagram
An anagram is a word formed by rearranging the letters of another word. To check if two strings are anagrams, you can use a HashMap
to count the frequency of each character in both strings and then compare the two maps.
10. HashSet
HashSet
is a set implementation that uses a hash table for storage. It ensures that no duplicate elements are stored.
Key Characteristics:
Unique Elements: Only unique elements are allowed.
Unordered Storage: Elements are stored in an unordered manner.
Constant Time Operations: Basic operations like add, remove, and contains have a time complexity of O(1).
11. Iteration on HashSet
Iteration over a HashSet
can be done using an iterator or an enhanced for loop. Since HashSet
does not maintain any order, the order of elements during iteration is not guaranteed.
12. LinkedHashSet
LinkedHashSet
is a subclass of HashSet
that maintains a doubly linked list of the elements. This ensures that the elements are stored in the order in which they were inserted.
13. TreeSet
TreeSet
is a set implementation that stores elements in a sorted order. It uses a Red-Black tree internally to maintain the order.
14. Count Distinct Elements
Counting distinct elements in an array can be efficiently done using a HashSet
. By adding each element to the set, duplicates are automatically removed, and the size of the set gives the count of distinct elements.
15. Union and Intersection
Union:
To find the union of two sets, you can use a HashSet
and add all elements from both sets.
Intersection:
To find the intersection, you can use a HashSet
to store the elements of one set and then check for each element in the other set if it exists in the first set.
16. Find it for Tickets
The problem of finding if tickets are valid or not can be solved using a HashMap
. Each ticket can be stored as a key-value pair, where the key is the ticket number, and the value is the ticket details.
17. Largest Subarray with Sum Zero
To find the largest subarray with a sum of zero, you can use a HashMap
to store the cumulative sum at each index. If the same sum occurs again, it means that the subarray between the two indices has a sum of zero.
18. Subarray Sum Equal to K
To find the number of subarrays that sum to a given value k
, you can use a HashMap
to store the cumulative sum at each index. By checking if (current_sum - k)
exists in the map, you can determine if a subarray with the sum k
exists.