Chapter 44: Collections in Java (Part 1)

Chapter 44: Collections in Java (Part 1)

Table of contents

1. Collections

  • Definition:

    • The Collection in Java is a framework that provides an architecture to store and manipulate groups of objects.

    • It is an API offering a unified way to handle operations such as searching, sorting, insertion, manipulation, and deletion.

    • A collection is essentially a single unit that holds multiple objects.

  • Hierarchy:
    Java Collection Framework consists of multiple interfaces and classes:

    • Interfaces: Set, List, Queue, Deque.

    • Classes: ArrayList, Vector, LinkedList, PriorityQueue, HashSet, LinkedHashSet, TreeSet.


2. Array

  • Purpose:

    • Variables are used to store single values, but for large datasets, managing multiple variables becomes cumbersome.

    • Arrays are used to store large volumes of data in an indexed manner.

  • Features of Array:

    • Indexed-based data structure.

    • Can store homogeneous data (same type).

    • Allows access via a single variable name.


3. Disadvantages of Arrays

  • Cannot store heterogeneous data (data of different types).

  • Fixed in size, which makes dynamic resizing impossible.

  • Requires contiguous memory locations.

  • Does not provide inbuilt methods for common operations.

    • Example: Sorting, searching, or insertion cannot be directly performed without helper classes (e.g., Arrays class).

4. Problems Faced in Storing Large Volumes of Data

  • Java provided inbuilt classes, but they lacked uniformity.

  • These classes struggled to manage large datasets efficiently.

  • Operations like searching, sorting, and deletion on large volumes of data were cumbersome.


5. Collection API

  • Origin:

    • Joshua Bloch, a computer scientist, identified the problems in earlier Java versions and contributed the Collection API.

    • He proposed this solution to Sun Microsystems, leading to the introduction of the Collection Framework in Java 1.2.

  • Features of Collection API:

    • It is an API, hierarchy, and framework.

    • Provides interfaces and classes pre-coded for developers.

  • Key Classes:
    The hierarchy consists of seven major classes:

    1. ArrayList

    2. LinkedList

    3. ArrayDeque

    4. PriorityQueue

    5. TreeSet

    6. HashSet

    7. LinkedHashSet

  • Interfaces and Class Mapping:

    • List Interface: ArrayList, LinkedList.

    • Queue Interface: ArrayDeque, PriorityQueue.

    • Set Interface: TreeSet, HashSet, LinkedHashSet.


Expanded Explanation of Additional Points


1. Legacy Classes

  • Definition:
    Legacy classes in Java refer to the data structures introduced before Java 2 (JDK 1.2), such as Vector, Stack, Hashtable, and Enumeration. These were replaced by the Collection Framework to ensure a more uniform and robust approach to data manipulation.

  • Characteristics:

    • Legacy classes are not part of the modern Collection Framework but are still available for backward compatibility.

    • Example:

        Vector<Integer> vec = new Vector<>();
        vec.add(10);
        vec.add(20);
        System.out.println(vec);
      
    • They lack features like generics and are synchronized by default, which can impact performance.


2. Generics

  • Definition:
    Generics in Java were introduced in JDK 1.5 to provide type safety and reduce runtime errors by enforcing compile-time type checking in collections.

  • Why Generics?

    • Without generics, collections can store any object, which could lead to ClassCastException.

    • Generics ensure that a collection can hold only a specific type of data.

  • Example (Without Generics):

      ArrayList list = new ArrayList();
      list.add("Hello");
      list.add(10); // Error-prone: Mixing types.
      String str = (String) list.get(1); // ClassCastException at runtime.
    
  • Example (With Generics):

      ArrayList<String> list = new ArrayList<>();
      list.add("Hello");
      // list.add(10); // Compile-time error: Ensures type safety.
      String str = list.get(0); // Safe access.
    

3. Data Structures in Collection Classes

  • Each collection class is backed by an underlying data structure that determines how it stores and organizes data.

  • Examples:

    • ArrayList: Backed by a dynamic array. Good for random access but slower for insertions/deletions in the middle.

    • LinkedList: Backed by a doubly linked list. Efficient for insertions/deletions but slower for random access.

    • HashSet: Uses a hash table. Provides constant time performance for basic operations like add, remove, and contains.

    • TreeSet: Uses a red-black tree to store elements in sorted order.

  • Why Different Data Structures?

    • Different applications have different performance requirements (e.g., speed, memory, or sorting).

    • These data structures optimize specific use cases, providing flexibility to developers.


4. Built-in Methods

  • The Collection Framework provides a wide range of utility methods for data manipulation, making it easy to perform common tasks.

  • Examples of Methods:

    • add(E element) – Adds an element to a collection.

    • remove(Object o) – Removes a specified element.

    • contains(Object o) – Checks if the collection contains the element.

    • size() – Returns the number of elements.

    • sort() – Sorts a list (available in Collections utility class).

  • Example Usage:

      ArrayList<Integer> list = new ArrayList<>();
      list.add(10);
      list.add(20);
      list.add(15);
    
      // Sort the list
      Collections.sort(list);
      System.out.println(list); // Output: [10, 15, 20]
    
      // Check if 15 is present
      System.out.println(list.contains(15)); // Output: true
    

5. Packages

  • Collections are part of the java.util package:

    • This package provides classes and interfaces for working with collections, such as ArrayList, HashSet, HashMap, and LinkedList.

    • Developers must import this package to use the collection classes in their code.

  • Helper Classes in java.lang Package:

    • Classes like String and Arrays are part of the java.lang package.

    • They provide utility methods to manipulate strings and arrays but are separate from the Collection Framework.

  • Importing Collections:

      import java.util.*; // Imports all collection classes.
    

6. Import Requirement

  • To utilize any of the collection classes, you must explicitly import the java.util package. This is because these classes are not automatically available in Java programs.

  • Example:

      import java.util.ArrayList;
    
      public class Main {
          public static void main(String[] args) {
              ArrayList<String> names = new ArrayList<>();
              names.add("Rohit");
              names.add("Java");
              System.out.println(names);
          }
      }
    

7. Object Storage in Collections

  • All data added to a collection is stored as objects, regardless of the type.

    • For primitive types like int, double, etc., autoboxing converts them into their corresponding wrapper classes (Integer, Double, etc.).

    • This ensures uniformity in how data is stored in collections.

  • Example (Primitive Autoboxing):

      ArrayList<Integer> numbers = new ArrayList<>();
      numbers.add(10); // Autoboxing: int -> Integer
      numbers.add(20);
    
      System.out.println(numbers); // Output: [10, 20]
    

Expanded Explanation of Collection Framework Details


Hierarchy and Interfaces in Collection Framework

The Collection Framework organizes its classes using a hierarchical structure of interfaces and classes, enabling uniformity and reusability in data management.


1. Division of Classes Based on Interfaces

  • The collection classes are divided into three main interfaces:

1. List Interface

  • Purpose: Handles ordered collections (also called sequences) that allow duplicate elements.

  • Classes:

    • ArrayList: Backed by a resizable array, provides random access but is slow for insertions/deletions in the middle.

    • LinkedList: Backed by a doubly linked list, efficient for frequent insertions and deletions.

  • Key Features: Maintains insertion order, allows duplicates, and supports indexed access.

2. Queue Interface

  • Purpose: Handles collections designed for FIFO (First-In-First-Out) order.

  • Classes:

    • ArrayDeque: Implements a deque (double-ended queue) that supports both LIFO and FIFO operations.

    • PriorityQueue: Elements are ordered based on their natural order or a custom comparator.

  • Key Features: Provides efficient insertion and removal operations.

3. Set Interface

  • Purpose: Represents collections of unique elements with no duplicate values.

  • Classes:

    • TreeSet: Stores elements in sorted order using a red-black tree.

    • HashSet: Uses a hash table for constant-time performance for add, remove, and search.

    • LinkedHashSet: Combines the features of HashSet with a linked list to maintain insertion order.

  • Key Features: Ensures uniqueness of elements.


2. Map Interface

  • Purpose: Represents a collection of key-value pairs, where each key is unique but values can be duplicated.

  • Key Implementations:

    • HashMap: Stores key-value pairs using a hash table. Allows one null key and multiple null values.

    • TreeMap: Keys are stored in sorted order using a red-black tree.

    • LinkedHashMap: Maintains insertion order while using a hash table for efficiency.



5. Internal Data Structure

  • Each collection class uses a specific data structure to store and manage data:

    • ArrayList: Dynamic array.

    • LinkedList: Doubly linked list.

    • HashSet: Hash table.

    • TreeSet: Red-black tree.

    • PriorityQueue: Heap.

These structures are chosen based on their suitability for specific operations, ensuring optimal performance.


6. Built-In Methods

The Collection Framework provides numerous built-in methods for data manipulation.
Examples:

  • add(), remove(), contains(), size(), sort(), etc.

7. Packages

  • Collection Classes in java.util Package: The framework's primary classes are part of this package.

  • Utility Classes in java.lang Package: Classes like String and Arrays.

To use a collection class, you need to import it:

import java.util.ArrayList;

public class Main {
    public static void main(String[] args) {
        ArrayList<Integer> numbers = new ArrayList<>();
        numbers.add(10);
        System.out.println(numbers);
    }
}

8. Object Storage

  • Collections store all data as objects, including primitives, which are autoboxed into their wrapper classes:

    • intInteger, doubleDouble, etc.

Example:

ArrayList<Integer> numbers = new ArrayList<>();
numbers.add(10); // Autoboxing
numbers.add(20);
System.out.println(numbers); // Output: [10, 20]

By organizing the framework into interfaces and classes, Java Collections ensure flexibility, efficiency, and ease of use for data management.

1. ArrayList: Introduction to List Interface

1.1 Overview

  • ArrayList is an in-built class in the Java Collection Framework.

  • Internally, it follows the Dynamic Array Data Structure.

  • It implements the List Interface, allowing indexed-based data access and duplicate entries.

  • Efficient operations can be performed, especially for insertion and deletion at the rear end.

1.2 Characteristics

  1. Dynamic Nature:

    • ArrayList dynamically grows as elements are added, especially at the rear end.

    • The size increases or decreases automatically based on data.

  2. Indexed Access:

    • Allows accessing elements via indices.

    • Efficient for retrieval operations using indices.

  3. Duplicate Entries:

    • Supports adding duplicate elements.
  4. Ease of Use:

    • Built-in methods simplify data operations like insertion, deletion, and modification.

1.3 Important Points

  1. Data in collections is stored as objects.

  2. Both homogeneous and heterogeneous data types can be stored.

  3. Entire collections can be added to another collection.

  4. Dynamic Growth: Size adjusts automatically based on operations.

  5. Elements can be inserted at any index (front, middle, or rear).

  6. Efficient for adding data at the rear end.

  7. Replacing Elements: Adds data at a specific index by shifting the subsequent elements.

  8. Inefficiency in Middle Insertions: Time-consuming due to shifting.

  9. Can handle large amounts of data effectively.


2. Dynamic Array

  • A Dynamic Array grows automatically as new elements are added to the rear end.

  • The last position in the array is termed the rear end.

  • It overcomes the fixed-size limitation of regular arrays by dynamically resizing.


3. Operations Using ArrayList

3.1 Adding Elements

  • Elements can be added using the add() method.

  • Supports adding both homogeneous and heterogeneous data.

3.2 Replacing Elements

  • Elements can be replaced at specific indices using the add(int index, Object element) method.

  • Example: Adding at the second index shifts all subsequent elements.

3.3 Combining Collections

  • Entire collections can be added to another collection using the addAll() method.

4. Example: ArrayList Operations

import java.util.*;

public class Collection2 {
    public static void main(String[] args) {

        // Collection-1: Homogeneous data is added
        ArrayList<Integer> al1 = new ArrayList<>();
        al1.add(10);
        al1.add(20);
        al1.add(30);
        System.out.println(al1); // Output: [10, 20, 30]
        System.out.println("---------------");

        // Collection-2: Heterogeneous data is added
        ArrayList<Object> al2 = new ArrayList<>();
        al2.add("Rohit");
        al2.add(28);
        al2.add('b');
        al2.add(18.5);    
        System.out.println(al2); 
        System.out.println("----------------");

        // Collection-3: Add one collection to another
        ArrayList<Object> al3 = new ArrayList<>();
        al3.addAll(al2);
        System.out.println(al3); 
        System.out.println("----------------");

        // Collection-4: Replace elements in the collection
        ArrayList<Integer> al4 = new ArrayList<>();
        al4.add(11);
        al4.add(12);
        al4.add(13);
        al4.add(14);
        al4.add(15);
        System.out.println("Existing data: " + al4);

        // Adding elements at specific indices
        al4.add(2, 28);  // Insert at index 2
        System.out.println("After replacing at index 2: " + al4);

        al4.add(0, 5);  // Insert at front
        System.out.println("After replacing at front: " + al4);

        al4.add(55);  // Add at the rear end
        System.out.println("After adding at rear end: " + al4);  
    }
}

Output:

[10, 20, 30]
---------------
[Rohit, 28, b, 18.5]
----------------
[Rohit, 28, b, 18.5]
----------------
Existing data: [11, 12, 13, 14, 15]
After replacing at index 2: [11, 12, 28, 13, 14, 15]
After replacing at front: [5, 11, 12, 28, 13, 14, 15]
After adding at rear end: [5, 11, 12, 28, 13, 14, 15, 55]

5. Key Observations

  1. ArrayList is efficient for storing and managing large amounts of data.

  2. Insertion at the middle or front causes shifting of elements, making it less efficient.

  3. Dynamic nature allows automatic resizing, overcoming fixed-size limitations.

  4. Homogeneous and Heterogeneous data can coexist in the same collection.


Here is the expanded and updated explanation with "Rohit" replacing "Ashish":


2. LinkedList

  • LinkedList is a part of two interfaces: List and Deque.

  • It internally uses a Doubly Linked List Data Structure.

  • Both homogeneous and heterogeneous data types can be stored.

  • Every data in a collection is stored as an object.


Structure of LinkedList

LinkedList consists of nodes connected by pointers. Each node contains:

  1. Data: The actual value stored.

  2. Pointers: Links to the next and previous nodes.

Example of LinkedList with three nodes:

              Node-1  ------------->   Node-2 -------------> Node-3 
               10     <-------------    20    <------------    40

Important Points

  1. Efficient Data Addition:

    • Adding data does not require shifting elements, as in an array. Instead, a new node is created and linked with existing nodes.

    • For example:
      ll1.add(1, 30);

Updated LinkedList:

                 Node-1  ------------->   New Node  ----------> Node-2 -------------> Node-3 
                  10     <-------------      30      <---------   20    <------------    40
  • This makes LinkedList a faster and efficient way to add objects to a collection.
  1. Index-Based Operations:

    • Data can be added based on an index.
  2. Collection Integration:

    • An entire collection can be added into another collection.
  3. Unique Methods:

    • LinkedList provides methods not available in ArrayList:

      • addFirst(): Adds an element at the front.

      • addLast(): Adds an element at the rear.

  4. Data Types:

    • Both heterogeneous and homogeneous data can be stored in LinkedList.
  5. Recommended for Insertions:

    • LinkedList is preferred when data needs to be added at the front, rear, or any specific position.
  6. Dispersed Memory Allocation:

    • LinkedList uses non-contiguous memory allocation, meaning the bytes do not need to be stored consecutively.
  7. Efficiency:

    • Faster than ArrayList for insertions due to the absence of shifting.

    • Recommended for insertion-heavy operations.

  8. Duplicate Data:

    • Duplicate elements are allowed in both ArrayList and LinkedList.

Example: LinkedList in Java

import java.util.*;

public class Collection3 {

    public static void main(String[] args) {
        LinkedList ll1 = new LinkedList();
        ll1.add(10);
        ll1.add("ineuron");
        ll1.add(20);
        System.out.println(ll1);

        // Adding data at the front
        ll1.addFirst("Rohit");
        System.out.println(ll1);

        // Adding data at the middle
        ll1.add(1, "Hyd");
        System.out.println(ll1);

        // Adding data at the rear
        ll1.addLast("infosys");
        System.out.println(ll1);
    }
}

Output:

[10, ineuron, 20]
[Rohit, 10, ineuron, 20]
[Rohit, Hyd, 10, ineuron, 20]
[Rohit, Hyd, 10, ineuron, 20, infosys]

Explanation of Code:

  1. Initial Addition:

    • ll1.add() adds elements to the LinkedList.

    • Output after adding 10, "ineuron", and 20:
      [10, ineuron, 20]

  2. Adding to the Front:

    • ll1.addFirst("Rohit") adds "Rohit" at the beginning.
      Output: [Rohit, 10, ineuron, 20]
  3. Adding to the Middle:

    • ll1.add(1, "Hyd") inserts "Hyd" at index 1.
      Output: [Rohit, Hyd, 10, ineuron, 20]
  4. Adding to the Rear:

    • ll1.addLast("infosys") appends "infosys" to the end.
      Output: [Rohit, Hyd, 10, ineuron, 20, infosys]

When to Use LinkedList?

  • When insertion and deletion operations are frequent.

  • When you need dynamic memory allocation without shifting data.

LinkedList is more efficient than ArrayList for scenarios involving frequent updates to the data structure.

Important Questions on Arrays and ArrayList


1Q. After these classes, is the array concept gone or outdated?

Ans: No, the array concept is neither gone nor outdated.

  • When to use arrays or collections:

    • For large datasets, both arrays and collections like ArrayList or LinkedList can be used depending on the requirements.

    • Arrays are still relevant for specific scenarios due to their performance and simplicity.


2Q. When to use Array over ArrayList?

Ans:

  • Use Array when:

    1. Size is Known: The size of the data is predetermined or fixed.

    2. Homogeneous Data: All elements are of the same type (e.g., only integers, only strings).

  • Why Arrays?

    1. Performance: Arrays are faster than ArrayList because they do not involve additional overhead like autoboxing and object creation.

    2. Simplicity: Arrays are straightforward and have less memory overhead.


Key Differences Between Array and ArrayList

FeatureArrayArrayList
SizeFixed size (declared at creation).Dynamic size (can grow/shrink).
Data TypeCan store primitive and object data.Stores objects (autoboxing for primitives).
SpeedFaster due to direct memory allocation.Slower due to object wrapping and resizing.
Memory AllocationContinuous memory allocation.Dispersed memory allocation.
UsageBest for known and fixed-size homogeneous data.Best for dynamic or unknown-size collections.

Why is Array Faster Than ArrayList?

  1. Direct Storage:

    • Arrays directly store primitive data types like int, float, etc.

    • Example: int[] arr = {1, 2, 3};

  2. No Object Conversion:

    • ArrayList stores everything as an object, requiring autoboxing (conversion of primitives to objects).

    • Example:

      • ArrayList<Integer> list = new ArrayList<>();

      • When adding an integer, 10 is converted to an Integer object internally.

  3. Memory Overhead:

    • Arrays avoid the overhead of creating new objects and storing references.

    • In ArrayList, a new instance has to be created for each primitive value.


Autoboxing in ArrayList

  • Definition:

    • Autoboxing is the automatic conversion of primitive data types (e.g., int, float) to their corresponding wrapper classes (e.g., Integer, Float) in Java.
  • How it Works:

    • When adding a primitive to an ArrayList, it is wrapped into its object equivalent.

    • Example:

        ArrayList<Integer> list = new ArrayList<>();
        list.add(10); // 10 is autoboxed to Integer.valueOf(10)
      
  • Impact on Performance:

    • The additional steps involved in creating objects and storing them in ArrayList make it slower than arrays.

Important Points to Remember

  1. Arrays:

    • Faster than ArrayList due to direct storage of primitives.

    • Suitable for fixed-size and homogeneous data.

    • Can store both primitive and object data.

  2. ArrayList:

    • More flexible with dynamic sizing.

    • Stores all elements as objects, requiring memory for both the object and its reference.

    • Involves autoboxing for primitive data.


When to Use What?

  • Array:

    • Known size and homogeneous data.

    • Example: A fixed list of student grades.

  • ArrayList:

    • Unknown size or need for dynamic operations like resizing.

    • Example: A growing list of employee names in a company.


By understanding these distinctions, you can choose the right data structure based on your program's requirements.

Other Series:


Connect with Me
Stay updated with my latest posts and projects by following me on social media:

  • LinkedIn: Connect with me for professional updates and insights.

  • GitHub: Explore my repository and contributions to various projects.

  • LeetCode: Check out my coding practice and challenges.

Your feedback and engagement are invaluable. Feel free to reach out with questions, comments, or suggestions. Happy coding!


Rohit Gawande
Full Stack Java Developer | Blogger | Coding Enthusiast