Find Median From Data Stream

To solve this coding challenge, we need to develop a system that can efficiently manage the insertion of numbers and calculate the median of the numbers stored so far. The challenge requires implementing a
MedianFinder
class with methods to add numbers and to find the median from the current set of numbers.

# Explanation:

Approach
  1. Data Structure Choice :
    • We are tasked with finding the median efficiently as numbers are added randomly.
    • One efficient way to achieve this is by using two heaps (priority queues):
      • A max-heap to store the smaller half of the numbers.
      • A min-heap to store the larger half of the numbers.
    • This approach allows us to balance the numbers such that the median is always easy to find.
  2. Balancing the Heaps :
    • When a new number is added, determine which heap it should go into.
    • Always ensure that the heaps remain balanced:
      • The max-heap can have at most one extra element compared to the min-heap.
    • If the max-heap has more elements than the min-heap, rebalance by moving the top of the max-heap to the min-heap.
  3. Finding the Median :
    • If the total number of elements is odd, the median is the top of the max-heap.
    • If the total number is even, the median is the average of the tops of both heaps.
    Step-by-step Explanation
  4. Initialization :
    • We'll initialize two empty heaps (one max-heap and one min-heap).
  5. Adding Numbers :
    • For each number added, decide which heap it should go into:
      • Insert the number into the max-heap if it's smaller than or equal to the maximum element of the max-heap.
      • Otherwise, insert it into the min-heap.
    • Rebalance the heaps if necessary to maintain the size property.
  6. Finding the Median :
    • If the total number of elements (n) is odd, the median is the root of the max-heap.
    • If n is even, the median is the average of the roots of the two heaps.
Detailed Steps in Pseudocode
Initialization:
                                            
class MedianFinder:
    # Constructor to initialize the class
    function __init__:
        maxHeap = []  # This represents the smaller half (max-heap)
        minHeap = []  # This represents the larger half (min-heap)

                                        
Adding Numbers:
                                            
    # Method to add a number to the data structure
    function addNum(num):
        # First, add to maxHeap and then push the largest from maxHeap to minHeap
        insert -num into maxHeap
        insert -extractMax(maxHeap) into minHeap
        
        # If minHeap has more elements than maxHeap, balance them
        if size(minHeap) > size(maxHeap):
            insert -extractMin(minHeap) into maxHeap

                                        
Finding the Median:
                                            
    # Method to find the median of the numbers added so far
    function findMedian:
        if size(maxHeap) is greater than size(minHeap):
            return -root(maxHeap)
        else:
            return (root(minHeap) - root(maxHeap)) / 2.0

                                        
By following this method:
  • The max-heap (referenced with negative values) will store the smaller half of the numbers in a way that allows efficient extraction of the maximum value.
  • The min-heap will store the larger half of the numbers directly.
  • The heaps will always be balanced or have at most a difference of one element in size, ensuring that finding the median is straightforward and efficient.