Find Median From Data Stream
To solve this coding challenge, we need to develop a system that can efficiently manage the insertion of numbers and calculate the median of the numbers stored so far. The challenge requires implementing a
class with methods to add numbers and to find the median from the current set of numbers.
MedianFinder
# Explanation:
Approach
- Data Structure Choice :
- We are tasked with finding the median efficiently as numbers are added randomly.
- One efficient way to achieve this is by using two heaps (priority queues):
- A max-heap to store the smaller half of the numbers.
- A min-heap to store the larger half of the numbers.
- This approach allows us to balance the numbers such that the median is always easy to find.
- Balancing the Heaps :
- When a new number is added, determine which heap it should go into.
- Always ensure that the heaps remain balanced:
- The max-heap can have at most one extra element compared to the min-heap.
- If the max-heap has more elements than the min-heap, rebalance by moving the top of the max-heap to the min-heap.
- Finding the Median :
- If the total number of elements is odd, the median is the top of the max-heap.
- If the total number is even, the median is the average of the tops of both heaps.
- Initialization :
- We'll initialize two empty heaps (one max-heap and one min-heap).
- Adding Numbers :
- For each number added, decide which heap it should go into:
- Insert the number into the max-heap if it's smaller than or equal to the maximum element of the max-heap.
- Otherwise, insert it into the min-heap.
- Rebalance the heaps if necessary to maintain the size property.
- Finding the Median :
- If the total number of elements (n) is odd, the median is the root of the max-heap.
- If n is even, the median is the average of the roots of the two heaps.
Step-by-step Explanation
Detailed Steps in Pseudocode
Initialization:
class MedianFinder:
# Constructor to initialize the class
function __init__:
maxHeap = [] # This represents the smaller half (max-heap)
minHeap = [] # This represents the larger half (min-heap)
Adding Numbers:
# Method to add a number to the data structure
function addNum(num):
# First, add to maxHeap and then push the largest from maxHeap to minHeap
insert -num into maxHeap
insert -extractMax(maxHeap) into minHeap
# If minHeap has more elements than maxHeap, balance them
if size(minHeap) > size(maxHeap):
insert -extractMin(minHeap) into maxHeap
Finding the Median:
# Method to find the median of the numbers added so far
function findMedian:
if size(maxHeap) is greater than size(minHeap):
return -root(maxHeap)
else:
return (root(minHeap) - root(maxHeap)) / 2.0
By following this method:
- The max-heap (referenced with negative values) will store the smaller half of the numbers in a way that allows efficient extraction of the maximum value.
- The min-heap will store the larger half of the numbers directly.
- The heaps will always be balanced or have at most a difference of one element in size, ensuring that finding the median is straightforward and efficient.