Find Median from Data Stream
To solve this challenge, the key idea is to maintain two priority queues (or heaps): a max heap for the lower half of the numbers and a min heap for the upper half. This way, the max heap always contains the smaller half of the numbers, with its maximum value at the top, while the min heap contains the larger half of the numbers, with its minimum value at the top. By doing this, the median will always be among the top elements of these two heaps.
Here's a step-by-step explanation of the solution, followed by the pseudo code:
1. Initialization:
- Initialize two heaps - a max heap (maxHeap) for the lower half of the numbers and a min heap (minHeap) for the upper half.
2. Adding a Number (addNum):
- First, decide which heap to add the new number (num) to. If the maxHeap is empty or the new number is less than the maximum of the lower half (maxHeap's top), add it to the maxHeap. Otherwise, add it to the minHeap.
- After adding the new number, check the size difference between the two heaps. The heaps should either have the same size or one heap should have one more element than the other. If the size difference exceeds 1, rebalance the heaps by moving the top element from the larger heap to the smaller heap.
3. Finding the Median (findMedian):
- If one heap has more elements than the other, the median is the top element of the heap with more elements.
- If both heaps have the same number of elements, the median is the average of the tops of both heaps.
Pseudo Code:
Class MedianFinder
Initialize:
maxHeap = new MaxHeap() // Lower half
minHeap = new MinHeap() // Upper half
Function addNum(num):
If maxHeap.isEmpty() OR num < maxHeap.peek():
maxHeap.add(num)
Else:
minHeap.add(num)
// Rebalance heaps if necessary
If maxHeap.size() - minHeap.size() > 1:
minHeap.add(maxHeap.poll())
Else If minHeap.size() - maxHeap.size() > 1:
maxHeap.add(minHeap.poll())
Function findMedian():
If maxHeap.size() > minHeap.size():
Return maxHeap.peek()
Else If minHeap.size() > maxHeap.size():
Return minHeap.peek()
Else:
Return (maxHeap.peek() + minHeap.peek()) / 2
Optimization for Range-Based Data
For the follow-up questions, if all integer numbers are in a specific range, you can optimize the solution by using an array or a hashmap to count the occurrences of each number. This would allow for a more efficient way to find the median without maintaining two heaps, especially when the range of numbers is small. For numbers mostly in a certain range with some outliers, a hybrid approach can be used where the array or hashmap is used for numbers within the range, and heaps or another data structure for numbers outside of the range.