How to find the median in number from 500 million?

Title Description

The median figure out the number from 500 million in. After data sorting, the number is in the middle position is the median. When the sample number is odd, the median of  (N+1)/2 the number; when the sample number is even, the median of  N/2 the number of the  1+N/2 average number.

Ideas to answer

This question is not if the memory size limit, you can all read the number in the sort memory to find the median. But the best time complexity of sorting algorithms are  O(NlogN). Here the use of other methods.

Method a: Method bioheap

It maintains two stacks, a heap big top, the top of a small heap. Big top stack the maximum number of less than or equal the minimum number of small top stack; difference in the number of elements to ensure that no more than two stacks.

If the total number of data is an even number, when the two stacks completed, the median is the average of these two elements in top of the stack. When the total number of data is an odd number, according to the size of two stacks, the median number of data in a certain stack top of the stack.

class MedianFinder {
    
    private PriorityQueue<Integer> maxHeap;
    private PriorityQueue<Integer> minHeap;

    /** initialize your data structure here. */
    public MedianFinder() {
        maxHeap = new PriorityQueue<>(Comparator.reverseOrder());
        minHeap = new PriorityQueue<>(Integer::compareTo);
    }
    
    public void addNum(int num) {
        if (maxHeap.isEmpty() || maxHeap.peek() > num) {
            maxHeap.offer(num);
        } else {
            minHeap.offer(num);
        }
        
        int size1 = maxHeap.size();
        int size2 = minHeap.size();
        if (size1 - size2 > 1) {
            minHeap.offer(maxHeap.poll());
        } else if (size2 - size1 > 1) {
            maxHeap.offer(minHeap.poll());
        }
    }
    
    public double findMedian() {
        int size1 = maxHeap.size();
        int size2 = minHeap.size();
        
        return size1 == size2 
            ? (maxHeap.peek () + minHeap.peek ()) * 1.0 / 2 
            : (size1 > size2? maxHeap.peek (): minHeap.peek ()); 
    } 
}

 

见 LeetCode No.295:https://leetcode.com/problems/find-median-from-data-stream/

This above method, all data needs to be loaded into memory. When a large amount of data, it is not the case, therefore, this method is applicable to the case where a smaller amount of data. 500 million number, each number occupies 4B, a total of 2G of memory. If the available memory is less than 2G, you can not use this method, and another method described below.

Method Two: Divide and Conquer

The idea is to divide and conquer a big problem gradually converted into a smaller problem to solve.

For this question, sequential read this 500 million figure, to read digital num, if it corresponds to the highest bit is a binary 1, put the figure written f1, otherwise in written f0. By this step, you can put the number of 500 million divided into two parts, and the number of f0 in both the number f1 is greater than (most significant bit is the sign bit).

After the division, you can easily know the median f0 or f1 in. Suppose f1 100 million in number, it must be in the median f0, and the f0 is, the average of a small number to largest number of 150 million and behind it.

Prompt, the median number of 500 million and 250 million are adjacent on the right a number averaged. If f1 one hundred million number, then the median is the number two f0 starting from the first number to obtain an average of 150 million.

F0 can be used for high times will continue binary file into two, so divided on until the divided file can be loaded into memory, the data is loaded into memory after ordering directly, find the median.

Note that, when the total number of data is an even number, if the data file is divided in two have the same number, then the minimum average of the median is larger the smaller the maximum value of the data file in the data file.

Methods

Minute root treatment, McCaw!

Guess you like

Origin www.cnblogs.com/qmillet/p/12562295.html