Heap actual combat (dynamic data flow for top k large elements, dynamic data flow for median)

Heap application (dynamic data for top k, dynamic data for median)

Find top k large elements in dynamic data collection

The first largest, the second largest... The k-th largest 
k is the smallest in this group, 

so to build a small top heap, you 
only need to maintain a small top heap of size k, which can 

be the element (newCome)> top of the heap Element (smallTop), indicating that the incoming element is eligible to compete with the top of the heap. At this time, the top of the heap is kicked out. At this time, the incoming element is placed on the top of the heap. 
newCome>smallTop, the left and right children of smallTop>smallTop, so newCome cannot be confirmed With the size relationship between the left and right children of 
smallTop, find the smallest element in the left and right child nodes of newCome and smallTop to exchange with newCome, and then continue to compare the size relationship between newCome and the exchanged left and right children. 
Continue this process (heaping).

If each time we ask the top K big data, we recalculate based on the current data, then the time complexity is O(nlogK), and n represents the size of the current data

Part of the code
topn.php

static_data = $ [2,5,3,1,0,7,6,10]; 


// 3 Large 
/ * 
2,5,3 2 
2,5,3 12 
2,5,3,1,0 2 
2,5,3,1,0,7 3 
2,5,3,1,0,7,6 5 
2,5,3,1,0,7,6,10 6 

Maintain a small top heap size It is 3 
*/ 
$heap=new Heap(3); //Create a small top heap with a size of 3 
foreach ($static_data as $v){ 
    echo $heap->topn($v).PHP_EOL; 
}

heap.php

public function topn($data)
{
    //堆满了
    if ($this->isFull()) {
        if ($data > $this->dataArr[1]) {
            $this->dataArr[1] = $data;
            $this->smallHeapFirst();
        }
    } else {
        $this->dataArr[$this->count + 1] = $data;
        $this->count++;
        $this->smallHeapLast();

    }
    return $this->dataArr[1];

}

Complete code

Dynamic data flow to find the median

2,3,1,7,5 returns 3 
1,3,1,7,5,4 returns 3,4 The 
data continues to go in, every time a number comes in, ask who is the median

Step1 analysis of ideas:

The so-called median is one or two elements in the middle. The property that the median satisfies is that the numbers before the median are all, and the numbers after the median are all larger than it. 
First analyze with odd numbers, and even numbers have the same principle. 
1. If it is a fixed data set, for example, there are n data, the median is n/2+1 elements. At this time, you only need to maintain a small top heap with a size of (n/2+1). 
    Why? It can’t be a large top pile. If the top of the pile is the largest, there is no way to know other than the maximum value of this group set. 
    If it is a small top pile, the top of the pile is the smallest. For example, the data set is 5 and the third largest. The element must be smaller than the first two numbers that have been compared, that is, the intermediate element, 
    
    but now it is a dynamic data stream. Every time an element comes in, 
    
    the difference between the intermediate element and the static data will be asked : I don’t know the size of the small top heap that is maintained. the 
    time needed to maintain the reactor 2 to the data, respectively, into the two stacks 
    1 large stack top, the top of a small stack 1, a large stack top data stack is less than the minor vertex data, when to ask the when 
    if it is an even number of data elements of the two stacks of the top of the stack is the middle element of 
    the top of the stack if the odd number of data elements, two more data heap stack is the intermediate element

step1 step analysis

The big top heap is big, the top element bigpeak, the size is bigsize, the small top heap is called small, the top element is smallpeak, the size is smallsize 


comes in 1 element, big is empty: put big 
             big is not empty: 
                        put element <bigpeak, put into big 
                        put element> bigpeak, put into small 
             
             put 1 element after completion, 
                    if bigsize-smallsize>1, remove the top element of the big element and pile up big, put the removed element then the stack of small 
                    if bigsize-smallsize <1, the top of the stack elements of the small element in the heap remove small, to then remove the element into big heap of

findmiddle.php

$arr = [9, 8, 11, 4, 2, 6, 5, 1, -1, 3, 20, 10];
//$arr=[9,8,11,4,2,6,5,100];

findMiddle($arr);

//动态数据实时获取中位数
function findMiddle($arr)
{
    //大顶堆
    $bigHeap = new Heap(0, 1);
    //小顶堆
    $smallHeap = new Heap(0, 0);

    foreach ($arr as $k => $v) {
        if ($bigHeap->isEmpty()) {
            $bigHeap->insert($v);
        } else {
            $bigPeak = $bigHeap->peak();
            if ($v < $bigPeak) {
                $bigHeap->insert($v);
            } else {
                $smallHeap->insert($v);
            }

            if ($bigHeap->count - $smallHeap->count > 1) {
                $bigPeak = $bigHeap->deleteFirst();
                $smallHeap->insert($bigPeak);
            } elseif ($smallHeap->count - $bigHeap->count > 1) {
                $smallPeak = $smallHeap->deleteFirst();
                $bigHeap->insert($smallPeak);
            }

        }
        //实时获取中位数
        echo implode(',', midPeak($bigHeap, $smallHeap)) . PHP_EOL;
    }


}

function midPeak($heap1, $heap2)
{
    if ($heap1->count == $heap2->count) {
        $midArr = [$heap1->peak(), $heap2->peak()];
    } elseif ($heap2->count > $heap1->count) {
        $midArr = [$heap2->peak()];
    } else {
        $midArr = [$heap1->peak()];
    }
    return $midArr;
}

Process analysis

Several important points
  • When the number of elements in the two heaps is equal, the middle element is the top of the two heaps, or the top of the
    heap with more elements
  • When the difference in the number of elements between the two is greater than 1, the number of elements in the heap should be adjusted
The elements inserted in sequence are 9, 8, 11, 4, 2, 6, 5, 1, -1, 3, 20, 10. The large top heap is called big, and the small top heap is called small. The respective sizes are bigsize, smallsize, The top of the heap is bigpeak, smallpeak, 

9 comes in. big is empty, insert big, bigsize-smallsize=1 is not greater than 1.        
            At this time bigsize>smallsize middle element is bigpeak, which is [9] 
8 comes in 8<bigpeak, insert big, bigsize-smallsize =2 is greater than 1, then         
            bigpeak needs to be deleted from Big, big is heaped, put into small, and small is heaped. At this time, bigsize=smallsize, so the middle element is [bigpeak,smallpeak], which is [8,9] 
11 comes in 11> bigpeak(8), 11 is inserted into small, at this time smallsize=2, bigsize=1, the difference is not greater than 1, because smallsize>bigsize, the middle element is [smallpeak] which is [9] 
4 comes in 4<bigpeak(8), 4 Insert into big, big heap, at this time bigsize=2,smallsize=2, the middle element is [bigpeak,smallpeak], which is [8,9]

Heap at this time

2 comes in 2<8, 2 is inserted into big and then heaped up, bigsize=3, smallsize=2 so the median is [8] at this time 
6 comes in 6< 8, 6 is heaped up after inserting big as the following figure

 At this time, bigsize=4,smallsize=2,bigsize-smallsize>1, delete the top element of big heap, and then insert the deleted element into small. After heaping, 
 big and small are shown in the figure below, the middle element Bit [bigpeak,smallpeak] is [6,8]

5 comes in 5<bigpeak(8), 5 is inserted into big heaping. 
At this time, Bigsize=4, smallsize=3, the difference is not greater than 1, and the middle element bit bigpeak is the 
same as the steps after [6]

Inserting data requires heapization, so the time complexity becomes O(logn), but to find the median, we only need to return the top element of the large top heap, so the time complexity is O(1)

Complete code

Guess you like

Origin blog.51cto.com/huangkui/2677746