Heap sort-topk problem

Now there are n numbers. Design an algorithm to get the top k largest numbers.

Solution: Slicing after sorting O(nlogn)

But if there are tens of thousands of elements, just taking the first few will cause a lot of waste.

If you use bubble sort, you need to perform only the first k bubbles (selection sort, insertion sort) O(kn)

A more convenient way is to use heap sort:      O(nlogk)

>Take the first k elements of the list to create a small root heap. The top of the heap is the kth largest number.

>Traverse the original list backwards in sequence. If the element in the list is less than the top of the heap, ignore the element; if it is greater than the top of the heap, replace the top of the heap with the element and make an adjustment to the heap.

>After traversing all elements of the list, pop the top of the heap in reverse order.


Adjust part of the code

It’s the same as the code for heap sort, except that the signs are reversed in two places.

def sift(li,low,high):
    i=low
    j=2*i+1
    tmp=li[low]
    while j<=high:
        if j+1<=high and li[j+1]<li[j]:
            j=j+1    # j指向右孩子
        if li[j]<tmp:
            li[i]=li[j]
            i=j
            j=2*i+1
        else:
            li[i]=tmp
            break
    else:
        li[i]=tmp

Rest of code:

def topk(li,k):
    heap=li[0:k]
    for i in range((k-2)//2,-1,-1):
        sift(heap,i,k-1)
        # 以上为建堆
    for i in range(k,len(li)-1):
        if li[i]>heap[0]:
            heap[0]=li[i]
            sift(heap,0,k-1)
            # 以上为遍历
    # 开始把小根堆遍历出来
    for i in range(k-1,-1,-1):
        # i一直指向最后一个元素
        heap[0], heap[i] = heap[i], heap[0]
        sift(heap,0, i - 1)
    return heap


Guess you like

Origin blog.csdn.net/qq_64685283/article/details/132496673