C#, Numerical Calculation - Calculation Method and Source Program of Heap Select

1 Brief introduction

HeapSelect is an algorithm for selecting the Kth largest element in an array. It is a variant of the selection problem that involves finding a specific element in an unordered or partially ordered set.

Algorithm outline: The array is converted to a max heap, then the root node is repeatedly removed and replaced with the next largest element until the Kth largest element is found.

2 Heapselect


We saw a randomized algorithm with n + O(log n) comparison expected. Can we get the same performance out of an unrandomized algorithm?
Think about basketball tournaments, involving n teams. We form a complete binary tree with n leaves; each internal node represents an elimination game. So at the bottom level, there are n/2 games, and the n/2 winners go on to a game at the next level of the tree. Assuming the better team always wins its game, the best team always wins all its games, and can be found as the winner of the last game.

(This could all easily be expressed in pseudo code. So far, it's just a complicated algorithm for finding a minimum or maximum, which has some practical advantages, namely that it's parallel (many games can be played at once) and fair (in contrast, if we used algorithm min above, the teams placed earlier in L would have to play many more games and be at a big disadvantage).

Now, where in the tree could the second best team be? This team would always beat everyone except the eventual winner. But it must have lost once (since only the overall winner never loses). So it must have lost to the eventual winner. Therefore it's one of the log n teams that played the eventual winner and we can run another tournament algorithm among these values.

If we express this as an algorithm for finding the second best, it uses only n + ceil(log n) comparisons, even better than the average case algorithm above.

If you think about it, the elimination tournament described above is similar in some ways to a binary heap. And the process of finding the second best (by running through the teams that played the winner) is similar to the process of removing the minimum from a heap. We can therefore use heaps to extend idea to other small values of k:

    heapselect(L,k)
    {
    heap H = heapify(L)
    for (i = 1; i < k; i++) remove min(H)
    return min(H)
    }
The time is obviously O(n + k log n), so if k = O(n/log n), the result is O(n). Which is interesting, but still doesn't help for median finding.

3 C# source program

using System;

namespace Legalsoft.Truffer
{
    public class Heapselect
    {
        private int m { get; set; }
        private int n { get; set; }
        private int srtd { get; set; }
        private double[] heap { get; set; }

        public Heapselect(int mm)
        {
            this.m = mm;
            this.n = 0;
            this.srtd = 0;
            this.heap = new double[mm];
            for (int i = 0; i < mm; i++)
            {
                heap[i] = 1.0E99;
            }
        }

        public void add(double val)
        {
            if (n < m)
            {
                heap[n++] = val;
                if (n == m)
                {
                    Array.Sort(heap);
                }
            }
            else
            {
                if (val > heap[0])
                {
                    heap[0] = val;
                    for (int j = 0; ;)
                    {
                        int k = (j << 1) + 1;
                        if (k > m - 1)
                        {
                            break;
                        }
                        if (k != (m - 1) && heap[k] > heap[k + 1])
                        {
                            k++;
                        }
                        if (heap[j] <= heap[k])
                        {
                            break;
                        }
                        Globals.SWAP(ref heap[k], ref heap[j]);
                        j = k;
                    }
                }
                n++;
            }
            srtd = 0;
        }

        public double report(int k)
        {
            int mm = Math.Min(n, m);
            if (k > mm - 1)
            {
                throw new Exception("Heapselect k too big");
            }
            if (k == m - 1)
            {
                return heap[0];
            }
            if (srtd == 0)
            {
                Array.Sort(heap);
                srtd = 1;
            }
            return heap[mm - 1 - k];
        }
    }
}
 

4 Reference C code

/***********************************************************************
 * Author: Isai Damier
 * Title: Find the Greatest k values
 * Project: geekviewpoint
 * Package: algorithms
 *
 * Statement:
 *   Given a list of values, find the top k values.
 *
 * Time Complexity: O(n log n)
 * 
 * Sample Input: {21,3,34,5,13,8,2,55,1,19}; 4
 * Sample Output: {19,21,34,55}
 * 
 * Technical Details: This selection problem is a classic and so has
 *   many very good solutions. In fact, any sorting algorithm can be
 *   modified to solve this problem. In the worst case, the problem
 *   can indeed be reduced to a sorting problem: where the collection
 *   is first sorted and then the element at indices 0 to k-1 are
 *   retrieved.
 *   
 *   Presently the problem is solved using a modified version of
 *   heapsort called heapselect.
 **********************************************************************/
 public int[] heapselectTopK(int[] G, int k) {
  int last = G.length - 1;
  //convert array to heap in O(n)
  int youngestParent = last / 2;//l = 2*p+1: p=(l-1)/2
  for (int i = youngestParent; i >= 0; i--) {
    moveDown(G, i, last);
  }
  //sort up to k (i.e. find the kth)
  int limit = last - k;
  for (int i = last; i > limit; i--) {
    if (G[0] > G[i]) {
      swap(G, 0, i);
      moveDown(G, 0, i - 1);
    }
  }
  return Arrays.copyOfRange(G, G.length - k, G.length);
}
 
private void moveDown(int[] A, int first, int last) {
  int largest = 2 * first + 1;
  while (largest <= last) {
    if (largest < last && A[largest] < A[largest + 1]) {
      largest++;
    }
    if (A[first] < A[largest]) {
      swap(A, first, largest);
      first = largest;
      largest = 2 * first + 1;
    } else {
      return;
    }
  }
}

Guess you like

Origin blog.csdn.net/beijinghorn/article/details/132051642