Three linear sorting algorithms counting sorting, bucket sorting and radix sorting

 

https://www.byvoid.com/zht/blog/sort-radix

[non-comparison based sorting]

In computer science, sorting is a basic algorithm technology, and many algorithms are based on it. Different sorting algorithms have different time and space costs. There are many kinds of sorting algorithms, such as our most commonly used algorithms such as quick sort and heap sort, these algorithms need to compare the data in the sequence, because it is called comparison-based sorting .

Comparison-based sorting algorithms cannot break O(NlogN). The simple proof is as follows:

There are N! possible permutations of N numbers, that is to say, the decision tree of the comparison-based sorting algorithm has N! leaf nodes, and the number of comparisons is at least log(N!)=O(NlogN) (Sterling formula ).

Instead of comparison-based sorting , such as counting sorting, bucket sorting, and radix sorting based on this, you can break the O(NlogN) time lower bound. It should be noted, however, that the use of non-comparison-based sorting algorithms is subject to conditions, such as element size limits, whereas comparison-based sorting does not have such constraints (within a certain range). But it is not because of the conditional restrictions that the non-comparison-based sorting algorithm becomes useless. For data with special properties in certain occasions, the non-comparison-based sorting algorithm can be very cleverly solved.

This article focuses on three linear non-comparison-based sorting algorithms: counting sort, bucket sort, and radix sort.

[count sort]

First , let's start with Counting Sort . Suppose we have an integer sequence A to be sorted, where the minimum value of the elements is not less than 0, and the maximum value is not more than K. Build a linear table C of length K to record the number of elements not greater than each value.

The algorithm idea is as follows:

  1. Scan sequence A, use the value of each element in A as an index, and fill in the number of occurrences in C. At this time, C[i] can represent the number of elements in A whose value is i.
  2. For C to accumulate from the beginning, make C[i]<-C[i]+C[i-1]. In this way, C[i] represents the number of elements in A whose value is not greater than i .
  3. Output the results according to the statistics.

From the linear table C, we can easily find the sorted data, define B as the sequence of the target, and Order[i] as the position of the i-th element in A, then the following methods can be used for statistics.

/*
 * Problem: Counting Sort
 * Author: Guo Jiabao
 * Time: 2009.3.29 16:27
 * State: Solved
 * Memo: 計數排序
*/
#include <iostream>
#include <cstdio>
#include <cstdlib>
#include <cmath>
#include <cstring>
using namespace std;
void CountingSort(int *A,int *B,int *Order,int N,int K)
{
    int *C=new int[K+1];
    int i;
    memset(C,0,sizeof(int)*(K+1));
    for (i=1;i<=N;i++) //把A中的每個元素分配
        C[A[i]]++;
    for (i=2;i<=K;i++) //統計不大於i的元素的個數
        C[i]+=C[i-1];
    for (i=N;i>=1;i--)
    {
        B[C[A[i]]]=A[i]; //按照統計的位置,將值輸出到B中,將順序輸出到Order中
        Order[C[A[i]]]=i;
        C[A[i]]--;
    }
}
int main()
{
    int *A,*B,*Order,N=15,K=10,i;
    A=new int[N+1];
    B=new int[N+1];
    Order=new int[N+1];
    for (i=1;i<=N;i++)
        A[i]=rand()%K+1; //生成1..K的隨機數
    printf("Before CS:\n");
    for (i=1;i<=N;i++)
        printf("%d ",A[i]);
    CountingSort(A,B,Order,N,K);
    printf("\nAfter CS:\n");
    for (i=1;i<=N;i++)
        printf("%d ",B[i]);
    printf("\nOrder:\n");
    for (i=1;i<=N;i++)
        printf("%d ",Order[i]);
    return 0;
}

The program works as follows:

Before CS:
2 8 5 1 10 5 9 9 3 5 6 6 2 8 2
After CS:
1 2 2 2 3 5 5 5 6 6 8 8 9 9 10
Order:
4 1 13 15 9 3 6 10 11 12 2 14 7 8 5

Obviously, the time complexity of counting sorting is O(N+K), and the space complexity is O(N+K). This is a very efficient linear sorting algorithm when K is not very large. More importantly, it is a stable sorting algorithm , that is, the original relative position of the elements of the same value after sorting will not change (shown on the Order), which is a very important property of counting sorting, which is based on this property. , we can apply it to radix sort.

[bucket sort]

You may find that counting sorting seems to be a little tricky. For example, when we have just counted C, C[i] can represent the number of elements with the value i in A. At this time, we can scan C directly and sequentially to find the sorted results. This is indeed the case, but this method is no longer a count sort, but a bucket sort (Bucket Sort) , to be precise, a special case of bucket sort.

In this way, it is easy to write a program, which is simpler than counting and sorting, but cannot obtain a stable Order.

/*
 * Problem: Bucket Sort
 * Author: Guo Jiabao
 * Time: 2009.3.29 16:32
 * State: Solved
 * Memo: 桶排序特殊實現
*/
#include <iostream>
#include <cstdio>
#include <cstdlib>
#include <cmath>
#include <cstring>
using namespace std;
void BucketSort(int *A,int *B,int N,int K)
{
    int *C=new int[K+1];
    int i,j,k;
    memset(C,0,sizeof(int)*(K+1));
    for (i=1;i<=N;i++) //把A中的每個元素按照值放入桶中
        C[A[i]]++;
    for (i=j=1;i<=K;i++,j=k) //統計每個桶元素的個數,並輸出到B
        for (k=j;k<j+C[i];k++)
            B[k]=i;
}
int main()
{
    int *A,*B,N=1000,K=1000,i;
    A=new int[N+1];
    B=new int[N+1];
    for (i=1;i<=N;i++)
        A[i]=rand()%K+1; //生成1..K的隨機數
    BucketSort(A,B,N,K);
    for (i=1;i<=N;i++)
        printf("%d ",B[i]);
    return 0;
}

The time complexity of this special implementation is O(N+K), and the space complexity is also O(N+K), and each element is also required to be within the range of K. More generally, what if our K is so large that we cannot directly open the O(K) space?

首先定義桶,桶爲一個數據容器,每個桶存儲一個區間內的數。依然有一個待排序的整數序列A,元素的最小值不小於0,最大值不超過K。假設我們有M個桶,第i個桶Bucket[i]存儲iK/M至(i+1)K/M之間的數,有如下桶排序的一般方法:

  1. 掃描序列A,根據每個元素的值所屬的區間,放入指定的桶中(順序放置)。
  2. 對每個桶中的元素進行排序,什麼排序算法都可以,例如快速排序。
  3. 依次收集每個桶中的元素,順序放置到輸出序列中。

對該算法簡單分析,如果數據是期望平均分佈的,則每個桶中的元素平均個數爲N/M。如果對每個桶中的元素排序使用的算法是快速排序,每次排序的時間複雜度爲O(N/Mlog(N/M))。則總的時間複雜度爲O(N)+O(M)O(N/Mlog(N/M)) = O(N+ Nlog(N/M)) =O(N + NlogN - NlogM)當M接近於N是,桶排序的時間複雜度就可以近似認爲是O(N)的。就是桶越多,時間效率就越高,而桶越多,空間卻就越大,由此可見時間和空間是一個矛盾的兩個方面。

桶中元素的順序放入和順序取出是有必要的,因爲這樣可以確定桶排序是一種穩定排序算法,配合基數排序是很好用的。

/*
 * Problem: Bucket Sort
 * Author: Guo Jiabao
 * Time: 2009.3.29 16:50
 * State: Solved
 * Memo: 桶排序一般實現
*/
#include <iostream>
#include <cstdio>
#include <cstdlib>
#include <cmath>
#include <cstring>
using namespace std;
struct linklist
{
    linklist *next;
    int value;
    linklist(int v,linklist *n):value(v),next(n){}
    ~linklist() {if (next) delete next;}
};
inline int cmp(const void *a,const void *b)
{
    return *(int *)a-*(int *)b;
}
/*
爲了方便,我把A中元素加入桶中時是倒序放入的,而收集取出時也是倒序放入序列的,所以不違背穩定排序。
*/
void BucketSort(int *A,int *B,int N,int K)
{
    linklist *Bucket[101],*p;//建立桶
    int i,j,k,M;
    M=K/100;
    memset(Bucket,0,sizeof(Bucket));
    for (i=1;i<=N;i++)
    {
        k=A[i]/M; //把A中的每個元素按照的範圍值放入對應桶中
        Bucket[k]=new linklist(A[i],Bucket[k]);
    }
    for (k=j=0;k<=100;k++)
    {
        i=j;
        for (p=Bucket[k];p;p=p->next)
            B[++j]=p->value; //把桶中每個元素取出,排序並加入B
        delete Bucket[k];
        qsort(B+i+1,j-i,sizeof(B[0]),cmp);
    }
}
int main()
{
    int *A,*B,N=100,K=10000,i;
    A=new int[N+1];
    B=new int[N+1];
    for (i=1;i<=N;i++)
        A[i]=rand()%K+1; //生成1..K的隨機數
    BucketSort(A,B,N,K);
    for (i=1;i<=N;i++)
        printf("%d ",B[i]);
    return 0;
}

[基數排序]

下面說到我們的重頭戲,基數排序(Radix Sort)。上述的基數排序和桶排序都只是在研究一個關鍵字的排序,現在我們來討論有多個關鍵字的排序問題。

假設我們有一些二元組(a,b),要對它們進行以a爲首要關鍵字,b的次要關鍵字的排序。我們可以先把它們先按照首要關鍵字排序,分成首要關鍵字相同的若干堆。然後,在按照次要關鍵值分別對每一堆進行單獨排序。最後再把這些堆串連到一起,使首要關鍵字較小的一堆排在上面。按這種方式的基數排序稱爲MSD(Most Significant Dight)排序。

第二種方式是從最低有效關鍵字開始排序,稱爲LSD(Least Significant Dight)排序。首先對所有的數據按照次要關鍵字排序,然後對所有的數據按照首要關鍵字排序。要注意的是,使用的排序算法必須是穩定的,否則就會取消前一次排序的結果。由於不需要分堆對每堆單獨排序,LSD方法往往比MSD簡單而開銷小。下文介紹的方法全部是基於LSD的。

通常,基數排序要用到計數排序或者桶排序。使用計數排序時,需要的是Order數組。使用桶排序時,可以用鏈表的方法直接求出排序後的順序。下面是一段用桶排序對二元組基數排序的程序:

/*
 * Problem: Radix Sort
 * Author: Guo Jiabao
 * Time: 2009.3.29 16:50
 * State: Solved
 * Memo: 基數排序 結構數組
*/
#include <iostream>
#include <cstdio>
#include <cstdlib>
#include <cmath>
#include <cstring>
using namespace std;
struct data
{
    int key[2];
};
struct linklist
{
    linklist *next;
    data value;
    linklist(data v,linklist *n):value(v),next(n){}
    ~linklist() {if (next) delete next;}
};
void BucketSort(data *A,int N,int K,int y)
{
    linklist *Bucket[101],*p;//建立桶
    int i,j,k,M;
    M=K/100+1;
    memset(Bucket,0,sizeof(Bucket));
    for (i=1;i<=N;i++)
    {
        k=A[i].key[y]/M; //把A中的每個元素按照的範圍值放入對應桶中
        Bucket[k]=new linklist(A[i],Bucket[k]);
    }
    for (k=j=0;k<=100;k++)
    {
        for (p=Bucket[k];p;p=p->next) j++;
        for (p=Bucket[k],i=1;p;p=p->next,i++)
            A[j-i+1]=p->value; //把桶中每個元素取出
        delete Bucket[k];
    }
}
void RadixSort(data *A,int N,int K)
{
    for (int j=1;j>=0;j--) //從低優先到高優先 LSD
        BucketSort(A,N,K,j);
}
int main()
{
    int N=100,K=1000,i;
    data *A=new data[N+1];
    for (i=1;i<=N;i++)
    {
        A[i].key[0]=rand()%K+1;
        A[i].key[1]=rand()%K+1;
    }
    RadixSort(A,N,K);
    for (i=1;i<=N;i++)
        printf("(%d,%d) ",A[i].key[0],A[i].key[1]);
    printf("\n");
    return 0;
}

基數排序是一種用在老式穿卡機上的算法。一張卡片有80列,每列可在12個位置中的任一處穿孔。排序器可被機械地"程序化"以檢查每一迭卡片中的某一列,再根據穿孔的位置將它們分放12個盒子裏。這樣,操作員就可逐個地把它們收集起來。其中第一個位置穿孔的放在最上面,第二個位置穿孔的其次,等等。

對於一個位數有限的十進制數,我們可以把它看作一個多元組,從高位到低位關鍵字重要程度依次遞減。可以使用基數排序對一些位數有限的十進制數排序

[三種線性排序算法的比較]

從整體上來說,計數排序,桶排序都是非基於比較的排序算法,而其時間複雜度依賴於數據的範圍,桶排序還依賴於空間的開銷和數據的分佈。而基數排序是一種對多元組排序的有效方法,具體實現要用到計數排序或桶排序。

相對於快速排序、堆排序等基於比較的排序算法,計數排序、桶排序和基數排序限制較多,不如快速排序、堆排序等算法靈活性好。但反過來講,這三種線性排序算法之所以能夠達到線性時間,是因爲充分利用了待排序數據的特性,如果生硬得使用快速排序、堆排序等算法,就相當於浪費了這些特性,因而達不到更高的效率。

在實際應用中,基數排序可以用於後綴數組的倍增算法,使時間複雜度從O(NlogNlogN)降到O(N*logN)。線性排序算法使用最重要的是,充分利用數據特殊的性質,以達到最佳效果

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326110821&siteId=291194637