DSAA之插入排序与希尔排序

1.概述

There are several easy algorithms to sort in $O(n^2)$ , such as insertion sort.

There is an algorithm, Shellsort, that is very simple to code, runs in $o(n^2)$ , and is efficient in practice.

There are slightly more complicated $O(nlogn)$ sorting algorithms.

Any general-purpose sorting algorithm requires $(nlogn)$ comparisons.

2. Insertion Sort

insertion sort ensures that the elements in positions 1 through p are in sorted order. Insertion sort makes use of the fact that elements in positions 1 through p - 1 are already known to be in sorted order.

　　插排的实现思想核心就是假设0到p-1的元素已经按顺序排好了，只要让p从1遍历到n-1就可以完成数列的sort。

void insertion_sort( input_type a[ ], unsigned int n )
{
    unsigned int j, p;
    input_type tmp;
    for( p=1; p < n; p++ ){
        tmp = a[p];
        for( j = p;  j>0; j-- )
            if(tmp<a[j-1])
                a[j] = a[j-1];
            else
                break;
        a[j] = tmp;
    }
}

3. Analysis of Insertion Sort

Because of the nested loops, each of which can take n iterations, insertion sort is $O(n^2)$ .大体上的估算就是n*n=n^2

Furthermore, this bound is tight, because input in reverse order can actually achieve this bound.反序输入的数组，可以达到该界。

It turns out that the average case is $O(n^2)$ for insertion sort, as well as for a variety of other sorting algorithms, as the next section shows.

　　通过观察也能发现，反序输入的数列，能让内部for每次都遍历到j=0结束。所以总共的次数为 $1+2+3+...+n-1=n*(n-1)/2=\Theta(n^2)$ 。最好的情况，也就是已经排过序的数列，时间复杂度为 $O(n)$ ，不过这种场合确实不太多见。

4. A Lower Bound for Simple

　　这部分稍微麻烦，但是可以记住几个结论：

An inversion in an array of numbers is any ordered pair (i, j) having the property that $i < j$ but $a[i] > a[j]$ . 逆序的定义

swapping two adjacent elements that are out of place removes exactly one inversion, and a sorted file has no inversions.逆序的消除方式

We can compute precise bounds on the average running time of insertion sort by computing the average number of inversions in a permutation.计算精确的界需要分析平均逆序

we can assume that the input is some permutation of the first n integers (since only relative ordering is important) and that all are equally likely.以下定理的前提

定理1：The average number of inversions in an array of n distinct numbers is $n(n - 1)/4$ .

定理2：Any algorithm that sorts by exchanging adjacent elements requires $O(n^2)$ time on average.

　　最后DSAA给出了一个总结：

It is valid not only for insertion sort, which performs adjacent exchanges implicitly, but also for other simple algorithms such as bubble sort and selection sort.所以很多选择题啊，冒泡排序、插入或者选择排序的下界和上界都是O(n^2)

In fact, it is valid over an entire class of sorting algorithms, including those undiscovered, that perform only adjacent exchanges.举一反三的认为所有排序算法只要仅仅通过互换相邻位置元素来消除逆序，都适合前面的界。

it must do comparisons and, in particular, exchanges between elements that are far apart. A sorting algorithm makes progress by eliminating inversions, and to run efficiently, it must eliminate more than just one inversion per exchange.

　　倒数第二点说明了，如果想改变 $O(n^2)$ ,必须选择交换非相邻的元素，如果想运行更有效率，必须每次做到每次交换消除不只一个逆序。下面的希尔排序，也就是在这两个方面做了改进。

5. Shellsort

5.1 定义

As suggested in the previous section, it works by comparing elements that are distant; the distance between comparisons decreases as the algorithm runs until the last phase, in which adjacent elements are compared. For this reason, Shellsort is sometimes referred to as diminishing increment sort.

5.2 作用原理

Shellsort uses a sequence, $h_{1}, h_{2}, . . . , h_{t}$ called the increment sequence. Any increment sequence will do as long as $h_{1} = 1$ .

After a phase, using some increment $h_{k}$ , for every $i$ , we have $a[i]< a[i+h_{k}]$ (where this makes sense); all elements spaced $k_{k}$ apart are sorted. The file is then said to be $h_{k}$ -sorted.

An important property of Shellsort (which we state without proof) is that an $h_{k}$ -sorted file that is then $h_{k-1}$ -sorted remains $h_{ k}$ -sorted.

　　以上三点都比较浅显，没有什么难以理解的。第一点每次排序的数据位置间隔在递减。第二点每次排完序之后，保证被排序的几个数一定是有序的。第三点保证每次排完的顺序不会被下一次破坏。

5.3 具体操作

The general strategy to $h_{k}$ -sort is for each position, $i$ , in $h_{k},h_{k} + 1, h_{k}+2, . . . , n$ , place the element in the correct spot among $i, i - h_{k},i - 2h_{k}$ , etc.

Although this does not affect the implementation, a careful examination shows that the action of an $h_{k}$ -sort is to perform $h_{k}$ an insertion sort on $h_{k}$ independent sub-arrays.这里恐怕不是很好观察出来，也是笔者难暂时还没有想明白的地方。

A popular (but poor) choice for increment sequence is to use the sequence suggested by Shell: $ht =n/2$ , and $h_{k} = h_{k+1}/2$ .

　　根据上面两点就可以写出来一个希尔排序，但是对于j >=increment和i = increment需要特别注意下：

void　shellsort( input_type a[ ], unsigned int n ){
    unsigned int i, j, increment;
    input_type tmp;
    for( increment = n/2; increment > 0; increment /= 2 )
        for( i = increment; i<n; i++ ){
            tmp = a[i];，
            for( j = i; j >=increment; j -= increment )
                if( tmp < a[j-increment] )
                    a[j] = a[j-increment];
                else
                    break;
            a[j] = tmp;
    }
}

5.4 Worst-Case Analysis of Shellsort

　　本部分DSAA有比较复杂的证明过程，这里就只记定理吧：

The worst-case running time of Shellsort, using Shell’s increments, is $\Theta(n^2)$ .

The worst-case running time of Shellsort using Hibbard’s increments is $\Theta(n^{3/2})$ .

6. 总结

大量输入的情况下，使用希尔排序是个不错的选择。
只使用互换相邻位置的算法消除逆序的如选择排序，插入排序，冒泡排序等，时间复杂度的上下界都为 $O(n^2)$
希尔排序使用Shell增量，时间复杂度为 $\Theta(n^2)$ .，使用Hibbard增量，时间复杂度为 $\Theta(n^{3/2})$ 。