DSAA之快速排序（一）

1. 基本原理

The basic algorithm to sort an array S consists of the following four easy steps:

If the number of elements in S is 0 or 1, then return.

Pick any element v in S. This is called the pivot.

Partition S - {v} (the remaining elements in S) into two disjoint groups: $S_{1} = \{x \in S - \{v\}| x\leq v\}$ , and $S_{2} = \{x \in S -\{v\}| x\geq v\}.$

Return { quicksort(S1) followed by v followed by quicksort(S2)}.

　上面是基本步骤，在选取v和分割S上面有很多讨论，笔者直接记录DSAA中提到的比较不错的选择：

Median-of-Three Partitioning:the common course is to use as pivot the median of the left, right and center elements.直接选用center的元素作为枢纽元素

Partitioning Strategy：

The first step is to get the pivot element out of the way by swapping it with the last element.

i starts at the first element and j starts at the next-to-last element.

While i is to the left of j, we move i right, skipping over elements that are smaller than the pivot. We move j left, skipping over elements that are larger than the pivot. When i and j have stopped, i is pointing at a large element and j is pointing at a small element. If i is to the left of j, those elements are swapped.

We then swap the elements pointed to by i and j and repeat the process until i and j cross.At this stage, i and j have crossed, so no swap is performed. The final part of the partitioning is to swap the pivot element with the element pointed to by i.
以上步骤可以不参考具体代码，自己撸一个快排，至于为什么这样做的原因，可以简单的认为是经过理论和实践证明的最好选择。不论是选取枢纽元素或者分割数组。

Small Files：For very small files $(n\leq 20)$ , quicksort does not perform as well as insertion sort.

A common solution is not to use quicksort recursively for small files, but instead use a sorting algorithm that is efficient for small files, such as insertion sort.
对于输入比较小的数据，直接使用插入排序优于快排，DSAA给出这个界线为20

2. 编程实现

　　其实快排的核心在于分割策略，只要正确的实现了分割，就能很快写出快排其他的逻辑代码：

void quick_sort(int * array,int left,int right){
  int i,j,center;
  //递归基准
  if(right-left+1 < 20){
      insert_sort(array,left,right);
      return ;
   }
  //笔者直接取中值
  center=(left+right)/2;
  swap(array,center,right);
  //核心部分，就是分割(兼带排序效果)
  for(i=left,j=right-1;i<=j && i<right && j>=left;){
    if(array[i]<array[right])
        i++;
    else if(array[j]>array[right])
        j--;
    else if(array[i] == array[right] && array[j] == array[right])
        //防止特殊情况的发生
        i++,j--;
    else 
        //这种情况需要交换
        swap(array,i,j);
  } 
  swap(array,i,right);
  quick_sort(array,left,i-1);
  quick_sort(array,i+1,right);
}

void swap(int * array,int left, int right){
  int tmp;
  tmp=array[left];
  array[left]=array[right];
  array[right]=tmp;
}

void insert_sort( int * array, int left,int right ){
    unsigned int j, p;
    int tmp;
    for( p=left+1; p <= right; p++ ){
        tmp = a[p];
        for( j = p;  j>left; j-- )
            if(tmp<a[j-1])
                a[j] = a[j-1];
            else
                break;
        a[j] = tmp;
    }
}

　　笔者的实现和书上有点出入，如果不是细致思考。假设在20分钟以内手写快排，很多人写出来差不多就是笔者这样的版本。但是这种是有情况不乐观的case的：

Suppose the input is 2,3,4, …,n -1, n, 1. What is the running time of this version of quicksort?

Suppose the input is in reverse order. What is the running time of this version of quicksort?

　　思考这两个问题会解决上面的问题：到底在三数中值法时，需要对三个数进行排序取中间值，还是直接取中间位置值？现在笔者还是放一下。

3. 时间复杂度

　　根据上面的步骤，可以得到 $T(n)=T(i)+T(n-i+1)+Cn$ ， $i$ 为分割在一侧的数据个数。分治这种递推结果很容易得到，另外如果治的时间不为线性，就不会采用分治策略了。
　　最坏的情况就是极端的不均衡的分割数组，在每次递归的过程枢纽元素总是最小元素，此时递推公式为 $T(n)=T(n-1)+Cn$ ，使用累加法得到最终的时间复杂度为 $O(n^2)$ 。
　　最好的情况就是每次都能均分数组元素，此时和归并法的时间复杂度分析一致，为 $O(nlogn)$ 。
　　一般情况就是考虑到每次递归可能存在的不平衡分割，作为快速学习的目的，直接记忆结论 $O(nlogn)$ 。

4. 最后

　　本节跳过了以上程序的AC环节，笔者在自己本机上已经通过了自己设置的几个测试用例，包含有重复元素的情况。快排的实现并不复杂，但是有些优化细节背后解决的问题确不太容易发现。最后留下了Median-of-Three Partitioning是否需要排序的问题。。。