1. 基本原理
- The basic algorithm to sort an array S consists of the following four easy steps:
- If the number of elements in S is 0 or 1, then return.
- Pick any element v in S. This is called the pivot.
- Partition S - {v} (the remaining elements in S) into two disjoint groups: , and
- Return { quicksort(S1) followed by v followed by quicksort(S2)}.
上面是基本步骤,在选取v和分割S上面有很多讨论,笔者直接记录DSAA中提到的比较不错的选择:
Median-of-Three Partitioning:
the common course is to use as pivot the median of the left, right and center elements.直接选用center的元素作为枢纽元素
Partitioning Strategy:
- The first step is to get the pivot element out of the way by swapping it with the last element.
- i starts at the first element and j starts at the next-to-last element.
- While i is to the left of j, we move i right, skipping over elements that are smaller than the pivot. We move j left, skipping over elements that are larger than the pivot. When i and j have stopped, i is pointing at a large element and j is pointing at a small element. If i is to the left of j, those elements are swapped.
- We then swap the elements pointed to by i and j and repeat the process until i and j cross.At this stage, i and j have crossed, so no swap is performed. The final part of the partitioning is to swap the pivot element with the element pointed to by i.
以上步骤可以不参考具体代码,自己撸一个快排,至于为什么这样做的原因,可以简单的认为是经过理论和实践证明的最好选择。不论是选取枢纽元素或者分割数组。
Small Files:
For very small files , quicksort does not perform as well as insertion sort.
- A common solution is not to use quicksort recursively for small files, but instead use a sorting algorithm that is efficient for small files, such as insertion sort.
对于输入比较小的数据,直接使用插入排序优于快排,DSAA给出这个界线为20
2. 编程实现
其实快排的核心在于分割策略,只要正确的实现了分割,就能很快写出快排其他的逻辑代码:
void quick_sort(int * array,int left,int right){
int i,j,center;
//递归基准
if(right-left+1 < 20){
insert_sort(array,left,right);
return ;
}
//笔者直接取中值
center=(left+right)/2;
swap(array,center,right);
//核心部分,就是分割(兼带排序效果)
for(i=left,j=right-1;i<=j && i<right && j>=left;){
if(array[i]<array[right])
i++;
else if(array[j]>array[right])
j--;
else if(array[i] == array[right] && array[j] == array[right])
//防止特殊情况的发生
i++,j--;
else
//这种情况需要交换
swap(array,i,j);
}
swap(array,i,right);
quick_sort(array,left,i-1);
quick_sort(array,i+1,right);
}
void swap(int * array,int left, int right){
int tmp;
tmp=array[left];
array[left]=array[right];
array[right]=tmp;
}
void insert_sort( int * array, int left,int right ){
unsigned int j, p;
int tmp;
for( p=left+1; p <= right; p++ ){
tmp = a[p];
for( j = p; j>left; j-- )
if(tmp<a[j-1])
a[j] = a[j-1];
else
break;
a[j] = tmp;
}
}
笔者的实现和书上有点出入,如果不是细致思考。假设在20分钟以内手写快排,很多人写出来差不多就是笔者这样的版本。但是这种是有情况不乐观的case的:
- Suppose the input is 2,3,4, …,n -1, n, 1. What is the running time of this version of quicksort?
- Suppose the input is in reverse order. What is the running time of this version of quicksort?
思考这两个问题会解决上面的问题:到底在三数中值法时,需要对三个数进行排序取中间值,还是直接取中间位置值?现在笔者还是放一下。
3. 时间复杂度
根据上面的步骤,可以得到
,
为分割在一侧的数据个数。分治这种递推结果很容易得到,另外如果治的时间不为线性,就不会采用分治策略了。
最坏的情况就是极端的不均衡的分割数组,在每次递归的过程枢纽元素总是最小元素,此时递推公式为
,使用累加法得到最终的时间复杂度为
。
最好的情况就是每次都能均分数组元素,此时和归并法的时间复杂度分析一致,为
。
一般情况就是考虑到每次递归可能存在的不平衡分割,作为快速学习的目的,直接记忆结论
。
4. 最后
本节跳过了以上程序的AC环节,笔者在自己本机上已经通过了自己设置的几个测试用例,包含有重复元素的情况。快排的实现并不复杂,但是有些优化细节背后解决的问题确不太容易发现。最后留下了Median-of-Three Partitioning是否需要排序的问题。。。