Common sorting algorithms and complexity

Foreword:

I saw an interview question on the Internet today, which is about common sorting algorithms and their respective negative complexity. Considering that if I encounter this question, I may not be able to answer it well, so I will consolidate and organize the knowledge in this area here.

Overview:

There are many sorting algorithms. Today, we will learn about several common sorting algorithms.


Principle and implementation:

1. Insertion sort:

(1) Direct insertion sort:

Basic idea: There is an already ordered data sequence, a data needs to be inserted into this data sequence, and the inserted data sequence is required to be incremented by 1 and still in order. That is: we regard the first number of the data sequence as an ordered data sequence, starting from the second number, adding them to the appropriate position of the previous ordered sequence one by one until the whole becomes an ordered sequence.

Key points: Each comparison needs to take a reference as the boundary of the temporary storage judgment array.

The following figure shows the simple process of insertion sort.


Advantages: stable and fast. If an element equal to the element to be inserted is encountered, the element to be inserted will be inserted later, so the order of equal elements in the sequence has not changed before and after sorting, so direct insertion sorting is stable.

Disadvantage: The number of comparisons is not necessarily, the less the number of comparisons, the more data movement after the insertion point, especially when the amount of data is huge. This problem can be solved with a linked list.

Implementation:

     static void directSort(int []array){  
		//打印初始数据
		System.out.print("初始数据序列:");
		for(int a:array){
			System.out.print(a+" ");
		}
		System.out.println("");
		
		//直接插入排序实现
		for(int i = 1;i < array.length;i++){
			int temp = array[i];//参照,也是我们的待插入元素
			int j;
			for(j = i - 1;j >= 0 && array[j] > temp;j--){
				//将前面大的值向后移
				array[j+1] = array[j];
			}
			array[j+1] = temp;//插入元素
			
			//打印当前排序结果
			System.out.print("第"+i+"次排序"+"(参照:"+temp+")"+":");
			for(int a:array){
				System.out.print(a+" ");
			}
			System.out.println("");
		}
		
		//打印最终数据序列
		System.out.print("最终数据序列:");
                for(int a:array){
        	       System.out.print("  "+a);
                }
       }

Output result:


efficiency:

time complexity:

Worst case: The data sequence is completely reversed. When inserting a new element, it is necessary to judge all the data in the existing ordered data sequence. When inserting the second element, you need to examine the previous element. When inserting the third element, you need to examine the first two elements. ... When inserting the nth element, you need to examine the first n-1 elements. The number of comparisons is 1+2+3+.....+(n-1), and the arithmetic sequence is summed, and the result is n^2/2, so the complexity in the worst case is O(n^2) .

Best case: the data sequence is already in order, and every time a new element is inserted, only the previous element needs to be examined, so the complexity is O(n).

Average case: O(n^2).

Space complexity:

The size of auxiliary space required is O(1).

(2) Binary insertion sort:

Basic idea: The basic idea of ​​binary insertion sorting is the same as that of direct insertion sorting. The difference lies in finding the insertion position. Direct insertion sorting uses the inserted element and the elements of the existing ordered sequence from back to front, one A comparison to find a suitable location. The binary insertion sorting method uses the binary search method to find the insertion position, compares the element to be inserted with the middle element of the existing ordered data sequence, and uses the middle element of the ordered sequence as the boundary to determine whether the element to be inserted is in the search The left side of the sequence is still the right side. If it is on the left side, the sequence on the left side is used as the search sequence, and the right side is similar. Recursively process new sequences until the length of the current search sequence is less than 1.

As shown in the figure below, take the judgment of the last data as an example of the process:


Advantages: stable.

Disadvantages: The number of comparisons is fixed and has nothing to do with the initial order of the records. In the worst case, it is faster than direct insertion sort, and in the best case, it is slower than direct insertion sort.

Implementation:

        static void Sort(int []array){  
		
		//二分法插入排序实现
		int left = 0;//左边界
		int right = array.length - 1;//右边界
		int temp;//临时值
		int low,high;//区间边界值
		int middle;//中间值
		for(int i = left + 1;i <= right;i++){
			temp = array[i];
			low = left;
			high = i - 1;
			
			//二分法判断插入位置
			while(low <= high){
				middle = (low + high)/2;
				if(array[i] >= array[middle]){
					low = middle + 1;
				}else{
					high = middle - 1;
				}
			}
			
			//移动数据,插入元素
			for(int j = i - 1;j >= low;j--){
				array[j+1] = array[j];
			}
			array[low] = temp;
			
		}
		
		
	}

efficiency:

time complexity:

It is suitable for scenarios with a large amount of data. Compared with direct insertion sorting, it takes less time to find the insertion position, but the number of data movements is the same; the number of comparisons has nothing to do with the order of the initial data sequence, and the number of times to find the insertion position is certain , so the number of comparisons is also certain, and the time complexity is O(n^2).

Space complexity:

The size of auxiliary space required is O(1).

(3) Shell (Hill) sorting:

Basic idea: A great improvement has been made on the basis of direct sorting, also known as shrinking incremental sorting. Divide the data sequence to be sorted into several subsequences according to a specific increment, and insert and sort each subsequence; then select a smaller increment, and then divide the data sequence into several subsequences for sorting..... ...The final increment is 1, that is, use direct insertion sorting to make the final data sequence orderly.

Key points: Take the increment, n is the number of data in the data sequence, and the increment sequence {n/2, n/4, n/8..., 1}.

The following figure shows the process of taking increments and dividing subsequence judgments:


Pros: Fast, less data movement.

Disadvantages: Unstable, what is the value of increment d, and how many different values ​​should be taken cannot be determined.

Implementation:

       static void sort(int[] array) {
		// 打印初始数据
		System.out.print("初始数据序列:");
		for (int a : array) {
			System.out.print(a + " ");
		}
		System.out.println("");

		// 希尔排序
		int d = array.length;//初始增量
		while(true){
			d = d / 2;
			for(int i = 0;i < d;i ++){//按增量划分子序列
				for(int j = i + d;j < array.length;j = j + d){//遍历子序列
					int temp = array[j];//直接插入排序的参照值
					int k;
					for(k = j - d;k >= 0 && array[k] > temp;k = k - d){//直接插入排序,后移数据
						array[k + d] = array[k];
					}
					//插入数据,后移一位游标到最后一次比较的位置插入
					array[k+d] = temp;
				}
			}
			
			//打印log
			System.out.print("增量:"+d);
			System.out.print("。结果:");
			for (int a : array) {
				System.out.print(a + " ");
			}
			System.out.println("");
			
			//跳出循环
			if(d == 1){
				break;
			}
		}

		// 打印最终数据序列
		System.out.print("最终数据序列:");
		for (int a : array) {
			System.out.print("  " + a);
		}
	}
efficiency:

time complexity:

Worst case: O(n^2).

Best case: O(n log n).

Average: O(n log n).

Space complexity:
The size of auxiliary space required is O(1).

Now that Hill sorting is an upgraded version of direct sorting, let’s talk about the advantages over direct sorting:

① When the initial state of the array to be sorted is basically ordered, the number of comparisons and insertions for direct insertion sorting is relatively small;

②When the length n of the array to be sorted is not large, there is not much difference between n and n^2, that is, the time complexity of direct insertion sorting is not much different between the best case O(n) and the worst case O(n^2) ;

③In the initial stage of incremental sorting, the incremental value is relatively large, the number of groups is relatively small, and the data in each group is also relatively small, so each group is directly inserted and sorted faster; as the increment gradually decreases, the number of divided groups There are more and more data in each group, but due to the sorting of the previous groups, the order of the existing groups is closer to the orderly state, so the new sorting will be faster.


2. Selection sort

(1) Simple selection sort:

Basic idea: In the array to be sorted, traverse to find the smallest number and exchange the number in the first position, and then find the exchange of the smallest number and the second position in the remaining numbers, and so on, until Contrast playing the penultimate and the last.

As shown in the figure below, the comparison process:

Advantages: The number of mobile data is known, n-1 times.

Disadvantages: Unstable, many comparisons.

Implementation:

       static void sort(int[] array) {
		// 打印初始数据
		System.out.print("初始数据序列:");
		for (int a : array) {
			System.out.print(a + " ");
		}
		System.out.println("");

		//简单选择排序
		int position = 0;
		for(int i = 0;i < array.length;i++){
			int temp = array[i];
			position = i;
			int j;
			for(j = i + 1;j < array.length;j++){
				if(array[j] < temp){
					temp = array[j];
					position = j;
				}
			}
			array[position] = array[i];
			array[i] = temp;
			
			//打印日志
			System.out.print("第"+i+"位置排序:");
			for (int a : array) {
				System.out.print("  " + a);
			}
			System.out.println("");
		}

		// 打印最终数据序列
		System.out.print("最终数据序列:");
		for (int a : array) {
			System.out.print("  " + a);
		}
	}
efficiency:

time complexity:

Best case: O(n^2).

Worst case: O(n^2).

Average case: O(n^2).

Space complexity:

The size of auxiliary space required is O(1).


(2) Heap sorting:

基本思想:这里说的堆不是堆栈的堆,这里的堆是一种数据结构。堆排序是树形选择排序,是对简单选择排序的改进。堆可以视为完全二叉树,完全二叉树除了最底层之外,其他每一层都是满的,这里我们把要排序的数组表示成堆,堆的每一个节点对应数组的一个元素。一个二叉树,如果某个节点的值总是不小于其根节点的值,则根节点是所有节点中最小的,成为小顶堆;如果某个节点的值总是不大于其父节点的值,则根节点是所有节点中最大的,称为大顶堆。

说起来不好理解,下面通过图文展示下过程:

这里用的一组初始数组数据:32,54,27,86,43,4,34,25,83。

对应表示成二叉树形结构为:(对应红字为节点编号)

从编号可以看出一定的规律,如果节点的编号为i,那么他的左右两个子节点的编号分别为2*i,2*i+1。

据此退出数学定义:有n个元素的序列(k1,k2......kn),当且仅当满足(这里盗个图)


此时称之为堆,前者为小顶堆,后者为大顶堆。

注意:堆只对父子节点做了约束,并没有对兄弟节点做约束,兄弟节点不存在必然的大小关系。

下面来展示具体的比较步骤(这里以大顶堆来展示):

总的思路就是:数组表示成二叉树,二叉树调整成大顶堆,取出堆顶最大元素,堆最后一个元素移到堆顶,这是堆被破坏,重复调整堆,取元素的过程,直到全部有序。

①数组表示成二叉树

原始数组:32,54,27,86,43,4,34,25,83。

表示成二叉树:

②调整成堆

这里如何调整成堆是关键,按照上面的推导公式,我们可以拿到最后一个父节点编号n/2,父节点的范围是[1,2.....n/2]。从最后一个父节点开始分别和左右子节点比较,

把最大的值移动到父节点,直到比较到最顶上的根节点,这个时候可以把最大值移到顶上根节点,成大顶堆。

如图这里调整成大顶堆:


③取出顶端最大元素

大顶堆,顶端元素为最大值,和堆尾值互换,然后取出最大值,如图:

最大值换到最后一个,然后把最大值取走,取出的元素放入序列尾,将剩下的n-1各元素重新调整成堆,循环此操作,直到有序。(这里我的示例数据取得有点尴尬,正好剩下的n-1个元素的最大值在堆尾,被换后还是个堆-_-||。纯属巧合,

这个时候是需要对剩下的n-1个元素重新调整成堆的)。

可能写的不是很好理解,可以参考这几篇文章,写的比较详细(点击打开链接点击打开链接点击打开链接)。

优点:平均时间上,堆排序的时间常数比快排要大一些,因此通常会慢一些,但是堆排序最差时间也是O(nlogn)的,这点比快排好。

缺点:不稳定。

具体实现:

    public static void mergeSort(int []arr){
    	int lastIndex = arr.length - 1;
    	while(lastIndex > 0){
	    	ajustHeap(arr,lastIndex);//创建大顶堆
    		lastIndex --;
    	}
    }
   
    /**
     * 建堆
     * @param arr
     * @param n
     */
    private static void ajustHeap(int []arr,int n){
    	int index = (n-1) / 2;
    	
    	int lagestIndex;
    	for(int i = index;i >= 0;i--){
    		lagestIndex = i;//记录最大值的下标
    		//判断是否存在右子节点
    		if(2*i + 2 <= n){//存在,右子节点存在,左子节点必然存在
    			if(arr[2*i+1] >= arr[i]){//如果左子节点大于等于父节点
    				lagestIndex = 2*i+1;
    			}
    			if(arr[2*i+2] >= arr[i] && arr[2*i+2] >= arr[2*i+1]){//如果右子节点大于等于父节点,并且右子节点大于等于左子节点
    				lagestIndex = 2*i+2;
    			}
    		}else{
    			//右子节点不存在
    			if(arr[2*i+1] >= arr[i]){
    				lagestIndex = 2*i+1;
    			}
    		}
    		
    		
    		//每个子堆比较完了,将最大的移到父节点
    		if(i != lagestIndex){
        		swap(arr,lagestIndex,i);
        	}
    	}
    	//堆顶最大的元素和堆尾的交换
    	swap(arr,0,n);
    }

    /**
     * 元素交换
     * @param arr
     * @param lagestIndex
     * @param lastIndex
     */
    private static void swap(int []arr,int lagestIndex,int lastIndex){
    	int temp = arr[lagestIndex];
    	arr[lagestIndex] = arr[lastIndex];
    	arr[lastIndex] = temp;
    }
效率:

时间复杂度:

由于堆排序对原始记录的状态并不敏感,因此它无论是最好、最坏和平均时间复杂度均为O(nlogn)。

最好情况:O(nlogn)。

最坏情况:O(nlogn)。

平均情况:O(nlogn)。

空间复杂度:
需要辅助空间O(1)。

三、交换排序

(1)冒泡排序:

基本思路:依次比较待排序数组的相邻两个数,依据排序要求,如果他们的顺序错误,就交换位置;重复遍历数组,直到有序。

如下图所示,比较过程:


优点:稳定。

缺点:慢,每次只移动相邻数据。

具体实现:

       static void sort(int[] array) {
		// 打印初始数据
		System.out.print("初始数据序列:");
		for (int a : array) {
			System.out.print(a + " ");
		}
		System.out.println("");

		//冒泡排序
		int temp;
		for(int i = array.length - 1;i > 0;i--){
			for(int j = 0;j < i;j++){
				if(array[j+1] < array[j]){
					temp = array[j+1];
					array[j+1] = array[j];
					array[j] = temp;
				}
			}
		}

		// 打印最终数据序列
		System.out.print("最终数据序列:");
		for (int a : array) {
			System.out.print("  " + a);
		}
	}
效率:

时间复杂度:

最好情况:O(n)。

最坏情况:O(n^2)。

平均情况:O(n^2)。

空间复杂度:
辅助空间O(1)。


(2)快速排序:

基本思路:通过一趟排序将要排序的数组分割成两个独立部分,其中一部分的所有数据都要比另一部分小,然后再按此方法分别对两部分进行快速排序,依次类推,整个过程递归进行,以达到整个数组有序。

如下图,下面以一轮比较的过程来展示:

优点:极快,数据移动少。

缺点:不稳定。

具体实现:

       static void sort(int arr[], int left, int right) {
		
		int low = left;
		int high = right;
		int temp = arr[left];
		while(low < high){
			while(low < high && arr[high] >= temp){
				high --;
			}
			if(low < high){
				int tem = arr[low];
				arr[low] = arr[high];
				arr[high] = tem;
			}
			
			while(low < high && arr[low] <= temp){
				low ++;
			}
			if(low < high){
				int tem = arr[low];
				arr[low] = arr[high];
				arr[high] = tem;
			}
		}
		
		if(left < low){
			sort(arr,left,low-1);
		}
		if(high < right){
			sort(arr,high+1,right);
		}
	}

效率:

时间复杂度:

最好情况:O(nlogn)。

最坏情况:O(n^2)。

平均情况:O(nlogn)。

空间复杂度:O(nlogn)~O(n^2)。


四、归并排序

基本思路:归并排序是建立在归并操作基础上的有效排序算法。该算法是采用分治法,将两个已有有序的子序列合并成一个大的有序的序列,通过递归,层层合并。

要点:首先要将一个数组拆分成A和B两个子数组,然后再分别对两个子数组再各自拆分出两个子数组,依次递归拆分,直到每个子数组的元素只有一个,可以视为这些子数组都各自有序;然后再按从小到大的顺序逆向层层合并,最后就可以得到一个有序的数组。

如图,展示过程:


优点:稳定。

缺点:空间复杂度为O(n),在数据量较大的时候让人难以接受,考虑到机器本身内存小,慎用。

具体实现:

       /**
       * <pre>
       * 二路归并
       * 原理:将两个有序表合并和一个有序表
       * </pre>
       * @param a
    * @param temp
       * @param left
       * 第一个有序表的起始下标
       * @param middle
       * 第二个有序表的起始下标
       * @param right
       * 第二个有序表的结束小标
       *
    */
    private static void merge(int[] a,int [] temp, int left, int middle, int right) {
    	int i = left;
    	int j = middle;
    	int k = 0;
        while(i < middle && j <= right){
        	if(a[i] <= a[j]){
        		temp[k] = a[i];
        		k++;
        		i++;
        	}else{
        		temp[k] = a[j];
        		k++;
        		j++;
        	}
        }
        while(i < middle){
        	temp[k] = a[i];
    		k++;
    		i++;
        }
        while(j <= right){
        	temp[k] = a[j];
    		k++;
    		j++;
        }
        //插入到原数组
        for(int index = 0;index < k;index++){
        	a[left+index] = temp[index];
        }
        
    }
    /**
	 *递归拆分
     */
    public static void mergeSort(int[] a,int [] temp, int left, int right) {
    	//当left = right时,长度为1,终止
    	if(left < right){
    		int middle = (right + left)/2;
    		mergeSort(a, temp, left, middle);//左子数组
    		mergeSort(a, temp, middle+1, right);//右子数组
    		merge(a,temp, left, middle+1, right);//合并数组
    	}
    }
效率:

时间复杂度:

归并排序的效率是很高的,由于递归划分为子序列只需要logn复杂度,而合并每两个子序列需要大约2n次赋值,为O(n)复杂度,因此,只需要简单相乘即可得到归并排序的时间复杂度 O(n)。并且由于归并算法是固定的,不受输入数据影响,所以它在最好、最坏、平均情况下表现几乎相同,均为O(nn)。

最好情况:O(nlogn)。

最坏情况:O(nlogn)。

平均情况:O(nlogn)。

空间复杂度:O(n)。(归并排序最大的缺陷在于其空间复杂度。在合并子数组的时候需要一个辅助数组,然后再把这个数据拷贝回原数组。所以,归并排序的空间复杂度(额外空间)为O(n)。而且如果取消辅助数组而又要保证原来的数组中数据不被覆盖,那就必须要在数组中花费大量时间来移动数据。不仅容易出错,还降低了效率。因此这个辅助空间是少不掉的。)

五、基数排序

基本思路:属于“分配式排序”,又称“桶子法”,透过键值的部分资讯,将要排序的元素分配到某些“桶”中,以达到排序的作用。实现思路是,将所有待比较的值(正整数)统一为同样的数位长度,数位较短的数前面补0,然后从最低位开始,依次进行依次排序,这样从最低位排序一直到最高位排序完成以后,数组就变成有序的数组。(两种排序方式,LSD(Least significant digital)和MSD(Most significant digital),LSD的排序方式由键值的最右边开始,而MSD相反。)。

如图,展示比较过程:


优点:稳定。

缺点:关键字可分解;记录的关键字位数较少,如果密集更好;如果是数字时,最好是无符号的,否则将增加相应的映射复杂度,可先将其正负分开排序。

具体实现:

     /**
	 *@param a  原数组
	 *@param max 数组中位数最长的元素的位数
     */
    public static void mergeSort(int[] a,int max) {
    	int k = 0;
    	int n = 1;//计算键值排序依据
    	int m = 1;//控制键值排序依据是哪一位
    	int [][] temp = new int[10][a.length];//多维数组,左边10行作为“桶子”,右边插入数据
    	int [] orderIndex = new int[10];//记录每个“桶子”有多少个值
    	while(m <= max){//从各位开始判断
    		for(int i = 0;i < a.length;i++){//分到每个桶子
    			int lsd = (a[i]/n) % 10;
    			temp[lsd][orderIndex[lsd]] = a[i];
    			orderIndex[lsd]++;
    		}
    		for(int i = 0;i < 10;i++){//合并
    			if(orderIndex[i] != 0){
    				for(int j = 0;j < orderIndex[i];j++){
    					a[k] = temp[i][j];
    					k++;
    				}
    				orderIndex[i] = 0;
    			}
    		}
    		
    		k = 0;
    		m++;
    		n = n*10;
    	}
    }
效率:

时间复杂度:

分配的时间复杂度为O(n),收集的的时间复杂度为O(r),分配和收集共需要d趟,所以基数排序的时间复杂度为O(d(n+r))。

最好情况:O(d(n+r))。

最坏情况:O(d(n+r))。

平均情况:O(d(n+r))。

空间复杂度:分配元素时,使用的桶空间;所以空间复杂度为:O(10 × n)= O(n)。



Guess you like

Origin blog.csdn.net/liujibin1836591303/article/details/78794422