[Data structure---sorting] Pao Ding’s analysis of common sorting algorithms

1. Common sorting algorithms

Sorting can be seen everywhere in our lives. The so-called sorting is the operation of arranging a string of records in increasing or decreasing order according to the size of one or some keywords in it.

Common sorting algorithms can be divided into four categories: insertion sort, selection sort, exchange sort, and merge sort; among them, insertion sort is divided into direct insertion sort and Hill sort; selection sort is divided into direct selection sort and heap sort; exchange sort is divided into They are bubble sort and quick sort; merge sort is classified into one major category;

Insert image description here

Below we analyze each sorting algorithm idea, advantages, disadvantages and stability one by one;

2. Implementation of common sorting algorithms

1. Direct insertion sort

Direct insertion sort is a simple insertion sort method. Its basic idea is to insert the records to be sorted into an already sorted sequence one by one according to the size of their key values ​​until all records are inserted. , get a new ordered sequence.

		//直接插入排序
		void InsertSort(int* a, int n)
		{
			for (int i = 1; i < n; i++)
			{
				int tmp = a[i];
				int end = i - 1;
				while (end >= 0)
				{
					if (a[end] > tmp)
					{
						a[end + 1] = a[end];
					}
					else
					{
						break;	
					}
					end--;
				}
				a[end + 1] = tmp;
			}
		}

When inserting the i (i>=1) element, the previous a[0], a[1],...,a[i-1] have been sorted, that is, the range from 0 to end has been sorted. At this time, use the subscript of a[i] to compare with the subscripts of a[i-1], a[i-2],... starting from end and going forward. If you find the insertion position that meets the conditions, a[i] will be inserted. The elements at the original position are moved backward;

As shown in the figure, there is only one element in the interval with the subscript 0 - end (0 - 0), that is, it has been sorted. i is the next subscript of end, and then i starts to compare forward from end, and when it encounters something better than itself The larger one will continue walking until it encounters an element smaller than itself, and insert it at this position;
Insert image description here

The original element at this position is moved backward; be careful to move it first and then insert it, otherwise the data will be overwritten;
Insert image description here
the second round of insertion:

Insert image description here
Insert image description here
As shown in the picture, these three numbers are in order;

The sorted animation is as follows:
Insert image description here

Summary of features of direct insertion sort:

  1. The closer the element set is to order, the more time-efficient the direct insertion sort algorithm is.
  2. Time complexity: O(N^2)
  3. Space complexity: O(1), it is a stable sorting algorithm
  4. Stability: stable

2. Hill sorting

Hill sorting method is also called reducing increment method. The basic idea of ​​Hill sorting method is: first select an integer gap, divide all records in the file to be sorted into gap groups, and all records with a distance of gap are divided into the same group. (That is, the intervals are divided into one group by gap, and the total number of gap groups is), and the records within each group are sorted. Then repeat the above grouping and sorting work. When gap = 1, all records are sorted in the same group.

Hill sorting is actually an optimization of direct insertion sorting. When gap > 1, it is pre-sorted (the gap group data is inserted and sorted separately). The purpose is to make the array closer to order; when gap == 1, that is Each element is an independent group, which becomes direct insertion sort;

The selection value of gap is not certain. We select the value according to approximately one-third of the array length; for example, {6,1,2,7,9,3,4,5,10,8,0} this array, We select the gap value according to gap = gap / 3 + 1. The array is divided into gap == 4 groups, and the element interval between each group is gap == 4 elements. As shown in the figure, the line segments of different colors represent Different gap groups:
Insert image description here
The sorted array of each gap group is as shown in the upper data of the original array:
Insert image description here

Then the gap continues to be calculated according to the above value method. The gap is 2. According to the grouping of gap == 2, it is divided into the following groups. There are two groups in total. The element interval between each group is gap == 2 elements:
Insert image description here

The sorted array for each gap group is as shown in the upper data of the original array:

Insert image description here
It can be seen from the current arrangement of the array that the array is very close to order. At this time, we only need to continue to take the gap value according to the gap value method, and we will get gap == 1, that is, direct insertion sorting, so that we sort Okay, there is an array; the reason why gap takes value according to gap = gap / 3 + 1 is because the last + 1 can guarantee that the value of the last gap must be 1, that is, the last sorting will definitely perform direct insertion sort;

The implemented code is as follows:

		//希尔排序
		void ShellSort(int* a, int n)
		{
			int gap = n;
			while (gap > 1)
			{
				// +1保证最后一次一定是1
				gap = gap / 3 + 1;
				
				// 多组并排
				// i < n - gap 保证与数组的长度有 gap 的距离,不会越界;并分成了 gap 组;
				for (int i = 0; i < n - gap; i++)
				{
					// 以 gap 为间距直接进行插入排序,多个组同时进行插入排序
					int end = i;
					int tmp = a[end + gap];
					while (end >= 0)
					{
						if (a[end] > tmp)
						{
							a[end + gap] = a[end];
							end -= gap;
						}
						else
						{
							break;
						}
		
					}
					a[end + gap] = tmp;
				}
			}
		}

The time complexity of Hill sorting is difficult to calculate, because there are many ways to value the gap, which makes it difficult to calculate. The overall time complexity is O(NlogN)~O(N^2), and the best case time complexity is is O(N^1.3); the space complexity is O(1) because no additional space is used; the stability of Hill sorting is unstable;

3. Direct selection sorting

The idea of ​​selection sorting is to select the smallest (or largest) element from the data elements to be sorted each time and store it at the beginning of the sequence until all the data elements to be sorted are exhausted.

The animation is as follows. The idea provided by the animation is to select only one smallest element at a time and put it on the far left of the array. Our idea is to select the largest element and the smallest element at the same time, put the largest element on the rightmost, and the smallest element on the farthest. On the left, this is a small optimization;
Insert image description here

For example, in the array {5, 3, 4, 1, 2}, begin and end record the head and tail of the array, maxi and mini record the subscripts of the largest element and the smallest element except the sorted elements, as shown below, At this time, begin and end maintain this array. Currently, this array is unordered. Both maxi and mini start traversing from begin to find the subscripts of the largest element and the smallest element respectively; then first a[maxi] and a[end ] Perform exchange, put the largest element at the end; then exchange a[mini] and a[begin], put the smallest element in the front; finally begin++, end- -, reduce the range of the array;

Insert image description here

Perform the second selection sorting. At this time, mini and end overlap. If a[maxi] and a[end] are exchanged first, the original a[mini] is the smallest element, and after the exchange, it will become the original a[maxi]. That is, the current a[end] has become the largest element (because end and mini overlap), so a judgment needs to be made at this time. If mini and end overlap, it means that the original mini has now been moved to the position of maxi. So it is necessary to perform mini = maxi operation;
Insert image description here
after exchange: Insert image description here
after correction:
Insert image description here
finally sorted:
Insert image description here

Here is the reference code:

		void Swap(int* p1, int* p2)
		{
			int tmp = *p1;
			*p1 = *p2;
			*p2 = tmp;
		}
		
		//选择排序
		void SelectSort(int* a, int n)
		{
			int begin = 0, end = n - 1;
			while (begin < end)
			{
				int maxi = begin, mini = begin;
				for (int i = begin; i <= end; i++)
				{
					if (a[i] > a[maxi])
					{
						maxi = i;
					}
					if (a[i] < a[mini])
					{
						mini = i;
					}
				}
				
				Swap(&a[end], &a[maxi]);
		
				//end 和 mini 重合
				if (mini == end)
					mini = maxi;
				
				Swap(&a[begin], &a[mini]);
				begin++;
				end--;
			}
		}

Summary of features of direct selection sort:

  1. Direct selection sorting is easy to understand, but the efficiency is not very good and is rarely used in practice.
  2. Time complexity: O(N^2)
  3. Space complexity: O(1)
  4. Stability: Unstable

4. Heap sort

Heapsort refers to a sorting algorithm designed using a data structure such as a stacked tree (heap). It is a type of selection sorting. It selects data through the heap. It should be noted that a large pile should be built in ascending order, and a small pile should be built in descending order.

For example, an array {5, 2, 1, 3, 7, 6, 4}, the tree structure of this array is as follows:
Insert image description here

Build it into a big pile. The idea of ​​building a pile will not be elaborated here. For details, please see the previous blog link Binary Tree-Heap . The built big pile is as shown below:

Insert image description here
The idea of ​​​​heap sorting is to first build a heap. Now that a large heap has been established, a large heap must be built in ascending order, because in the large heap, the larger ones are in front. Each time, the values ​​of the data at the top of the heap and the data at the end of the heap are compared. Swap, after the swap, the length is reduced by one, which is equivalent to putting the largest one at the back and not moving, and then adjusting downward from the top of the heap, the next largest one is adjusted to the top of the heap, and then exchanged with the value of the penultimate data... Until the length is reduced to 0, the sorting is completed;

For example, in the big heap in the picture above, the size decreases after the exchange of 7 and 4. On the surface, the logical structure of the heap we operate is a heap. In fact, we operate an array, so after the exchange, 7 is exchanged to the end of the array, and 7 is the largest element, so reducing the length by one means that an element has been sorted. After sorting an element, continue to adjust downward from the top of the heap, because except for the elements at the top of the heap, it is already a heap, so it can Start directly from the top of the heap and adjust the algorithm downwards to continue building the heap;

Insert image description here

The reference code is as follows:

		//向下调整算法
		void AdjustDown(int* a, int n, int parent)
		{
			int child = 2 * parent + 1;
		
			while (child < n)
			{
				if (child + 1 < n && a[child] < a[child + 1])
				{
					child++;
				}
		
				if (a[child] > a[parent])
				{
					Swap(&a[child], &a[parent]);
		
					parent = child;
					child = 2 * parent + 1;
				}
		
				else
				{
					break;
				}
			}
		}
		
		
		//堆排序
		void HeapSort(int* a, int n)
		{
			//建堆
			for (int i = (n - 1 - 1) / 2; i >= 0; i--)
			{
				AdjustDown(a, n, i);
			}
		
			// 交换数据后调整堆顶的数据			
			while (n)
			{
				Swap(&a[0], &a[n - 1]);
				n--;
				AdjustDown(a, n, 0);
			}
		}

Summary of features of heap sort:

  1. Heap sort uses a heap to select numbers, which is much more efficient.
  2. Time complexity: O(N * logN), the consumption of time complexity is mainly to find the second largest/second smallest value on the top of the heap after exchanging data; because after exchanging data, except for the last element and the elements on the top of the heap , other elements are already in the heap, so to find the second largest/second smallest element on the top of the heap, the time complexity is O(logN), and there are N elements in total, so the overall time complexity is O(N*logN) ;
  3. Space complexity: O(1)
  4. Stability: Unstable

5. Bubble sort

The idea of ​​bubble sorting is to compare two by two, and put the larger element at the back. Until the array is traversed, the largest element is placed at the end; then a second comparison is performed, and the next largest element is placed at the back. The element is placed at the penultimate position. Assuming there are n elements, a total of n times must be compared, and each of the n elements must be compared pairwise, so the time complexity of bubble sorting is O (N^2);

The animation of bubble sort is as follows:
Insert image description here
The reference code is as follows:

		//冒泡排序
		void BubbleSort(int* a, int n)
		{
			// 每一趟
			for (int i = 0; i < n; i++)
			{	
				// 每一趟的两两比较
				// flag 标记,如果这一趟没有进行交换,说明数组已经是有序的,提前跳出循环
				int flag = 1;
				for (int j = 1; j < n - i; j++)
				{
					if (a[j - 1] > a[j])
					{
						Swap(&a[j],&a[j - 1]);
		
						flag = 0;
		 			}
				}
		
				if (flag)
					break;
			}
		}

There is a small optimization here, which is for arrays that are already sorted, marked with flag 1. If no exchange is performed in this pass, it means that the array is already in order, and there is no need to exchange, and there is no need to compare. So jump out of the loop directly in advance;

Summary of features of bubble sort:

  1. Bubble sorting is a very easy-to-understand sorting, suitable for beginners to understand, and has teaching significance;
  2. Time complexity: O(N^2)
  3. Space complexity: O(1)
  4. Stability: stable

6. Quick sort

6.1 Recursive implementation of quick sort

The basic idea of ​​quick sort is: take any element in the sequence of elements to be sorted as the benchmark value, and divide the set to be sorted into two subsequences according to the sorting code. All elements in the left subsequence are less than the benchmark value, and all elements in the right subsequence are smaller than the benchmark value. All elements are greater than the baseline value, and then the process is repeated for the left and right subsequences until all elements are arranged in their corresponding positions.

To put it simply, it is to select a relatively centered value key in the array, put the elements smaller than the key to the left of the key, and place the elements larger than the key to the right of the key; and select the array range to the left of the key For the new middle value (key) of this interval, repeat the above operation, and then repeat the operation on the right side of the key. Finally, the left and right sides of the key are ordered, and the array is naturally ordered; of course, the selected value of the key is Pay attention to it, let me analyze it one by one below;

First, we first find a way to select the value of each key and divide the selected key values. Here are three ideas for your reference:

Idea 1. hoare version

Let’s take a look at the hoare version of the animation idea first:

Insert image description here

Obviously, the idea is to define key as the leftmost element each time, and then define two subscripts L and R. L looks for elements larger than key, and R looks for elements smaller than key. After finding them, swap the subscripts to L and R. elements; then through this idea, we can get the following code:

		// 快排排单趟 --- hoare法
		int PartSort1(int* a, int left, int right)
		{
			int keyi = left;
			while (left < right)
			{
		
				while (left < right && a[right] >= a[keyi])
				{
					right--;
				}
		
				while (left < right && a[left] <= a[keyi])
				{
					left++;
				}
		
				Swap(&a[left], &a[right]);
			}
		
			Swap(&a[keyi], &a[left]);
		
			return left;
		}

So everyone must have a question, how can we ensure the accuracy of the last exchange?

First of all, we define key as the leftmost element. In fact, it can also be defined as the rightmost element. It depends on your choice. If we define key as the leftmost element, then we definitely hope that the last exchange with key will be longer than key. Small elements, because elements smaller than key must be placed on the left; then how to ensure that the position where L and R meet must be smaller than key?

This is related to who goes first, L or R. Suppose we let L go first, for example, the array {6, 1, 2, 7, 9, 3, 4, 5, 10, 8} in the above animation, As shown below, let L go first:

Insert image description here

As can be seen from the picture, the final position where L and R meet is 9, which is not a value smaller than the key we want; and in the first animated picture, R goes first, and R goes first. The final result satisfies us. Required; what is the reason for this situation?

The reason is very simple. L essentially wants to find a value larger than key, while R wants to find a value smaller than key. If L goes first and R goes second, after they find the corresponding value and exchange, L starts again. A round of searching, looking for a value larger than key, and after the previous round of exchange, the element currently staying in L is a value larger than key. If L does not encounter a value larger than key before meeting R, then L will eventually The staying position must be where R is, and because they have already met, R cannot move, so the final value exchanged with key is a value larger than key, which does not meet our expectations;

On the contrary, if R goes first and L goes later, after a round of exchange, the position where L stays is a value smaller than the key, and the position where R stays is a value larger than the key. In the new round, R will also go first. , if no value smaller than key is encountered before encountering L, then the meeting point of R and L must be a value smaller than key; even if R encounters a value smaller than key before encountering L, as L moves, L will definitely meet R, and their meeting point must be smaller than key, so the meeting point and key exchange meet our expectations;

The above is the idea of ​​hoare version. Next we introduce another idea;

Idea 2: Digging method

As is the old rule, let’s look at the idea of ​​the animation first:
Insert image description here
the idea is very simple, that is, regard the leftmost element as the key, hollow out the key position, and then define two subscripts L and R. Find L larger than the key Element, R looks for elements smaller than key, because we hollow out the leftmost element first, and we expect that all elements on the left are smaller than key, so we also let R go first and find elements smaller than key. Then put it into the pit, forming a new pit by itself, then walk L, find an element larger than key, put it into the pit, and form a new pit by itself. Repeat this step until L and R meet, and the meeting position is the pit. Just put the key back into the pit; the reference code is as follows:

		// 快排排单趟 --- 挖坑法
		int PartSort2(int* a, int left, int right)
		{
			int key = a[left];
			int hole = left;
		
			while (left < right)
			{
				// 右边找比 key 小的
				while (left < right && a[right] >= key)
				{
					right--;
				}
		
				a[hole] = a[right];
				hole = right;
		
				// 左边找比 key 大的
				while (left < right && a[left] <= key)
				{
					left++;
				}
		
				a[hole] = a[left];
				hole = left;
			}
		
			a[hole] = key;
			return hole;
		}

Idea three, front and back pointer method

There is another idea called the front and rear pointer method. Let’s look at the idea of ​​the animation first:
Insert image description here
As can be seen from the figure, the idea of ​​the front and rear pointer method is also easy to understand. Define two pointers prev and cur, and also regard the leftmost element as key, cur finds an element smaller than key, and exchanges it with the last position of prev, so that the elements from key + 1 to prev are all elements smaller than key, and the elements from prev + 1 to cur are all elements larger than key; until cur is Empty, the position of prev must be an element smaller than key, and finally the positions of key and prev can be exchanged;

The reference code is as follows:

		// 快排排单趟 --- 前后指针法
		int PartSort3(int* a, int left, int right)
		{
			int keyi = left, cur = left + 1, prev = left;
			while (cur <= right)
			{
		
				if (a[cur] < a[keyi] && ++prev != cur)
				{
					Swap(&a[prev], &a[cur]);
				}
		
				cur++;
			}
			Swap(&a[prev], &a[keyi]);
		
			keyi = prev;
			return keyi;
		}

The above are our three ideas for key segmentation, so how should we implement quick sorting?

Since the splitting operation is a bit like the pre-order traversal in the binary tree we learned earlier, the key is like the root node, so we can use recursive thinking to implement it;

		// 快排 --- 递归实现
		void QuickSort(int* a, int left, int right)
		{
			if (left >= right)
				return;
		
			int keyi = PartSort3(a, left, right - 1);
			
		
			QuickSort(a, left, keyi);
			QuickSort(a, keyi + 1, right);
		}

As can be seen from the code, we take the front and back pointer method as an example. We first take out the subscript keyi of the key, and then divide the left and right intervals by key, that is, recurse them, and finally stop when left >= right Recursion.

In this way, our quick sort is implemented, but there are still some flaws in this quick sort: Just imagine, our key is selected according to the leftmost value every time. If every time the leftmost value is the smaller value in the array, elements, unnecessary recursion will be performed, and the efficiency will slow down. Therefore, to solve this problem, we have the idea of ​​selecting the key from three numbers . The reference code for this idea is as follows:

		//快排优化:三数取中
		int GetMidIndex(int* a, int left, int right)
		{
			int mid = (left + right) / 2;
		
			if (a[left] < a[mid])
			{
				if (a[mid] < a[right])
				{
					return mid;
				}
		
				if (a[left] < a[right])
				{
					return right;
				}
		
				else
				{
					return left;
				}
			}
		
			// a[left] > a[mid]
			else
			{
				if (a[mid] > a[right])
				{
					return mid;
				}
		
				if (a[left] > a[right])
				{
					return right;
				}
		
				else
				{
					return left;
				}
			}
		}

We select the middle subscript mid for the subscripts left and right, then compare these three elements in pairs, and return the subscript of the middle-sized element, which greatly increases the randomness of key selection;

So how should we use this function?
It's very simple. Suppose we take the front and back pointer method as an example. We only need to add this function at the beginning of the front and back pointer method function; pass the subscripts left and right into the GetMidIndex function, get the subscript midi of the middle number element, and then Just exchange the elements subscripted left and midi;

		// 快排排单趟 --- 前后指针法
		int PartSort3(int* a, int left, int right)
		{
			int midi = GetMidIndex(a, left, right);
			Swap(&a[left], &a[midi]);
		
		
			int keyi = left, cur = left + 1, prev = left;
			while (cur <= right)
			{
		
				if (a[cur] < a[keyi] && ++prev != cur)
				{
					Swap(&a[prev], &a[cur]);
				}
		
				cur++;
			}
			Swap(&a[prev], &a[keyi]);
		
			keyi = prev;
			return keyi;
		}

The above recursive implementation of quick sort is relatively complete, but some special cases have not been solved accordingly. For example, when dealing with a large number of identical elements, it is very likely that the same elements will be obtained when selecting three numbers. Unnecessary recursion will also be repeated, which greatly reduces efficiency. The solution to this problem is called three-way division . If you are interested, you can learn about it yourself.

Summary of quick sort features:

  1. The overall comprehensive performance and usage scenarios of quick sort are relatively good, so it is called quick sort.
  2. Time complexity: O(N*logN)
  3. Space complexity: O(logN) (recursion consumes the space of the stack frame)
  4. Stability: Unstable

6.2 Non-recursive implementation of quick sort

The basic idea of ​​non-recursive implementation of quick sort is: using the stack to simulate recursive operations, strictly speaking, it does not simulate the implementation of recursion, but using the stack to implement operations that are more like recursions;

For example, in the array {6, 1, 2, 7, 9, 3, 4, 5, 10, 8}, assume that we use the leftmost one as the key each time, as shown in the figure below. The figure below only executes until the second time to get keyi Value:
Insert image description here
As shown in the picture above, when you get the value of keyi for the second time, you actually repeat the operation at the beginning of the picture above, and continue to push the left and right ranges into the stack. According to the characteristics of the stack, the last in, first out, the stack will process the last in first. The element subscript of , what we simulated above is the left interval of keyi, so the stack will first process the left interval of keyi, and finally process the right interval of keyi;

Secondly, to implement stack simulation, we need to have a stack first. According to the previous review, we directly use the previously implemented stack. Please see the link stack and queue for details .

The reference code is as follows:

		// 快排 --- 非递归
		void QuickSortNonR(int* a, int begin, int end)
		{
			ST st;
			STInit(&st);
			
		    // 一开始先将两边的元素入栈
			STPushTop(&st, end - 1);
			STPushTop(&st, begin);
		
			// 栈不为空就继续
			while (!STIsEmpty(&st))
			{
				// 取一次,出一次栈
				int left = STTop(&st);
				STPopTop(&st);
		
				// 取一次,出一次栈
				int right = STTop(&st);
				STPopTop(&st);
		
				// 取出 keyi 的值
				int keyi = PartSort3(a, left, right);
		
				// 在符合的区间内就继续将其左右区间入栈
				if (keyi + 1 < right)
				{
					STPushTop(&st, right);
					STPushTop(&st, keyi + 1);
				}
		
				if (left < keyi - 1)
				{
					STPushTop(&st, keyi - 1);
					STPushTop(&st, left);
				}
			}
		
			STDestroy(&st);
		}

7. Merge sort

7.1 Recursive implementation of merge sort

Basic idea: Merge sort is an effective sorting algorithm based on merge operations. This algorithm is a very typical application of the divide-and-conquer method. Merge the already ordered subsequences to obtain a completely ordered sequence; that is, first make each subsequence orderly, and then make the subsequence segments orderly. If two ordered lists are merged into one ordered list, it is called a two-way merge.

Observe the idea of ​​​​animation below:

Insert image description here
For example, the array {10, 6, 7, 1, 3, 9, 2, 4}, observe the more intuitive animation: Insert image description here
Based on the above ideas, we first thought that its idea is a bit like the post-order traversal in a binary tree. First, Its subsequences are arranged in order, and finally the two relatively ordered subsequences are merged, so we can also use recursive ideas here to implement operations similar to post-order traversal;

We first need a sub-function to divide and sort the sub-sequence:

		// 归并的区间划分
		void PartOfMergeSort(int* a, int begin, int end, int* tmp)
		{
			if (begin == end)
				return;
		
			// 小区间优化
			if (end - begin + 1 < 10)
			{
				InsertSort(a + begin, end - begin + 1);
				return;
			}
		
			int mid = (begin + end) / 2;
		
			// 划分的区间为:
			// [begin,mid] [mid + 1,end]
			PartOfMergeSort(a, begin, mid, tmp);
			PartOfMergeSort(a, mid + 1, end, tmp);
		
			// 对每个区间进行归并排序
			int begin1 = begin, end1 = mid;
			int begin2 = mid + 1, end2 = end;
			int pos = begin;
		
			while (begin1 <= end1 && begin2 <= end2)
			{
				if (a[begin1] <= a[begin2])
					tmp[pos++] = a[begin1++];
				
				else
					tmp[pos++] = a[begin2++];
			}
		
			while (begin1 <= end1)
				tmp[pos++] = a[begin1++];
			
		
			while (begin2 <= end2)
				tmp[pos++] = a[begin2++];
				
			// 将这段已经排序好的空间拷贝回原数组
			memcpy(a + begin, tmp + begin, sizeof(int) * (end - begin + 1));
		}

In the above function, every time you enter the function, the middle subscript will be taken, the area will be divided, and its left and right sub-ranges will be recursed. The condition for stopping the recursion at the end is begin == end, and then return to the previous level of recursion. The subsequences of one level are merged and sorted. After each subsequence is sorted, it is copied back to the original array, and then continues to return to the previous level to sort the subsequences of the previous level until it returns to the first level. Returning to the first level, the left and right subsequences are The sequence has been sorted, just perform the last merge sort;

Secondly, we can see that we added a small optimization to the above function, that is, when the elements in the interval are less than 10, we choose direct insertion sorting. The reason is because when the elements in the interval are less than 10, continuing the recursion will consume For more space and efficiency, it is better to replace this unnecessary recursion with direct insertion sort;

		// 归并 --- 递归
		void MergeSort(int* a, int n)
		{
			// 需要一段空间进行临时拷贝
			int* tmp = (int*)malloc(sizeof(int) * n);
			PartOfMergeSort(a, 0, n - 1, tmp);
			free(tmp);
		}

7.2 Non-recursive implementation of merge sort

The basic idea of ​​non-recursive implementation of merge sort is to control the value of gap, and regard 2*gap as a subsequence. After the subsequences of this round of gap are sorted, gap *= 2, and then merge the next subsequence. , until the value of gap is greater than the length of the array, it will end;

For example array { 10, 6, 7, 1, 3, 9, 4, 2 },
when gap == 1:
Insert image description here
when gap == 2:
Insert image description here
when gap == 4:
Insert image description here
as shown above, when gap == 4, the array is already After sorting, just copy the array back to the original array at this time; the reference code is as follows:

		// 归并 --- 非递归
		void MergeSortNonR(int* a, int n)
		{
			int* tmp = (int*)malloc(sizeof(int) * n);
			assert(tmp);
		
			int gap = 1;
			while (gap < n)
			{
				int pos = 0;
				for (int i = 0; i < n; i += 2 * gap)
				{
					// 给定两个归并区间的范围
					int begin1 = i, end1 = i + gap - 1;
					int begin2 = i + gap, end2 = i + 2 * gap - 1;
		
					// 有一个区间结束就结束
					while (begin1 <= end1 && begin2 <= end2)
					{
						if (a[begin1] <= a[begin2])
						{
							tmp[pos++] = a[begin1++];
						}
		
						else
						{
							tmp[pos++] = a[begin2++];
						}
					}
					
					// 判断两个区间是否都结束了
					while (begin1 <= end1)
					{
						tmp[pos++] = a[begin1++];
					}
		
					while (begin2 <= end2)
					{
						tmp[pos++] = a[begin2++];
					}
		
				}
			
				// 更新 gap
				gap *= 2;
			}
		}

At this time we have to face a question, when we add 1 to 2 data, will the result be the same? We can see it by drawing a picture. When the array is {10, 6, 7, 1, 3, 9, 4, 2, 0}, that is, the above array adds a 0. The drawing is as follows:
Insert image description here
From the picture It can be seen that when gap == 1, the problem has already occurred, end1, begin2, end2 are all out of bounds;

Some people think that an odd number of elements will not work, and when the array is {10, 6, 7, 1, 3, 9, 4, 2, 0, 5}, that is, an additional element is added to the above array. At this time There are 10 elements, and the drawing is as follows:
Insert image description here
When there is an even number of elements, it still crosses the boundary. At this time, we have to face a problem, that is, when dividing the range into the interval, the boundary interval may face the problem of crossing the boundary. At this time We need to correct the scope of the boundary. There are two correction options:

Option 1 : Because begin1 == i, and i cannot cross the boundary, begin1 cannot cross the boundary, and end1, begin2, and end2 may all cross the boundary. At this time, we can make the following corrections:

			// 修正边界值(方法一:适用归并一组拷贝一组)
			if (end1 >= n || begin2 >= n)
			{
				break;
			}

			if (end2 >= n)
			{
				end2 = n - 1;
			}

Add it to the function as follows:

		// 归并 --- 非递归
		void MergeSortNonR(int* a, int n)
		{
			int* tmp = (int*)malloc(sizeof(int) * n);
			assert(tmp);
		
			int gap = 1;
			while (gap < n)
			{
				int pos = 0;
				for (int i = 0; i < n; i += 2 * gap)
				{
					int begin1 = i, end1 = i + gap - 1;
					int begin2 = i + gap, end2 = i + 2 * gap - 1;
		
					// 修正边界值(方法一:适用归并一组拷贝一组)
					if (end1 >= n || begin2 >= n)
					{
						break;
					}
		
					if (end2 >= n)
					{
						end2 = n - 1;
					}
		
					while (begin1 <= end1 && begin2 <= end2)
					{
						if (a[begin1] <= a[begin2])
						{
							tmp[pos++] = a[begin1++];
						}
		
						else
						{
							tmp[pos++] = a[begin2++];
						}
					}
		
					while (begin1 <= end1)
					{
						tmp[pos++] = a[begin1++];
					}
		
					while (begin2 <= end2)
					{
						tmp[pos++] = a[begin2++];
					}
		
					// 归并一组,拷贝一组
					memcpy(a + i, tmp + i, sizeof(int) * (end2 - i + 1));
				}
				gap *= 2;
			}
		}

Note that solution one requires merging a group and copying a group. Its solution is to jump out of the loop directly when begin2 or end1 crosses the boundary. This interval will not be moved in the original array;

Option 2 : Add it directly to the function as follows:

		// 归并 --- 非递归
		void MergeSortNonR(int* a, int n)
		{
			int* tmp = (int*)malloc(sizeof(int) * n);
			assert(tmp);
		
			int gap = 1;
			while (gap < n)
			{
				int pos = 0;
				for (int i = 0; i < n; i += 2 * gap)
				{
					// 给定两个归并区间的范围
					int begin1 = i, end1 = i + gap - 1;
					int begin2 = i + gap, end2 = i + 2 * gap - 1;
		
					// 修正边界值(方法二:适用归并完当前 gap 再拷贝)
					if (end1 >= n)
					{
						end1 = n - 1;
		
						// 将第二个区间变成不存在的区间
						begin2 = n;
						end2 = n - 1;
					}
		
		
					else if (begin2 >= n)
					{
						// 变成不存在的区间
						begin2 = n;
						end2 = n - 1;
					}
		
					else if (end2 >= n)
					{
						end2 = n - 1;
					}
		
					// 有一个区间结束就结束
					while (begin1 <= end1 && begin2 <= end2)
					{
						if (a[begin1] <= a[begin2])
						{
							tmp[pos++] = a[begin1++];
						}
		
						else
						{
							tmp[pos++] = a[begin2++];
						}
					}
		
					// 判断两个区间是否都结束了	
					while (begin1 <= end1)
					{
						tmp[pos++] = a[begin1++];
					}
		
					while (begin2 <= end2)
					{
						tmp[pos++] = a[begin2++];
					}
				}
		
				// 归并完当前 gap 全部拷贝
				memcpy(a, tmp, sizeof(int) * n);
				gap *= 2;
			}
		}

The idea of ​​the second option is to modify all the out-of-border boundary values, and only need to modify it to begin2 > end2; this modification option can directly merge the current gap group without merging a group or copying a group. Then, copy it back to the original array at once;

The above is an analysis of the ideas of merge sort, and a summary of the characteristics of merge sort:

  1. The disadvantage of merging is that it requires O(N) space complexity. The thinking of merging and sorting is more about solving the external sorting problem on the disk.
  2. Time complexity: O(N*logN)
  3. Space complexity: O(N)
  4. Stability: stable

*8. Counting sort

Counting sorting is a non-comparative sorting. It uses another array hash to record the number of occurrences of elements in the array to be sorted, then traverses the hash array once, and puts the appearing elements into the array in order, and decrements after each placement. Once, until the number of occurrences of elements that have appeared is reduced to 0, which is equivalent to sorting;

This sorting algorithm only needs to be understood, because it has great limitations and has two major flaws:
Flaw 1 : It depends on the data range and is suitable for arrays in the range;
Flaw 2 : It can only be used for shaping;

So I won’t do too much analysis here, and interested partners can learn about it by themselves;
the reference code is as follows:

		// 计数排序
		void CountSort(int* a, int n)
		{
			// 找出最大的元素和最小的元素
			int max = a[0], min = a[0];
			for (int i = 0; i < n; i++)
			{
				if (a[i] > max)
				{
					max = a[i];
				}
		
				if (a[i] < min)
				{
					min = a[i];
				}
			}
		
			// 计算这个数组的最大值和最小值的范围
			// 计算相对范围
			int range = max - min + 1;
		
			// 开辟空间,长度就是相对的范围
			int* hash = (int*)malloc(sizeof(int) * range);
			assert(hash);
		
			// 将空间初始化为 0 
			memset(hash, 0, sizeof(int) * range);
		
			// 统计某个元素在相对位置出现的次数
			for (int i = 0; i < n; i++)
			{
				hash[a[i] - min]++;
			}
		
			// 遍历相对范围,如果相对位置不为 0,说明出现过,就将这个元素的相对值放入元素中覆盖即可,然后出现的次数自减
			int pos = 0;
			for (int i = 0; i < range; i++)
			{
				while (hash[i] != 0)
				{
					a[pos++] = i + min;
					hash[i]--;
				}
			}
		}

3. Complexity and stability of various sorting

First of all, we need to understand a concept, what is stability?
Stability : Assume that there are multiple records with the same keyword in the record sequence to be sorted. If sorted, the relative order of these records remains unchanged, that is, in the original sequence, r[i] = r[j] , and r[i] is before r[j], and in the sorted sequence, r[i] is still before r[j], then this sorting algorithm is called stable; otherwise it is called unstable.

Therefore, after analysis, we have obtained the time complexity, space complexity and stability of various sorting algorithms as follows: The above is
Insert image description here
my sharing of common sorting ideas. If there is anything incorrect or can be modified, thank you Point it out!

Guess you like

Origin blog.csdn.net/YoungMLet/article/details/131710968