[What algorithms must programmers master? 】

A programmer may encounter various algorithms in his life, but there are always a few algorithms that a programmer will definitely encounter and need to master with a high probability. Let’s talk about these very important “must catch!” algorithms today~

Introduction to Common Algorithms

The sorting algorithms introduced in this article all take ascending order as an example.

insert image description here


1. Direct insertion sort

Direct insertion sorting is to insert a piece of data at an appropriate position from a piece of data.
insert image description here

Case:
A picture to understand direct insertion sort
insert image description here
insert image description here

Implementation code:

void InsertSort(int * a,int n )
{
    
    
	for(int i =0;i<n-1;i++)
	{
    
    
		int end = i;
		//保存待插入元素
		int tmp = a[end+1];
		while(end>=0)
		{
    
    
			if(a[end]>tmp)
			{
    
    
				//把end往后挪
				a[end+1] = a[end];
				//end再往前走	
				end--;
			}
			else
			{
    
    
				break;
			}
		}
		//由于不管是在中间的任意地方插入还是在end的末尾插入(即tmp是最大的情况),
		//都是在end后面的位置插入,所以来到这里进行合并
		a[end+1] = tmp;
	}
}

Insertion sort time complexity

The time complexity of direct insertion sorting is: O(N^2), because the worst case is the case of reverse order:
insert image description here

The number of moves required for each insertion is: 1+2+3+4+...+n-2+n-1 = n*n/2

So the worst case time complexity is O(n^2)

2. Hill sort

Hill sort can be thought of as an optimized direct insertion sort.

The specific optimization process is as follows:

Given a gap, this gap is to divide the data to be inserted into gap groups, and the interval between each group is the gap length. Given a gap, this gap is to divide the data to be
inserted into gap groups, and the interval between each group
is the gap length.

Say important things three times.

for example:

insert image description here
Let gap = 3, that is, the interval of the data to be inserted is 3, which is different from direct insertion sorting. The interval between the first and second data is always 1 for direct insertion sorting. For Hill sorting, when gap = 3, the interval between the first data and the second data is 3.
When we compare the elements of the group two by two, the larger elements will go back faster.

insert image description here
The second round is to compare the elements to be inserted between 8 and 9, because the first element after 9 is no longer 7, but the data at the position of 9+gap, that is, 8 is then compared with 9 and 8, and 8 is inserted at position 9
.

Of course, this is a comparison within each group,

Looking at the entire Hill sort, it is performed in multiple groups at the same time .

It can be found that
when the gap is larger, the large elements are moved to the back faster
. When the gap is smaller, the small elements are moved to the back more slowly.

When gap == 1, it is equivalent to the direct insertion sort mentioned above.

Going back to the above case, gap = 3, so the data needs to be divided into 3 groups, and the interval of each group is 3 lengths.

insert image description here

As shown above: At this point each element can be covered.

insert image description here

It is equivalent to moving the large elements in the gap group to the back faster at the same time

We call the above process: pre-sorting

In other words, after the above operations are completed, the entire data is not in order, but close to order

For example, in the above case, after pre-sorting, the entire set of data is:

insert image description here
At this time, it is close to order, so gap = 1 at this time, that is, when it is finally close to order, direct insertion sorting can be performed.

Note : The value of gap is uncertain:
the larger the value of gap, the faster the large data will be moved to the back, but the closer to the order. The
smaller the gap value, the slower the large data will be moved to the back, but the closer to the order

In short, the gap must not be fixed, and the value of the gap must be 1 in the end.

The value of gap should gradually transition from large to small.

The value of gap is generally:

initialize gap=n,

When entering reincarnation:

gap = gap/3+1 or gap = gap/2 , after each round, the gap will decrease.

When the value of gap is gap = gap/2, the time complexity is: O(N*logN) , logN is the logarithm of base N with 2

The worst case is also in reverse order:

insert image description here
The last round of gap = 1, this time is direct insertion sorting, then N/2/2/2/.../2 = 1,
every time there is a gap, the gap will be /2, the last gap = 1, the number of times to compare is logN (the logarithm of N with base 2)

Implementation code:

void ShellSort(int* a, int n)
{
    
    
	//当gap越大,大的值越快到达最后的位置,但越不接近有序
	//当gap越小,大的值越慢到达最后的位置,但越接近有序

	//当gap值越接近1,排序越接近顺序
	//刚gap == 1时,就是直接插入排序
	int gap = n;

	while (gap > 1)
	{
    
    
		//两种方式均可,gap可以任取任何值,但是都必须保证gap最后一定为1
		//gap = gap / 2;
		gap = gap / 3 + 1;

		//在这里就是把间隔多组的数据同时排列
		for (int i = 0; i < n - gap; i++)
		{
    
    

			int end = i;
			int tmp = a[end + gap];

			while (end >= 0)
			{
    
    

				//小于的情况,需要挪动数据
				if (a[end] > tmp)
				{
    
    
					a[end + gap] = a[end];
					end -= gap;
				}
				//大于或者等于的情况,直接插入end后面
				else
				{
    
    
					break;
				}
			}
			//由于最终都需要插入end后面,所以在循环之外插入
			a[end + gap] = tmp;
		}

	}
}

Summary: Hill sorting introduces a gap based on direct insertion sorting. This gap divides the data into gap groups, and the interval between elements in each group is also gap.
The gap will gradually decrease each time, and the final gap must be 1. When the gap is 1, it means that the pre-sorting is completed, and the
last step is to perform direct insertion sorting.

Hill sort time complexity

The total time complexity is the number of times to traverse the entire set of elements: O(N)*the number of insertions per traversal O(logN)
—> O(N * logN)

In the same way: when the change of gap is gap = gap/3-1, in the worst case (in reverse order), the number of insertions required for each round is

(((N/3+1) /3+1)/3+1)... = 1
For time complexity: the +1 item can be ignored, so the number of insertions in each round is log3 (N), the logarithm of N with base 3

The total time complexity is O(N*log3 (N))

According to previous calculations, the average time complexity of Hill sorting is:
O(N^1.3)
insert image description here


List of previous knowledge:
insert image description here

3. Selection sort

Direct selection sorting finds the maximum value and minimum value through each round of comparison, exchanges the node with the maximum value with the right side, and exchanges the node with the minimum value with the left side, so as to achieve the effect of sorting in ascending order.

insert image description here

If the subscript of the leftmost data is left, the subscript of the rightmost data is right.

Selection sorting is to select the two data of max and min in each round, exchange the data of max and right subscripts, and exchange the data of min and left subscripts. After the exchange, right–, left++, so that the second round will find the second smallest and second largest data, and then in turn.

insert image description here

Implementation code:

void SelectSort(ShellDataType* a, int n)
{
    
    
	//左下标 和右下标
	int left = 0;
	int right = n - 1;
	
	//不需要left <= right,最后一个元素不需要再交换,当然给<=也没问题
	while (left < right)
	{
    
    
		//假设最小最大全部在left
		int mini = left, maxi = left;
		//单趟查找最小值和最大值下标
		for (int i = left; i < right; i++)
		{
    
    
			//找到最小的,更新下标
			if (a[i] < a[mini])
			{
    
    
				mini = i;
			}
			//找到最大的,更新下标
			if (a[i] > a[maxi])
			{
    
    
				maxi = i;
			}
		}
		//maxi和right交换,mini和left交换

		Swap(&a[left], &a[mini]);
		//这里存在特殊情况,如果maxi在left位置,left和mini交换了之后,最大值的下标就是mini了
		//所以这里需要判断一下,如果真的是这种情况,就更新最大值下标。
		if (maxi == left)
		{
    
    
			maxi = mini;
		}
		Swap(&a[right], &a[maxi]);
		//后面的不需要再更新了,因为后面就算mini是在right位置,这轮也已经结束了,所以不需要再管它

		//更新left和right 的下标
		left++;
		right--;
	}

}

Direct Selection Sort Time Complexity

Each round of comparison needs to traverse the array to find the maximum and minimum values. The first round traverses N data, the second round is N-2 data, the third round is N-4..., the number of traverses is: N+N-2+N-4+...+1, and an arithmetic sequence is summed

So the total time complexity is O(N^2)

4. Heap sort

For the upward adjustment algorithm and downward adjustment algorithm, please refer to: Data Structure - Heap

The so-called heap sorting means sorting heaps, which requires a heap to be sorted, so if you want to sort an array for any continuous array, you need to build a heap first.

Use the upward adjustment method to build a heap as shown in the figure below:

insert image description here
The result is as follows:
insert image description here

The time complexity is O(N*logN)


Use the downward adjustment to build the heap as shown in the figure below:

insert image description here
Time complexity O(N)

Heap sort:

堆排序使用交换之后再向下调整原理:
在建了大根堆之后,堆顶的左右子树都是大根堆,不管最后一个元素是否是最小的,与堆顶元素交换后,
堆顶元素就被放到了堆尾,然后再让堆顶元素向下调整,因为此时堆顶元素是一个较小的元素,会向下调整,调整之后是第二大的元素在堆顶下一次再排序时,排的是堆尾的前一个了,那个最大的元素不用再排了,排的就是第二大的元素,
再让堆的最后一个元素与堆顶元素交换,再进行向下调整,调整完后第三大的元素就上来了。

After the heap is built, sort the heap. The heap sorting process diagram is as follows:

insert image description here

Implementation code:

void HeapSort(int* a, int n)
{
    
    
	
	assert(a);

	//1.先建堆,向上调整建堆,建堆的时间复杂度为O(N*logN)
	
	//也可以采用向下调整的方法建堆,向下调整的方法建堆的时间复杂度为O(N)
	//强烈建议采用向下调整的建堆方式
	//for (int i = 0; i < n; ++i)
	//{
    
    
	//	AdjustUp(a, i);
	//}
	//向下调整建堆,是从第一个非叶子节点开始调整,因为所有的叶子节点都不需要进行向下调整
	//child = (parent-1)/2
	//此时parent就是n-1
	for (int i = (n - 1 - 1) / 2; i >=0; -- i)
	{
    
    
		AdjustDown(a, n, i);
	}
	//现在是大根堆
	
	//2.堆排序,采用向下调整算法进行排序,让最后一个节点和堆顶节点进行交换,然后堆顶节点向下调整
	//调整完后继续倒数第二个节点和堆顶节点交换,以此类推

	for (int i = n-1; i >0; --i)
	{
    
    
		swap(&a[0], &a[i]);
		//这里传参不能传n,传n-1,因为交换之后最后一个数字就不需要参与进来了,相当于size--
		//堆排序使用交换之后再向下调整原理:
		//在建了大根堆之后,堆顶的左右子树都是大根堆,不管最后一个元素是否是最小的,与堆顶元素交换后
		//堆顶元素就被放到了堆尾,然后再让堆顶元素向下调整,因为此时堆顶元素是一个较小的元素,会向下调整,调整之后是第二大的元素在堆顶
		// 
		//下一次再排序时,排的是堆尾的前一个了,那个最大的元素不用再排了,排的就是第二大的元素,
		//再让堆的最后一个元素与堆顶元素交换,再进行向下调整,调整完后第三大的元素就上来了。
		AdjustDown(a, i, 0);
	}
	//总结:排升序的话,建大根堆
	//排降序建小根堆

	for (int i = 0; i < n; i++)
	{
    
    
		printf("%d ", a[i]);
	}
}

Heap sort time complexity

The time complexity of building a heap is O(N)
The time complexity of traversing N numbers in the adjustment process is O(N) The time
complexity of adjusting one number each time is O(logN)
The total time complexity is O(N+N*logN)
To sum up:

The time complexity of heap sorting is: O(N*logN)

Five, bubble sort

Bubble Sort: Compare two elements at a time, and exchange them if they are in the wrong order, and there is no need to exchange repeated sequences. The name of the bubble sort is derived from the fact that the smaller elements will slowly "float" to the top of the array through exchange. Equivalent to an ascending order operation.

insert image description here

For each bubble sort, a number is sorted. It can be visualized that a large number is placed on the bottom of the water, and a small number is placed on the surface of the water, and bubbles will slowly pop out.

Bubble sort implementation code:

void bubble_sort(int* arr, int sz)
{
    
    
	int i = 0;
	for (i = 0; i < sz - 1; i++)
	{
    
          //sz-1是冒泡排序的趟数
		int j = 0;
		
		for (j = 0; j < sz - 1 - i; j++)
		{
    
        //sz-1-i是每一趟冒泡排序要比较的元素个数
			if (arr[j] > arr[j + 1])//升序排序
			{
    
    
				int tmp = arr[j];
				arr[j] = arr[j + 1];
				arr[j + 1] = tmp;
			}
		}
	}
}

We found that: when only a few pieces of data in bubble sorting are out of order, when the sorting is completed, the entire data will be in order, and there is no need to compare at this time.
insert image description here
After a round of sorting, the data is already in order at this time, and you can quit and no longer compare.

So bubble sort can also be optimized:

void buuble_sort(int arr[], int sz)
{
    
    

	int i = 0;
	for (i = 0; i < sz; i++)
	{
    
    
		int j = 0;
		int flag = 1;//假设这一趟冒泡排序已经有序

 
		for (j = 0; j < sz - 1 - i; j++)
		{
    
    
			if (arr[j] > arr[j + 1])
			{
    
    
				int tmp = arr[j];
				arr[j] = arr[j + 1];
				arr[j + 1] = tmp;
				flag = 0;//如果没完全有序,则flag=0,我们才能知道是否完全有序
			}
 
		}
		if (flag == 1)
		{
    
    
			break;
		}
	}
}

bubble complexity

1. Time complexity: O(N^2)

2. Space complexity: O(1)

3. Stability: Stable

Six, quick sort

(The following is the recursive method)

Quick sort is a kind of sorting similar to binary tree structure.
Idea: Take any value in the data to be sorted as the reference value (key) , according to a certain method, put the data smaller than the key value on the left, and put the data larger than the key value on the right. When the key value is reached, it is a split point. The left side is smaller than it, and the right side is larger than it. Then use the key as the split point to perform the same operation on its left sub-interval and right sub-interval.

1. Hoare method (not recommended)

The Hoare method is a method proposed by Hoare, the founder of Quick Sort. This method is a bit difficult to understand, and there are many details that need attention. It is not recommended.

The idea is:
Given two subscripts, namely left and right, record the leftmost subscript and the rightmost subscript. Select a value as the key (generally choose the leftmost or rightmost value as the key), and its subscript is keyi. If you choose the left as the key, let the right go first. If you choose the right as the key, let the left go first.

Assume that the left side is selected as the key, and the right side goes first to find a value smaller than the key. If it is found, then it is the turn of the left side to go, and the left side looks for a larger value. Then repeat the process.

first step:
insert image description here

The second step:
insert image description here
the third step: recursion, first recurse the left side of keyi, and then recurse the right side of keyi.
Repeat the above operation.insert image description here

However, there are several defects in Hoare:
defect 1. When the data is ordered or reversed, every time we select the key, we choose the leftmost or the rightmost, which leads to the position of keyi is still the leftmost or the rightmost after each sorting. At this time, the number of recursions will be n times, which may cause stack overflow.
insert image description here

The method of selecting the benchmark value key (the method of quick sorting is available)

So we need to get the middle number as much as possible every time we go to the key to ensure that the left and right sub-intervals are relatively even when recursive.

1. Random method

At this time, there are two ways to get the key:
1. Random key method: It is to randomly take a key.

	// 随机选key
	int randi = left + (rand() % (right - left));
	//随机选到key后,把key放到左边的位置
	Swap(&a[left], &a[randi]);

2. Take the middle of three numbers (recommended)

2. (Recommended) The method of taking the middle of three numbers, the method of taking the middle of three numbers is to use the three numbers with left, right, and mid as subscripts to take the middle large number as the key.

For example: insert image description here
mid = (left+right)>>1 ;
a[left] = 6, a[right] = 8, a[mid] = 3; so the number that should be taken is 6. In this way, it is guaranteed that the number fetched each time is an intermediate number, and there will be no stack overflow when the recursion depth is too deep when the data is in order.

//三数取中法取key
//从左,右,中三个数选出一个不大不小的数作为key
int GetMidNumi(int *a, int left, int right)
{
    
    
	int mid = (left + right) / 2;
	//也可以这样写 , 右移一位除2,左移一位乘2,左移两位乘2^2,以此类推
	//int mid = (left + right) >> 1;
	if (a[left] < a[right])
	{
    
    
		if (a[left] > a[mid])
		{
    
    
			return left;
		}

		else if(a[right] < a[mid])
		{
    
    
			return right;
		}
		else
		{
    
    
			return mid;
		}
	}

	else
	{
    
    
		if (a[right] > a[mid])
		{
    
    
			return right;
		}

		if (a[mid] > a[left])
		{
    
    
			return left;
		}

		else
		{
    
    
			return mid;
		}

	}
}

Defect 2,

As mentioned earlier, when left is used as the key value, the right goes first, and the right finds the small value, and when it finds the small value, go to the left to find the large value.
Right now:

while (left < right)
	{
    
    
		while (a[right] > a[keyi])
			--right;

		while (a[left] < a[keyi])
			++left;
			
		Swap(&a[left], &a[right]);
	}

When the sorted data is 6 1 2 6 9 3 4 6 10 8, right goes to the left to find the small, and the found data is 6, and left finds the large, and the found data is also 6. At this time, after exchanging left and right, 6 remains unchanged, and the cycle is repeated .

The solution is to add an equal sign.
as follows:

while (left < right)
	{
    
    
		while (a[right] => a[keyi])
			--right;

		while (a[left] <= a[keyi])
			++left;
			
		Swap(&a[left], &a[right]);
	}

Defect 3:

If the sorted number is 1 2 3 4 5
, choose 1 as the key at this time, first find the small right, and it will continue –, – it will appear – smaller than left, and problems will arise.

insert image description here
On the whole, the following improvements are needed:

while (left < right)
	{
    
    
		while (left < right && a[right] => a[keyi])
			--right;

		while (left < right && a[left] <= a[keyi])
			++left;
			
		Swap(&a[left], &a[right]);
	}

Inter-cell optimization (available for every method)

Before understanding inter-cell optimization, we need to know a question:

When the amount of data is large, such as 10 million data, we need to sort it:
using a recursive method for sorting, it is inevitable that the recursion depth will be deep and the efficiency will be reduced.
insert image description here
According to the situation in the above figure, ideally, when there are N data, the minimum recursion depth is LogN.
At this time, the number of recursions in the last layer is the most, and it needs to recurse N/2 times. That is to say, when there are 100W data, the last layer needs to recurse 50W times! ! Then the penultimate layer needs to recurse 25W times, and the penultimate layer needs to recurse 12.5W times. If we can eliminate the number of recursion times in the last three layers, it can not only improve efficiency, but also reduce the stack space consumed by recursion.

Therefore, when the number of data is less than 10, there is no need to use quick sort, and we can use direct insertion sort instead of quick sort.
This is why you can see in the code below that when the number of data is less than 10, insertion sort is used.

Hoare implementation code

void QuickSort1(SortDataType* a, int left, int right)
{
    
    
	//递归结束条件
	if (left >= right)
		return;

	int keyi = PartSort1(a, left, right);

	//递归下去
	// [left, keyi-1] keyi [keyi+1, right] 

	//小区间优化,如果数据个数小于10个,用直接插入排序

	if (keyi - left + 1 <= 10)
	{
    
    
		InsertSort(a + left, keyi - left + 1);
	}
	else
	{
    
    
		QuickSort1(a, left, keyi - 1);
	}
	if (right - (keyi + 1) + 1 <= 10)
	{
    
    
		InsertSort(a + keyi + 1, right - (keyi + 1) + 1);
	}
	else
	{
    
    
		QuickSort1(a, keyi + 1, right);
	}

}

 // //Hoare
int PartSort1(SortDataType* a, int left, int right)
{
    
    

	 随机选key
	//int randi = left + (rand() % (right - left));
	随机选到key后,把key放到左边的位置
	//Swap(&a[left], &a[randi]);

	// 三数取中
	int midi = GetMidNumi(a, left, right);
	//把key值挪到left位置
	if (midi != left)
		Swap(&a[midi], &a[left]);
	//出现了随机选key和三数取中选key的原因:假如要排的数是已经有序或者完全逆序,
	//使用固定的选left下标的值为key的话,快排的时间复杂度就是O(N^2)
	//为了优化快排,就采取随机选key或者三数取中的方法

	//这是一轮
	//铁律:左边做keyi值右边先走,右边做key值左边先走,能保证L和R相遇位置一定比keyi小
	//原因:情况1.R先走,找小,找到了,然后到L走,L找大,找到了,交换
	//L和R相遇的位置,就一定是比key小的

	int keyi = left;
	while (left < right)
	{
    
    
		//排升序
		//右边找小
		//必须要给定left<right这个条件,否则如果是1 2 3 4 5这组数据,right会--到越界

		//必须要给等于号,否则可能会死循环
		//比如这组数据: 5 1 2 5 8 9 5 6 8
		//停下来的位置都是跟key相同的,两个相同的交换还是一样,就产生了死循环
		while (left < right && a[right] >= a[keyi])
			--right;

		//左边找大
		//必须要给定left<right这个条件,否则如果是5 4 3 2 1这组数据,left会++到越界

		while (left < right && a[left] <= a[keyi])
			++left;

		//找到之后交换,实现了比key小的在左边,比key大的在右边
		Swap(&a[left], &a[right]);
	}
	//退出循环就是left == right 了,那就交换keyi和left或者keyi和right都行

	Swap(&a[keyi], &a[left]);

	// [begin, keyi-1] keyi [keyi+1, end] 

	//完成了一轮排序,找到了一个keyi,返回
	//注意,返回的是下标,此时keyi经过交换之后,key的下标在left/right位置
	//所有返回的是left/right,而不是返回keyi,或者你可以更新keyi然后返回
	keyi = left;
	return keyi;

}

2. Digging method (recommended)

Digging method: as the name suggests, the process of digging and filling the pit.

insert image description here

Ideas:

First select a key (choose the middle of the three numbers) . After selecting and saving, left will leave a hole. (In the actual data, the value corresponding to the left subscript still exists, and the original intention of filling the hole here is to cover), similar to the Hoare method. The left is the key, the right goes first, finds the one smaller than the key, and places it in the left hole after finding it. A new pit is formed, and this cycle continues until left and right meet.

Result: It is guaranteed that the left side of the key is smaller than the key, and the right side is larger than the key. Then recurse the left and right subintervals of the key.

Implementation code:

//挖坑法的难点在于key只是一个临时变量,hole是坑的下标,变量和下标易于混淆
//右边找小左边找大的过程中,可能出现右边找小找不到最后找出数组范围了,所以要限制left<right
//同理左边找大也是
//挖坑法
void QuickSort2(SortDataType* a, int left, int right)
{
    
    
	//递归结束条件
	if (left >= right)
		return;

	int begin = left, end = right;
	// 三数取中
	int midi = GetMidNumi(a, left, right);
	//把key值挪到left位置
	if (midi != left)
		Swap(&a[midi], &a[left]);

	//这个key只是一个临时变量
	
	int key = a[left];
	
	int hole = left; // 坑位
	
	while (left < right)
	{
    
    
		// 右边找小
		while (left < right && a[right] >= key)
			right--;

		//找到了,填坑
		a[hole] = a[right];
		hole = right;
		
		// 左边找大
		while (left < right && a[left] <= key)
			left++;

		//找到了,填坑
		a[hole] = a[left];
		hole = left;

	}

	//把key放到最后的坑里面
	a[hole] = key;

	if (hole - 1 - begin <= 10)
	{
    
    
		InsertSort(a + begin, hole - 1 - begin + 1);
	}
	else
	{
    
    
		QuickSort2(a, begin, hole - 1);
	}
	if (end - hole + 1 <= 10)
	{
    
    
		InsertSort(a + hole + 1, end - hole + 1 +1);
	}
	else
	{
    
    
		QuickSort2(a, hole + 1, end);
	}
	
}

3. Back and forth pointer method (recommended)

The forward and backward pointer method is relatively the best implementation, and there is no need to consider so many methods in detail.
insert image description here
Idea: First, given two subscripts prev and cur (it is said to be the forward and backward pointer method, it is for the convenience of understanding, in fact, the pointer method does not necessarily require pointers), prev stores the subscript of the left position, and cur = prev The subscript of the next position.
The key also uses the method of taking the middle of three numbers to find the key, and then put it in the left position.

Secondly: cur goes first, then find the one smaller than key;
1. If it is smaller than key, first ++prev, then exchange the corresponding values ​​of cur and prev, and finally ++cur.
2. If it is larger than the key, directly ++cur.

This loops continuously until cur is greater than right.

Finally, exchange the key corresponding to keyi with the value corresponding to prev. (emphasis)

It is realized that the ones smaller than the key are on the left, and the ones larger than the key are on the right.

You will find that prev and cur are like a wheel, constantly turning numbers smaller than key to the left, and numbers larger than key to the right.

Implementation code

void QuickSort3(SortDataType* a, int left, int right)
{
    
    
	//递归结束条件
	if (left >= right)
		return;

	int begin = left, end = right;

	//三数取中法求key
	int midi = GetMidNumi(a, left, right);
	if(midi!=left)
		Swap(&a[midi], &a[left]);

	int keyi = left;

	int prev = left;
	int cur = prev + 1;

	while (cur <= right)
	{
    
    
		//也可以这样写
		if (a[cur] < a[keyi] && ++prev != cur)
			Swap(&a[prev], &a[cur]);
		
		++cur;
		
		//下面这样写逻辑比较清晰,好懂
		//if (a[cur] < a[keyi])
		//{
    
    
		//	++prev;
		//	//自己跟自己没有交换的必要,浪费时间
		//	if(cur != prev)
		//		Swap(&a[prev], &a[cur]);
		//	++cur;

		//}
		//else
		//{
    
    
		//	++cur;
		//}

	}

	//切记不能交换
	//Swap(&a[prev], &key);
	//key只是一个临时变量,交换了它,跟没交换一样,因为跟临时变量交换与数组的交换无关
	
	Swap(&a[prev], &a[keyi]);

	keyi = prev;

	if (keyi - 1 - begin + 1 <= 10)
	{
    
    
		InsertSort(a, keyi - 1 - begin + 1);
	}
	else
	{
    
    
		QuickSort3(a, begin, keyi - 1);
	}

	if (end - (keyi + 1) + 1 <= 10)
	{
    
    
		InsertSort(a, end - (keyi + 1) + 1);
	}
	else
	{
    
    
		QuickSort3(a, keyi + 1, end);
	}

}

quick sort non recursive method

Idea: For the recursive method, a stack frame needs to be established for each recursive left and right sub-interval, so our non-recursive method can simulate a recursive stack.

Build a stack.

First put the left and right subscripts into the stack. Since the characteristic of the stack is last-in-first-out, it is necessary to enter right first and then enter left.
(If you don’t want to think so much, you can use a structure to store the subscripts of left and right. (You can go on and try this))

After taking out the left and right elements at the top of the stack, use any of the above three sorting methods to perform the first round of sorting. After the first round of sorting is completed, the following intervals are obtained:

[left, keyi-1] keyi [keyi+1, right]

Similar to a stack, the left sub-interval is recursed first, so it needs to be pushed into the right sub-range of the stack first, and then into the left sub-range of the stack.
(The stack is a last-in-first-out feature)

The process of continuously pushing into and out of the stack realizes the recursion of quick sorting.
insert image description here

栈代码
void StackInit(ST* ps)//初始化
{
    
    
	assert(ps!=NULL);
	ps->a = NULL;
	ps->top = ps->capacity = 0;
	//ps->top可以初始化成-1,此时先++,再赋值
	//此时指向的就是栈顶元素
}

void StackDestroy(ST* ps)
{
    
    
	assert(ps);
	free(ps->a);
	ps->a = NULL;
	ps->top = ps->capacity = 0;
}

void CheckCapacity(ST**ps)//检查容量
{
    
    
	assert(ps != NULL);
	if ((*ps)->top == (*ps)->capacity)
	{
    
    
		STDataType newcapacity = (*ps)->capacity == 0 ? 4 : (*ps)->capacity * 2;
		STDataType* tmp = (STDataType*)realloc((*ps)->a,(sizeof(STDataType)*newcapacity));//申请的空间是存放STDataType的
		//不是用来存放结构体的
		//如果第一个参数是一个NULL,realloc的作用就跟malloc一样,所以可以传NULL
		assert(tmp != NULL);
		(*ps)->a = tmp;// 把新地址给ps->a
		(*ps)->capacity = newcapacity;
	}
}

void StackPush(ST* ps, STDataType x)//插入元素
{
    
    
	assert(ps);
	CheckCapacity(&ps);//这里如果传参传的是ps,相当于传值调用,在CheckCapacity函数内部申请的空间就无法返回来了。
	ps->a[ps->top] = x; // 先赋值,再++,因为ps->top初始化是0,就是指向栈顶元素的下一个。
	ps->top++;
}

void StackPop(ST* ps)//删除栈顶数据
{
    
    
	assert(ps);
	assert(!StackEmpty(ps));

	ps->top--;
}

STDataType StackTop(ST* ps)//取栈顶元素
{
    
    
	assert(ps);
	assert(!StackEmpty(ps)); //感叹号表达式让语句的逻辑相反

	return ps->a[ps->top - 1];
}

int StackSize(ST* ps)//计算栈有多少个数据
{
    
    
	assert(ps);
	assert(!StackEmpty(ps));
	return ps->top;
}

bool StackEmpty(ST* ps)//判断栈是否为空
{
    
    
	assert(ps);
	return ps->top == 0;
}


//快排非递归写法:模拟栈实现非递归
//思路:先求出一个keyi出来,然后分成左右两个子区间,分别入栈,入栈先入右区间再入左区间
//

int PartSort3(SortDataType* a, int left, int right)
{
    
    
	//三数取中法求key
	int midi = GetMidNumi(a, left, right);
	if (midi != left)
		Swap(&a[midi], &a[left]);

	int keyi = left;

	int prev = left;
	int cur = prev + 1;

	//1.cur指针指向的位置如果小于key,先++prev,然后Swap(cur,prev),然后再++cur
//2.cur指针指向的位置如果大于key,直接++cur
	while (cur <= right)
	{
    
    
		//也可以这样写
		if (a[cur] < a[keyi] && ++prev != cur)
			Swap(&a[prev], &a[cur]);

		++cur;

		//下面这样写逻辑比较清晰,好懂
		//if (a[cur] < a[keyi])
		//{
    
    
		//	++prev;
		//	//自己跟自己没有交换的必要,浪费时间
		//	if(cur != prev)
		//		Swap(&a[prev], &a[cur]);
		//	++cur;

		//}
		//else
		//{
    
    
		//	++cur;
		//}

	}

	//切记不能交换
	//Swap(&a[prev], &key);
	//key只是一个临时变量,交换了它,跟没交换一样,因为跟临时变量交换与数组的交换无关
	Swap(&a[prev], &a[keyi]);
	//更新keyi的下标
	keyi = prev;

	return keyi;
}

void QuickSortNonR(SortDataType* a, int left, int right)
{
    
    
	ST st;
	StackInit(&st);

	//入的时候是先右后左
	StackPush(&st, right);
	StackPush(&st, left);

	while (!StackEmpty(&st))
	{
    
    
		//出的时候是先左后右
		int begin = StackTop(&st);
		StackPop(&st);
		int end = StackTop(&st);
		StackPop(&st);
		
		//划分区间,这里的PartSort3其实就是第三种前后指针法分出来的
		int keyi = PartSort3(a, begin, end);
	
		//只有一个数据的时候就不用入栈了
		if (keyi + 1 < end)
		{
    
    
			StackPush(&st, end);
			StackPush(&st, keyi + 1);
		}
		if (begin < keyi - 1)
		{
    
    
			StackPush(&st, keyi - 1);
			StackPush(&st, begin);
		}
	}
	
	//当栈空了就排完了
	StackDestroy(&st);
}

Quick sort complexity

Every time the quick sort is sorted, it needs to traverse n data, and the recursion depth is logN. In the worst case, the recursion depth is N, so the worst case time complexity is O(N^2).

But quick sorting can be optimized, and the maximum depth of recursion after optimization is N, so the time complexity of quick sorting is O(NlogN)

Space complexity O(LogN)~O(N) (where O(N) is the worst case)

Stability: Unstable


Seven, merge sort

Merge sort is to divide a range into several sub-problems, and the sub-problems are divided into sub-problems again. This is a divide-and-conquer process; when there is only one number in the final sub-problem, you can start merging. The merging process is the process of comparing two sub-problems. After the merging is completed, copy the merged new data to the original data.

Recursive implementation of merge sort

Recursively implement merge sorting, which is to divide and conquer a large array, continue to divide and conquer into a small array, and finally divide and conquer until there is only one number, and then merge
each number into 2 numbers, and then merge the two numbers of the two sets of arrays into 4 numbers, and so on until the last large array is merged.
insert image description here
Step 1: Use the left and right subscripts to find the subscript in the middle of the array, and use the subscript as the boundary to divide the data into two groups.

insert image description here

The second step: Repeat the process of the first step, but first divide the left group completely, and then divide the right group, which is the idea of ​​the preorder traversal of the binary tree.

insert image description here
The third big step: continue to divide and conquer until there is only one element left in the decomposition, and judge that there is only one element, that is, when left>=right .
insert image description here
Step 4: Two-by-two comparison, four-four comparison and merger

Note: After each merge, you need to copy the data of tmp back to the original array.
insert image description here
The last step: Merge the two sub-intervals into a total interval:
Note: After each merge, you need to copy the data of tmp back to the original array.
insert image description here

Implementation code:


void _MergeSort(SortDataType* a, int left, int right, SortDataType* tmp)
{
    
    
	if (left >= right)
	{
    
    
		return;
	}

	int mid = (left + right) >> 1; // 右移一位相当于/2

	int begin1 = left, end1 = mid;
	int begin2 = mid + 1, end2 = right;

	int index = left; // tmp的下标,不能从0开始,因为有些归并是不会从0开始的。
	_MergeSort(a, begin1, end1, tmp);
	_MergeSort(a, begin2, end2, tmp);


	while (begin1 <= end1 && begin2 <= end2)
	{
    
    
		if (a[begin1] <= a[begin2])
		{
    
    
			tmp[index++] = a[begin1++];
		}
		else
		{
    
    
			tmp[index++] = a[begin2++];
		}
	}

	//到这里不知道是谁先结束的,所以都要判断

	while (begin1 <= end1)
	{
    
    
		tmp[index++] = a[begin1++];
	}

	while (begin2 <= end2)
	{
    
    
		tmp[index++] = a[begin2++];
	}

	//拷贝回去

	//for (int i = left; i <= right; ++i)
	//{
    
    
	//	a[i] = tmp[i];
	//}
	// source, destination , size

	//每次归并完都拷贝一次
	memcpy(a + left, tmp + left, sizeof(SortDataType) * (right - left + 1));
}

void MergeSort(SortDataType* a, int n)
{
    
    
	SortDataType* tmp = (SortDataType*)malloc(sizeof(SortDataType) * n);

	_MergeSort(a, 0, n - 1, tmp);

}

Non-recursive implementation of merge sort

For the recursive implementation of merge sorting, the big problem is divided into small problems, which are divided from top to bottom.

For non-recursion, small problems are merged into big problems, and they are divided from bottom to top.

Take the above numbers as an example:
the general idea is as follows:
insert image description here

Non-recursive difficulty 1:

But faced with the first problem:
how to choose to compare from one to two

Choose to use gap
gap to indicate the number of data in each group during each merge.
Initially, gap = 1, which means that the first time is a one-to-one comparison. After each round of merging, gap*2, the next round of pairwise comparison, and so on.

Non-recursive difficulty 2:

However, the second difficulty in understanding lies in: how to choose begin1 and end1, begin2 and end2!

insert image description here
First of all, i jumps 2×gap each time, because at the beginning it is compared one by one, after comparing once, it is equivalent to comparing two data,
and the meaning of gap is the number of data in each group when merging each time!
Then you need to skip the length of 2 × gap.

Followed by begin1 and end1, begin1 = i is easy to understand;
end1 = i+gap-1 is like this: i+gap means the next gap data starting from begin1, since it is data, then -1 is the subscript.
And begin2 = i+gap is also easy to understand, the one behind end1 is begin2;
end2 = i+2*gap-1, starting from position i, jumping 2×gap data to reach the last data to be compared, -1 is the subscript of the last data.

Non-recursive difficulty 3:

Difficulty 3 lies in how to deal with the boundary

Let me first talk about how to copy a string of numbers back to the original array after merging:
1. One-time copy method, also called stud copy method (not recommended)
2. Copy once every time merged (recommended)

1. Stud copy method: After merging all the data, copy it back to the original array at one time, which is simple and rude.
insert image description here

2. Copy once every merge: After merging into two ordered data one by one, copy the original array.
insert image description here

There are three situations for the boundary here:
the first one: end1 is out of bounds . In the following situation, when merging into four or four comparisons, begin1 is just at the end position, so end1 is out of bounds at the beginning: There are
insert image description here
two processing methods here, but the different methods are determined according to how to copy the merged data back to the original array.

If it is the stud copy method, no matter what the situation is, it must be corrected.

Let’s talk about the situation where end1 is out of bounds. If the original array is copied at one time using the stud copy method, end1 must be corrected to
end1 = n-1, and begin2 and end2 must be corrected to a non-existent interval , for example:
begin2 = n, end2 = n-1. The purpose of doing this is to prevent the interval of begin2 and end2 from entering the loop, and to prevent data from being copied to outside the bounds.
as follows:
insert image description here

Of course, the corrections of begin2 and end2 are not unique, as long as they are corrected to a non-existent interval.

The second type: begin2 out of bounds

Possible occurrences of begin2 out of bounds are as follows:

insert image description here

The second case is handled in the same way as the first one. Under the premise of the stud copy method, it is necessary to modify the two data of begin2 and end2 to a non-existent interval to prevent them from being copied.
For example: begin2 = n, end2 = n-1.
as follows:
insert image description here

The third type: end2 out of bounds

insert image description here
At this point, you only need to correct end2 to the n-1 position,
as follows:
insert image description here

Note: it is impossible for begin1 to cross the boundary, it is impossible for begin1 to cross the boundary, and it is impossible for begin1 to cross the boundary, because if begin1 crosses the boundary, then all the following end1, begin2, and end2 will cross the boundary, so why not merge them!

Implementation code

The stud writing code is as follows:

void MergeSortNonR(SortDataType* a, int n)
{
    
    
	SortDataType* tmp = (SortDataType*)malloc(sizeof(SortDataType) * n);
	assert(tmp);
	int gap = 1;
	//gap 是归并过程中,每组数据的个数

	while (gap < n)
	{
    
    
		for (int i = 0; i < n; i+=2*gap)
		{
    
    
			//理解难点
			//当gap为2时,i每次都会走2步,相当于跳过一个归并组
			int begin1 = i, end1 = i + gap - 1;
			int begin2 = i + gap, end2 = i + 2 * gap - 1;
			int index = i;

			//梭哈修正写法,但是不推荐
			if (end1 >= n)
			{
    
    
				end1 = n - 1;
				begin2 = n;
				end2 = n - 1;
			}

			else if (begin2 >= n)
			{
    
    
				begin2 = n;
				end2 = n - 1;
			}

			else if (end2 >= n)
			{
    
    
				end2 = n - 1;
			}

			while (begin1 <= end1 && begin2 <= end2)
			{
    
    
				if (a[begin1] <= a[begin2])
				{
    
    
					tmp[index++] = a[begin1++];
				}
				else
				{
    
    
					tmp[index++] = a[begin2++];
				}
			}

			//到这里不知道是谁先结束的,所以都要判断

			while (begin1 <= end1)
			{
    
    
				tmp[index++] = a[begin1++];
			}

			while (begin2 <= end2)
			{
    
    
				tmp[index++] = a[begin2++];
			}
		}
		//不推荐
		//法1:梭哈法:一次性整体拷贝 
		memcpy(a, tmp, sizeof(SortDataType) * n);

		gap *= 2;
	}

	free(tmp);
	tmp = NULL;	
}

Second, if it is a copy method that copies the data back to the original array every time it is merged, the processing situation is different.

In the case of merging and copying once:

1.end1 is out of bounds

insert image description here
Because it is merged and copied once, all the previous red data has been copied from the tmp temporary array back to the original array. As for the data of 3, there is no need to copy it to tmp, just let it stay in the original place.
So the solution is to directly break

2.begin2 is out of bounds
insert image description here
The situation is the same as that of end1 out of bounds, because it is merged and copied once, the previous red data has been copied from the tmp temporary array back to the original array, as for the subsequent data, there is no need to copy it to tmp, just let it stay in the original place.
So just break

3. end2 out of bounds

insert image description here
Similarly, if end2 is out of bounds, you need to modify end2 to n-1 position to ensure that begin1 and begin2 are comparable.
So fix: end2 = n-1

The non-recursive writing method of copying step by step is as follows:

void MergeSortNonR(SortDataType* a, int n)
{
    
    
	SortDataType* tmp = (SortDataType*)malloc(sizeof(SortDataType) * n);
	assert(tmp);
	int gap = 1;
	//gap 是归并过程中,每组数据的个数

	while (gap < n)
	{
    
    
		for (int i = 0; i < n; i+=2*gap)
		{
    
    
			//理解难点
			//当gap为2时,i每次都会走2步,相当于跳过一个归并组
			int begin1 = i, end1 = i + gap - 1;
			int begin2 = i + gap, end2 = i + 2 * gap - 1;
			int index = i;

			//法2:三种情况,但是前两种情况可以使用相同的方法解决
			
			//如果end1越界了,那就不归并了,
			//如果begin2越界了,那也不归并了
			if (end1 >= n || begin2 >= n)
			{
    
    
				break;
			}

			//如果end2越界了,让end2修正到n-1位置
			if (end2 >= n)
			{
    
    
				//修正
				end2 = n - 1;
			}

			while (begin1 <= end1 && begin2 <= end2)
			{
    
    
				if (a[begin1] <= a[begin2])
				{
    
    
					tmp[index++] = a[begin1++];
				}
				else
				{
    
    
					tmp[index++] = a[begin2++];
				}
			}

			//到这里不知道是谁先结束的,所以都要判断

			while (begin1 <= end1)
			{
    
    
				tmp[index++] = a[begin1++];
			}

			while (begin2 <= end2)
			{
    
    
				tmp[index++] = a[begin2++];
			}

			// destination  source  size

			//推荐
			//法2:归并一点,拷贝一点,需要画图理解
			//如果是end1 或begin2大于等于n的时候越界
			//不同于梭哈一次性拷贝,梭哈拷贝需要把所有的拷贝进tmp,必须再拷回去,虽然做了无用功,但是是必须做的,这也是比较挫的地方
			//这个法2没做无用功,既然end1或者begin2越界了,那就干脆不拷贝了
			memcpy(a + i, tmp + i, sizeof(SortDataType) * (end2 - i+1));
		}
		gap *= 2;
	}

	free(tmp);
	tmp = NULL;	
}

Note that in the two writing methods, the copied code is placed in different positions of the while loop!

Merge Sort Complexity

Merge sort has stability, that is, for two or more identical data, the relative position of the same data will not be changed before and after merge sort, which is stability.

Merge sort is not sensitive to the order of the data.

The time complexity of merging and sorting is O(NlogN). Starting from merging one by one, all data needs to be traversed for each merging, but because it is a two-way merge, the "height" of n data is logN, that is, without a layer, it is necessary to traverse all the data once, so the time complexity is O(NlogN ) .

Space complexity: O(N), because a temporary array needs to be opened to save the merged values, so the space complexity is O(N).

8. Counting and sorting

A counting sort is a sorting that counts the number of occurrences of each data item.

The concepts of absolute mapping and relative mapping are introduced here. Let’s talk about absolute mapping first:
Absolute mapping is: for a set of data, first traverse to find the maximum value of the set of data, and open up a space with the maximum value + 1, which is used to traverse and count each data.

insert image description here
After obtaining the number of occurrences of each data, the data can be sorted by traversing the Count array in order.

But there is a problem here, if this set of data is:

1 0 0 1 1 0 2 99999

Then the space we open up is the maximum value of this set of data + 1, that is, we need to open up 10W blocks of space!
That's a staggering number, and doing this consumes a lot of space, there is a way around this:

Relative mapping can be solved.

Relative mapping is to traverse the data to find out max and min. We only need to open up the space of max-min+1 !

If we need to sort data like this

10 20 11 14 15 11 17 19 13

Step 1: first traverse to find out max and min, at this time max = 30, min = 10
, then we only need to open up the space of max - min +1.
This space contains all values ​​between the maximum and minimum values.

Relative mapping processing method:

insert image description here

Therefore, when sorting relatively concentrated data, counting sorting efficiency is the highest, even higher than quick sorting.

Implementation code:

void CountSort(SortDataType* a, int n)
{
    
    
	int min = a[0];
	int max = a[0];
//找max和min
	for (int i = 1; i < n; ++i)
	{
    
    
		if (min > a[i])
		{
    
    
			min = a[i];
		}
		if (max < a[i])
		{
    
    
			max = a[i];
		}
	}

	//calloc(num,size) ,自动初始化为0

	int range = max - min+1;
	//这里必须是max - min +1,假如max = 10,min = 0,max-min = 10,但是实际上有11个数据。
	//左闭,右闭区间,需要+1

	SortDataType* Count = (SortDataType*)calloc(range, sizeof(SortDataType));

	if (Count == NULL)
	{
    
    
		perror("malloc fail\n");
		exit(-1);
	}

	//计数
	for (int i = 0; i < n; ++i)
	{
    
    
		Count[a[i] - min]++;
	}

 //拷贝回原数组
	int j = 0;
	for (int i = 0; i < range; i++)
	{
    
    
		while (Count[i]--)
		{
    
    
			a[j++] = i+min;
		}
	}

}


Counting sorting also has disadvantages: counting sorting is more suitable for those sorted data that are relatively concentrated, and counting sorting cannot sort floating-point numbers, and types such as structures can only sort integers.

counting sort complexity

Time complexity: traverse the array to find max and min,
and then traverse the array for O(N) to count,
put the counted data into the original array in order for O(N),
so the time complexity of counting and sorting is O(N)

Open up max-min+1 spaces and
the space complexity is O(max-min+1)

Summary of the eight rankings:

Stability is: after a set of data is sorted in a certain way, the relative position of the same elements does not change, then the sorting is stable. For example: a set of data is: 2 1 5 9 3 5, if the relative position of the two 5s does not change, it is stable.
insert image description here
To judge whether the sorting algorithm is stable, you need to recall the idea of ​​the algorithm and how the algorithm achieves the sorting process.

Bubble sort: stable.
Reason: Compare two by two and then exchange. For the same data, if you do not exchange, you can achieve stability.

Direct Selection Sort: Unstable.
Reason: If a set of data is: 2 2 1 3 1, in the first round, select the smallest and largest, and then exchange with the values ​​corresponding to the left and right subscripts respectively, then the relative positions of the first 2 and the second 2 will change after the exchange.

Insertion sort: stable.
Reason: For the same number, just insert directly behind it. It can be stable.

Hill sort: unstable.
Reason: During pre-sorting, the same data may be divided into different groups, and the relative position of the same data may change after pre-sorting is completed.

Heap sort: unstable.
Reason: The process of building a heap is unstable. If the relative order of the same numbers is not changed during heap building, then in the process of heap sorting, if a large root heap is built, then after the data at the root position is exchanged with the last data, the elements at the top of the heap are relatively small and need to be adjusted downward. During the adjustment process, the relative position of the same data will also change.

Merge sort: stable.
Reason: As long as the same elements are not compared.

Quicksort: Unstable.
Reason: If a set of data is distributed in such a way:
insert image description here
after a row is completed, the last step needs to exchange the position where the left and a certain keyi in the middle should be, which causes a problem of different relative positions.

Three: Summary of Key Algorithms

By learning and mastering the main methods of algorithm design, the ability to correctly analyze the time and space complexity of algorithms, the ability to select appropriate data structures and design clear, correct and effective algorithms for specific application problems, lay a solid theoretical foundation for independent algorithm design and algorithm complexity analysis.

Guess you like

Origin blog.csdn.net/w2915w/article/details/131774806