Sorting Algorithms: Merge Sort (Recursive and Non-Recursive)

Friends, guys, we meet again. In this issue, I will explain to you some relevant knowledge points about sorting algorithms . If you have some inspiration after reading it, then please leave your three links and I wish you all the best. It’s done!

C language column: C language: from entry to proficiency

Data Structure Column: Data Structure

Personal homepage: stackY、

Table of contents

1. Merge sort

1.1 Recursive version

Code demo:

1.2 Non-recursive version 

Code demo:

Test order:

Corrected code 1:

Test order:

Corrected code 2:

1.3 Optimization of the recursive version

Code demo:

2. Merge sorting characteristics


1. Merge sort

Basic idea:
Merge sort (MERGE-SORT) is an effective sorting algorithm based on merge operations. This algorithm is a very typical application of the divide and conquer method. Merge the already ordered subsequences to obtain a completely ordered sequence; that is, first make each subsequence orderly, and then make the subsequence segments orderly. If two ordered lists are merged into one ordered list, it is called a two-way merge. Core steps of merge sort:

 

Merge sort is also divided into recursive and non-recursive versions. Let’s learn step by step: 

1.1 Recursive version

From the above figure, we can see that merge sorting is divided into two stages of decomposition and merging . The idea of ​​​​translating into code is to first divide the small interval, and then take out the smallest one in the interval and end it in another array. Insert, wait until the last set of data is sorted, and then copy the entire array sorted by tail insertion to the original array, thus completing the sorting of the entire array.

Then through this process, we can find that merge sort requires opening a separate array, so its

The space complexity is O(N) . In addition, merge sort first divides the small intervals and then sorts. Then it is similar to the post-order traversal logic in the binary tree. The entire data is first divided into two, so that the left interval and the right interval are ordered. , and then sort the left and right intervals, then the entire data will be in order, so we need to make the left and right intervals more orderly, and we also need to divide the right interval into two intervals, and make their left and right intervals more orderly. , and so on, until there is only one piece of data left in the interval and no further division is needed, then the values ​​with the smaller of the two intervals are taken and inserted into a separate array, and finally the whole is copied to the original array.

There are several issues to pay attention to when recursing:

1. During recursion, the starting and ending positions of the two intervals need to be saved for easy access.

2. When one interval has been tail-inserted, directly insert the data of the other interval one by one.

3. When the data in both intervals are tail-inserted into the tmp array, the data in the tmp array needs to be copied to the original array again.

4. When copying to the original array, you need to pay attention to which interval is copied at the end of the insertion. 

Code demo:

void _MergerSort(int* a, int begin, int end, int* tmp)
{
	//递归截止条件
	if (begin == end)
		return;

	//划分区间
	int mid = (begin + end) / 2;
	//[begin,mid] [mid+1,end]

	//递归左右区间
	_MergerSort(a, begin, mid, tmp);
	_MergerSort(a, mid + 1, end, tmp);

	//将区间保存
	int begin1 = begin, end1 = mid;
	int begin2 = mid + 1, end2 = end;
	int i = begin;

	//取两个区间小的值尾插
	//一个区间尾插完毕另一个区间直接尾插即可
	while (begin1 <= end1 && begin2 <= end2)
	{
		if (a[begin1] < a[begin2])
		{
			tmp[i++] = a[begin1++];
		}
		else
		{
			tmp[i++] = a[begin2++];
		}
	}

	//再将剩余数据依次尾插
	//哪个区间还没有尾插就尾插哪一个
	while (begin1 <= end1)
	{
		tmp[i++] = a[begin1++];
	}
	while (begin2 <= end2)
	{
		tmp[i++] = a[begin2++];
	}

	//再重新拷贝至原数组
	//尾插的哪个区间就将哪个区间拷贝
	memcpy(a + begin, tmp + begin, sizeof(int) * (end - begin + 1));
}

//归并排序
void MergerSort(int* a, int n)
{
	//先创建一个数组
	int* tmp = (int*)malloc(sizeof(int) * n);

	_MergerSort(a, 0, n - 1, tmp);

	//释放
	free(tmp);
}
Summary of features of merge sort:
1. The disadvantage of merging is that it requires O(N) space complexity. The thinking of merging and sorting is more about solving the external sorting problem on the disk.
2. Time complexity: O(N*logN)
3. Space complexity: O(N)
4. Stability: Stable

1.2 Non-recursive version 

Since recursive code may cause problems due to too deep recursion when there is too much data, we also need to write their corresponding non-recursive version of the sorting code: the non-recursive version of merge sort can be completed directly using a loop, but the pitfalls are very Many, let's investigate slowly next.

We can first form a group of data, then sort them in pairs, and then copy the overall sorted result to the original array. In this way, a sorting is completed, and then the sorted results of the two data are grouped into a group. , then sort them in pairs, and then copy the sorted data to the original array. In this way, the second sorting is completed. Then divide the data into four groups, sort them in pairs, and then add the sorted data Copy to the original array until the data in each group exceeds or equals the total length of the data and no further sorting is required.

First, we create an array, then set a gap for grouping, then record two intervals, compare the numbers in the two intervals, insert the small ones at the end, and then continue to insert the remaining data at the end, and then complete the pass Sort, then copy the sorted data to the original array, then set gap * 2, and continue to complete the remaining sorting until the gap of the divided group is greater than or equal to the total length of the data to complete all sorting:

Code demo:

//归并排序
//非递归
void MergerSortNonR(int* a, int n)
{
	//创建数组
	int* tmp = (int*)malloc(sizeof(int) * n);
	if (tmp == NULL)
	{
		perror("tmp");
		exit(-1);
	}
	//划分组数
	int gap = 1;

	while (gap < n)
	{
		int j = 0;
		for (int i = 0; i < n; i += 2 * gap)
		{
			//将区间保存
			int begin1 = i, end1 = i + gap - 1;
			int begin2 = i + gap, end2 = i + 2 * gap - 1;


			//取两个区间小的值尾插
			//一个区间尾插完毕另一个区间直接尾插即可
			while (begin1 <= end1 && begin2 <= end2)
			{
				if (a[begin1] < a[begin2])
				{
					tmp[j++] = a[begin1++];
				}
				else
				{
					tmp[j++] = a[begin2++];
				}
			}

			//再将剩余数据依次尾插
			//哪个区间还没有尾插就尾插哪一个
			while (begin1 <= end1)
			{
				tmp[j++] = a[begin1++];
			}
			while (begin2 <= end2)
			{
				tmp[j++] = a[begin2++];
			}
		}

		//将数据重新拷贝至原数组
		memcpy(a, tmp, sizeof(int) * n);
		//更新gap
		gap *= 2;
	}

	//释放
	free(tmp);
}
Test order:
void PrintArry(int* a, int n)
{
	for (int i = 0; i < n; i++)
	{
		printf("%d ", a[i]);
	}
	printf("\n");
}

void TestMergerSortNonR()
{
	int a[] = { 10,6,7,1,3,9,4,2 };
	PrintArry(a, sizeof(a) / sizeof(int));
	MergerSortNonR(a, sizeof(a) / sizeof(int));
	PrintArry(a, sizeof(a) / sizeof(int));
}


int main()
{
	TestMergerSortNonR();
	return 0;
}

You can see that the sorting is completed, and it is done well, so let’s test a few more sets of data:

Here we are using 8 data, what if we use 9 or 10 data?

void PrintArry(int* a, int n)
{
	for (int i = 0; i < n; i++)
	{
		printf("%d ", a[i]);
	}
	printf("\n");
}

void TestMergerSortNonR()
{
	int a[] = { 10,6,7,1,3,9,4,2,8,5 };
	PrintArry(a, sizeof(a) / sizeof(int));
	MergerSortNonR(a, sizeof(a) / sizeof(int));
	PrintArry(a, sizeof(a) / sizeof(int));
}


int main()
{
	TestMergerSortNonR();
	return 0;
}

You can see that there are errors in the data, so why? Let’s take a look together:

As the gap increases by 2 times, the problem of data interval out-of-bounds will occur, because when the data is 10, the gap will increase to 8, so an out-of-bounds will occur when accessing the data. We can also observe this out-of-bounds Phenomenon:

You can print out the data access interval:

So how to solve this problem?

1. You cannot first sort the data once and then copy the entire data. You need to sort a group and copy a group.

2. Secondly, you can break directly when the first and second types of boundary crossing occur.

3. When the third type of boundary violation occurs, the boundary can be revised.

Corrected code 1:

//归并排序
//非递归
void MergerSortNonR(int* a, int n)
{
	//创建数组
	int* tmp = (int*)malloc(sizeof(int) * n);
	if (tmp == NULL)
	{
		perror("tmp");
		exit(-1);
	}
	//划分组数
	int gap = 1;

	while (gap < n)
	{
		int j = 0;
		for (int i = 0; i < n; i += 2 * gap)
		{
			//将区间保存
			int begin1 = i, end1 = i + gap - 1;
			int begin2 = i + gap, end2 = i + 2 * gap - 1;
			
			//end1和begin2越界直接跳出
			if (end1 >= n || begin2 >= n)
			{
				break;
			}

			//end2越界可以进行修正
			if (end2 >= n)
			{
				end2 = n - 1;
			}

			//取两个区间小的值尾插
			//一个区间尾插完毕另一个区间直接尾插即可
			while (begin1 <= end1 && begin2 <= end2)
			{
				if (a[begin1] < a[begin2])
				{
					tmp[j++] = a[begin1++];
				}
				else
				{
					tmp[j++] = a[begin2++];
				}
			}

			//再将剩余数据依次尾插
			//哪个区间还没有尾插就尾插哪一个
			while (begin1 <= end1)
			{
				tmp[j++] = a[begin1++];
			}
			while (begin2 <= end2)
			{
				tmp[j++] = a[begin2++];
			}

			//归并一组,拷贝一组
			memcpy(a + i, tmp + i, sizeof(int) * (end2 - i + 1));
		}
		gap *= 2;
	}

	//释放
	free(tmp);
}
Test order:
void PrintArry(int* a, int n)
{
	for (int i = 0; i < n; i++)
	{
		printf("%d ", a[i]);
	}
	printf("\n");
}

void TestMergerSortNonR()
{
	int a[] = { 10,6,7,1,3,9,4,2,8,5 };
	PrintArry(a, sizeof(a) / sizeof(int));
	MergerSortNonR(a, sizeof(a) / sizeof(int));
	PrintArry(a, sizeof(a) / sizeof(int));
	printf("\n");

	int a2[] = { 10,6,7,1,3,9,4,2 };
	PrintArry(a2, sizeof(a2) / sizeof(int));
	MergerSortNonR(a2, sizeof(a2) / sizeof(int));
	PrintArry(a2, sizeof(a2) / sizeof(int));
	printf("\n");

	int a3[] = { 10,6,7,1,3,9,4,2,8 };
	PrintArry(a3, sizeof(a3) / sizeof(int));
	MergerSortNonR(a3, sizeof(a3) / sizeof(int));
	PrintArry(a3, sizeof(a3) / sizeof(int));
	printf("\n");

}


int main()
{
	TestMergerSortNonR();
	return 0;
}

After the correction, the out-of-bounds problem has been completely solved. The improvement method is to merge one group and copy one group.

We can also correct all out-of-bounds intervals, and then copy the entire data once sorted.

Corrected code 2:

The purpose of improvement can also be achieved by correcting all out-of-bound intervals. We will use the logic of merging data as the basis, and then modify the interval. Therefore, we need to change the out-of-bound interval to a non-existent interval:

//归并排序
//非递归
//改进代码2:
void MergerSortNonR(int* a, int n)
{
	//创建数组
	int* tmp = (int*)malloc(sizeof(int) * n);
	//划分组数
	int gap = 1;

	while (gap < n)
	{
		int j = 0;
		for (int i = 0; i < n; i += 2 * gap)
		{
			//将区间保存
			int begin1 = i, end1 = i + gap - 1;
			int begin2 = i + gap, end2 = i + 2 * gap - 1;

			//将越界的区间修改为不存在的区间
			if (end1 >= n)
			{
				end1 = n - 1;

				//修改为不存在的区间
				begin2 = n;
				end2 = n - 1;
			}
			else if (begin2 >= n)
			{
				//不存在的区间
				begin2 = n;
				end2 = n - 1;
			}
			else if (end2 >= n)
			{
				end2 = n - 1;
			}
		
			//取两个区间小的值尾插
			//一个区间尾插完毕另一个区间直接尾插即可
			while (begin1 <= end1 && begin2 <= end2)
			{
				if (a[begin1] < a[begin2])
				{
					tmp[j++] = a[begin1++];
				}
				else
				{
					tmp[j++] = a[begin2++];
				}
			}

			//再将剩余数据依次尾插
			//哪个区间还没有尾插就尾插哪一个
			while (begin1 <= end1)
			{
				tmp[j++] = a[begin1++];
			}
			while (begin2 <= end2)
			{
				tmp[j++] = a[begin2++];
			}
		}

		//整体拷贝
		memcpy(a, tmp, sizeof(int) * n);
		gap *= 2;
	}

	//释放
	free(tmp);
}

 

1.3 Optimization of the recursive version

 When the amount of data is very large, the recursive version can be used to perform an optimization. When the recursion reaches a small interval, we can use insertion sort for optimization. This optimization is only limited to the recursive version. You need to pay attention to the insertion sort in the small interval. In the previous step, insertion sorting is used to sort the interval to which interval the recursion reaches, so you need to pay attention to the sorted interval when performing insertion sort.

Code demo:

void _MergerSort(int* a, int begin, int end, int* tmp)
{
	//递归截止条件
	if (begin == end)
		return;

	小区间优化
	//区间过小时直接使用插入排序,减少递归损耗
	if (end - begin + 1 < 10)
	{
		//         注意排序的区间
		InsertSort(a + begin, end - begin + 1); 
		return;
	}

	//划分区间
	int mid = (begin + end) / 2;
	//[begin,mid] [mid+1,end]

	//递归左右区间
	_MergerSort(a, begin, mid, tmp);
	_MergerSort(a, mid + 1, end, tmp);

	//将区间保存
	int begin1 = begin, end1 = mid;
	int begin2 = mid + 1, end2 = end;
	int i = begin;

	//取两个区间小的值尾插
	//一个区间尾插完毕另一个区间直接尾插即可
	while (begin1 <= end1 && begin2 <= end2)
	{
		if (a[begin1] < a[begin2])
		{
			tmp[i++] = a[begin1++];
		}
		else
		{
			tmp[i++] = a[begin2++];
		}
	}

	//再将剩余数据依次尾插
	//哪个区间还没有尾插就尾插哪一个
	while (begin1 <= end1)
	{
		tmp[i++] = a[begin1++];
	}
	while (begin2 <= end2)
	{
		tmp[i++] = a[begin2++];
	}

	//再重新拷贝至原数组
	//尾插的哪个区间就将哪个区间拷贝
	memcpy(a + begin, tmp + begin, sizeof(int) * (end - begin + 1));
}

//归并排序
void MergerSort(int* a, int n)
{
	//先创建一个数组
	int* tmp = (int*)malloc(sizeof(int) * n);

	_MergerSort(a, 0, n - 1, tmp);

	//释放
	free(tmp);
}

2. Merge sorting characteristics

1. The disadvantage of merging is that it requires O(N) space complexity. The thinking of merging and sorting is more about solving the external sorting problem on the disk.
2. Time complexity: O(N*logN)
3. Space complexity: O(N)
4. Stability: Stable

Friends and guys, good times are always short. Our sharing in this issue ends here. Don’t forget to leave your precious three pictures after reading it. Thank you all for your support! 

Guess you like

Origin blog.csdn.net/Yikefore/article/details/132869892