Data Structures and Algorithms—Merge Sort & Counting Sort

Table of contents

1. Merge sort

1. Main function

2. Recursive implementation

3. Optimize recursion 

4. Non-recursive implementation

5. Feature summary:

2. Counting sorting

1. Code:

2. Feature summary:

3. Summary of various sortings

Time & Space Complexity Summary 


1. Merge sort

Basic idea: Merge sort is an effective sorting algorithm based on the merge operation. The algorithm adopts the divide and conquer method< A very typical application of /span>. two-way merge. Merge the already ordered subsequences to obtain a completely ordered sequence; that is, first make each subsequence orderly, and then make the subsequence segments orderly. If two ordered lists are merged into one ordered list, it is called   

Merge logic diagram with even number of elements: 

 

 Merge animation of odd number of elements:

Here we talk about the even and odd numbers of elements, we will explain how to deal with them in the code.

Let’s start with an array of even numbers of elements :

 1. Main function

void MergeSort(int* a, int n)
{
	int* tmp = (int*)malloc(sizeof(int) * n);
	_MergeSort(a, 0, n - 1, tmp);
	free(tmp);
}

Our idea is to take out the array elements, sort them and insert them into the created tmp array. After everything is sorted, copy the tmp array to the original array.​ 

  • The main function accepts two parameters, an integer arraya and an integern, n represents the length of the array.
  • The MergeSort function first creates space for the tmp array.
  • Call the _MergeSort function to sort.
  • Free up space in tmp.

2. Recursive implementation

void _MergeSort(int* a, int begin, int end, int* tmp)
{
	if (begin == end)
		return;
	int mid = (begin + end) / 2;
	_MergeSort(a, begin, mid, tmp);
	_MergeSort(a, mid + 1, end, tmp);
	int begin1 = begin, end1 = mid;
	int begin2 = mid + 1, end2 = end;
	int i = begin;
	while (begin1 <= end1 && begin2 <= end2)
	{
		if (a[begin1] < a[begin2])
		{
			tmp[i++] = a[begin1++];
		}
		else
		{
			tmp[i++] = a[begin2++];
		}
	}
	while (begin1 <= end1)
	{
		tmp[i++] = a[begin1++];
	}
	while (begin2 <= end2)
	{
		tmp[i++] = a[begin2++];
	}
	memcpy(a + begin, tmp + begin, sizeof(int) * (end - begin + 1));
}

Let’s first look at how the function compares each part: 

  • First calculate the middle positionmid and recursively sort both parts of the array. This is the idea of ​​​​divide and conquer, decomposing large problems into small problems, using four pointers begin1 and begin2、end1 and end2, respectively Points to the start and end of two sections,
  • Then look at the comparison and insertion process of the three while loops. After each division, the two parts are compared from the beginning, and the smaller one is inserted into the tmp array. After all the numbers in a certain part are inserted into the array, the first while loop ends. Continue to check which array has remaining elements. The remaining elements are larger and inserted directly into the tmp array.

The following uses the array {1,6,7,10,2,3,4,9} for comparison and insertion:

Recursive idea:

Next, we need to sort and insert upwards from the smallest subsequence to the largest, so here we refer to the recursive idea to complete the sorting:

  • In the function _MergeSort, first determine whether begin is equal to end. If equal, the current subsequence has only one element, no sorting is required, and it is returned directly.
  • If they are not equal, the middle position mid is calculated, and then the _MergeSort function is called recursively to sort the left and right halves. After the sorting is completed, the left half and the right half are merged into an ordered array tmp.

if (begin == end)
		return;
	int mid = (begin + end) / 2;
	_MergeSort(a, begin, mid, tmp);
	_MergeSort(a, mid + 1, end, tmp);

 After each level of recursive sorting, use the memcpy function to copy the elements in the temporary array tmp back to the original array a.

memcpy(a + begin, tmp + begin, sizeof(int) * (end - begin + 1));

3. Optimize recursion 

First observe where the optimization has been done?

void _MergeSort(int* a, int begin, int end, int* tmp)
{
	if (begin == end)
		return;

	if (end - begin + 1 < 10)
	{
		InsertSort(a+begin, end - begin + 1);
		return;
	}

	int mid = (begin + end) / 2;
	// [begin, mid] [mid+1, end]
	_MergeSort(a, begin, mid, tmp);
	_MergeSort(a, mid+1, end, tmp);

	int begin1 = begin, end1 = mid;
	int begin2 = mid+1, end2 = end;
	int i = begin;
	while (begin1 <= end1 && begin2 <= end2)
	{
		if (a[begin1] <= a[begin2])
		{
			tmp[i++] = a[begin1++];
		}
		else
		{
			tmp[i++] = a[begin2++];
		}
	}

	while (begin1 <= end1)
	{
		tmp[i++] = a[begin1++];
	}

	while (begin2 <= end2)
	{
		tmp[i++] = a[begin2++];
	}

	memcpy(a+begin, tmp+begin, sizeof(int) * (end - begin + 1));
}

Through observation, we found that this recursive implementation has an additional "insertion sort" to achieve inter-cell optimization. Let's take a look at its uses: 

	// 小区间优化
	if (end - begin + 1 < 10)
	{
		InsertSort(a+begin, end - begin + 1);
		return;
	}

Let’s analyze it with the help of examples: 

If we have 10,000 data to be sorted, and each time we call down through recursion, the function will be called many times.

We can divide the data until the total number of data is small, call insertion sort for auxiliary processing, and no longer recursively process. Let’s explain one by one below:

When the total number of array elements is 10, three levels of downward recursive calls will be made.

 Through the learning of binary trees, we can use the knowledge of binary trees to understand how to improve efficiency:

 Using insertion sort to handle the situation where the total number of elements is 10 is to process the last three levels of recursion. The number of function calls can be known by calculating the number of nodes in the binary tree. As can be seen from the figure: the last three levels occupy 87.5% The number of calls, we solve these three layers to achieve recursive optimization, that is, insert sorting when the total number of elements is 10.

4. Non-recursive implementation

Control the merge through gapSub-array sizeAchieve non-recursive merge sort

 We can first initialize the gap to 1, and then multiply the gap by 2 each time until the gap is greater than or equal to the length of the array. In each loop, we divide the array into several sub-arrays of size gap, and then sort and merge each sub-array. In this way, we can implement merge sort through loops without using recursion.

There are some special cases at the tail in non-recursion, which have been dealt with in the correction part of the code. Now let us enter the explanation of the code: 

void MergeSortNonR(int* a, int n)
{
	int* tmp = (int*)malloc(sizeof(int) * n);

	// 1  2  4 ....
	int gap = 1;
	while (gap < n)
	{
		int j = 0;
		for (int i = 0; i < n; i += 2 * gap)
		{
			// 每组的合并数据
			int begin1 = i, end1 = i + gap - 1;
			int begin2 = i + gap, end2 = i + 2 * gap - 1;

            //修正
			if (end1 >= n || begin2 >= n)
			{
				break;
			}

			if (end2 >= n)
			{
				end2 = n - 1;
			}

			while (begin1 <= end1 && begin2 <= end2)
			{
				if (a[begin1] < a[begin2])
				{
					tmp[j++] = a[begin1++];
				}
				else
				{
					tmp[j++] = a[begin2++];
				}
			}

			while (begin1 <= end1)
			{
				tmp[j++] = a[begin1++];
			}

			while (begin2 <= end2)
			{
				tmp[j++] = a[begin2++];
			}
			// 归并一组,拷贝一组
			memcpy(a+i, tmp+i, sizeof(int)*(end2-i+1));
		}
		gap *= 2;
	}
	free(tmp);
}
  1. First, a temporary array tmp is defined in the code to store the intermediate results in merge sort. Then, through a while loop, the value of gap is continuously increased, each time the array is divided into several sub-arrays with a length of gap, and each sub-array is merged and sorted.
  2. AfterwardsSolution number combination world problem
  3. The first method: perform partial copy of each merged segment,
    1. In the first case, if the second part of a group is completely out of bounds and the first part is partially out of bounds, no sorting and merging will be performed, and the valid elements will be left to the appropriate gap group for merging and sorting ,That is, when end1 or begin2 exceeds the range of array a, the loop needs to exit;
    2. In the second case, if one part does not cross the boundary and the other part partially crosses the boundary, then the part that does not cross the boundary will be changed. That is, when end2 crosses the boundary but begin2 does not, end2 needs to be corrected to n-1.
  4. In each subarray, a for loop is used to divide the subarray into two parts, [begin1, end1] and [begin2, end2]. Then, through two while loops, the elements in these two parts are merged into the tmp array in order from small to large.
  5. After the three inner while loops are completed, the elements in the tmp arraycurrently merged are copied back to the original array a through the memcpy function. , to prevent data loss from overwriting the original array, because the tmp array still has data locations that do not meet the merge requirements.

In addition to the method of handling array out-of-bounds in the appeal explanation, there is a second method. 

Second type: global copy after each round of merging

if (end1 >= n)
{
    end1 = n - 1;
	// 不存在区间
	begin2 = n;
	end2 = n - 1;
}
else if (begin2 >= n)
{
	// 不存在区间
	begin2 = n;
	end2 = n - 1;
}
else if(end2 >= n)
{
	end2 = n - 1;
}
  1. end1 begin2 end2 is out of bounds, then the elements in the first part that are not out of bounds will be sorted and merged, that is, end1 is corrected to n-1. For the out-of-bounds elements in the second part, we do not need to deal with it, so assign begin2 to n and end2 to n- 1, this part is a non-existent interval, does not meet the sorting requirements, and will not be processed.
  2. begin2 and end2 are out of bounds. Just assign the value of the second part of out-of-bounds begin2 to n and end2 to n-1. In this way, this part is a non-existent interval and does not meet the sorting requirements and will not be processed.
  3. When end2 crosses the boundary but begin2 does not, end2 needs to be corrected to n-1.
  4. Finally, note that memcpy function is placed after the for loop ends. 
    memcpy(a, tmp, sizeof(int) * n);

5. Feature summary:

  1. The disadvantage of merging is that it requires O(N) space complexity. The thinking of merging and sorting is more about solving the problem of external sorting on the disk.Thinking: Sorting of large data in external memory usually requires dividing the data into multiple small blocks. Each small block can be sorted in memory, and then the sorted small blocks are written to external memory. Then, we can merge and sort multiple sorted small blocks to obtain the final ordered sequence.
  2. Time complexity: O(N*logN)
  3. Space complexity: O(N)
  4. Stability: stable 

2. Counting sorting

Counting sort is a common sorting algorithm also known as the pigeonhole principle. It is a modified application of the hash direct addressing method.

The operation steps of this algorithm are as follows:

  1. Count the number of occurrences of the same element and store it in a count array.
  2. According to the statistical results in the count array, the elements in the sequence are recycled into the original sequence.

The advantage of counting sort isfast and suitable for datawith a small range case. At the same time, this algorithmdoes not need to compare the sizes of elements, so it is more efficient than other sorting algorithms in some cases. If you need to sort a large amount of data, you can consider using other more efficient sorting algorithms.

1. Code:

void CountSort(int* a, int n)
{
	int min = a[0], max = a[0];
	for (int i = 0; i < n; i++)
	{
		if (a[i] < min)
		{
			min = a[i];
		}

		if (a[i] > max)
		{
			max = a[i];
		}
	}
	int range = max - min + 1;
	int* countA = (int*)malloc(sizeof(int) * range);
	memset(countA, 0, sizeof(int) * range);

	// 统计次数
	for (int i = 0; i < n; i++)
	{
		countA[a[i] - min]++;
	}

	// 排序
	int k = 0;
	for (int j = 0; j < range; j++)
	{
		while (countA[j]--)
		{
			a[k++] = j + min;
		}
	}
}
  • First, the code finds the minimum and maximum values ​​in the array by traversing the array, so that the size and range of the counting array can be determined later.
  • Next, the code dynamically allocates a count array countA of size range and initializes it to 0.
  • Then, the code traverses the original array a, counts the number of occurrences of each element, and stores it in the count array countA.
  • Finally, the code traverses the count array countA and stores the sorted elements back into the original array a.
    • Specifically, start traversing from the first element of the count array, and if the count value of the element is not 0, store its corresponding element value (i.e. j + min) to the k-th position of the original array a, And move k backward one bit. In this way, all elements can be stored back into the original array a in order from small to large, thereby completing the sorting.

It should be noted that the time complexity of this algorithm is O(n + range), where range represents the size of the count array, so when range is relatively large, the efficiency of this algorithm will become lower. In addition, this algorithm is only applicable when the range of element values ​​is relatively small. If the range of element values ​​is large, it is recommended to use other sorting algorithms.

2. Feature summary:

  1. Counting sorting is very efficient when the data range is concentrated, but its applicable scope and scenarios are limited.
  2. Time complexity: O(MAX(N,range))
  3. Space complexity: O(range)

3. Summary of various sortings

Stability: Assume that there are multiple records with the same keyword in the record column to be sorted. If sorted, the relative order of these records remains unchanged, that is, in the original sequence, r[i] = r[j] , and r[i] is before r[j], and in the sorted sequence, r[i] is still before r[j], then this sorting algorithm is called stable: otherwise it is called unstable.

 For example, in the above situation, if after the sorting is completed, it is guaranteed that 5 is in front of 5 , then the sorting is stable, otherwise it is unstable of.

Time & Space Complexity Summary 

 

Guess you like

Origin blog.csdn.net/m0_73800602/article/details/134346376