Data Structure Part 7: Sorting

foreword

  The so-called sorting is the operation of arranging a string of records in increasing or decreasing order according to the size of one or some of the keywords. Different sorting algorithms have their own advantages and disadvantages. This chapter will introduceInsertion sorting, Hill sorting, selection sorting, bubble sorting, heap sorting, quick sorting, merge sorting and counting sorting eight sorting algorithms

1. Insertion sort

1.1 Basic idea

  Insertion sorting is a relatively simple sorting algorithm. Its basic idea is: take the first data of the data to be sorted as the standard, and put thisa dataLook at it as a sorted data (at this time only look at this data, first record it as a), then compare it with the next data (recorded as b) and compare b with a, if a is larger than b, then a Overwrite backwards (ascending order) until it encounters a value smaller than b or stops at the end of the data.
  As shown in the figure:
insert image description here
  It is easier to understand the figure and code together.

1.2 Code implementation

void InsertSort(int* a, int n)
{
    
    
	assert(a);
	int i = 0;
	for (i = 0; i < n - 1; i++)
	{
    
    
		int end = i;
		int tmp = a[end + 1];

		while (end >= 0)
		{
    
    
			if (a[end] > tmp)
			{
    
    
				a[end + 1] = a[end];
				end--;
			}
			else
			{
    
    
				break;
			}
		}
		a[end + 1] = tmp;
	}
}

1.3 Feature Summary

  1. The closer the set of elements is to order, the more time-efficient the direct insertion sort algorithm is.
  2. Time complexity: O(N^2)
  3. Space complexity: O(1), it is a stable sorting algorithm
  4. Stability: Stable

2. Hill sort

2.1 Basic idea

  First select an integer, divide the data to be sorted into groups, and group all the data whose distance is gap into the same group, and sort the records in each group. Then take the job of repeating the grouping and sorting above. When gap = 1 is reached, all records are sorted within the same group.

insert image description here
  In fact, the data is first divided into many groups and sorted separately. Each sorting will move the smaller number to the front, so that the data is gradually approaching order. The process of Hill sorting is to perform multiple pre-sorting (group sorting process) until the data interval gap = 1 to complete all the sorting. In practice, the initial value of the gap can be half of the number of data, divide by 2 every time the gap is sorted, and the sorting ends when gap = 1.

2.2 Code implementation

void ShellSort(int* a, int n)
{
    
    
	int gap = 3;
	int i = 0;
	int j = 0;
	//while (gap > 1)
	//{
    
    
	//	gap /= 2; //当gap为1时就直接插入排序
	//	for (j = 0; j < gap; j++) //通过j变量控制i变量,依次排序每个元素
	//	{
    
    
	//		for (i = j; i < n - gap; i += gap) // 排一组
	//		{
    
    
	//			int end = i;
	//			int tmp = a[end + gap];
	//			while (end >= 0)  // 排一个
	//			{
    
    
	//				if (a[end] > tmp)
	//				{
    
    
	//					a[end + gap] = a[end];
	//					end -= gap;
	//				}
	//				else
	//				{
    
    
	//					break;
	//				}
	//			}
	//			a[end + gap] = tmp;
	//		}
	//	}
	//}

	while (gap > 1)
	{
    
    
		gap /= 2; //当gap为1时就直接插入排序
		for (i = 0; i < n - gap; i++) // 一个接一个排
		{
    
    
			int end = i;
			int tmp = a[end + gap];
			while (end >= 0)  // 排一个
			{
    
    
				if (a[end] > tmp)
				{
    
    
					a[end + gap] = a[end];
					end -= gap;
				}
				else
				{
    
    
					break;
				}
			}
			a[end + gap] = tmp;
		}
	}
}

  The data sorting of each group adopts direct insertion sorting. The main idea of ​​Hill sorting is that multiple pre-sortings are required to make the data gradually orderly.

2.3 Feature Summary

  1. Hill sort is an optimization of direct insertion sort.
  2. When gap > 1, it is pre-sorted, the purpose is to make the array closer to order. When gap == 1, the array is already close to order, so it will
    be very fast. In this way, the overall optimization effect can be achieved. After we implement it, we can compare performance tests.
  3. The time complexity of Hill sorting is not easy to calculate, because there are many ways to value the gap, which makes it difficult to calculate. Therefore, the time complexity of Hill sorting given in many textbooks is not fixed.
  4. Stability: Unstable

3. Selection sort

3.1 Basic idea

  Select the smallest (or largest) element from the data elements to be sorted each time, and store it at the beginning of the sequence until all the data elements to be sorted are exhausted

3.2 Code implementation

  A little bit of optimization has been carried out here, and the maximum and minimum values ​​are found at the same time, and placed in the starting position and the end position respectively.

void Swap(int* x, int* y)
{
    
    
	int tmp = *x;
	*x = *y;
	*y = tmp;
}

//O(n*2)
void SelectSort(int* a, int n)
{
    
    
	int end = n - 1;
	int begin = 0;
	while (begin < end)
	{
    
    
		int mini = begin, maxi = end;
		int i = 0;
		for (i = begin; i <= end; i++)
		{
    
    
			if (a[i] < a[mini])
				mini = i;

			if (a[i] > a[maxi])
				maxi = i;
		}
		Swap(&a[mini], &a[begin]);
		if (begin == maxi)
			maxi = mini;
		Swap(&a[maxi], &a[end]);

		begin++;
		end--;
	}
}

  It is worth noting that when begin and maxi coincide, because we exchange the minimum value and the first position first, after the exchange, the value pointed to by maxi is no longer the maximum value, but runs to the point pointed to by the exchanged mini place, so when begin coincides with maxi, we need to modify the position of maxi after the exchange to make it point to the correct position. Specifically as shown in the figure:
insert image description here

3.3 Feature Summary

  1. Direct selection sort thinking is very easy to understand, but the efficiency is not very good, and it is rarely used in practice.
  2. Time complexity: O(N^2)
  3. Space complexity: O(1)
  4. Stability: Unstable

4. Heap sort

4.1 Basic idea

  Heap sorting refers to a sorting algorithm designed using the data structure of a stacked tree (heap), which selects data through the heap. It should be noted that you need to build a large heap for ascending order, and a small heap for descending order. The specific idea is explained in my other article Data Structure Chapter 6: Binary Tree . If you want to know more, you can click here to jump. I will only paste the code here.

4.2 Code implementation

//向上调整
void AdjustUp(int* data, int child)
{
    
    
	int parent = (child - 1) / 2;
	while (child > 0)
	{
    
    
		if (data[child] < data[parent]) // "<" 小堆
		{
    
    
			Swap(&data[child], &data[parent]);
			child = parent;
			parent = (child - 1) / 2;
		}
		else
		{
    
    
			break;
		}
	}
}

//向下调整
void AdjustDown(int* data, int size, int parent)
{
    
    
	int child = (parent * 2) + 1;
	while (child < size)
	{
    
    
		if (child + 1 < size && data[child] > data[child + 1])  // "<" 大堆
		{
    
    
			child++;
		}

		if (data[child] < data[parent]) // ">"大堆
		{
    
    
			Swap(&data[child], &data[parent]);
			parent = child;
			child = (parent * 2) + 1;
		}
		else
		{
    
    
			break;
		}
	}
}

//堆排序
//升序 -- 大堆
//降序 -- 小堆
void HeapSort(int* a, int n)
{
    
    
	1.建堆 O(N*logn)
	int i = 0;
	//for (i = 1; i < n; i++)
	//{
    
    
	//	AdjustUp(a, i);
	//}

	//2.建堆 O(N)
	for (i = (n - 1 - 1) / 2; i >= 0; i--)
	{
    
    
		AdjustDown(a, n, i);
	}

	int end = n - 1;
	while (end >= 0)
	{
    
    
		Swap(&a[0], &a[end]);
		AdjustDown(a, end, 0);
		end--;
	}
}

4.3 Summary of Features

  1. Heap sort uses the heap to select numbers, which is much more efficient.
  2. Time complexity: O(N*logN)
  3. Space complexity: O(1)
  4. Stability: Unstable

5. Bubble sort

5.1 Basic idea

  Swap the positions of the two data in the sequence according to the size of the two data in the sequence. This is very simple, just compare and sort each number with other numbers in turn.

5.2 Code implementation

void BubbleSort(int* a, int n)
{
    
    
	int i = 0;
	for (i = 0; i < n - 1; i++) //趟数
	{
    
    
		int j = 0;
		for (j = 0; j < n - 1 - i; j++) //每个元素比较次数
		{
    
    
			if (a[j] > a[j + 1])
			{
    
    
				Swap(&a[j], &a[j + 1]);
			}
		}
	}
}

5.3 Summary of Features

  1. Bubble sort is a very easy to understand sort.
  2. Time complexity: O(N^2)
  3. Space complexity: O(1)
  4. Stability: Stable

6. Quick Sort

6.1 Basic idea

  Take an element in the element sequence to be sorted as the reference value, divide the set to be sorted into two subsequences according to the reference value, all elements in the left subsequence are smaller than the reference value, and all elements in the right subsequence are greater than the reference value, and then The process is repeated for the left and right subsequences until all elements are arranged in their corresponding positions.

6.1.1 Thought 1

  Here I will introduce three quick sorting ideas, first look at the first one. All elements in the left subsequence need to be less than the benchmark value, and all elements in the right subsequence need to be greater than the benchmark value, so we need to find elements on the right that are less than the benchmark value and move to the left, and find elements on the left that are greater than the benchmark value and move to the right.
insert image description here
insert image description here
  Note here thatMust search from right to left, so that the position where the left finally stops can put the key in the correct position.
  This is just a sorting, and then use it as the dividing point to continue sorting the left and right sequences. It is a process of continuous segmentation + sorting, so we can consider using recursion to solve it.

6.1.2 Thought 2

  It is an idea of ​​digging a pit, which is generally similar to the first one, but slightly different.
insert image description here

6.1.3 Thought 3

  Use prev to point to the beginning of the data, cur to point to the next one of prev, and then use key to record the first data as the reference value.Use cur to find the one smaller than the key from the beginning, and exchange it with the next position of prev when found, because the current prev points to the reference value, and we start sorting from the next position of the reference value, so it is the same as the reference value exchange for the next position. In essence, it can be seen that the data before prev, including the data pointed to by prev, are smaller than the reference value, and the way to achieve this effect is to use cur to find the data smaller than the reference value, and then compare it withThe next data of prev is exchangedPerform prev++ again until cur has traversed the data.
insert image description here
  The data before prev does not include the position of the key, that is, the distance from the second data to prev is smaller than the key, and exchanging the prev with the key at this time will generate the data from the first data to the key. At this time, the data before prev is smaller than the key, indicating that the number is sorted.

6.2 Code implementation

int Partion1(int* a, int left, int right)
{
    
    
	int min = GetMidIndex(a, left, right);
	Swap(&a[left], &a[min]);

	int key = left;
	while (left < right)
	{
    
    
		//从右往左,找小的
		while (left < right && a[right] >= a[key])
		{
    
    
			right--;
		}
		//从左往右,找大的
		while (left < right && a[left] <= a[key])
		{
    
    
			left++;
		}
		Swap(&a[left], &a[right]);
	}
	Swap(&a[left], &a[key]);
	return left;
}

//挖坑法
int Partion2(int* a, int left, int right)
{
    
    
	int min = GetMidIndex(a, left, right);
	Swap(&a[left], &a[min]);

	int pivot = left;
	int key = a[left];
	while (left < right)
	{
    
    
		//从右往左,找小的,放到左边的坑里
		while (left < right && a[right] >= key)
		{
    
    
			right--;
		}
		//Swap(&a[pivot], &a[right]);
		a[pivot] = a[right];
		pivot = right;

		//从左往右,找大的,放到右边的坑里
		while (left < right && a[left] <= key)
		{
    
    
			left++;
		}
		//Swap(&a[pivot], &a[left]);
		a[pivot] = a[left];
		pivot = left;
	}
	a[pivot] = key;
	return pivot;
}

//前后指针
int Partion3(int* a, int left, int right)
{
    
    
	int key = left;
	int prev = left , cur = left + 1;
	while (cur <= right)
	{
    
    
		if (a[key] > a[cur])
		{
    
    
			Swap(&a[cur], &a[++prev]);
			cur++;
		}
		else
		{
    
    
			cur++;
		}
	}
	Swap(&a[key], &a[prev]);
	return prev;
}

6.2.1 Recursive version

  The three ideas return the subscripts after the sorting of the benchmark values ​​each time, that is, they are divided into left and right sub-intervals, and it is enough to sort the sub-intervals continuously with recursion. If you don’t understand the recursive process, you can jump Go to Data Structure Part 6: Binary Tree This article has a look at the pre-order traversal recursion of the binary tree.

void QuickSort(int* a, int left, int right)
{
    
    
	if (left >= right)
		return;

	int key = Partion3(a, left, right);
	QuickSort(a, left, key - 1);
	QuickSort(a, key + 1, right);
}

6.2.2 Non-recursive version

  The non-recursive implementation needs the assistance of the stack. We store the range that needs to be sorted into the stack every time. After sorting a sub-range, we will divide the sub-range into smaller sub-ranges and push them onto the stack. Every time an interval is sorted, the interval will be popped from the stack, and the sub-intervals of this interval will be pushed into the stack at the same time. When the stack is empty, it means that all intervals have been sorted.
insert image description here
  This continuous repetition is equivalent to simulating the recursive process. The advantage of non-recursion is that it does not need to consume too much space. If there is too much data, the recursion depth is too high and it is easy to cause stack overflow. Non-recursion can solve this problem very well.

void QuickSortNonR(int* a, int left, int right)
{
    
    
	Stack st;
	StackInit(&st);
	StackPush(&st, left); //入栈
	StackPush(&st, right);

	while (!StackEmpty(&st))
	{
    
    
		int end = StackTop(&st);//取栈顶元素
		StackPop(&st);          //出栈

		int begin = StackTop(&st);
		StackPop(&st);

		int key = Partion3(a, begin, end);
		//[begin,key - 1] key [key + 1,end]

		if (begin < key - 1)
		{
    
    
			StackPush(&st, begin); //入栈
			StackPush(&st, key - 1);
		}

		if (key + 1 < end)
		{
    
    
			StackPush(&st, key + 1);
			StackPush(&st, end);
		}
	}

	StackDestroy(&st);
}

6.3 Optimization

  If you look at the code, you can see a GetMidIndex(); this function is an optimization for the selection of benchmark values. Assuming that the first data is the maximum or minimum value of all data, what will happen to the sorting? It will appear that the first sorting will not be found until the end of the traversal. Compared with searching on both sides at the same time, it will be much slower, so there is this optimization of key value selection.

int GetMidIndex(int* a, int left, int right)
{
    
    
	int mid = (left + right) / 2;
	if (a[left] > a[mid])
	{
    
    
		if (a[mid] > a[right])
		{
    
    
			return mid;
		}
		else if(a[left] < a[right])
		{
    
    
			return left;
		}
		else
		{
    
    
			return right;
		}
	}
	else  //a[left] < a[mid]
	{
    
    
		if (a[right] < a[left])
		{
    
    
			return left;
		}
		else if (a[right] > a[mid])
		{
    
    
			return mid;
		}
		else
		{
    
    
			return right;
		}
	}
}

6.4 Summary of Features

  1. The overall comprehensive performance and usage scenarios of quick sort are a relatively good sort, but it is relatively slow for sorting data that is already in order, and it belongs to an algorithm that the more chaotic the data, the better the sorting effect.
  2. Time complexity: O(N*logN)
  3. Space complexity: O(logN)
  4. Stability: Unstable

7. Merge sort

7.1 Basic idea

  Merge sort is an effective sorting algorithm based on the merge operation, which is a very typical application of the divide and conquer method. Merge the already ordered subsequences to obtain a completely ordered sequence; that is, first make each subsequence in order, and then make the subsequence segments in order. Merging two sorted lists into one sorted list is called a two-way merge.
insert image description here
  In essence, it is two-two sorting, four-four sorting..., the process of continuously increasing data for sorting. What we have to do is to continuously divide the difference, and then sort the two adjacent intervals. Since the data itself needs to be compared, a temporary array is also needed to save the sorted data each time. After the sorting is completed, it can be copied back to the original array. The recursive process can be understood by referring to the article Data Structure Six: Binary Tree.

7.2 Code implementation

7.2.1 Recursive version

void _MergeSort(int* a, int left, int right, int* tmp)
{
    
    
	if (left >= right)
		return;
	//进行递归分割区间
	int mid = (left + right) / 2;
	_MergeSort(a, left, mid, tmp);
	_MergeSort(a, mid + 1, right, tmp);

	//进行排序
	int begin1 = left, end1 = mid; //左区间
	int begin2 = mid + 1, end2 = right;//右区间
	int i = left;
	while (begin1 <= end1 && begin2 <= end2)
	{
    
    
		if (a[begin1] < a[begin2])
		{
    
    
			tmp[i++] = a[begin1++];
		}
		else
		{
    
    
			tmp[i++] = a[begin2++];
		}
	}

	while (begin1 <= end1)
	{
    
    
		tmp[i++] = a[begin1++];
	}

	while (begin2 <= end2)
	{
    
    
		tmp[i++] = a[begin2++];
	}

}

void MergeSort(int* a, int n)
{
    
    
	int* tmp = (int*)malloc(sizeof(int) * n);
	if (tmp == NULL)
	{
    
    
		printf("MergeSort:");
		exit(-1);
	}

	_MergeSort(a, 0, n - 1, tmp);

	free(tmp);
	tmp = NULL;
}

7.2.2 Non-recursive version

  The first non-recursive time is to end the stack or queue to complete, but this is not possible, because we need to divide the interval to the minimum before starting to sort, and when using the stack and queue to simulate recursion, the boundary of the interval is stored. It ends when the stack or queue is empty. But here, when we divide into the smallest interval, we end up sorting out the queue.
insert image description here
  That is, after this step is completed, the stack or queue is empty, and it is over, and the sorting cannot be completed, so the stack or queue cannot be used to assist in completion.
  Therefore, it is necessary to change the way of thinking. Here, the gap is used to control the interval. There is no need for recursive segmentation, and the loop is directly used to control the interval, and the gap is used to change the size of each interval.
insert image description here
  It is easier to understand by looking at it against this.
insert image description here
  But this will cause a big problem, the gap is twice as much as before, that is, the number of data sorted each time is twice as much as before, 2, 4, 8, 16..., then if it is 10 What about the data? There will be an out-of-bounds situation. Obviously there are only 10 data, but you have accessed the space behind the 10 data. Therefore, it is necessary to adjust this situation to prevent access to the address space that should not be accessed.
  By looking at the code, we know that only end1, begin2, and end2 will appear out of bounds, because if there is no data in an interval, the loop will not enter at all, and if there is data, begin1 will not cross the bounds. If end2 is out of bounds, you only need to change the value of end2 to n - 1, and let it point to the position of the last data again, and there will be no out of bounds. If begin2 is out of bounds, change begin2 to n, and end2 to n - 1. In this way, begin2 > end2 means that the interval does not exist, and this interval will not be accessed. The processing method of end1 out of bounds is the same as that of end2 out of bounds. This perfectly solves the problem of crossing the boundary.

void MergeSortNonR1(int* a, int n)
{
    
    
	int* tmp = (int*)malloc(sizeof(int) * n);
	if (tmp == NULL)
	{
    
    
		printf("MergeSort:");
		exit(-1);
	}

	int gap = 1, i = 0;  //类似层序
	while (gap < n)
	{
    
    
		for (i = 0; i < n; i = i + 2 * gap)
		{
    
    
			//[i,i+gap-1]  [i+gap, i+gap*2-1]
			int begin1 = i, end1 = i + gap - 1;
			int begin2 = i + gap, end2 = i + gap * 2 - 1;
			int j = i;

			//end2 越界
			if (end2 >= n)
			{
    
    
				end2 = n - 1;
			}

			//begin2 越界 [begin2,end2]不存在,讲这个区间设置为不存在
			if (begin2 >= n)
			{
    
    
				begin2 = n;
				end2 = n - 1;
			}

			//end1 越界,[begin2,end2]不存在
			if (end1 >= n)
			{
    
    
				end1 = n - 1;
			}

			while (begin1 <= end1 && begin2 <= end2)
			{
    
    
				if (a[begin1] < a[begin2])
				{
    
    
					tmp[j++] = a[begin1++];
				}
				else
				{
    
    
					tmp[j++] = a[begin2++];
				}
			}

			while (begin1 <= end1)
			{
    
    
				tmp[j++] = a[begin1++];
			}

			while (begin2 <= end2)
			{
    
    
				tmp[j++] = a[begin2++];
			}
		}

		int j = 0;
		for (j = 0; j < n; j++)
		{
    
    
			a[j] = tmp[j];
		}
		gap *= 2;
	}

	free(tmp);
	tmp = NULL;
}

7.3 Summary of features

  1. The disadvantage of merging is that it requires O(N) space complexity, and the thinking of merging and sorting is more to solve the problem of external sorting in the disk.
  2. Time complexity: O(N*logN)
  3. Space complexity: O(N)
  4. Stability: Stable

8. Counting sort

8.1 Basic idea

  Count the number of occurrences of the same element, and put the result back into the original sequence according to the statistical result.
insert image description here
  It can be understood as using the newly developed count array to record the number of occurrences of each data. The subscripts of the count array represent the data of the original array, and what is stored in the count array is the number of occurrences of the corresponding subscript.
  From the figure above, 0 and 1 are unused space, and the useful space is only 2 to 9, so we can use the maximum value minus the minimum value plus one to open up space for the count array and reduce space loss. But at the same time, the original array data corresponding to the subscript will also change. For example, the above 2 needs to be stored at the position of 0 by subtracting the minimum value 2 when storing, and adding this minimum value when putting it back into the original array at the end will result in OK.
  If you don't understand well, you can passDraw pictures and bring in data to understand the process in more detail. This is also an important way to learn data structure content.

8.2 Code implementation

void CountSort(int* a, int n)
{
    
    
	int max = a[0], min = a[0];
	int i = 0;
	for (i = 0; i < n; i++)
	{
    
    
		if (a[i] < min)
		{
    
    
			min = a[i];
		}

		if (a[i] > max)
		{
    
    
			max = a[i];
		}
	}
	int range = max - min + 1;
	int* count = (int*)malloc(sizeof(int) * range);
	if (count == NULL)
	{
    
    
		printf("MergeSort:");
		exit(-1);
	}
	memset(count, 0, sizeof(int) * range);
	//计数
	for (i = 0; i < n; i++)
	{
    
    
		count[a[i] - min]++;
	}
	//排序
	int j = 0;
	for (i = 0; i < range; i++)
	{
    
    
		while (count[i]--)
		{
    
    
			a[j++] = i + min;
		}
	}
	free(count);
	count = NULL;
}

8.3 Feature Summary

  1. Counting sort is highly efficient when the data range is concentrated, but its scope of application and scenarios are limited.
  2. Time complexity: O(MAX(N,range))
  3. Space complexity: O(range range)
  4. Stability: Stable

9. Summary

  There is a lot of content in this article, and it took a few days to sort out like the binary tree (sigh), but it is still much better than the binary tree (hey). If you find any mistakes, you can private message or point out in the comment area (Mandarin). Then the data structure has come to an end, and the next step is to learn C++ and Linux. I hope to make progress together with everyone, so this issue ends here, and let's see you next time! ! If you feel good, you can click a like to show your encouragement! !

Guess you like

Origin blog.csdn.net/qq_62321047/article/details/132197418