Data structure (C language implementation) - the basic idea and implementation of common sorting algorithms (three methods of quick sorting and optimization and non-recursive quick sorting)

1 Introduction

Sorting is used almost everywhere in life, such as the order of stores when shopping online, the ranking of students' grades, etc. Today we will learn several sorting algorithms that are common in data structures.

2. Sort

2.1 Concept

Sorting : The so-called sorting is the operation of arranging a string of records in increasing or decreasing order according to the size of one or some of the keywords.
Stability : Assume that there are multiple records with the same keyword in the sequence of records to be sorted. If sorted, the relative order of these records remains unchanged, that is, in the original sequence, r[i]=r[j] , and r[i] is before r[j], and in the sorted sequence, r[i] is still before r[j], the sorting algorithm is said to be stable; otherwise it is called unstable.
Internal sorting : A sorting in which all data elements are placed in memory.
External sorting : Too many data elements cannot be placed in memory at the same time, and the sorting of data cannot be moved between internal and external memory according to the requirements of the sorting process.

2.2 Common sorting algorithms

Insertion Sort : Direct Insertion Sort, Hill Sort
Selection Sort : Direct Selection Sort, Heap Sort
Exchange Sort : Bubble Sort, Quick Sort
Merge Sort : Merge Sort

Below I will introduce the characteristics of these sorting algorithms and how to implement them.

3. Implementation of common sorting algorithms

3.1 Direct insertion sort

Idea :
Insert the records to be sorted into a sorted sequence one by one according to the size of their key code values, until all the records are inserted, and a new sequence is obtained.
When inserting the i-th (i>=1) element, the previous array[0], array[1],...,array[i-1] have been sorted, and at this time use the sort code of array[i] and Array[i-1], array[i-2], ... compare the order of the sort codes, find the insertion position and insert array[i], and the order of the elements at the original position is shifted backward.

Code:

void InsertSort(int* a, int n)
{
    
    
	int i = 0;
	for (i = 0; i < n - 1; i++)
	{
    
    
		//[0,end]有序,把end+1的位置插入,并保持有序
		int end = i;
		int tmp = a[end + 1];
		while (end >= 0)
		{
    
    
			if (tmp < a[end])
			{
    
    
				a[end + 1] = a[end];
				end--;
			}
			else
			{
    
    
				break;
			}
		}
		a[end + 1] = tmp;
	}
}

Features :
1. The closer the element set is to order, the higher the time efficiency of the direct insertion sorting algorithm
2. Time complexity: O(N^2)
3. Space complexity: O(1)
4. Stability: stable

3.2 Hill sort

Idea :
First select an integer, divide all the records in the file to be sorted into several groups, and group all the records whose distance is gap into the same group, and sort the records in each group. Then, take gap=gap/3+1, and repeat the above grouping and sorting work. When gap=1 is reached, all records are sorted in the same group.

Code:

void ShellSort(int* a, int n)
{
    
    
	//gap > 1时是预排序
	//gap 最后一次等于1,是直接插入排序
	int gap = n;
	while (gap > 1)
	{
    
    
		gap = gap / 3 + 1;
		int i = 0;
		for (i = 0; i < n - gap; i++)
		{
    
    
			int end = i;
			int tmp = a[end + gap];
			while (end >= 0)
			{
    
    
				if (tmp < a[end])
				{
    
    
					a[end + gap] = a[end];
					end -= gap;
				}
				else
				{
    
    
					break;
				}
			}
			a[end + gap] = tmp;
		}
	}
}

Features :
1. Hill sort is an optimization of direct insertion sort.
2. When gap > 1, it is pre-sorted, the purpose is to make the array closer to order. When gap == 1, the array is already close to order, and direct insertion sorting will be very fast.
3. The time complexity of Hill sorting is not easy to calculate, because there are many ways to value the gap, which makes it difficult to calculate. According to the analysis in "Data Structure (C Language Version)" written by Yan Weimin, we can get Hill The time complexity of sorting is about O(N^1.3).
4. Stability: Unstable

3.3 Direct Selection Sort

Ideas :
1. Select the data element with the largest (smallest) key in the element set array[i]–array[n-1].
2. If it is not the last (first) element in the set, exchange it with the last (first) element in the set.
3. In the remaining array[i]–array[n-2] (array[i+1]–array[n-1]) collection, repeat the above steps until there is 1 element remaining in the collection.

We optimize the direct selection sort a little, so that the algorithm selects the largest number and the smallest number each time, so that both sides can be sorted at the same time, reducing the execution time of the program.
Code:

void SelectSort(int* a, int n)
{
    
    
	int begin = 0;
	int end = n - 1;
	
	while (begin < end)
	{
    
    
		int mini = begin;
		int maxi = begin;
		for (int i = begin; i < end; i++)
		{
    
    
			if (a[i] < a[mini])
			{
    
    
				mini = i;
			}
			if (a[i] > a[maxi])
			{
    
    
				maxi = i;
			}
		}
		Swap(&a[begin], &a[mini]);
		//如果begin和maxi重叠,那么要修正一下maxi的位置
		if (begin == maxi)
		{
    
    
			maxi = mini;
		}
		Swap(&a[end], &a[maxi]);
		begin++;
		end--;
	}
}

Features :
1. The direct selection and sorting thinking is very easy to understand, but the efficiency is not very good. Rarely used in practice
2. Time complexity: O(N^2)
3. Space complexity: O(1)
4. Stability: unstable

3.4 Heap sort

Idea :
Heap sorting refers to a sorting algorithm designed using the stacked tree (heap) data structure, which is a type of selection sorting. It selects data through the heap. It should be noted that you need to build a large heap for ascending order, and a small heap for descending order.

Code:

void Swap(int* p1, int* p2)
{
    
    
	int tmp = *p1;
	*p1 = *p2;
	*p2 = tmp;
}

void AdjustUp(int* arr, int child)
{
    
    
	int parent = (child - 1) / 2;
	while (child > 0)
	{
    
    
		if (arr[child] > arr[parent])
		{
    
    
			Swap(&arr[child], &arr[parent]);
			child = parent;
			parent = (child - 1) / 2;
		}
		else
		{
    
    
			break;
		}
	}
}

void AdjustDown(int* arr, int size, int parent)
{
    
    
	int child = parent * 2 + 1;
	while (child < size)
	{
    
    
		if (child + 1 < size && arr[child + 1] < arr[child])
		{
    
    
			child = child + 1;
		}
		if (arr[child] < arr[parent])
		{
    
    
			Swap(&arr[child], &arr[parent]);
			parent = child;
			child = parent * 2 + 1;
		}
		else
		{
    
    
			break;
		}
	}
}

void HeapSort(int* arr, int size)
{
    
    
	//排升序建大堆,排降序建小堆
	//向上调整建堆O(N*logN)
	//int i = 0;
	//for (i = 1; i < size; i++)
	//{
    
    
	//	AdjustUp(arr, i);
	//}

	//向下调整建堆O(N)
	int i = 0;
	for (i = (size - 1 - 1) / 2; i >= 0; i--)
	{
    
    
		AdjustDown(arr, size, i);
	}

	int end = size - 1;
	while (end > 0)
	{
    
    
		Swap(&arr[0], &arr[end]);
		AdjustDown(arr, end, 0);
		end--;
	}
}

Features :
1. Heap sorting uses heaps to select numbers, which is much more efficient.
2. Time complexity: O(N*logN)
3. Space complexity: O(1)
4. Stability: unstable

3.5 Bubble sort

Ideas :
1. First compare adjacent elements, and if the first number is greater than the second number, exchange them.
2. Repeat the first step for each pair of adjacent elements, from the first two elements to the last two elements at the end. After the end of this sorting, the last element is the largest number.
3. Repeat the above steps for all elements, except the elements that have been sorted, so the number of elements to be sorted is reduced by 1 each time. After each element is sorted, it becomes ascending order.

Code:

void BubbleSort(int* a, int n)
{
    
    
	for (int j = 0; j < n - 1; j++)
	{
    
    
		int flag = 0;
		for (int i = 0; i < n - j; i++)
		{
    
    
			if (a[i] > a[i + 1] && i+1 < n)
			{
    
    
				Swap(&a[i], &a[i + 1]);
				flag = 1;
			}
		}
		if (flag == 0)
		{
    
    
			return;
		}
	}
}

Features :
1. Bubble sorting is a very easy-to-understand sorting
2. Time complexity: O(N^2)
3. Space complexity: O(1)
4. Stability: stable

3.6 Quick Sort

Idea :
Any element in the element sequence to be sorted is taken as the reference value key, and the set to be sorted is divided into two subsequences according to the sorting code. All elements in the left subsequence are smaller than the reference value key, and all elements in the right subsequence are equal to is greater than the reference value key, then the leftmost subsequence repeats the process until all elements are arranged in the corresponding position.

The problem is how to divide a sequence to be sorted into two subsequences that meet the requirements according to the key. Let's introduce the three most common ways below.

3.6.1 hoare version

Ideas :
1. Set the first element in the sequence as the reference value key.
2. Go to the right first, traverse the sequence from right to left to find a position smaller than the key, and stop after finding it.
3. Walk to the left, traverse the sequence from left to right to find a position larger than the key, and stop after finding it.
4. Exchange these two positions, repeat step 2.3 until the left and right meet, then exchange the key with any position, the position of the key is the final position, and divide the sequence into the left side of the key and the right side of the key sequence.

Code:

//hoare版本
int PartSort1(int* a, int left, int right)
{
    
    
	int begin = left;
	int end = right;
	int key = left;
	while (left < right)
	{
    
    
		//右边先走,找小
		while (left < right && a[right] >= a[key])
		{
    
    
			right--;
		}
		//左边再走,找大
		while (left < right && a[left] <= a[key])
		{
    
    
			left++;
		}
		Swap(&a[left], &a[right]);
	}
	Swap(&a[key], &a[left]);
	key = left;
	return key;
}

3.6.2 Hole digging method

Ideas :
1. First store the first element on the left in a temporary variable key, and this position forms a pit.
2. Go to the right first, traverse the sequence from right to left to find a position smaller than the key, and fill the number into the pit on the left after finding it. At this time, the original position of the number forms a new pit.
3. Go to the left and traverse the sequence from left to right to find a position larger than the key. After finding it, fill the number into the pit on the right. At this time, the original position of the number forms a new pit.
4. Repeat step 2.3 until the left and right meet, and fill the key into the pit at this time, so that the left and right intervals of the key are divided.

Code:

//挖坑法
int PartSort2(int* a, int left, int right)
{
    
    
	int begin = left;
	int end = right;
	int key = a[begin];
	int pit = begin;
	while (left < right)
	{
    
    
		//右边先走,找小,填到左边的坑里面去。这个位置形成新的坑
		while (left < right && a[right] >= key)
		{
    
    
			right--;
		}
		a[pit] = a[right];
		pit = right;
		//左边再走,找大,填到右边的坑里面去。这个位置形成新的坑
		while (left < right && a[left] <= key)
		{
    
    
			left++;
		}
		a[pit] = a[left];
		pit = left;
	}
	a[pit] = key;
	return pit;
}

3.6.3 Back and forth pointer method

Thoughts :
1. Initially, the prev pointer points to the beginning of the sequence, and the cur pointer points to the next position of prev.
2. The cur pointer starts to traverse the sequence to the right until it finds a value smaller than the key.
3. The prev pointer goes one step to the right, exchanging the value pointed to by the prev pointer and the value pointed to by the cur pointer.
4. Repeat step 2.3 until cur==NULL.

Code:

//前后指针法
int PartSort3(int* a, int left, int right)
{
    
    
	int key = left;
	int prev = left;
	int cur = left + 1;
	while (cur <= right)
	{
    
    
		if (a[cur] < a[key])
		{
    
    
			prev++;
			Swap(&a[prev], &a[cur]);
		}
		cur++;
	}
	Swap(&a[key], &a[prev]);
	key = prev;
	return key;
}

3.6.4 Optimization of Quick Sort

3.6.4.1 The method of taking the middle of three numbers

The efficiency of quick sorting is directly related to the selection of keys. If the key is the minimum or maximum value every time, it is the worst case, so we can choose the key value reasonably to avoid this situation as much as possible. It is best The key selection method is the method of taking the middle of three numbers.

Code:

int GetMid(int* a, int left, int right)
{
    
    
	int mid = (left + right) / 2;
	if (a[left] < a[mid])
	{
    
    
		if (a[right] > a[mid])
		{
    
    
			return mid;
		}
		else if (a[right] < a[left])
		{
    
    
			return left;
		}
		else
		{
    
    
			return right;
		}
	}
	else //a[left] >= a[mid]
	{
    
    
		if (a[mid] > a[right])
		{
    
    
			return mid;
		}
		else if (a[right] > a[left])
		{
    
    
			return left;
		}
		else
		{
    
    
			return right;
		}
	}
}

3.6.4.2 Inter-cell simplification method

The space complexity of quicksort is related to the depth of recursion. The more times of recursion, the more space the program takes up, and we can find out through the idea of ​​quicksort that the sequence with the most recursion times happens to be the last few layers of recursion, but this At this time, there are almost only a few elements in each sequence. At this time, continuing to recurse is a waste of space, which is easy to cause stack overflow, so we can set a limit. If the data is less than this limit, there is no need to recurse. , using direct insertion sort is simpler and more efficient.

3.7 The optimal version of quick sort

Through the above method, we can divide the recursive subinterval very well, and select the appropriate key value, and it will not cause space waste.

The final version of the quick sort code is as follows:

//前后指针法
int PartSort3(int* a, int left, int right)
{
    
    
	int key = left;
	int prev = left;
	int cur = left + 1;
	//三数取中法优化
	int mid = GetMid(a, left, right);
	Swap(&a[key], &a[mid]);

	while (cur <= right)
	{
    
    
		if (a[cur] < a[key])
		{
    
    
			prev++;
			Swap(&a[prev], &a[cur]);
		}
		cur++;
	}
	Swap(&a[key], &a[prev]);
	key = prev;
	return key;
}

void QuickSort(int* a, int left, int right)
{
    
    
	//区间不存在,或者只有一个值则不需要在处理
	if (left >= right)
	{
    
    
		return;
	}
	
	if (right - left > 10)
	{
    
    

		//int key = PartSort1(a, left, right);
		//int key = PartSort2(a, left, right);
		int key = PartSort3(a, left, right);
		QuickSort(a, left, key - 1);
		QuickSort(a, key + 1, right);
	}
	else
	{
    
    
		//递归到小区间的时候,用插入排序
		InsertSort(a + left, right - left + 1);
	}
}

Features :
1. The overall comprehensive performance and usage scenarios of Quick Sort are relatively good.
2. Time complexity: O(N*logN)
3. Space complexity: O(logN)
4. Stability: unstable

3.8 Non-recursive implementation of quick sort

Using the characteristics of the stack to achieve non-recursive quick sort

code show as below:

void QuickSortNonR(int* a, int left, int right)
{
    
    
	ST st;
	StackInit(&st);
	StackPush(&st, right);
	StackPush(&st, left);

	while (!StackEmpty(&st))
	{
    
    
		int left = StackTop(&st);
		StackPop(&st);

		int right = StackTop(&st);
		StackPop(&st);

		int key = PartSort3(a, left, right);

		if (key + 1 < right)
		{
    
    
			StackPush(&st, right);
			StackPush(&st, key + 1);
		}
		if (left < key - 1)
		{
    
    
			StackPush(&st, key - 1);
			StackPush(&st, left);
		}
	}
	StackDestroy(&st);
}

3.9 Merge sort

Idea :
Merge sort is an effective sorting algorithm based on merge operations, which is a very typical application of divide and conquer. Combine the ordered subsequences to obtain a completely ordered sequence; that is, first make each subsequence in order, and then make the subsequence segments in order. Merging two sorted lists into one sorted list is called a two-way merge.

Code:

void _MergeSort(int* a, int begin, int end, int* tmp)
{
    
    
	if (begin >= end)
	{
    
    
		return;
	}
	int mid = (begin + end) / 2;
	//[begin, mid] [mid+1, end] 分治递归,让子区间有序
	_MergeSort(a, begin, mid, tmp);
	_MergeSort(a, mid + 1, end, tmp);
	//归并[begin, mid] [mid+1, end]
	int begin1 = begin;
	int end1 = mid;
	int begin2 = mid + 1;
	int end2 = end;
	int i = begin1;
	while (begin1 <= end1 && begin2 <= end2)
	{
    
    
		if (a[begin1] < a[begin2])
		{
    
    
			tmp[i++] = a[begin1++];
		}
		else
		{
    
    
			tmp[i++] = a[begin2++];
		}
	}
	while (begin1 <= end1)
	{
    
    
		tmp[i++] = a[begin1++];
	}
	while (begin2 <= end2)
	{
    
    
		tmp[i++] = a[begin2++];
	}
	//把归并数据拷贝回原数组
	memcpy(a + begin, tmp + begin, sizeof(int)*(end - begin + 1));

}


void MergeSort(int* a, int n)
{
    
    
	assert(a);
	int* tmp = (int*)malloc(sizeof(int) * n);
	if (tmp == NULL)
	{
    
    
		perror("malloc");
		exit(-1);
	}

	_MergeSort(a, 0, n - 1, tmp);

	free(tmp);
}

Features :
1. The disadvantage of merging is that it requires O(N) space complexity, and the thinking of merging and sorting is more to solve the problem of external sorting in the disk.
2. Time complexity: O(N*logN)
3. Space complexity: O(N)
4. Stability: stable

3.10 Non-recursive implementation of merge sort

Set a gap value, and the interval size of each merge is controlled by gap, so as to realize non-recursive merge sort.

Code:

void MergeSortNonR(int* a, int n)
{
    
    
	assert(a);
	int* tmp = (int*)malloc(sizeof(int) * n);
	if (tmp == NULL)
	{
    
    
		perror("malloc");
		exit(-1);
	}
	int gap = 1;
	while (gap < n)
	{
    
    
		for (int i = 0; i < n; i += 2*gap)
		{
    
    
			int begin1 = i;
			int end1 = i + gap - 1;
			int begin2 = i + gap;
			int end2 = i + 2 * gap - 1;
			//越界-修正边界
			if (end1 >= n)
			{
    
    
				end1 = n - 1;
				//[begin2, end2]修正为不存在区间
				begin2 = n;
				end2 = n - 1;
			}
			else if (begin2 >= n)
			{
    
    
				//[begin2, end2]修正为不存在区间
				begin2 = n;
				end2 = n - 1;
			}
			else if (end2 >= n)
			{
    
    
				end2 = n - 1;
			}
			
			int j = begin1;
			int m = end2 - begin1 + 1;
			while (begin1 <= end1 && begin2 <= end2)
			{
    
    
				if (a[begin1] < a[begin2])
				{
    
    
					tmp[j++] = a[begin1++];
				}
				else
				{
    
    
					tmp[j++] = a[begin2++];
				}
			}
			while (begin1 <= end1)
			{
    
    
				tmp[j++] = a[begin1++];
			}
			while (begin2 <= end2)
			{
    
    
				tmp[j++] = a[begin2++];
			}
			memcpy(a + i, tmp + i, sizeof(int) * m);
		}
		gap *= 2;
	}
	free(tmp);
}

4. Complexity and stability analysis of sorting algorithm

sorting method average case best case worst case auxiliary space stability
direct insertion sort O(N^2) O(N) O(N^2) O(1) Stablize
Hill sort O(NlogN)~O(N^2) O(N^1.3) O(N^2) O(1) unstable
direct selection sort O(N^2) O(N^2) O(N^2) O(1) unstable
heap sort O(NlogN) O(NlogN) O(NlogN) O(1) unstable
Bubble Sort O(N^2) O(N) O(N^2) O(1) Stablize
quick sort O(NlogN) O(NlogN) O(N^2) O(logN)~O(N) unstable
merge sort O(NlogN) O(NlogN) O(NlogN) O(N) Stablize

5. End

I will introduce you to the common sorting algorithms in the data structure. If you want to learn the sorting algorithm, the most important thing is to master the idea of ​​each sorting, and be familiar with the characteristics, complexity and stability of each algorithm. The scene selects the most appropriate sorting algorithm. There are inevitably many mistakes and omissions in this article, this article is only for your study and reference. If this article is helpful for everyone to learn the sorting algorithm, the blogger is very honored.
Finally, I would like to thank you all for your patience and support. Friends who feel that this article is well written can follow and support it three times. If you have any questions or there are mistakes in this article, you can private message me or leave a message in the comment area Discussion, thanks again everyone.

Guess you like

Origin blog.csdn.net/qq_43188955/article/details/130309972