Collection of common sorting-C language to implement data structure

Sorting : The so-called sorting is the operation of arranging a string of records in increasing or decreasing order according to the size of one or some of the keywords.

Here is a recommended website data structure and algorithm dynamic visualization (Chinese) - VisuAlgo

It allows us to see the sorting process more clearly.

Sort implements the interface

sort.h

#include<stdlib.h>
#include<stdio.h>
#include<assert.h>
#include<time.h>

// 插入排序
void InsertSort(int* a, int n);

// 希尔排序
void ShellSort(int* a, int n);

// 选择排序
void SelectSort(int* a, int n);

// 堆排序
void AdjustDwon(int* a, int n, int root);
void HeapSort(int* a, int n);

// 冒泡排序
void BubbleSort(int* a, int n)

// 快速排序递归实现

// 1.快速排序hoare版本
int PartSort1(int* a, int left, int right);

// 2.快速排序挖坑法
int PartSort2(int* a, int left, int right);

// 3.快速排序前后指针法
int PartSort3(int* a, int left, int right);

void QuickSort(int* a, int left, int right);

// 快速排序 非递归实现
void QuickSortNonR(int* a, int left, int right)

// 归并排序递归实现
void MergeSort(int* a, int n)

// 归并排序非递归实现
void MergeSortNonR(int* a, int n)

1. Insertion sort

Idea: Insert the records to be sorted into an already sorted ordered sequence one by one according to the size of their key values, until all records are inserted, and a new ordered sequence is obtained .

accomplish:

void InsertSort(int* arr, int n)
{
	// i< n-1 最后一个位置就是 n-2
	for (int i = 0; i < n - 1; i++)
	{
		//[0,end]的值有序，把end+1位置的值插入，保持有序
		int end = i;
		int tmp = arr[end + 1];

		while (end >= 0)
		{
			if (tmp < arr[end])
			{
				arr[end + 1] = arr[end];
				end--;
			}
			else
			{
				break;
			}
		}

		arr[end + 1] = tmp; 
		// why?  end+1 
		//break 跳出 插入 因为上面end--；
        //为什么不在else那里插入？因为极端环境下，假设val = 0，那么end-- 是-1，不进入while ， 
        //所以要在外面插入
	}
}

Why is the for loop i < n-1 here? As shown in the figure:

2. Hill sorting

Hill sorting is also called reducing increment method. The idea is: the algorithm first divides a set of numbers to be sorted into several groups according to a certain increment gap. The subscripts recorded in each group differ by gap. Sort all the elements in each group, and then Then group it by a smaller increment and sort within each group. When the increment is reduced to 1 (== direct insertion sort), the entire number to be sorted is divided into one group, and the sorting is completed.

Hill sorting can be understood as two steps: 1. Pre-sorting 2. Direct insertion sorting

As shown below:

Implementation: ①

void ShellSort(int* arr, int n)
{
	int gap = n;
	while (gap > 1)
	{
		gap = gap / 3 + 1;  
		//gap = gap / 2;

	for (int j = 0; j < gap; j++)
	{
		for (int i = j; i < n - gap; i = i + gap)
		{
			int end = i;
			int tmp = arr[end + gap];
			while (end >= 0)
			{
				if (tmp < arr[end])
				{
					arr[end + gap] = arr[end];
					end = end - gap;
				}
				else
				{
					break;
				}
			}
			arr[end + gap] = tmp;
		}
	}
}

②: Simple optimization based on ①

void ShellSort(int* arr, int n)
{
	//gap > 1 时 ，预排序
	//gap = 1 时，直接插入排序
	int gap = n;
	while (gap > 1)
	{
		gap = gap / 3 + 1;  //加1意味着最后一次一定是1 ，当gap = 1 时，就是直接排序
		//gap = gap / 2;

		for (int i = 0; i < n - gap; i++)
		{
			int end = i;
			int tmp = arr[end + gap];
			while (end >= 0)
			{
				if (tmp < arr[end])
				{
					arr[end + gap] = arr[end];
					end = end - gap;
				}
				else
				{
					break;
				}
			}
			arr[end + gap] = tmp;
		 }
	}

}

Why is i < n-gap in the for loop?

What is the value of gap?

Here it depends on personal habits. In the above, the gap is n at the beginning, and every time after entering the loop, it is /3. The reason why +1 is to ensure that the gap must be 1 in the last loop. Of course, /2 is also possible, and /2 means you don’t need +1 in the end.

Summary of features of Hill sort:

1. Hill sorting is an optimization of direct insertion sorting.

2. When gap > 1 , it is pre-sorted to make the array closer to order. When gap == 1 , the array is already nearly ordered, so it will be fast. In this way, overall optimization results can be achieved.

3. The time complexity of Hill sorting is difficult to calculate, because there are many ways to value gaps , which makes it difficult to calculate. The time complexity of Hill sorting is not fixed.

4. Arrange in ascending order. The larger the gap, the larger numbers will go to the back faster, and the smaller numbers will go to the front faster, but the closer they are to order.

The smaller the gap, the closer to ordering. When gap = 1, it is direct insertion sort.

3.Select sort

Idea: Each time, the smallest (or largest) element is selected from the data elements to be sorted and stored at the beginning of the sequence until all the data elements to be sorted are arranged.

Implementation: A simple optimization is done here. Each traversal not only selects the smallest one, but also selects the largest one.

void Swap(int* p1, int* p2)
{
	int tmp = *p1;
	*p1 = *p2;
	*p2 = tmp;
}

void SelectSort(int* arr, int n)
{
	assert(arr);

	int left = 0; //开始位置
	int right = n - 1; //结束位置

	while (left < right)
	{
		int min = left;
		int max = left;

		for (int i = left + 1; i <= right; i++)
		{
			if (arr[i] < arr[min])
				min = i;

			if (arr[i] > arr[max])
				max = i;
		}

		Swap(&arr[left], &arr[min]);

		//如果 left 和 max 重叠 ，那么要修正 max 的位置
		if (left == max)
		{
			max = min;
		}

		Swap(&arr[right], &arr[max]);

		left++;
		right--;

	}

}

4. Heap sort

Idea: Heapsort refers to a sorting algorithm designed using a data structure such as a stacked tree (heap). It is a type of selection sorting. It selects data through the heap. It should be noted that a large heap should be built in ascending order, and a small heap should be built in descending order.

Implementation: There are two ways to build a heap. Here we use the downward adjustment method to build a heap.

typedef int HPDataType;

void Swap(HPDataType* p1, HPDataType* p2)
{
	HPDataType tmp = *p1;
	*p1 = *p2;
	*p2 = tmp;
}

void AdjustDown(HPDataType* arr, int size, int parent)//向下调整
{

	int child = parent * 2 + 1;

	while (child < size)
	{
		if (arr[child + 1] > arr[child] && child + 1 < size)
		{
			child++;
		}

		if (arr[child] > arr[parent])
		{
			Swap(&(arr[child]), &(arr[parent]));
			parent = child;
			child = (parent * 2) + 1;
		}
		else
		{
			break;
		}
	}

}


void HeapSort(int* arr, int n)
{
    //建堆
	for (int i = (n - 1 - 1) / 2; i >= 0; i--)
	{
		AdjustDown(arr, n, i);
	}
    
    //排序
	int end = n - 1;
	while (end > 0)
	{
		Swap(&(arr[0]), &(arr[end]));
		AdjustDown(arr, end, 0);
		end--;
	}

}

5. Bubble sort

Idea: Swap the positions of the two records in the sequence based on the comparison results of the key values of the two records in the sequence. The characteristic of bubble sorting is: move the record with a larger key value to the end of the sequence, and move the record with a smaller key value to the end of the sequence. records are moved to the front of the sequence.

Please refer to: bubbling

accomplish:

void BubbleSort(int* arr, int n)
{
	assert(arr);

	for (int i = 0; i < n; i++)
	{
		int flag = 1;
		for (int j = 0; j < n - i - 1; j++)
		{
			if (arr[j] > arr[j + 1])
			{
				Swap(&arr[j], &arr[j + 1]);
				flag = 0;
			}
		}

		//如果没有发生交换，说明有序，直接跳出
		if (flag == 1)
			break;
	}

}

6. Quick sort

Idea: Take any element in the sequence of elements to be sorted as the benchmark value, and divide the set to be sorted into two subsequences according to the sorting code. All elements in the left subsequence are less than the benchmark value, and all elements in the right subsequence are greater than the benchmark value . value, and then repeat the process for the left and right subsequences until all elements are arranged at the corresponding positions .

hoare version

Methods as below:

void Swap(int* p1, int* p2)
{
	int tmp = *p1;
	*p1 = *p2;
	*p2 = tmp;
}

int PartSort1(int* arr, int begin, int end)
{
	int left = begin;
	int right = end;

	//keyi 意味着保存的是 key 的位置
	int keyi = left;

	while (left < right)
	{
		//右边先走，找小
		while (left < right && arr[right] >= arr[keyi])
		{
			right--;
		}

		//左边再走，找大
		while (left < right && arr[left] <= arr[keyi])
		{
			left++;
		}

		//走到这里意味着，右边的值比 key 小，左边的值比 key 大
		Swap(&arr[left], &arr[right]);
	}

	//走到这里 left 和 right 相遇 
	Swap(&arr[keyi], &arr[left]);

	keyi = left; //需要改变keyi的位置

	return keyi;
}

pit digging

Methods as below:

int PartSort2(int* arr, int begin, int end)
{
	int key = arr[begin];

	int piti = begin;

	while (begin < end)
	{
		//右边先走，找小,填到左边的坑里去，这个位置形成新的坑
		while (begin < end && arr[end] >= key)
		{
			end--;
		}

		arr[piti] = arr[end];
		piti = end;

		//左边再走，找大
		while (begin < end && arr[begin] <= key)
		{
			begin++;
		}

		arr[piti] = arr[begin];
		piti = begin;
	}

	//相遇一定是在坑位
	arr[piti] = key;
	return piti;

}

forward and backward pointer method

Methods as below:

int PartSort3(int* arr, int begin, int end)
{
	int key = begin;

	int prev = begin;

	int cur = begin + 1;

	//优化-三数取中
	int midi = GetMidIndex(arr, begin, end);
	Swap(&arr[key], &arr[midi]);

	while (cur <= end)
	{
		if (arr[cur] < arr[key] && prev != cur )
		{
			prev++;
			Swap(&arr[prev], &arr[cur]);
		}

		cur++;
	}

	Swap(&arr[key], &arr[prev]);
	key = prev;

	return key;
}

Implementation: The above three methods are all implemented in the form of functions, which makes them easy to call. In addition, the above methods are all single-pass sorting. If you want to achieve complete sorting, you still need to use a recursive method, similar to the pre-order traversal of a binary tree.

void Swap(int* p1, int* p2)
{
	int tmp = *p1;
	*p1 = *p2;
	*p2 = tmp;
}

void QuickSort(int* arr, int begin,int end)
{

	//当区间不存在或者区间只要一个值，递归返回条件
	if (begin >= end)
	{
		return;
	}

	if (end - begin > 20) //小区间优化一般在十几
	{
		//int keyi = PartSort1(arr, begin, end);
        //int keyi = PartSort2(arr, begin, end);
        int keyi = PartSort3(arr, begin, end);

		//[begin , keyi - 1] keyi [keyi + 1 , end]
		//如果 keyi 的左区间有序 ，右区间有序，那么整体就有序

		QuickSort(arr, begin, keyi - 1);
		QuickSort(arr, keyi + 1, end);
	}
	else
	{
		InsertSort(arr + begin, end - begin + 1);//为什么+begin，因为排序不仅仅排序左子树，还有右子树
		                                         //为什么+1 ，因为这个区间是左闭右闭的区间.例：0-9 是10个数 所以+1
	}
}

optimization:

1. Three numbers take the Chinese method to select the key

2. When recursing to a small subrange, you can consider using insertion sort (already used in the implementation)

int GetMidIndex(int* arr, int begin, int end)
{
	//begin   mid    end

	int mid = (begin + end) / 2;
	if (arr[begin] < arr[mid])
	{
		if (arr[mid] < arr[end])
		{
			return mid;
		}
		else if(arr[begin] < arr[end])  //走到这里说明 mid 是最大的
		{
			return end;
		}
		else
		{
			return begin;
		}
	}
	else // arr[begin] > arr[mid]
	{

		if (arr[mid] > arr[end])
		{
			return mid;
		}
		else if (arr[begin] < arr[end])  // 走到这里就是 begin end 都大于 mid
		{
			return begin;
		}
		else
		{
			return end;
		}
	}
}

Non-recursive version :

The non-recursive version requires the use of a stack. This is implemented in C language, so a stack needs to be implemented manually.

If you use C++, you can directly reference the stack.

The implementation of the stack here is temporarily omitted, and a link will be given later. Just know it for now.

Simplified diagram:

//非递归
//递归问题：极端场景下，深度太深，会出现栈溢出
//1.直接改成循环--例：斐波那契数列、归并排序
//2.用数据结构栈模拟递归过程
void QuickSortNonR(int* arr, int begin, int end)
{
	ST st;
	StackInit(&st);

	StackPush(&st, end);
	StackPush(&st, begin);

	while (!StackEmpty(&st))
	{
		int left = StackTop(&st);
		StackPop(&st);

		int right = StackTop(&st);
		StackPop(&st);

		int keyi = PartSort3(arr, left, right);

		//[left , keyi - 1]   keyi    [keyi + 1 , right]

		if (keyi + 1 < right)
		{
			StackPush(&st, right);
			StackPush(&st, keyi + 1);
		}

		if (left < keyi - 1)
		{
			StackPush(&st, keyi - 1);
			StackPush(&st, left);
		}

	}

	StackDestory(&st);
}

7. Merge sort

Idea: Merge sort (MERGE-SORT ) is an effective sorting algorithm based on the merge operation . This algorithm is a very typical application of the divide and conquer method ( Divide and Conquer). Combine the ordered subsequences to obtain a completely ordered sequence; that is, first make each subsequence in order, and then make the subsequence segments in order. If two sorted lists are merged into one sorted list, it is called a two-way merge. Core steps of merge sort:

accomplish:

void _MergeSort(int* arr, int begin, int end, int* tmp)
{
	if (begin >= end)
		return;

	int mid = (begin + end) / 2;

	//[begin mid]  [mid+1,end]

	//递归
	_MergeSort(arr, begin, mid, tmp);
	_MergeSort(arr, mid + 1, end, tmp);

	//归并[begin mid]  [mid+1,end]
	int left1 = begin;
	int right1 = mid;

	int left2 = mid + 1;
	int right2 = end;

	int i = begin;//这里之所以等于begin 而不是等于0 是因为可能是右子树而不是左子树 i为tmp数组下标

	while (left1 <= right1 && left2 <= right2)
	{
		if (arr[left1] < arr[left2])
		{
			tmp[i++] = arr[left1++];
		}
		else
		{
			tmp[i++] = arr[left2++];
		}

	}

	//假如一个区间已经结束，另一个区间直接拿下来
	while (left1 <= right1)
	{
		tmp[i++] = arr[left1++];
	}

	while (left2 <= right2)
	{
		tmp[i++] = arr[left2++];
	}


	//把归并的数据拷贝回原数组 [begin mid]  [mid+1,end]
	// +begin 是因为可能是右子树    例：[2,3][4,5]
	//+1 是因为是左闭右闭的区间 0-9 是10个数据
	memcpy(arr + begin, tmp + begin, (end - begin + 1) * sizeof(int));

}

void MergeSort(int* arr, int n)
{
	int* tmp = (int*)malloc(sizeof(int) * n);
	if (tmp == NULL)
	{
		perror("malloc");
		exit(-1);
	}

	_MergeSort(arr, 0, n - 1, tmp);

	free(tmp);
}

Non-recursive version:

Thought: Stacks or queues cannot be used here, because stacks or queues are suitable for replacement of pre-order traversal, but the idea of merging sort belongs to post-order traversal. The characteristics of stacks and queues mean that the previous space may not be used later.

Because it is a loop here, you can design a variable gap. When gap = 1, merge them one by one. When gap = 2, merge them in pairs, gap *2 each time.

As shown in the picture:

code show as below:

void MergeSortNonR(int* arr, int n)
{
	int* tmp = (int*)malloc(sizeof(int) * n);
	if (tmp == NULL)
	{
		perror("malloc");
		exit(-1);
	}

	int gap = 1;
	while (gap < n)
	{
		for (int i = 0; i < n; i += 2 * gap)
		{
			//[i , i + gap-1]  [i + gap , i + 2*gap-1]
			int left1 = i;
			int right1 = i + gap - 1;

			int left2 = i + gap;
			int right2 = i + 2 * gap - 1;

			int j = left1;

			while (left1 <= right1 && left2 <= right2)
			{
				if (arr[left1] < arr[left2])
				{
					tmp[j++] = arr[left1++];
				}
				else
				{
					tmp[j++] = arr[left2++];
				}

			}

			while (left1 <= right1)
			{
				tmp[j++] = arr[left1++];
			}

			while (left2 <= right2)
			{
				tmp[j++] = arr[left2++];
			}

		}

		memcpy(arr, tmp, sizeof(int) * n);

		gap *= 2;
	}

	free(tmp);
}

However, the above code involves a problem, because if the data to be sorted is not a power of 2, problems will occur (it has nothing to do with the parity of the data), and it will cross the boundary .

example:

So we need to optimize the code. Optimization can be done from two aspects:

//1. After merging, copy all the data back to the original array
//Adopt the boundary correction method
//Example: If it is 9 data, the last data will continue to be merged
//Because if it is not merged, all will be copied back to the original array for the last time An array means 9 data, the first 8 are merged, and the last data copied back generates a random value because it has not been merged.

//If it crosses the boundary, correct the boundary and continue merging

code show as below:

void MergeSortNonR(int* arr, int n)
{
	int* tmp = (int*)malloc(sizeof(int) * n);
	if (tmp == NULL)
	{
		perror("malloc");
		exit(-1);
	}

	int gap = 1;
	while (gap < n)
	{
		//printf("gap=%d->", gap);

		for (int i = 0; i < n; i += 2 * gap)
		{
			//[i , i + gap-1]  [i + gap , i + 2*gap-1]
			int left1 = i;
			int right1 = i + gap - 1;

			int left2 = i + gap;
			int right2 = i + 2 * gap - 1;

			//监测是否出现越界
			//printf("[%d,%d][%d,%d]---", left1, right1, left2, right2);

			//修正边界
			if (right1 >= n)
			{
				right1 = n - 1;
				//[left2 , right2] 修正为一个不存在的区间
				left2 = n;
				right2 = n - 1;
			}
			else if (left2 >= n)
			{
				left2 = n;
				right2 = n - 1;
			}
			else if (right2 >= n)
			{
				right2 = n - 1;
			}

			//printf("[%d,%d][%d,%d]---", left1, right1, left2, right2);

			int j = left1;

			while (left1 <= right1 && left2 <= right2)
			{
				if (arr[left1] < arr[left2])
				{
					tmp[j++] = arr[left1++];
				}
				else
				{
					tmp[j++] = arr[left2++];
				}

			}

			while (left1 <= right1)
			{
				tmp[j++] = arr[left1++];
			}

			while (left2 <= right2)
			{
				tmp[j++] = arr[left2++];
			}

		}

		//printf("\n");

		memcpy(arr, tmp, sizeof(int) * n);

		gap *= 2;
	}

	free(tmp);
}

2. To merge a set of data, copy a set of data back to the original array

In this way, if it crosses the boundary, it will directly break out of the loop, and the subsequent data will not be merged.

void MergeSortNonR_2(int* arr, int n)
{
	int* tmp = (int*)malloc(sizeof(int) * n);
	if (tmp == NULL)
	{
		perror("malloc");
		exit(-1);
	}

	int gap = 1;
	while (gap < n)
	{

		for (int i = 0; i < n; i += 2 * gap)
		{
			//[i , i + gap-1]  [i + gap , i + 2*gap-1]
			int left1 = i;
			int right1 = i + gap - 1;

			int left2 = i + gap;
			int right2 = i + 2 * gap - 1;

			//right1 越界 或者 left2 越界，则不进行归并
			if (right1 >= n || left2 > n)
			{
				break;
			}
			else if (right2 >= n)
			{
				right2 = n - 1;
			}

			int m = right2 - left1 + 1;//实际归并个数

			int j = left1;

			while (left1 <= right1 && left2 <= right2)
			{
				if (arr[left1] < arr[left2])
				{
					tmp[j++] = arr[left1++];
				}
				else
				{
					tmp[j++] = arr[left2++];
				}

			}

			while (left1 <= right1)
			{
				tmp[j++] = arr[left1++];
			}

			while (left2 <= right2)
			{
				tmp[j++] = arr[left2++];
			}

			memcpy(arr+i, tmp+i, sizeof(int) * m);

		}

		gap *= 2;
	}

	free(tmp);
}

The above two methods of code are all available, the specific important thing is the idea.

Algorithm complexity and stability analysis

Stability : Assume that there are multiple records with the same keyword in the record sequence to be sorted. If sorted, the relative order of these records remains unchanged, that is, in the original sequence, r[i]=r[j] , and r[i] is before r[j] , and in the sorted sequence, r[i] is still before r[j] , then this sorting algorithm is called stable; otherwise it is called unstable.

Collection of common sorting-C language to implement data structure

Guess you like