Insertion sort, selection sort, exchange sort, merge sort and non-comparison sort (C language version)

Preface

        The so-called sorting is to arrange a set of data in an increasing or decreasing manner to make this set of data orderly. Sorting is widely used in life, and is used in all walks of life. For example, when we shop online, we choose things according to a certain sorting method. So it is very important to understand the implementation of sorting.

Table of contents

1. The concept of sorting

2. Common sorting algorithms

3. Implementation of common sorting algorithms

        3.1 Insertion sort

                3.1.1 Direct insertion sort 

                3.1.2 Hill sorting 

        3.2 Selection sorting

                3.2.1 Heap sort

                3.2.2 Direct selection sorting  

        3.3 Exchange sorting

                3.3.1 Bubble sort

                3.3.2 Quick sort 

        3.4 Merge sort

                3.4.1 Merge sort 

                3.4.2 Merge sort application-external sort

        3.5 Non-comparison sorting 

4. Analysis of complexity and stability of sorting algorithm  


1. The concept of sorting

        Sorting: The so-called sorting is to make a string of records into an increasing or decreasing sequence according to a certain rule according to the size of one or some keywords.

        Stability: Assuming that there are multiple records with the same keyword in the record sequence to be sorted, the relative order of these records has not changed after sorting, that is, r[i] = r[j] in the original sequence, and r[ i] is before r[j], and if r[i] is before r[j] after sorting, then this sorting algorithm is said to be stable, otherwise it is unstable.

        Internal sorting: sorting in which all data elements are placed in memory.

        External sorting: There are too many data elements that cannot be placed in the memory at the same time. The data can be moved between internal and external memory according to the requirements of the sorting process.

2. Common sorting algorithms

         

3. Implementation of common sorting algorithms

        3.1 Insertion sort

        Basic idea: Insert the records to be sorted into an ordered sequence one by one according to their key code values, until all the sequences are inserted, and a new ordered sequence is obtained.

        In fact, when we played poker, we used the idea of ​​insertion sort.

         

                3.1.1 Direct insertion sort 

        When inserting the i-th element, the previous array[0], array[1],...,array[i-1] have been sorted. At this time, use the sort code of arryy[i] and array[i -1] ,array[i-2],array[i-3],...Compare the order of the sorting codes to find the insertion position and insert array[i], and the order of the elements at the original position will be moved backward.

        In the comparison process, if the sorting is in ascending order, it is first compared with the current value. If it is smaller than the current value, it is compared with the previous element of the current value. If it is larger than the previous element of the current value, it is inserted after the previous element of the current value. , if it is smaller than the previous element of the current value, continue to compare to the previous element until you find the appropriate position for insertion. If it is the smallest, then insert it at the beginning of the array. As shown in the picture:

void InsertSort(int* a, int n)//升序排序
{
	assert(a);
	for (int i = 0; i < n - 1; ++i)
	{
		int end = i;
		int tmp = a[end + 1];//保存数据,后面移动的时候数据会被覆盖
		while (end >= 0)
		{
			if (a[end] > tmp)
			{
				a[end + 1] = a[end];//将数据向后移动空出位置
				--end;//迭代继续向前比较
			}
			else
			{
				break;//插入到
			}
		}
		a[end + 1] = tmp;
	}
}

        Summary of features of direct insertion sort:

        1. The time complexity is O(N*2).

        2. The closer the sorted sequence is to order, the lower the time complexity.

        3. Stability: Stable.

        4. Space complexity O(1), it is a stable sort.

        Implementation code:

                3.1.2 Hill sorting 

        Hill sorting is also called reducing increment sorting method. The basic idea of ​​Hill sorting is: first select an integer, divide the records in the file to be sorted into groups, all records with a gap distance are in one group, and sort the records in each group. Then, repeat the above grouping and sorting work. When gap = 1, all records are sorted uniformly.

        To put it bluntly, Hill sorting finds a breakthrough from direct insertion sorting. Although the time complexity of direct insertion sorting is very high, the efficiency of direct insertion sorting will be very good when it is in order or close to order . So how to make it Is it close to orderly? This requires pre-sorting . The array is close to order through pre-sorting , and the last direct insertion sorting will be very fast. In this way, Hill sorting is well optimized for direct insertion sorting.

        Hill sorting is divided into two steps:

        1. Presort the array

        Divide the array into many groups according to the gap. First sort the groups with the gap so that the gaps are in order, and then reduce the gap. Repeat the above process. 

        2. Direct insertion sort

        When gap is equal to 1, it is equivalent to direct insertion sorting. Doesn’t it sound very simple, hhhh.

 

        As shown in the picture:

        

//希尔排序
void ShellSort(int* a, int n)
{
	assert(a);//确保指针不为空
	int gap = n;
	while (gap > 1)
	{
		gap = gap / 3 + 1;//保证最后一次排序的间隔是1,进过计算gap按照三分之一减少是最优的
		for (int i = 0; i < n - gap; ++i)//排升序
		{
			int end = i;
			int tmp = a[end + gap];//防止数据被覆盖
			while (end >= 0)
			{
				if (a[end] > tmp)
				{
					a[end + gap] = a[end];//移动数组,继续在前面比较
					end = end - gap;
				}
				else
				{
					break;
				}
			}
			a[end + gap] = tmp;//将数据插入到数组中
		}
		Print(a, 10);
		printf("  gap = %d\n", gap);
		printf("\n");
	}
	
}

       Summary of features of Hill sort

        1. Hill sorting is an optimization of direct insertion sorting.

        2. When the gap is greater than 1, it is pre-sorted, the purpose is to make the array closer to order. When gap == 1, the array is almost in order, so it can be quickly discharged. In this way, the overall optimization effect can be achieved. After we implement it, we can compare performance tests.

        3. The time complexity of Hill sorting is difficult to calculate, because there are many ways to take the value of gap, which makes it difficult to calculate. Therefore, the time complexity of Hill sorting is not fixed in many books.

 

        Because the gap here is calculated according to Knuth's method, it is calculated accordingly for the time being.

         

        3.2 Selection sorting

        Basic idea: Select the smallest (or largest element) from the data to be sorted each time and store it at the beginning of the sequence until the array to be sorted is sorted. 

                3.2.1 Heap sort

        See: Heap sort 

                3.2.2 Direct selection sorting  

         Sorting in ascending order: In the element set array[i]-array[n-1], select the data element with the largest (smallest) key code.

        If it is not the last (first) element in the set, swap it with the last (first) element in the set.

        Repeat the above steps in the remaining array[i]--array[n -2](array[i+1] --array[n-1]) set until there is one element left in the set.

void SelectSort(int* a, int n)
{
	assert(a);//确保a存在
	//排升序
	int left = 0;
	int right = n - 1;
	while (left < right)
	{
		int maxDex = right;
		int minDex = left;
		//遍历剩余的数组每次找出最大的和最小的将最大的换到n-1的位置,将最小的放到j位置
		for (int i = left; i <= right; ++i)
		{
			if (a[maxDex] < a[i] )
			{
				maxDex = i;//记录最大值的下标
			}
			if (a[minDex] > a[i]  )
			{
				minDex = i;//记录最小值的下标
			}
		}
		Swap(&a[minDex], &a[left]);
		if (left == maxDex)//说明最大值的下标在最左边,上一步的交换让最大值已经不是最左边,而是下标minDex
			maxDex = minDex;
		Swap(&a[maxDex], &a[right]);
		left++;
		right--;
	}
}

         Its time complexity is O(n*n).

void SelectSort(int* a, int n)
{
	assert(a);//确保a存在
	//排升序
	int left = 0;
	int right = n - 1;
	while (left < right)
	{
		int maxDex = right;
		int minDex = left;
		//遍历剩余的数组每次找出最大的和最小的将最大的换到n-1的位置,将最小的放到j位置
		for (int i = left; i <= right; ++i)
		{
			if (a[maxDex] < a[i] )
			{
				maxDex = i;//记录最大值的下标
			}
			if (a[minDex] > a[i]  )
			{
				minDex = i;//记录最小值的下标
			}
		}
		Swap(&a[minDex], &a[left]);
		if (left == maxDex)//说明最大值的下标在最左边,上一步的交换让最大值已经不是最左边,而是下标minDex
			maxDex = minDex;
		Swap(&a[maxDex], &a[right]);
		left++;
		right--;
	}
}

        3.3 Exchange sorting

                3.3.1 Bubble sort

        See  bubble sort for details

                3.3.2 Quick sort 

        See quick sort for details 

        In addition to the recursive implementation method of quick sort, there are also non-recursive implementation methods. So, how to implement quick sort through non-recursion? Let's try it together. We all know that the recursive method of implementing quick sort is implemented by calling stack frames through functions. In fact, non-recursion is also implemented by simulating the process of calling stack frames by simulating functions and through the stack of data structures.

        Although the stack of the data structure and the stack of the operating system are not the same thing, but their nature is the same (last in, first out) , how to simulate and realize it through the stack?

        Code: 

// 快速排序 非递归实现
void QuickSortNonR(int* a, int begin, int end)
{
	//创建并初始化栈
	Stack st;
	StackInit(&st);
	//将区间[left,right]入栈
	StackPush(&st, end);
	StackPush(&st, begin);
	//通过栈来模拟快排递归时的调用
	//数据结构实现的栈和操作系统的栈的特性是一样的
	while (!StackEmpty(&st))
	{
		int left = StackTop(&st);
		StackPop(&st);
		int right = StackTop(&st);
		StackPop(&st);//如栈的时候先右后左,出栈的时候先左后右
		int midi = PartSort1(a, left, right);//对子区间进行快速排序的单趟排序
		
		//将左右子区间都入栈
		if (midi + 1 < right)//右边区间至少存在一个数
		{
			StackPush(&st, right);
			StackPush(&st, midi + 1);
		}
		if (left < midi - 1)//左边区间至少存在一个数
		{
			StackPush(&st, midi - 1);
			StackPush(&st, left);
		}
	}
	StackDestory(&st);
}

        3.4 Merge sort

                3.4.1 Merge sort 

                 Merge sort is an effective sorting algorithm based on merging operations. This algorithm is a typical application of the divide and conquer method, which merges already ordered subsequences to obtain a completely ordered sequence; that is, first make each subsequence The sequences are all in order. Merging subsequences makes the entire interval in order. If two ordered lists are merged into one ordered list, it is called a two-way merge. The core steps of merge sorting are:

Code:

//单趟归并排序
void _MergeSortSignal(int *a, int begin1, int end1, int begin2, int end2, int *tmp)//闭区间
{
	int begin = begin1;//保存数组起始的位置方便拷贝
	tmp = (int*)malloc(sizeof(int) * (10 + 1));
	int i = begin1;
	while (begin1 <= end1 && begin2 <= end2)
	{
		if (a[begin1] < a[begin2])
		{
			tmp[i++] = a[begin1++];
		}
		else
		{
			tmp[i++] = a[begin2++];
		}
	}
	//将剩下的一个数组尾插到tmp
	while (begin1 <= end1)
	{
		tmp[i++] = a[begin1++];
	}
	while (begin2 <= end2)
	{
		tmp[i++] = a[begin2++];
	}
	for (int j = begin ; j <= end2; ++j)
	{
		a[j] = tmp[j];
	}

	free(tmp);

 }
// 归并排序递归实现
void _MergeSort(int* a, int left, int right, int * tmp)
{
	if (left >= right)//区间只剩下一个数
	{
		return;
	}
	int midi = (left + right) / 2;
    _MergeSort(a, left, midi, tmp);
	_MergeSort(a, midi + 1, right, tmp);
    //合并有序的小区间
	_MergeSortSignal(a, left, midi, midi + 1, right, tmp);

}
void MergeSort(int* a, int n)
{
	//int* tmp = (int*)malloc( sizeof(int) * n);
	_MergeSort(a, 0, n - 1,NULL);//闭区间[left,right]
	//free(tmp);
}

 

       Non-recursive method of merge sort: 

        If a stack is used to simulate the problem, the problem will become more complicated. From the above figure, we can easily find that merge sorting is a process of continuously reducing the range to be sorted until the range to be sorted becomes orderly. So how can it be in order? Well, it is not difficult for us to find that if there is only one number in the interval, it must be in order, so we adopt this idea and first merge adjacent consecutive numbers. At this time, the gap is one. Next time, adjacent numbers will be merged. The two numbers are already in order, then at this time, the two adjacent sub-intervals with an interval length of 2 should be merged into an ordered interval. At this time, gap = 2, and so on, just increase the gap. When will it end? Well, until the gap is greater than or equal to the length of the array, the array must be in order. At this time, there is no point in merging.

       Note: When dividing sub-intervals for merging, it is possible that the length of the second interval is less than the length of the first interval, or the second interval does not exist, so it is necessary to pay attention to correcting the boundary of the second interval or only the second interval. When it is a sub-interval, there is no need to merge sort this time.

//将两个有序小区间合并为一个
void _MergeSortSignal(int *a, int begin1, int end1, int begin2, int end2, int *tmp)//闭区间
{
	int begin = begin1;//保存数组起始的位置方便拷贝
	tmp = (int*)malloc(sizeof(int) * (10 + 1));
	int i = begin1;
	while (begin1 <= end1 && begin2 <= end2)
	{
		if (a[begin1] < a[begin2])
		{
			tmp[i++] = a[begin1++];
		}
		else
		{
			tmp[i++] = a[begin2++];
		}
	}
	//将剩下的一个数组尾插到tmp
	while (begin1 <= end1)
	{
		tmp[i++] = a[begin1++];
	}
	while (begin2 <= end2)
	{
		tmp[i++] = a[begin2++];
	}
	for (int j = begin ; j <= end2; ++j)
	{
		a[j] = tmp[j];
	}

	free(tmp);

 }
// 归并排序非递归实现
void MergeSortNonR(int* a, int n)
{
	//实现思路:这里如果借助栈来模拟会将问题变得复杂起来,所以可以采取循环的方式
	//直接归并,第一次是相邻的两个数归并,这时候gap为1,第二次gap为而就是区间[i,i+gap-1] 和区间[i+gap,i+2*gap -1]进行插入排序,依次类推
	//直到gap不小于数组的长度就结束
	int gap = 1;
	while (gap < n)
	{
		//单趟归并排序
		for (int i = 0; i < n;++i)
		{
			//采用闭区间
			//[i,i+gap-1] 和[i+gap,i+2*gap]
			int begin1 = i, end1 = i + gap - 1;
			int begin2 = i + gap, end2 = i + 2 * gap - 1;

			//调用将两个数组合并成一个数组的函数
			if (begin2 >= n)
			{
				break;//说明要排序第二组不存在,只有第一组,本次不需要再排
			}
			if (end2 >= n)
			{
				//需要修正第二组的边界
				end2 = n - 1;
			}
			_MergeSortSignal(a, begin1, end1, begin2, end2, NULL);
		}
		gap *= 2;
	}
}

 

               3.4.2 Merge sort application-external sort

           Merge sort is different from other sorts. Other sorts are suitable for sorting in memory, but merge sort can not only sort in memory. When there is a lot of data, it cannot fit in the memory and can only be stored in files. At this time, other sorting is not very easy to use, but merge sort can sort the data and does it in the file, so merge sort is also an external sort.

        Now simulate a scenario, assuming that there is a massive amount of data that cannot be loaded into the memory at one time. Now please write a program to sort the data and save the results in a file.

        The idea is:

        1. First, we need to divide the data into many parts, and each divided part can be loaded into the memory at one time and sorted.

        2. Store the divided data into sub-files at once, and use quick sort to make the data orderly.

        3. At this point, the prerequisites for merge sorting have been met, and each subsequence is in order. At this time, we only need to read the data in the two files each time, compare them and merge them into a new file, and proceed in the same way until Finally, all ordered sub-intervals are merged into one file. At this time, the data in this file are all ordered.

        

Code:

//将两个文件中的有序数据合并到一个文件中并且保持有序
void _MergeFile(const char* file1, const char* file2, const char* mfile)
{
	FILE* fout1 = fopen(file1, "r");
	if (fout1 == NULL)
	{
		printf("fout1打开文件失败\n");
		exit(-1);
	}
	FILE* fout2 = fopen(file2, "r");
	if (fout2 == NULL)
	{
		printf("fout2打开文件失败\n");
		exit(-1);
	}
	FILE* fin = fopen(mfile, "w");
	if(fin == NULL)
	{
		printf("fin打开文件失败\n");
		exit(-1);
	}
	int num1, num2;
	int ret1 = fscanf(fout1, "%d\n", &num1);
	int ret2 = fscanf(fout2, "%d\n", &num2);
	//在文件中读数据进行归并排序
	while (ret1 != EOF && ret2 != EOF)
	{
		if (num1 < num2)
		{
			fprintf(fin, "%d\n", num1);
			//再去fout1所指的文件中读取数据
			ret1 = fscanf(fout1, "%d\n", &num1);
		}
		else
		{
			fprintf(fin, "%d\n", num2);
			//再去fout2所指的文件中读取数据
			ret2 = fscanf(fout2, "%d\n", &num2);
		}
	}
	while (ret1 != EOF)
	{
		fprintf(fin, "%d\n", num1);
		
		ret1 = fscanf(fout1, "%d\n", &num1);
	}
	while (ret2 != EOF)
	{
		fprintf(fin, "%d\n", num2);
		
		ret2 = fscanf(fout2, "%d\n", &num2);
	}
	fclose(fout1);
	fclose(fout2);
	fclose(fin);
}
void MergeSortFile(const char* file)//文件归并排序
{
	//打开文件
	FILE* fout = fopen(file, "r");
	if (fout == NULL)
	{
		printf("打开文件失败\n");
		exit(-1);
	}
	int n = 10;
	int a[10] = { 0 };
	char subr[1024] ;
	/*memset(subr, 0, 1024);
	memset(a, 0, sizeof(int) * n);*/

	int num = 0;
	int i = 0;
	int fileI = 1;
	while (fscanf(fout, "%d\n",&num )!=EOF)
	{
		if (i < n - 1)
		{
			a[i++] = num;
		} 
		else
		{
			a[i] = num;
			QuickSort(a, 0, n - 1);//对内存中的数据进行排序
			sprintf(subr, "%d", fileI++);
			FILE* fin = fopen(subr, "w");
			if (fin == NULL)
			{
				printf("打开文件失败\n");
				exit(-1);
			}
			//写数据到文件中
			for (int j = 0; j < n; ++j)
			{
				fprintf(fin, "%d\n", a[j]);
			}
			//关闭文件
			i = 0;//置零对下一组数据进行操作
			/*memset(subr, 0, 1024);
			memset(a, 0, sizeof(int) * n);*/
			fclose(fin);
		}
		
	}

	//外排序
	//利用互相归并到文件中,实现整体有序

	char file1[100] = "1";
	char file2[100];
	char mfile[100] = "12";
	for (int i = 2; i <= n; ++i)
	{
		sprintf(file2, "%d", i);
		//读取FIle和file2,进行归并排序出mfile
		_MergeFile(file1, file2, mfile);

		strcpy(file1,mfile);
		sprintf(mfile, "%s%d", mfile, i + 1);
	}
	fclose(fout);
	return NULL;
}

 

        3.5 Non-comparison sorting 

        As the name suggests, non-comparison sorting can be sorted without comparing elements. What is introduced here is counting sorting

Counting sorting, also known as the pigeonhole principle, is a deformed application of the hash direct value method. Steps:

        1. Count the number of times the same element appears

        2. Recycle the sequence into the original sequence based on the statistical results

 

Code:

// 计数排序
void CountSort(int* a, int n)
{
	//先遍历数组,找出最大值和最小值用来确定范围
	int max = a[0];
	int min = a[0];
	for (int i = 0; i < n; ++i)
	{
		if (max < a[i])
		{
			max = a[i];
		}
		if (min > a[i])
		{
			min = a[i];
		}
	}
	//然后根据最大值和最小值的范围开辟空间
	int range = max - min + 1;
	int* CountArray = (int*)calloc(sizeof(int), range);
	//统计原数组中每个数出现的次数
	for (int i = 0; i < n; ++i)
	{
		CountArray[ a[i] - min ] ++ ;//利用相对位置来计算数据出现的个数
	}
	/*Print(CountArray, 9);
	printf("\n");*/

	//将临时数组中的数,覆盖到原数组中
	int j = 0;
	for (int i = 0; i < range; ++i)
	{
		while (CountArray[ i ]--)
		{
			a[j++ ] = i + min;//将每个数据从临时数组中拿出来加上相对数据min,然后对数组进行覆盖
		}
	}
	//释放临时开辟的空间
	free(CountArray);
}

 

        Summary of features of counting sort:

  1. Counting sorting is highly efficient when the data range is concentrated, but its scope of application and scenarios are limited.

  2. Time complexity O(max(N, range))

  3. Space complexity O (range)

   

4. Analysis of complexity and stability of sorting algorithm  

         

        What is stability? Generally speaking, it means that the relative position of the same elements in the array does not change after sorting . So what is the impact of stability or instability? It has an impact in some special scenarios. For example, in an exam, awards should be given to the top three. How to determine the top three? For example, the results of the top five are: 99 98 97 97 97. In these cases, it is impossible to directly determine the top three, because the results of third, fourth, and fifth are the same, so there is another rule at this time. The rule is that if the results in this game are the same, the one with the shorter time will be in front. In this case, the top three can be determined through stable sorting and it is fair to everyone, but if it is an unstable sorting, the result is unfair. 

        Selection sorting is unstable. If multiple identical maximum values ​​appear in a set of sequences, which one to choose is a problem, or as shown below: 

        Heap sorting is also unstable. It needs to be adjusted downward when building a heap or selecting numbers. When adjusting downward, the relative order of the same elements may change, as shown in the figure: 

         Quick sort is also unstable, because when a benchmark is selected for comparison, the relative order may change.        

        Hill sorting is also unstable, because during pre-sorting, the same numbers may be divided into different groups, so their relative order cannot be guaranteed. Because the counting sort counts the number of occurrences of each element in the original array, there is no way to guarantee the relative position of the same element. 

Guess you like

Origin blog.csdn.net/m0_68641696/article/details/132574459