Sorting articles: direct insertion, Hill, direct selection and heap sorting (C language)

Table of contents

Foreword:

One: insertion sort

(1) Direct insertion sort

Basic ideas (just have an impression, mainly look at a single trip)

single pass sort

full sort

Time Complexity Analysis

(2) Hill sort

Basic ideas (just have an impression, mainly look at a single trip)

single pass sort

full sort

Time Complexity Analysis

Two: selection sort

(1) Direct selection sort

basic idea

single pass sort

full sort

Time Complexity Analysis

(2) Heap sort

Three: Performance testing


Foreword:

 The importance of sorting:

1. Data storage and retrieval: For large-scale data storage and retrieval, sorting can improve retrieval speed. For example, when we need to search for data in a certain range in the database, if the range has been sorted, we can use the binary search algorithm to search quickly instead of traversing elements one by one.

2. Data analysis: Sorting is also the basis of many statistical analysis algorithms, machine learning algorithms, and data processing algorithms. For example, in statistical analysis, we need to sort a large amount of data to find the median or calculate quantiles.

3. Algorithm optimization: Sorting algorithm optimization has always been a research hotspot in the field of computer science. An efficient sorting algorithm can shorten the running time of the program and improve the performance of the computer system.

4. Practicality: Sorting algorithms are widely used in many real-life situations, such as the classification of books by libraries, the classification of mail by mail systems, and the rearrangement of commodities by shopping malls.

The sorting in this article is in ascending order, descending order and vice versa

One: insertion sort

(1) Direct insertion sort

Basic ideas ( just have an impression, mainly look at a single trip )

1. Divide the sequence to be sorted into a sorted interval and an unsorted interval . At the beginning, the sorted interval only contains the first element, that is, the first element of the sequence.

2. For the unsorted elements, compare them with the elements in the sorted interval in sequence, find their insertion position in the sorted interval , and insert the element into the corresponding position in the sorted interval.

3. Repeat step 2 until all elements have been traversed.

single pass sort

Let's look at the following ordered array a:

If we add a number 5 to the end, how should it be adjusted into an ordered array?

 By observation we know that 5 should be inserted after 4

① We can design a variable end to record the position of the subscript at the end of the original array

②If a[end] is greater than 5, we use the variable x to record 5, then 10 covers the position of 5 backwards, and end minus 1

③Repeat the previous step until you find a number smaller than 5 or end goes to the subscript -1 position (at this time, the entire array is moved backwards), and the loop ends

④Insert 5 into the position where the subscript is end+1 (whether the jump is found or not found, this is the position)

Illustration:

Code (single trip):

    //单趟只是代表了核心思想
    int end;
	int x = a[end + 1];
	//如果end < 0就代表这个数据是最小的
	while (end >= 0)
	{
		//如果大于就向后覆盖,用x保存
		if (a[end] > x)
		{
			a[end + 1] = a[end];
			end--;
		}
		//如果小于就确定了插入位置
		else
		{
			break;
		}
	}
	a[end + 1] = x;

full sort

The idea of ​​complete sorting is to record the end of the array subscript starting from 0

First make the first two numbers in order, then end+1

Then make the first three numbers in order, end+1

Repeat the above steps until the entire array is sorted

Illustration:

code:

//直接插入排序
void InsertSort(int* a, int n)
{
	//断言,不能传空指针
	assert(a);

	for (int i = 0; i < n - 1; i++)
	{
		int end = i;
		int x = a[end + 1];
		//如果end < 0就代表这个数据是最小的
		while (end >= 0)
		{
			//如果大于就向后覆盖,用x保存
			if (a[end] > x)
			{
				a[end + 1] = a[end];
				end--;
			}
			//如果小于就确定了插入位置
			else
			{
				break;
			}
		}
		a[end + 1] = x;
	}
}

Time Complexity Analysis

①The best case of direct insertion sorting is that the input sequence is already ordered, and the time complexity is O(N) at this time . In the worst case, the input sequence is in reverse order, and the time complexity is O(N^2) at this time . The average time complexity is also O(N^2)

We can analyze its time complexity through mathematical methods. Assuming that the length of the sequence to be sorted is N, the outer loop will be executed N-1 times, and the inner loop will start from the second item and compare item by item until the insertion position of the element is found or the first item is found. Layer loops need to compare half of the elements on average, that is, need to compare N/2 times. So the total time complexity is: T(N) = (N-1)*(N/2) = (N^2-N)/2 = O(N^2)

③It can be seen that the time complexity of direct insertion sorting is O(N^2), and the space complexity is O(1). It is easy to understand and easy to implement, especially suitable for sorting decimal columns. However, for large-scale data sets, the performance of insertion sorting is somewhat unsatisfactory, and a more efficient sorting algorithm needs to be adopted.

(2) Hill sort

Hill sort is an optimization of direct insertion sort

Basic ideas (just have an impression, mainly look at a single trip)

Select an integer gap first , and divide all the data in the array to be sorted into gap groups
②All the data with a distance of gap are divided into the same group , and the data in each group is sorted (this process is called pre-sorting, which can make the array close to order ).
Then, take different gaps and repeat the above grouping and sorting work . When gap=1 is reached, sorting is performed to complete the sorting of the entire array ( sorting when gap is 1 is equivalent to direct insertion sorting ).

single pass sort

For the first time, we take the gap as 3, and group the following array

The idea of ​​sorting each group here is basically the same as the previous direct insertion sort

It’s just that the amount of change of end has changed from 1 to gap, the data to be inserted has changed from a[end+1] to a[end+gap], and the subscript position to be inserted has changed from end+1 to end+gap

There are two implementations here (the essence is the same)

① We add an extra layer of loops to the outer layer, and sort them group by group (the number of groups is gap)

Illustration:

row red group

Arrange the green group (the blue group is already in order, only judge)

 

code:

​
    //单趟排序只是核心思路
	gap = 3;
	//最外层——分成几组预排序
	for (i = 0; i < gap; i++)
	{
		//进行一组的预排序
		for (j = 0; j < n - gap; j += gap)
		{
			int end = j;
			int x = a[end + gap];
			while (end >= 0)
			{
				if (a[end] > x)
				{
					a[end + gap] = a[end];
					end -= gap;
				}
				else
				{
					break;
				}
				a[end + gap] = x;
			}
		}
	}

​

② No need to group, just sort one by one from left to right

code:

    gap = 3;
	//进行预排序
	for (i = 0; i < n - gap; i++)
	{
		int end = i;
		int x = a[end + gap];
		while (end >= 0)
		{
			if (a[end] > x)
			{
				a[end + gap] = a[end];
				end -= gap;
			}
			else
			{
				break;
			}
			a[end + gap] = x;
		}
	}

full sort

The main point of complete sorting is to control the size of the gap for multiple pre-sorting (to ensure that the last loop gap is 1)

①gap = n (n is the number of array elements), gap = gap/2

②gap = n (n is the number of array elements), gap = gap/3 + 1

(There is less pre-rowing, and the gap can be guaranteed to be 1 in the last cycle)

Code (used in ② in this article):

//希尔排序
void ShellSort(int* a, int n)
{
	int gap = n;

	第一种写法,循环层数多一层
	用来控制循环
	//int i = 0;
	//int j = 0;
	//
	//while (gap > 1)
	//{
	//	gap = gap / 3 + 1;
	//	//最外层——分成几组预排序
	//	for (i = 0; i < gap; i++)
	//	{
	//		//进行一组的预排序
	//		for (j = 0; j < n - gap; j += gap)
	//		{
	//			int end = j;
	//			int x = a[end + gap];
	//			while (end >= 0)
	//			{
	//				if (a[end] > x)
	//				{
	//					a[end + gap] = a[end];
	//					end -= gap;
	//				}
	//				else
	//				{
	//					break;
	//				}
	//				a[end + gap] = x;
	//			}
	//		}
	//	}
	//}

	//第二种写法,一锅炖,比较简洁
	//用来控制循环
	int i = 0;
	//最外层——分成gap组预排序
	while (gap > 1)
	{
		gap = gap / 3 + 1;
		//进行预排序
		for (i = 0; i < n - gap; i++)
		{
			int end = i;
			int x = a[end + gap];
			while (end >= 0)
			{
				if (a[end] > x)
				{
					a[end + gap] = a[end];
					end -= gap;
				}
				else
				{
					break;
				}
				a[end + gap] = x;
			}
		}
	}
}

Time Complexity Analysis

The time complexity of Hill sort has a lot to do with the selection of the incremental sequence (gap).

At present, there is no method for selecting an incremental sequence that can make the time complexity of Hill sorting better than O(N*logN) , but experiments show that in most cases, when the incremental sequence is selected as the Knuth sequence (gap = gap/3 +1), the time complexity of Hill sorting is about O(N^1.3) . This time complexity is better than O(N^2) of insertion sort, but not as good as quick sort and merge sort.

In theory, the worst-case time complexity of Hill sort is O(N^2), which is the same as insertion sort. The best-case time complexity is related to the selection of the incremental sequence.

Two: selection sort

(1) Direct selection sort

basic idea

Select the smallest (or largest) element from the data elements to be sorted each time, and store it at the beginning of the sequence until all the data elements to be sorted are exhausted.

single pass sort

The idea of ​​single-pass sorting is relatively simple. We can optimize it a little bit. Find the maximum and minimum at the same time in one traversal, which can double the efficiency.

Illustration:

Then when we exchange here, we will find that there is a problem, that is, end and mini (minimum subscript) overlap

The positions of the minimum value and maximum value have been exchanged. At this time, mini should be adjusted to maxi


code:

 //后续都会用到交换,封装成函数
void swap(int* x, int* y)
{
	int tmp = *x;
	*x = *y;
	*y = tmp;
}   

    int begin = 0;
    int end = n - 1;
	int maxi = begin;
	int mini = begin;

	for (int i = begin; i <= end; i++)
	{
		if (a[mini] > a[i])
		{
			mini = i;
		}
		if (a[maxi] < a[i])
		{
			maxi = i;
		}
	}

	swap(&a[maxi], &a[end]);
	// 如果end跟mini重叠,需要修正一下mini的位置
	if (mini == end)
		mini = maxi;
	swap(&a[mini], &a[begin]);

	end--;
	begin++;

full sort

The point of complete sorting is that the range of end and begin should shrink after a traversal is completed (end-1,begin+1)

Sorting is complete when end and begin are the same

Illustration:

code:

//后续都会用到交换,封装成函数
void swap(int* x, int* y)
{
	int tmp = *x;
	*x = *y;
	*y = tmp;
}

//选择排序
void SelectSort(int* a, int n)
{
	int begin = 0;
	int end = n - 1;
	while (end > begin)
	{
		int maxi = begin;
		int mini = begin;

		for (int i = begin; i <= end; i++)
		{
			if (a[mini] > a[i])
			{
				mini = i;
			}
			if (a[maxi] < a[i])
			{
				maxi = i;
			}
		}

		swap(&a[maxi], &a[end]);
		// 如果end跟mini重叠,需要修正一下mini的位置
		if (mini == end)
			mini = maxi;
		swap(&a[mini], &a[begin]);

		end--;
		begin++;
	}
}

Time Complexity Analysis

The time complexity of direct selection sort is O(N^2), where N is the length of the array to be sorted.

In direct selection sort, each element needs to be compared with the remaining unsorted elements

So the number of comparisons is N+(N−1)+(N−2)+⋯+1

That is, N-1 comparisons are required in the first round, and N-2 comparisons are required in the second round.

By analogy, the last round requires 1 comparison. The sum of this arithmetic sequence is N*(N-1)/2​, so the total number of comparisons is O(N^2).

In addition, direct selection sorting requires N exchange operations, because each time the minimum element is found, it needs to be exchanged with the current position. However, the time complexity of the exchange operation is relatively low compared to the comparison operation, which is O(N), so it does not affect the overall time complexity.

In general, although direct selection sorting is relatively simple, its time complexity is not good enough. For large-scale data collections, the efficiency will be relatively low.

(2) Heap sort

The implementation of heap sorting and time complexity analysis have been relatively clear in the previous issue

Previous link: https://blog.csdn.net/2301_76269963/article/details/130157994?spm=1001.2014.3001.5501

Here is a brief introduction to the implementation idea:

1. Build a heap : Build a heap from the array to be sorted. You can use the binary heap construction algorithm to adjust downwards from the last element with child nodes so that it meets the properties of the heap.

2. Find the top element of the heap: The top element of the heap is the largest element (or the smallest element), take it out and put it at the end of the sorted array.

3. Rebuild the maximum heap : Rebuild the remaining unsorted elements into a maximum heap (or minimum heap), and repeat the above process until all elements are sorted.

code:

//后续都会用到交换,封装成函数
void swap(int* x, int* y)
{
	int tmp = *x;
	*x = *y;
	*y = tmp;
}

//向下调整
void AdjustDwon(int* a, int n, int parent)
{
	//左孩子
	int child = parent*2 + 1;

	while (child < n)
	{
		//小堆只需要改变符号就行
		//选出左右孩子中最大的一方
		//要考虑右孩子不存在的情况
		if (child+1 < n && a[child+1] > a[child])
		{
			child++;
		}

		if (a[child] > a[parent])
		{
			//交换
			swap(&a[child], &a[parent]);
			//迭代
			parent = child;
			child = parent * 2 + 1;
		}
		//如果孩子没有大于父亲,结束循环
		else
		{
			break;
		}
	}
}

//堆排序
void HeapSort(int* a, int n)
{
	int parent = (n-1-1)/2;
	//建堆
	while (parent >= 0)
	{
		AdjustDwon(a, n, parent);
		parent--;
	}
	
	//排序
	int end = n - 1;
	while (end > 0)
	{
		swap(&a[end], &a[0]);
		AdjustDwon(a, end, 0);
		end--;
	}
}

Three: Performance testing

code:

// 测试排序的性能对比
void TestOP()
{
	srand(time(0));
	//随机生成十万个数
	const int N = 100000;
	int* a1 = (int*)malloc(sizeof(int) * N);
	int* a2 = (int*)malloc(sizeof(int) * N);
	int* a3 = (int*)malloc(sizeof(int) * N);
	int* a4 = (int*)malloc(sizeof(int) * N);
	//5和6是给快速排序和归并排序的
	/*int* a5 = (int*)malloc(sizeof(int) * N);
	int* a6 = (int*)malloc(sizeof(int) * N);*/

	if (a1==NULL || a2==NULL )
	{
		printf("malloc error\n");
		exit(-1);
	}
	if (a3 == NULL || a4 == NULL)
	{
		printf("malloc error\n");
		exit(-1);
	}
	/*if (a5 == NULL || a6 == NULL)
	{
		printf("malloc error\n");
		exit(-1);
	}*/

	for (int i = 0; i < N; ++i)
	{
		a1[i] = rand();
		a2[i] = a1[i];
		a3[i] = a1[i];
		a4[i] = a1[i];
		/*a5[i] = a1[i];
		a6[i] = a1[i];*/
	}

	//clock函数可以获取当前程序时间
	int begin1 = clock();
	InsertSort(a1, N);
	int end1 = clock();

	int begin2 = clock();
	ShellSort(a2, N);
	int end2 = clock();

	int begin3 = clock();
	SelectSort(a3, N);
	int end3 = clock();

	int begin4 = clock();
	HeapSort(a4, N);
	int end4 = clock();

	/*int begin5 = clock();
	QuickSort(a5, 0, N - 1);
	int end5 = clock();*/

	/*int begin6 = clock();
	MergeSort(a6, N);
	int end6 = clock();*/
	
	printf("InsertSort:%d\n", end1 - begin1);
	printf("ShellSort:%d\n", end2 - begin2);
	printf("SelectSort:%d\n", end3 - begin3);
	printf("HeapSort:%d\n", end4 - begin4);
	//printf("QuickSort:%d\n", end5 - begin5);
	//printf("MergeSort:%d\n", end6 - begin6);

	free(a1);
	free(a2);
	free(a3);
	free(a4);
	//free(a5);
	//free(a6);
}


int main()
{
    //测试效率
	TestOP();
}




Efficiency comparison:

Guess you like

Origin blog.csdn.net/2301_76269963/article/details/130458617