Sorting algorithm: insertion sort (direct insertion sort, Hill sort)

Friends and guys, we meet again. In this issue, I will explain to you the relevant knowledge points about sorting algorithms. If you have some inspiration after reading it, please leave your three links. I wish you all the best thoughts. It's done!

C language column: C language: from entry to master

Data Structures Column: Data Structures

Personal homepage: stackY,

Table of contents

 

Foreword:

1. The concept of sorting and its application

1.1 The concept of sorting

1.2 Application of sorting

1.3 Common sorting algorithms

2. Implementation of sorting algorithm

2.1 Insertion sort

2.1.1 Basic idea

2.1.2 Direct insertion sort

#Complete code for direct insertion sorting:

2.1.3 Hill sort 

# presort

# Direct insertion sort

# Hill sort complete code:

3. Algorithm efficiency comparison


Foreword:

Sorting is ubiquitous, and we use the method of sorting indirectly or directly all the time in our lives. Many complicated things will become much simpler after sorting. So, at this stage, how to realize the sorting of the data structure we are learning?

1. The concept of sorting and its application

1.1 The concept of sorting

Sort by :
The so-called sorting is the operation of making a string of records arranged in ascending or descending order according to the size of one or some keywords .
Stability :
Assuming that there are multiple records with the same keyword in the record sequence to be sorted, the relative order of these records remains unchanged after sorting , that is, in the original sequence, r[i]=r[j], and r [i] is before r[j], and in the sorted sequence, r[i] is still before r[j], the sorting algorithm is said to be stable; otherwise it is called unstable.
Internal sorting : A sorting in which all data elements are placed in memory.
External sorting : Too many data elements cannot be placed in memory at the same time, and the sorting of data cannot be moved between internal and external memory according to the requirements of the sorting process.

1.2 Application of sorting

For example, the sorting method of the applications on our computer desktop:

For example, when we shop online, we can also sort the products:

For example, the 2023 China University Rankings:

 etc. are displayed by sorting, so it can be seen that sorting is widely used in any field.

1.3 Common sorting algorithms

If you want to test whether your sorting algorithm is correct or not, you can test it in this OJ link: https://leetcode.cn/problems/sort-an-array/

2. Implementation of sorting algorithm

2.1 Insertion sort

2.1.1 Basic idea

Direct insertion sorting is a simple insertion sorting method, the basic idea of ​​which is:
Insert the records to be sorted into a sorted sequence one by one according to the size of their key values, until a new sequence is obtained .
This is actually the same as we play poker, drawing cards and inserting:

2.1.2 Direct insertion sort

When inserting the i-th (i>=1) element, the previous array[0], array[1],...,array[i-1] have been sorted, and at this time use the sort code of array[i] and Array[i-1], array[i-2], ... compare the order of the sort codes, find the insertion position and insert array[i] , and the order of the elements at the original position is shifted backward.
Sort in ascending order: a simple summary is to insert tmp in the interval [0, end], and compare it from the position of end. If tmp is smaller than the element at the position of end, then move end to the back, and then compare it with the element of end-1 Then compare (the premise is: [0, end] The elements in this interval are in order).

Let's first write the basic logic of an insertion sort:

Sort in ascending order: Insert tmp in the interval [0, end], first compare the size relationship between tmp and end, if it is smaller than end, then move end to the back, and compare tmp with end-1, if it is smaller than end-1 , and then move end-1 back until tmp is larger than the element in [0, end] interval, then stop, if tmp is smaller than all the numbers in this interval, then directly put tmp in the first position That's it.

Code demo: 

//直接插入排序
//在[0,end]插入tmp
//升序
void InsertSort(int* a, int n)
{
    //一趟插入排序
	int end;
	int tmp;
    
	while (end >= 0)
	{
		if (a[end] > tmp)
		{
            //挪数据
			a[end + 1] = a[end];
			end--;
		}
		else
		{    
			break;
		}
	}
    //放数据
    //走到这里有两种情况:
    //1.tmp比a[end]大
    //2.tmp比a中的任何数都小
    a[end + 1] = tmp;
}

After completing a trip of insertion sort, how to implement sorting in the array?

The first time you take the first element of the array and put it directly, the second time you need to compare it with the elements you put in for the first time, and then arrange them in order, that is to say, we can regard the first element as In order, and then insert the following ones in order, so we can regard the inserted data as the hand cards in the hand, and the uninserted data as the cards in the deck. We insert one card each time, In this way, the array is sorted using insertion sort

#Complete code for direct insertion sorting:

//直接插入排序
//在[0,end]插入tmp
//升序
void InsertSort(int* a, int n)
{
	for (int i = 1; i < n; i++)
	{
		//一趟插入排序
		int end = i - 1;
		int tmp = a[i];

		while (end >= 0)
		{
			if (a[end] > tmp)
			{
				//挪数据
				a[end + 1] = a[end];
				end--;
			}
			else
			{
				break;
			}
		}
		//放数据
		//走到这里有两种情况:
		//1.tmp比a[end]大
		//2.tmp比a中的任何数都小
		a[end + 1] = tmp;
	}
}
Summary of the characteristics of direct insertion sort:
1. The closer the element set is to order , the higher the time efficiency of the direct insertion sorting algorithm
2. Time complexity: O(N^2) (the best case can be O(N))
3. Space complexity: O(1), it is a stable sorting algorithm
4. Stability: Stable

Comparison between direct insertion sort and bubble sort:

The speed of insertion sorting cannot be seen through the above implementation process. We can compare it with the bubble sorting we have learned before.  

The time complexity of bubble sorting is also O(N^2), so use these two algorithms to sort an array that is originally in ascending order, then even if the data does not need to be moved once, bubble sorting still needs O(N ^2), but direct insertion sort only takes O(N). So direct insertion sorting is quite fast in some special cases.

2.1.3 Hill sort 

Hill sort is also known as shrink-incremental sort

The basic idea of ​​Hill sorting method is:
1. Pre-sorting: close to order
2. Insertion sort

The overall step is to determine an integer value gap, and then the interval is gap, divide all the data into gap groups, perform insertion sorting on the gap group data, and finally perform insertion sorting as a whole. The purpose of this pre-sorting is to make the data close to each other. Ordered so as to reduce the time of the last insertion sort.
For example, to arrange: 9 8 7 6 5 4 3 2 1 0 in ascending order

# presort

Next, we will use code to demonstrate:

We don't care how to set the gap, here we first set the gap to 3

Still using the above example: 9 8 7 6 5 4 3 2 1 0 in ascending order

1. First divide this group of data into gap groups, and then insert and sort the gap group data respectively.

① Let’s do a sorting first. This time, the difference from the previous direct insertion sorting is that the next data must be saved first, because moving the data to the back will overwrite it, and then each comparison is the data with a gap interval:

Code demo:

//希尔排序
void ShellSort(int* a, int n)
{
	int gap = 3;

	//一次插入排序
	int end;
	//保存下一个位置的数据
	int tmp = a[end + gap];
	//比较、挪动
	while (end >= 0)
	{
		if (a[end] > tmp)
		{
			a[end + gap] = a[end];
			end -= gap;
		}
		else
		{
			break;
		}
	}
	a[end + gap] = tmp;
}

② Then how do we sort one of the gap groups?

Here is a problem to pay attention to. When inserting and sorting a set of data, each skip is not 1, but gap, and end cannot be greater than n - gap in the process of going backwards. If it is greater than n - gap, then tmp It will cause out-of-bounds access.

Code demo:

//希尔排序
void ShellSort(int* a, int n)
{
	int gap = 3;

	//              控制条件得注意、 步长要注意
	for (int i = 0; i < n - gap; i += gap)
	{
		//一次插入排序
		int end = i;
		//保存下一个位置的数据
		int tmp = a[end + gap];
		//比较、挪动
		while (end >= 0)
		{
			if (a[end] > tmp)
			{
				a[end + gap] = a[end];
				end -= gap;
			}
			else
			{
				break;
			}
		}
		a[end + gap] = tmp;
	}
}

③ After sorting one of the groups, the next step is to sort the gap groups separately. If we observe carefully, it is not difficult to find that when the end is 0, the first group is sorted, and when the end is 1, the first group is sorted. The second group, then we can set up another loop, with gap as the control loop condition:

Code demo:

//希尔排序
void ShellSort(int* a, int n)
{
	int gap = 3;

	//循环gap组,对gap组的数据分别进行插入排序
	for (int j = 0; j < gap; j++)
	{
		//              控制条件得注意、 步长要注意
		for (int i = j; i < n - gap; i += gap)
		{
			//一次插入排序
			int end = i;
			//保存下一个位置的数据
			int tmp = a[end + gap];
			//比较、挪动
			while (end >= 0)
			{
				if (a[end] > tmp)
				{
					a[end + gap] = a[end];
					end -= gap;
				}
				else
				{
					break;
				}
			}
			a[end + gap] = tmp;
		}
	}
}

The above code uses three layers of loops, which seems to be more complicated, but we can modify it, please pay attention to the changes in the code:

//希尔排序
void ShellSort(int* a, int n)
{
	int gap = 3;

	//对gap组数据分别进行插入排序
	for (int i = 0; i < n - gap; i++)
	{
		//一次插入排序
		int end = i;
		//保存下一个位置的数据
		int tmp = a[end + gap];
		//比较、挪动
		while (end >= 0)
		{
			if (a[end] > tmp)
			{
				a[end + gap] = a[end];
				end -= gap;
			}
			else
			{
				break;
			}
		}
		a[end + gap] = tmp;
	}
}

Many veterans here can’t understand the code after the revision. The editor will explain to you here: First of all, the time required for the above two codes is completely the same. If you use a three-layer loop, it is easier to understand. The following code is more difficult to understand. The three-layer loop uses group sorting, and the two-layer loop uses multiple groups side by side (when i == 0, the first element of the first group is sorted, and when i == 1, the sort is the first element of the second group... and so on to complete all the data), so the explanation should not be difficult to understand.

 It is voluntary here, and the efficiency of the two methods is the same, there is no difference, and you can choose according to your personal preference.

# Direct insertion sort

After inserting and sorting the gap group data respectively, it is necessary to perform the last insertion sort. The most direct way is to call the ordinary insertion sort at the end of the above code, but this is a bit too troublesome. So we need to start with gap, we can first compare insertion sort and Hill sort:

Gap determines the step size of each step here, so:

The larger the gap, the faster the large numbers can go to the back, and the faster the small numbers can go to the front, but the less orderly the data is.

The smaller the gap, the slower the movement of large numbers and small numbers, but the closer the data is to order after the movement.

When gap == 1, it is direct insertion sort.

It is more appropriate to have a gap of 3 when sorting 10 data above, but if you want to sort 10,000 data, it is a bit frustrating if the gap is still 3, so the setting of the gap should be determined according to the number of elements, and then in The gap must be 1 when sorting to the last time, in order to complete the last direct insertion sort.

# Hill sort complete code:

//希尔排序
void ShellSort(int* a, int n)
{
	//1. gap > 1:预排序
	//2. gap == 1 :直接插入排序

	int gap = n;
	while (gap > 1)
	{
		gap = gap / 3 + 1;
		//对gap组数据分别进行插入排序
		for (int i = 0; i < n - gap; i++)
		{
			//一次插入排序
			int end = i;
			//保存下一个位置的数据
			int tmp = a[end + gap];
			//比较、挪动
			while (end >= 0)
			{
				if (a[end] > tmp)
				{
					a[end + gap] = a[end];
					end -= gap;
				}
				else
				{
					break;
				}
			}
			a[end + gap] = tmp;
		}
	}
}

Summary of the characteristics of Hill sorting:
1. Hill sort is an optimization of direct insertion sort .
2. When gap > 1, it is pre-sorted, the purpose is to make the array closer to order . When gap == 1, the array is already close to order, so it will be very fast. In this way, the overall optimization effect can be achieved. We can compare performance tests later.
3. The time complexity of Hill sorting is not easy to calculate, because there are many ways to obtain the value of gap, which makes it difficult to calculate. Therefore, the time complexity of Hill sorting given in many trees is not fixed, and in some large It's mentioned in his book:
"Data Structure (C Language Edition ) " --- Yan Weimin

"Data Structure - Using Object-Oriented Method and C++ Description" --- Yin Renkun

 So, the time complexity of Hill sorting isO(N^{3/2})

4. Stability: Unstable

3. Algorithm efficiency comparison

 Here we compare the efficiency of bubble sorting, heap sorting, direct insertion sorting and Hill sorting that we have learned. Before we compare, let's review bubble sorting and heap sorting:

//冒泡排序
void BubbleSort(int* a, int n)
{
	//设置冒泡排序的趟数
	for (int i = 0; i < n - 1; i++)
	{
		//一趟冒泡排序的次数
		bool exchange = false;
		for (int j = 1; j < n - j; j++)
		{
			if (a[j] > a[j + 1])  //如果前面的一个数字大于后面的数字就交换
			{
				int tmp = a[j];
				a[j] = a[j + 1];
				a[j + 1] = tmp;
				exchange = true;
			}
		}
		if (exchange == false)
		{
			break;
		}
	}
}
//堆排序

//交换函数
void Swap(int* p1, int* p2)
{
	int tmp = *p1;
	*p1 = *p2;
	*p2 = tmp;
}

//向下调整
void AdjustDown(int* a, int n, int parent)
{
	//假设左孩子为左右孩子中最小的节点
	int child = parent * 2 + 1;

	while (child < n)  //当交换到最后一个孩子节点就停止
	{
		if (child + 1 < n  //判断是否存在右孩子
			&& a[child + 1] < a[child]) //判断假设的左孩子是否为最小的孩子
		{
			child++;   //若不符合就转化为右孩子
		}
		//判断孩子和父亲的大小关系
		if (a[child] < a[parent])
		{
			Swap(&a[child], &a[parent]);
			//更新父亲和孩子节点
			parent = child;
			child = parent * 2 + 1;
		}
		else
		{
			break;
		}
	}
}

//O(N * logN)
void HeapSort(int* a, int n)
{
	//建堆--向下调整算法建堆
	//时间复杂度为O(N)
	for (int i = ((n - 1) - 1) / 2; i >= 0; --i)
	{//       这里的n-1表示最后一个叶子节点
		//       最后一个叶子节点的父亲就是:
		//           (n-1)-1/2;
		AdjustDown(a, n, i);
	}

	//O(N * logN)
	int end = n - 1;
	while (end > 0)
	{
		//交换堆顶和最后一个数据的位置
		Swap(&a[0], &a[end]);
		//向下调整,找次小的
		AdjustDown(a, end, 0);
		end--;
	}
}

Next, we can use a piece of code to test the efficiency difference of the above four algorithms.

The idea is to randomly generate 1w data, and then use these algorithms for sorting:

Code demo:

// 测试排序的性能对比
void TestOP()
{
	srand(time(0));
	const int N = 100000;
	int* a1 = (int*)malloc(sizeof(int) * N);
	int* a2 = (int*)malloc(sizeof(int) * N);
	int* a3 = (int*)malloc(sizeof(int) * N);
	int* a4 = (int*)malloc(sizeof(int) * N);

	for (int i = 0; i < N; ++i)
	{
		a1[i] = rand();
		a2[i] = a1[i];
		a3[i] = a1[i];
		a4[i] = a1[i];
	}
	int begin1 = clock();
	InsertSort(a1, N);
	int end1 = clock();
	int begin2 = clock();
	ShellSort(a2, N);
	int end2 = clock();
	int begin3 = clock();
	BubbleSort(a3, N);
	int end3 = clock();
	int begin4 = clock();
	HeapSort(a4, N);
	int end4 = clock();
	int begin5 = clock();

	printf("InsertSort:%d\n", end1 - begin1);
	printf("ShellSort:%d\n", end2 - begin2);
	printf("BubbleSort:%d\n", end3 - begin3);
	printf("HeapSort:%d\n", end4 - begin4);

	free(a1);
	free(a2);
	free(a3);
	free(a4);
}

When testing here, it is recommended that you use the Release version, and the test effect will be better (the unit tested here is milliseconds):

Friends and guys, the good times are always short-lived. This is the end of our sharing in this issue. Don’t forget to leave your precious trilogy after reading it. Thank you for your support!

Guess you like

Origin blog.csdn.net/Yikefore/article/details/131185784