The implementation of Hill sort allows you to improve the slow speed of inline sorting

Author's homepage: paper jie's blog_CSDN blog - C language, algorithm detailed field blogger

Author of this article: Hello everyone, I am paper jie, thank you for reading this article, welcome to build three companies.

This article is included in the column "Detailed Algorithm Explanation", which is carefully crafted for college students and beginners in programming. The author spent a lot of money (time and energy) to build it, and collected all the basic knowledge of algorithms, hoping to help readers.

Other columns: "System Analysis of C Language", "C Language", "C Language - Grammar"

Content sharing: In this issue, we will give a detailed explanation of the Hill sort among the eight major sorts. Grandpas, please move the small benches and sit down.

    -------- Don't need 998, don't want 98, just one button for three times, you can't lose money if you buy three times, you can't be fooled

Table of contents

foreword

What is Hill sort

development path

Implementation of Hill sort

basic idea

specific code

The principle of Hill sorting

Why Hill sort is better than insertion sort

Advantages and disadvantages of Hill sort


foreword

In the last issue, we introduced direct insertion sort, and analyzed its basic idea and complexity. In the process, we found that although direct insertion sort is a stable algorithm, its time complexity is too slow. At this time, we will launch the optimized version of the direct insertion sort that we are going to talk about in Jintian - Hill sort

What is Hill sort

We can find it on Baidu Encyclopedia: Shell's Sort is a kind of "Diminishing Increment Sort", also known as "Diminishing Increment Sort", which is a more efficient improved version of the direct insertion sort algorithm . Hill sort is an unstable sorting algorithm. This method is named after DLShell proposed in 1959. Hill sorting is to group records according to a certain increment, and use the direct insertion sorting algorithm to sort each group; as the increment gradually decreases, each group contains more and more keywords. When the increment is reduced to 1, the entire The algorithm terminates just as soon as the files are grouped.

development path

Hill sorting is named after its designer Hill (Donald Shell), and the algorithm was described in the paper "A high-speed sorting procedure" published by Hill in 1959.

Hill sorting is an improved method based on the following two properties of insertion sorting:

  1. Insertion sorting is highly efficient when operating on almost sorted data, that is, it can achieve the efficiency of linear sorting.

  2. But insertion sort is generally inefficient because insertion sort can only move data one bit at a time.

In 1961, Marlene Metzner Norton (Marlene Metzner Norton), a female programmer of IBM,   implemented the Hill sorting algorithm for the first time using FORTRAN language programming. In its program, a simple and effective method is used to set the increment sequence required for Hill sorting: the first increment is half of the number of records to be sorted, and then halved successively, and the last increment is 1. The algorithm came to be known as the Shell-Metzner algorithm, and Metzner himself said in a 2003 email: "I did nothing for this algorithm, and my name should not appear in the name of the algorithm."

Implementation of Hill sort

basic idea

1 First take a number d1 with a length of len/2 as the first increment, and then group all the elements of the array to be sorted.

2 All elements whose distance is a multiple of d1 are placed in the same group.

3 Insertion sort is then performed within each group.

4 Then, take the second increment d2=d1/2, d3=d2/2... Repeat the above operations for each increment, until it becomes 1 at last.

specific code

#include <stdio.h>
void ShellSort(int* arr, int len)
{
	//gap是步长
	int gap = len / 2;
	//每一次都折半,直到最后为1,就成一个数组了
	for (; gap > 0; gap /= 2)
	{
		int i = 0;
		//这里共有gap个被分解的数组 分别对它们直接插入排序
		for (i = 0; i < gap; i++)
		{
			int j = 0;
			//这里是一个数组中的元素
			for (j =i + gap; j < len; j += gap)
			{
				//如果需比较的元素小于前一个元素就进入
				if (arr[j] < arr[j - gap])
				{
					//用哨兵tmp将需比较元素存储起来
					int tmp = arr[j];
					//k就是带比较的前一个元素的下标
					int k = j - gap;
					//k就是限制范围,不能超出数组
					//arr[k]>tmp 就进行交换
					while (k >= 0 && arr[k]>tmp)
					{
						arr[k + gap] = arr[k];
						//减去gap就到了前一个元素
						//为下一次需比较的元素比较做准备
						k -= gap;
					} 
					//比较完后将tmp放入它合适的位子
					//因为上面-gap,所以需要+回来
					arr[k + gap] = tmp;
				}
			}
		}
	}
}
int main()
{
	//需处理的数组
	int arr[] = {10, 9,8,7,6,5,4,3,2,1 };
	//希尔排序
	ShellSort(arr, sizeof(arr) / sizeof(arr[0]));
	//打印
	int i = 0;
	for (i = 0; i < sizeof(arr) / sizeof(arr[0]); i++)
	{
		printf("%d ", arr[i]);
	}
	return 0;
}

The principle of Hill sorting

This method is essentially a grouping insertion method

Comparing numbers that are separated by a greater distance (called an increment) so that the numbers can be moved across multiple elements, a single comparison may eliminate multiple element swaps. The algorithm first divides a group of numbers to be sorted into several groups by a certain increment d, and the subscripts recorded in each group differ by d. Sort all the elements in each group, and then group them with a smaller increment , and sort them in each group. When the increment is reduced to 1, the entire number to be sorted is divided into one group, and the sorting is completed. Generally, half of the sequence is taken as an increment for the first time, and then halved each time until the increment is 1.

Let's take the array in the code as an example:

We found that the array has 10 elements. Our first increment is half of the element gap=len/2=5. Then we group elements that are multiples of 5, here is 10 5, 9 4, 8 3, 7 2, 6 1 are respectively a group, and then insert sort. That is 5 4 3 2 1 10 9 8 7 6 Here we should pay attention to their relative positions after sorting. Let's sort for the second time: gap=5/2=2, here is 5 3 2 1 9, 4 2 10 8 6 are a group respectively, and then insert sort. That's 1 2 3 4 5 6 7 8 9 10. The third sorting: At this time gap=2/2=1, there is only one array, just insert and sort directly, and the final result is 1 2 3 4 5 6 7 8 9 10 The array array here is a bit special. In this way, we have completed a Hill sort.

Why Hill sort is better than insertion sort

Hill sorting is to insert and sort elements according to different step lengths. When the elements are out of order at the beginning, the step length is the largest, so the number of elements inserted into the sort is very small and the speed is very fast; when the elements are basically ordered, the step size The length is small, and insertion sort is very efficient for ordered sequences. Therefore, the time complexity of Hill sorting will be better than o(n^2).

The reason why the time performance of Hill sort is better than direct insertion sort:

①When the initial state of the array is basically ordered, the number of comparisons and moves required for direct insertion sorting is less.

②When the value of n is small, the difference between n and is also small, that is, the best complexity O(n) of direct insertion sorting and the worst time complexity 0() are not much different.

③At the beginning of Hill sorting, the increment is large, there are many groups, and the number of records in each group is small, so the direct insertion in each group is faster. Later, the increment di gradually decreases, the number of groups gradually decreases, and the number of records in each group Gradually increasing, but because the files have been sorted according to di-1 as the distance, the files are closer to the ordered state, so the new sorting process is also faster.

Therefore, Hill sort has a greater improvement in efficiency than direct insertion sort.

Advantages and disadvantages of Hill sort

It requires a lot of auxiliary space and is as easy to implement as merge sort. Hill sorting is an algorithm based on insertion sorting. A new feature is added to this algorithm to improve efficiency. The time complexity of Hill sorting time is O(n3/2), and the lower bound of Hill sorting time complexity is n*log2n. Hill sorting is not as fast as quick sorting O(n(logn)), so medium-sized scales perform well, but it is not the best choice for very large-scale data sorting. But it is much faster than the algorithm with O(n^2) complexity, that is, direct insertion sort. And Hill sort is very easy to implement, and the algorithm code is short and simple. In addition, the performance efficiency of Hill's algorithm in the worst case is not much different from that in the average case, and at the same time, the efficiency of quick sort in the worst case will be very poor. Therefore, it is recommended that almost any sorting work can use Hill sorting at the beginning. If it proves that it is not fast enough in actual use, it can be changed to a more advanced sorting algorithm such as quick sorting. Essentially, the Hill sorting algorithm is a direct insertion sort An improvement to the algorithm that reduces the number of copies it makes is much faster. The reason is that when the value of n is large, the number of data items that need to be moved for each sorting is very small, but the distance of the data items is very long. When the value of n decreases, the data that needs to be moved in each pass increases, and at this time it is close to their final position after sorting. It is the combination of these two situations that makes Hill sort much more efficient than insertion sort.


Guess you like

Origin blog.csdn.net/paperjie/article/details/131395993