Insertion sort + Hill sort of eight sorting algorithms

Table of contents

1. Preface (overall introduction)

About insertion sort

 About Hill sorting:

2. Insertion sort

Function header:

Algorithm ideas:

Analysis of Algorithms

Insertion sort code implementation:

Prelude to the optimization of the insertion sort algorithm: 

3. Hill sorting (shrinking incremental sorting)

1. Algorithm idea: 

2. Algorithm split analysis 

sequence grouping

Group presort:

Another implementation of group presorting:

Implementation idea of ​​Hill sorting (here adopts Knuth implementation method)

Analysis on the exponential decline of gap:

4. Time complexity analysis of Hill sorting

Time complexity of group presorting

In the process of exponentially decreasing gap, the total complexity analysis of multiple group pre-sorting:

Explanation of the complexity of Hill sorting in the classic data structure bibliography: 

5. Measured and compared the time efficiency of Hill sorting and ordinary insertion sorting


1. Preface (overall introduction)

About insertion sort

  • The idea of ​​insertion sorting: Insert each element one by one into the specified position in the already ordered sequence
  • The idea of ​​insertion sort is very similar to the way we draw cards one by one and arrange the order of cards when we play poker:
  • There is also a gif closer to the algorithm implementation:
  1. Think of the bars above as elements in an array
  2. The orange square bar represents the sequence of elements that have been sorted , and the blue square bar represents each element to be inserted
  3. In the implementation of the algorithm, the sequence formed by the orange square bars is maintained with a subscript variable (the subscript variable is incremented by 1 each time the cycle is broken) 

 About Hill sorting:

  • Hill sorting is an optimized version of insertion sorting , and the optimization algorithm is given by Hill himself
  • The basic idea is to pre- sort the sub-arrays of the array ( divided according to certain rules ) , so that the array to be sorted becomes relatively orderly , and finally perform an insertion sort on the entire array to complete the overall sorting

2. Insertion sort

Function header:

void InsertSort(DataType* arr, int size);
  • arr is the first address pointer of the array to be sorted ;
  • size is the number of elements to be sorted;

Algorithm ideas:

  1. Use an array subscript variable end to maintain the first end ordered elements (end is initialized to 0)
  2. During each cycle, the element pointed to by the end subscript variable is inserted into a specified position of the previous end elements to form an ordered sequence of end+1 elements , then end is incremented by one, and then the next cycle is executed until end The loop ends after traversing the entire array ;
  3. During each loop , the position where the x element is inserted is determined by element comparison :
  4. When end is 0, the number of elements in the ordered sequence is 0 ; when end is 1, the number of elements in the ordered sequence is 1 ......; when end is size, the number of elements in the ordered sequence It is size (that is, the size elements in the array are arranged in order) and the sorting is completed at this time
  5. Insertion sorting is to adjust the array arr into an ordered sequence in this recursive way 

Analysis of Algorithms

  • It can be seen from the algorithm idea that in the process of inserting an element x into the ordered sequence formed by the first end elements , data comparison and movement will occur
  • In the worst case , every time an element x is inserted into the ordered sequence of the previous end elements , each element of the ordered sequence must be moved back by one position (that is, the element comparison and moving The number of times is end times )
  • Therefore, the total number of elements compared and moved during the entire sorting process is:
  • So the time complexity of insertion sorting is: O(N^2) (N is the number of elements to be sorted);

Insertion sort code implementation:

Two layers of loop implementation:

  • The outer loop is controlled by end ( there are size elements to be inserted , loop size times ) (each time to insert the element of the prefix sequence x=arr[end]) ( the end subscript traverses the entire array )
    for(int end =0;end<size;++end)
  • The inner loop is controlled by insert, which is used to compare and move data , determine the position where x is to be inserted and complete the insertion ( insert is decremented from end each time until it is equal to 0 or just finds the position where x is to be inserted )

  • Take ascending order as an example:

//以排升序为例
void InsertSort(DataType* arr, int size)
{
	assert(arr);
	for (int end = 0; end < size; end++)    //用end来维护数组前end个元素构成的有序序列
	{
		int x = arr[end];					//x为待插入有序序列的数据
		int insert = end;					//用变量insert来确定x要插入的位置
		while (insert>0)					//通过元素比较确定x要插入的位置
		{
			if (x < arr[insert-1])			   //说明insert不是x要插入的位置
			{
				arr[insert] = arr[insert-1];   //往后挪动数据为x留出空位
				insert--;                      //令insert指向序列前一个元素
			}
			else
			{
				break;	    //有序序列中x>=arr[insert-1]说明insert是x要插入的位置
			}
		}					//最坏的情况下x会被插入到数组的首地址处(此时数据比较和挪动了end次)
		arr[insert] = x;    //完成元素x的插入(继续插入下一个元素)
	}
}
  • Note the boundary conditions of the algorithm:
  1. End the loop when the outer loop end increases to size (each element of the array is subscripted from 0 to size-1)
  2. When the inner loop insert is equal to 0, it means that x is smaller than all elements in the ordered sequence of the first end elements , and x will be inserted into the position where the array subscript is insert=0

Prelude to the optimization of the insertion sort algorithm: 

  • The insertion sort algorithm uses the method of recursively inserting the prefix ordered sequence element by element to complete the adjustment of the entire sequence to be sorted (in the process of element insertion, the ordered nature of the prefix sequence can sometimes greatly reduce the number of comparisons and moves of elements ).
  • Therefore, the insertion sort algorithm has a characteristic: when the array arr to be sorted is relatively ordered as a whole , the time complexity of the sorting process is close to O(N). For example:
  • In the process of processing the sequence in the above figure, the total number of comparisons of elements in insertion sorting is size+2 times, and the total number of moving times of elements is 3 times ( in general, it can be considered that when processing a roughly ordered sequence as a whole , the time of insertion sorting Complexity is linearly related to size );
  • The above characteristics of insertion sorting provide the possibility for the optimization of the algorithm . If we can make the sequence relatively orderly before the sequence is formally inserted and sorted , then the order of magnitude of the sorting time complexity can be reduced . This is Hill The starting point for the birth of sorting

3. Hill sorting (shrinking incremental sorting)

  • The Hill sorting algorithm was named after the great god DLShell in 1959. It is essentially optimized on the basis of insertion sorting .

First give the header of the sorting interface:

void ShellSort(DataType* arr, int size);
  • DataType is the element type defined by typedef
  • size is the number of elements in the array arr to be sorted

1. Algorithm idea: 

  • After analyzing the insertion sort , we already know that the time complexity of the insertion sort is linearly related to the number of elements to be sorted when dealing with an overall roughly ordered sequence.
  • Therefore, before formally performing insertion sorting , we consider grouping and presorting the sequence multiple times
  • After multiple grouping pre-sorting is completed, the overall sequence will become relatively orderly , and finally an insertion sort is performed to complete the sorting of the overall sequence

2. Algorithm split analysis 

sequence grouping

  1. Sequence grouping method graphic analysis: first determine a gap value ( gap<=size ) ( divide the sequence into gap groups ), and start from the first element of the array to form subsequences with gaps as intervals , as shown in the figure:
  2. According to the above grouping method, we simply conduct a mathematical analysis : since gap<=size , we must be able to divide the array arr into gap subsequences ( the number of elements in each subsequence is at least (size/gap) ; when When the size is an integer multiple of gap , the number of elements in each subsequence is (size/gap) , and when the size is not an integer multiple of gap , the maximum number of elements in each subsequence is (size/gap)+1 );( The divisions in the above mathematical expressions are all rounded down divisions )
  3. I give this sequence grouping mode a name: sequence fixed interval grouping method
  4. Another graphical form of the sequence fixed interval grouping method :
  • It can be seen that the first element of each subsequence is the element with subscript 0 to gap-1
  • After completing the fixed interval grouping of the sequence , the next step is to perform insertion sorting on these subsequences , that is, grouping presorting

Group presort:

  • After determining the gap value, insert and sort each subsequence : (gap=3 is used as an example in the illustration)
  • Code implementation ideas for group pre-sorting: (the most basic idea is three-layer loop implementation )
  1. The array arr is divided into gap groups , so it is necessary to insert and sort the gap subsequences , that is, the outermost loop gap times , the outermost loop is controlled by the start variable (initialized to 0), and the boundary condition is start<gap ;
  2. The code structure of the inner two-layer loop is exactly the same as the insertion sort
  • Code implementation of group pre-sorting ( based on the framework of insertion sorting code ):
    void ShellSort(DataType* arr, int size)
    {
    	assert(arr);
    
    	int gap;									//将数组划分为gap组(gap待定)
    	for (int start = 0; start < gap; ++start)   //start为每个子序列的首元素下标(gap个子序列循环gap次)
    	{
    		for (int end = start; end < size; end += gap)	//用end来维护子序列中的前缀有序序列(end不能大于或等于size)(end下标遍历了子序列)
    		{
    			int x = arr[end];							//x为待插入前缀有序序列的元素
    			int insert = end;							//用insert确定x要插入的位置
    			while (insert>start)						//insert=start时说明x要插入到子序列首元素的位置
    			{
    				if (x < arr[insert - gap])				//说明insert不是x要插入的位置
    				{
    					arr[insert] = arr[insert - gap];    //挪动数据,为x空出位置
    					insert -= gap;						//指向子序列的前一个元素
    				}
    				else                                    //x >= arr[insert - gap]说明insert就是x要插入的位置
    				{ 
    					break;
    				}
    			}
    			arr[insert] = x;							//完成x元素的插入
    		}	
    	}
    }
  • The code segment has completed a group pre-sorting ( the grouping interval is gap )

  • If the gap is equal to 1 in the code segment , the group pre-sorting becomes normal insertion sorting 

Another implementation of group presorting:

  • Group pre-sorting can be implemented with two layers of loops:

  • Note: There is essentially no difference between the group pre-sorting of the two-level loop and the grouping pre-sorting of the three-level loop , except that the order of element adjustment is different (the three-level loop is a group of subsequences to complete the insertion sort, and the two-level loop is adjustments are done element by element)

  • void ShellSort(DataType* arr, int size)
    {
    	assert(arr);
    	int gap;										//gap值待定		
    	for (int end = 0; end < size; ++end)		    //用end来遍历数组每一个元素
    	{
    		int x = arr[end];							//x为待插入子序列前缀的元素
    		int insert = end;							//用insert确定x要插入的位置
    		while (insert >= gap)						//insert<gap的时,insert为下标的元素不需要调整
    		{
    			if (x < arr[insert - gap])				//说明insert不是x要插入的位置
    			{
    				arr[insert] = arr[insert - gap];    //挪动数据,为x空出位置
    				insert -= gap;						//指向子序列的前一个元素
    			}
    			else                                    //x >= arr[insert - gap]说明insert就是x要插入的位置
    			{
    				break;
    			}
    		}
    		arr[insert] = x;							//完成x元素的插入
    	}
    }

    The three-layer loop is changed to a two-layer loop . The specific code ideas are illustrated as follows:

  • The essence of the writing method of the three-layer loop and the writing method of the two-layer loop is to complete a group pre-sorting when the gap is a certain value

  • Similarly, if the gap is equal to 1 in the code segment, the group pre-sorting becomes normal insertion sorting 

  • For the sake of brevity, we use the grouping pre-sorting implementation method of two layers of loops to complete Hill sorting

Implementation idea of ​​Hill sorting (here adopts Knuth implementation method)

  • From the previous analysis, we can know that the value of gap determines a division of the sequence (the sequence is divided into gap subsequences ), and the insertion and sorting of these subsequences completes the grouping pre-sorting of a sequence
  • We set the initial value of the gap to size , and each cycle of the gap decreases according to the method of gap = (gap/3)+1 (from simple mathematical analysis, it can be known that the gap will be reduced to 1 ) (adding 1 is to ensure that the gap will eventually be certain will be reduced to 1 )
  • For each gap value, we perform a group presort on the array to be sorted
  • When the gap is reduced to 1 , the array has been pre-sorted one or more times , and finally a group pre-sort with a gap of 1 is completed ( the group pre-sort with a gap of 1 is equivalent to a normal insertion sort ) to complete The overall sorting of the array ( the sorting with a gap of 1 must be performed , otherwise the overall sequence cannot be guaranteed to be in order )
  • We only need to add a loop controlled by gap to the outer layer of the implementation code of group pre-sorting to realize the above process:
    void ShellSort(DataType* arr, int size)
    {
    	assert(arr);
    	int gap = size;										
    	while (gap>1)                  //gap为1时序列已经完成了普通插入排序,排序完成
    	{
            gap = gap/3 + 1;           //缩小gap的值
    		for (int end = 0; end < size; ++end)		    //用end来遍历数组每一个元素
    		{
    			int x = arr[end];							//x为待插入子序列前缀的元素
    			int insert = end;							//用insert确定x要插入的位置
    			while (insert >= gap)						//insert<gap的时,insert为下标的元素不需要调整
    			{
    				if (x < arr[insert - gap])				//说明insert不是x要插入的位置
    				{
    					arr[insert] = arr[insert - gap];    //挪动数据,为x空出位置
    					insert -= gap;						//指向子序列的前一个元素
    				}
    				else                                    //x >= arr[insert - gap]说明insert就是x要插入的位置
    				{
    					break;
    				}
    			}
    			arr[insert] = x;							//完成x元素的插入
    		}
    	}												
    }

Analysis on the exponential decline of gap:

  • In the Hill sorting algorithm, the gap decreases exponentially from the size . (This is also the source of the name shrinking incremental sorting )
  • Therefore, the order of magnitude of the execution times of group presorting in the algorithm is (logN) (N=size)
  • The idea that the gap is designed to be exponentially decreasing is very important , which not only ensures that a certain number of group pre-sortings are completed before the insertion sort with gap=1 , but also ensures that the number of group pre-sortings will not be too many ( group pre-sorting If there are too many sorting times , it will reduce the overall efficiency of the algorithm )

4. Time complexity analysis of Hill sorting

Shell sorting interface header:

void ShellSort(DataType* arr, int size)

Time complexity of group presorting

Sequence grouping diagram:

  • For the convenience of analysis, we assume that the size of each group pre-sorting is an integer multiple of gap : according to the sequence fixed interval grouping method , the array to be sorted is divided into gap subsequences , and the number of elements in each subsequence is (size/gap)
  • According to the time complexity calculation formula of insertion sorting : the order of magnitude of the total number of element comparisons and moves in the loop of the insertion sorting algorithm for each group of subsequences is:
  • That is, the time complexity calculation expression for completing a group pre-sorting is:

In the process of exponentially decreasing gap, the total complexity analysis of multiple group pre-sorting:

  • A rule can be found: the larger the gap value (gap<=size), the lower the upper limit of grouping pre-sorting element comparison and moving times ( the complexity is closer to linear ) , so using a larger gap value for grouping pre-sorting will reduce the overall The design of adjusting the sequence to a relatively ordered sequence is very ingenious! ! !

  • If size is written as N , the overall order of magnitude of the time complexity of Hill sorting is roughly NlogN
  • The proof is not strict , but it generally reflects the idea that Hill sorting tries to reduce the order of magnitude of the time complexity of the overall sorting through multiple grouping pre-sorting methods .
  • When Master Hill designed the algorithm, the decreasing method of gap is: gap=gap/2 ;
  • Gap = (gap/3)+1 The formula for reducing the increment is given by the god Kunth. The optimal gap decrement formula for Hill sorting has not yet been obtained (this involves some mathematical problems that have not been fully resolved)

Explanation of the complexity of Hill sorting in the classic data structure bibliography: 

The following paragraphs are excerpted from "Data Structure - Using Object-Oriented Method and C++ Description" --- Yin Renkun

  • There are many ways to take the gap . At first, Shell proposed to take gap=gap/2 until gap=1. Later, Kunth proposed to take gap=(gap/3) +1. Some people also suggested that it is better to take odd numbers, and some people proposed that each gap Mutual quality is better. Neither claim has been proven .
  • It is very difficult to analyze the time complexity of Hill sorting. Under certain circumstances, it is possible to accurately estimate the number of comparisons of key codes and the number of object movements . Dependencies , and give a complete mathematical analysis , no one has been able to do . In the third volume of "Computer Programming Skills" written by Kunth, using a large amount of experimental data and statistics, it is concluded that when n is large, the average number of key code comparisons and the average number of moving objects are about n^1.25 to 1.6n ^1.25 , which is obtained when using direct insertion sort as the subsequence sorting method

From the above paragraphs, we can know that, limited by the current mathematical analysis , we can temporarily consider the time complexity of Hill sorting as: O(N^1.25) based on the experimental results of Kunth.

5. Measured and compared the time efficiency of Hill sorting and ordinary insertion sorting

  • Use a random number generator to create two identical integer arrays with 100,000 elements , one is sorted by direct insertion sorting , the other is sorted by Hill sorting , and the time consumed by the two is printed (in milliseconds)
    int main()
    {
    	srand(time(0));
    	const int N = 100000;
    	int* arr1 = (int*)malloc(sizeof(int) * N);
    	int* arr2 = (int*)malloc(sizeof(int) * N);
    	for (int i = 0; i < N; ++i)
    	{
    		arr1[i] = rand();
    		arr2[i] = arr1[i];
    	}
    	int begin1 = clock();
    	InsertSort(arr1, N);
    	int end1 = clock();
    	printf("IsertSort:%d\n", end1-begin1);
    
    
    	int begin2 = clock();
    	ShellSort(arr2, N);
    	int end2 = clock();
    	printf("ShellSort:%d\n", end2-begin2);
    	return 0;
    }

  • It can be seen that although the algorithm optimization idea of ​​Hill sorting is novel and complicated, there is no doubt that Hill's optimization of insertion sorting is very successful! ! ! ! (And the greater the amount of data, the more obvious the optimization effect of Hill sort )

Guess you like

Origin blog.csdn.net/weixin_73470348/article/details/129465904