[C-based sorting algorithm] Hill sorting of insertion sorting

foreword

This article is based on the C language to share a wave of the author's learning experience and experience of the Hill sort in the insertion sort of the sorting algorithm. Due to the limited level, mistakes are inevitable. Welcome to correct and exchange.

Hill sort

The Hill sorting method is also known as the shrinking increment method. The basic idea of ​​the Hill sorting method is: first divide the entire sequence of records to be sorted into several subsequences for direct insertion sorting, and when the records in the entire sequence are "basically ordered", then perform direct insertion sorting on all records in turn .

Hill sorting is an improved method based on the following two properties of insertion sorting:

  • Insertion sorting is efficient when operating on almost sorted data, that is, it can achieve the efficiency of linear sorting
  • But insertion sorting is generally inefficient because insertion sorting can only move data one bit at a time

​ Why is it also called the shrinking increment method? Let's take a look at the direct insertion sort. Do the elements move one by one each time they are compared? In fact, the default increment of direct insertion sorting (represented by gap) is 1, that is, every other element is compared. Hill sorting is to perform direct insertion sorting on the sequence starting from the initially set increment, and then reduce the incremental gap every time the sorting is completed, and the last time is the direct insertion sorting with an increment of 1.

steps :

  1. The purpose of pre-sorting: to keep the sequence close to order

    The data with a gap interval is divided into one group, and there are gap groups in total. In each round, the data in each group is directly inserted and sorted, and the gap is reduced after each round.

  2. direct insertion sort

    When the gap is reduced to 1, the last direct insertion sort is performed.

Explain the process according to the following example (ascending order is required):

​ We initially selected the gap as 3, and then there are groups as shown in the figure. Numbers of different colors represent different groups. In fact, the sorting of each group is equivalent to the improved version of direct insertion sorting, that is, changing the value of gap from 1 to Got 3.

image-20220813174425646

int gap = 3;
//单轮排序
for(int j = 0; j < gap; ++j)
{
    
    
    //单组排序
    for(int i = 0; i < n - gap; i += gap)
    {
    
    
        int end = i;
        int tmp = arr[end + gap];
        while(end >= 0)
        {
    
    
            if(tmp < arr[end])
            {
    
    
                arr[end + gap] = arr[end];
            }
            else
                break;
            end -= gap;
        }
        arr[end + gap] = tmp;        
    } 
}

image-20220813175102900

image-20220813175329119

​ We take the initial value of the gap here sz / 3and add 1 every time gap = gap / 3 + 1so that the gap can be 1 in the end. assert is to detect whether the arr pointer is empty, and it is impossible to be empty under normal circumstances.

void ShellSort(int* arr, int sz)
{
    
    
    assert(arr);
    
    int gap = sz;
    while(gap > 1)
    {
    
    
        gap = gap / 3 + 1;
        //单轮排序
        for(int j = 0; j < gap; ++j)
        {
    
    
            //单组排序
            for(int i = j; i < n - gap; i += gap)
            {
    
    
                int end = i;
                int tmp = arr[end + gap];
                while(end >= 0)
                {
    
    
                    if(tmp < arr[end])
                        arr[end + gap] = arr[end];
                    else
                        break;
                    end -= gap;
                }
                arr[end + gap] = tmp;        
            }        
        }        
    }
    
}

​ We found that the code looks a bit... bluffing if implemented in this way. At first glance, isn't it a set of four loops? This is done in groups. Is there a way to make the code look more "comfortable"?

​ We can let the gap group data be carried out in multiple groups side by side. What does it mean? Take the following picture as an example, let end only take one step backward each time, and perform direct insertion sorting of the red group when it encounters the data of the red group, and perform the blue group when it encounters the data of the blue group. The black group will do the black group. Every time end is finished, it is a whole round of sorting.

image-20220813174425646

void ShellSort(int* arr, int sz)
{
    
    
    assert(arr);
    
    int gap = sz;
    while(gap > 1)
    {
    
    
        gap = gap / 3 + 1;
        //单轮排序
        for(int i = 0; i < n - gap; ++i)
        {
    
    
            int end = i;
            int tmp = arr[end + gap];
            while(end >= 0)
            {
    
    
                if(tmp < arr[end])
                    arr[end + gap] = arr[end];
                else
                    break;
                end -= gap;
            }       
        }     
    }

}

We found :

​ The larger the gap, the faster the large data can jump to the back, and the faster the small data can jump to the front (for example, if the gap is 3, you can sort across three element positions at a time, and if the gap is 1, then much slower).

​ The smaller the gap, the slower the jump (for example, if the gap is 1, only one element can be sorted at a time), the closer to order.

Note :

  • gap>1 is pre-sorting

  • gap == 1 is direct insertion sort

  • We need to ensure that the last gap is 1, and there are many options for the value of the gap. Here we recommend gap = sz / 2sum gap /= 2or gap = sz / 3 sum gap = gap / 3 + 1.

Summary of the characteristics of Hill sorting :

  1. Hill sort is an optimization of direct insertion sort.
  2. When gap > 1, it is pre-sorted, the purpose is to make the array closer to order. When gap == 1, the array is already close to order, so it will
    be very fast. In this way, the overall optimization effect can be achieved. After we implement it, we can compare performance tests.

  3. The time complexity of Hill sorting is not easy to calculate, because there are many ways to value the gap, which makes it difficult to calculate, so the time complexity of Hill sorting given in many books is not fixed

image-20220813102430656

image-20220813102436571

​ Because our gap is valued according to the method proposed by Knuth, and Knuth has conducted a large number of experimental statistics, we temporarily calculate according to: O(n 1.25 ) to O(1.6n 1.25 ), the efficiency and O(nlogn) Similarly, it is slightly worse than O(nlogn) when the amount of data is large. In any case, Hill sort breaks through O(n 2 ) in time complexity .

  1. Stability: Unstable

Thank you for watching, your support is my greatest encouragement~

Guess you like

Origin blog.csdn.net/weixin_61561736/article/details/126796564