Data structure: insertion sort

direct insertion sort

The insertion sort algorithm is the simplest algorithm among all sorting methods. Its main implementation idea is to insert data into an ordered table one by one in a certain order, and the final sequence is the sorted data.

Direct insertion sort is one of the insertion sort algorithms. The method used is: when adding a new record, use sequential search to find the position where it is to be inserted, and then insert the new record.

What many beginners call insertion sort actually refers to the direct insertion sort algorithm. The insertion sort algorithm also includes half insertion sort, 2-way insertion sort, table insertion sort and Hill sort, etc.

For example, use the direct insertion sorting algorithm to sort the unordered list { 3 , 1 , 7 , 5 , 2 , 4 , 9 , 6 } \{3,1,7,5 ,2,4,9,6\} { 3,1,7,5,2,4,9,6}The process of sorting in ascending order is:

  • First consider record 3. Since the insertion sort has just started, there are no records in the ordered list, so 3 can be added directly to the ordered list. Then the ordered list and the unordered list can be shown in Figure 1:

    Insert image description here

  • When inserting record 1 into the ordered list, it is compared with record 3 in the ordered list. 1<3, so it is inserted to the left of record 3, as shown in Figure 2:

    Insert image description here

  • When inserting record 7 into the ordered list, compare it with record 3 in the ordered list, 3<7, so it is inserted to the right of record 3, as shown in Figure 3: Figure 3 Direct insertion sort (3)

    Insert image description here

  • When inserting record 5 into the ordered list, compare it with record 7 in the ordered list. 5<7, and 5>3 at the same time, so it is inserted between 3 and 7, as shown in Figure 4:

    Insert image description here

  • When inserting record 2 into the ordered table, compare it with record 7 in the ordered table, 2<7, and then compare it with 5, 3, and 1 respectively, and finally determine that 2 is located between 1 and 3, as shown in Figure 5: Figure 5 Direct insertion sort (5)

    Insert image description here

  • According to this rule, records 4, 9 and 6 in the unordered table are inserted into the ordered list in sequence, as shown in Figure 6: Figure 6 Inserting records 4, 9 and 6 in sequence

    Insert image description here

The specific code implementation of direct insertion sort is:

#include "iostream"
using namespace std;
//自定义的输出函数
void print(int a[], int n ,int i){
    
    
    cout << i << ": ";
    for(int j=0; j<8; j++){
    
    
        cout << a[j];
    }
    cout << endl;
}
//直接插入排序函数
void InsertSort(int a[], int n)
{
    
    
    for(int i= 1; i<n; i++){
    
    
        if(a[i] < a[i-1]){
    
    //若第 i 个元素大于 i-1 元素则直接插入;反之,需要找到适当的插入位置后在插入。
            int j= i-1;
            int x = a[i];
            while(j>-1 && x < a[j]){
    
      //采用顺序查找方式找到插入的位置,在查找的同时,将数组中的元素进行后移操作,给插入元素腾出空间
                a[j+1] = a[j];
                j--;
            }
            a[j+1] = x;      //插入到正确位置
        }
        print(a,n,i);//打印每次排序后的结果
    }
}

int main(){
    
    
    int a[8] = {
    
    3,1,7,5,2,4,9,6};
    InsertSort(a,8);
    return 0;
}

The running result is:

1: 13752496
2: 13752496
3: 13572496
4: 12357496
5: 12345796
6: 12345796
7: 12345679

The direct insertion sort algorithm itself is relatively simple and easy to implement. The time complexity of this algorithm is O(n^2).

half insertion sort

It is an improvement on the basis of direct insertion sorting, because when searching for the insertion position, it is not required that the sequence to be searched must be an ordered sequence, so we use the idea of ​​​​halving and dichotomy to optimize the efficiency of the search.

Algorithm steps

  1. Assume that the records to be sorted are stored in data arr[n], and arr[1] is an ordered sequence.
  2. Loop n-1 times, using half search each time to find the insertion position of arr[i] in the sorted sequence, and then insert arr[i] into the ordered sequence with table length i-1 until arr[ n] inserted into the ordered sequence.

The specific code implementation of this algorithm is:

#include "iostream"
using namespace std;
//自定义的输出函数
void print(int a[], int n ,int i){
    
    
    cout << i << ": ";
    for(int j=0; j<8; j++){
    
    
        cout << a[j];
    }
    cout << endl;
}
void BInsertSort(int a[],int size){
    
    
    int i,j,low = 0,high = 0,mid;
    int temp = 0;
    for (i=1; i<size; i++) {
    
    
        low=0;
        high=i-1;
        temp=a[i];
        //采用折半查找法判断插入位置,最终变量 low 表示插入位置
        while (low<=high) {
    
    
            mid=(low+high)/2;
            if (a[mid]>temp) {
    
    
                high=mid-1;
            }else{
    
    
                low=mid+1;
            }
        }
        //有序表中插入位置后的元素统一后移
        for (j=i; j>low; j--) {
    
    
            a[j]=a[j-1];
        }
        a[low]=temp;//插入元素
        print(a, 8, i);
    }

}

int main(){
    
    
    int a[8] = {
    
    3,1,7,5,2,4,9,6};
    BInsertSort(a,8);
    return 0;
}

The running result is:

1:13752496
2:13752496
3:13572496
4:12357496
5:12345796
6:12345796
7:12345679

Compared with the direct insertion sort algorithm, the half insertion sort algorithm only reduces the number of comparisons between keywords, but the number of recorded moves is not optimized, so the time complexity of the algorithm is still O(n^2).

2-way insertion sort

2-way insertion sort algorithm is an improvement on the half insertion sort, reducing the number of records moved during the sorting process to improve efficiency.

The specific implementation idea is: In addition, set up an array d with the same size as the array storing records (understood as a ring array), add the first record in the unordered table to the position of d[0], and then start from the unordered table. Starting from the second record in the table, compare it with d[0]: if the value is greater than d[0], add it to the right; otherwise, add it to the left.

The process of sorting the unordered list {3,1,7,5,2,4,9,6} using the 2-way insertion sort algorithm is as follows:

  • Add record 3 to array d:

    Insert image description here

  • Then insert 1 into the array d, as shown below:

Insert image description here

  • Insert record 7 into array d, as shown below:

    Insert image description here

  • Insert record 5 into array d. Since it is smaller than 7 but larger than 3, you need to move the position of 7 and then insert 5, as shown in the following figure:

    Insert image description here

  • Insert record 2 into array d. Since it is larger than 1 and smaller than 3, you need to move the positions of 3, 7, and 5, and then insert 2, as shown in the following figure:

Insert image description here

  • To insert record 4 into array d, you need to move the positions of 5 and 7, as shown in the following figure:

    Insert image description here

  • Insert record 9 into array d as shown below:

Insert image description here

  • Insert record 6 into array d, as shown below:

    Insert image description here

When finally stored in the original array, they are stored sequentially starting from d[7].

The specific implementation code of the 2-way insertion sort algorithm is:

#include "iostream"
using namespace std;
void insert(int arr[], int temp[], int n)
{
    
    
    int i,first,final,k;
    first = final = 0;//分别记录temp数组中最大值和最小值的位置
    temp[0] = arr[0];
    for (i = 1; i < n; i ++){
    
    
        // 待插入元素比最小的元素小
        if (arr[i] < temp[first]){
    
    
            first = (first - 1 + n) % n;
            temp[first] = arr[i];
        }
            // 待插入元素比最大元素大
        else if (arr[i] > temp[final]){
    
    
            final = (final + 1 + n) % n;
            temp[final] = arr[i];
        }
            // 插入元素比最小大,比最大小
        else {
    
    
            k = (final + 1 + n) % n;
            //当插入值比当前值小时,需要移动当前值的位置
            while (temp[((k - 1) + n) % n] > arr[i]) {
    
    
                temp[(k + n) % n] =temp[(k - 1 + n) % n];
                k = (k - 1 + n) % n;
            }
            //插入该值
            temp[(k + n) % n] = arr[i];
            //因为最大值的位置改变,所以需要实时更新final的位置
            final = (final + 1 + n) % n;
        }
    }
    // 将排序记录复制到原来的顺序表里
    for (k = 0; k < n; k ++) {
    
    
        arr[k] = temp[(first + k) % n];
    }
}

int main()
{
    
    
    int a[8] = {
    
    3,1,7,5,2,4,9,6};
    int temp[8];
    insert(a,temp,8);
    for (int i = 0; i < 8; i ++){
    
    
        cout << a[i] << " ";
    }
    return 0;
}

The running result is:

1 2 3 4 5 6 7 9

Compared with the halved insertion sort, the 2-way insertion sort algorithm only reduces the number of records moved, but does not fundamentally avoid it, so its time complexity is still O(n^2).

table insertion sort

Table insertion sorting uses the storage structure of a linked list to insert and sort data. In the process of sorting records according to their keywords, there is no need to move the storage location of the records, only the pointing of the pointers between nodes needs to be changed.

The storage structure of the linked list is represented by code:

#define SIZE 100
typedef struct {
    
    
    int rc;//记录项int next;//指针项,由于在数组中,所以只需要记录下一个结点所在数组位置的下标即可。
}SLNode;
typedef struct {
    
    
    SLNode r[SIZE];//存储记录的链表int length;//记录当前链表长度
}SLinkListType;

In a linked list represented by an array structure, set the node with the array subscript 0 as the head node of the linked list, and let its key take the maximum integer. The specific implementation process of table insertion sorting is: first, the node with the array subscript 1 and the head node in the linked list form a circular linked list, and then all the nodes in the subsequent sequence are arranged in sequence according to the size of the keywords they store. Insert into circular linked list.

For example, to sort the unordered table {49, 38, 76, 13, 27} using table insertion sort, the process is:

  • First, the node storing 49 and the header node form an initial circular linked list to complete the initialization of the linked list, as shown in the following table:

    Insert image description here

  • Then insert the record with 38 as the key into the circular linked list (you only need to change the next pointer of the linked list). The inserted linked list is:

    Insert image description here

  • Then insert the node with 76 as the key into the circular linked list. The inserted linked list is:

Insert image description here

  • Then insert the node with 13 as the key into the circular linked list. The inserted linked list is:

    Insert image description here

  • Finally, the node with 27 as the key is inserted into the circular linked list. The inserted linked list is:

    Insert image description here

  • The final circular linked list is:

    Insert image description here

Analyzing the implementation process of table insertion sort, compared with direct insertion sort, it only avoids the process of moving records (just modify the pointer field in each record node), but the number of comparisons with other keywords during the insertion process is not Change, so the time complexity of the table insertion sort algorithm is still O(n2).

Reprocess the linked list

The ordered list obtained by the table insertion sort algorithm is represented by a linked list, which means it is destined to only perform sequential searches. If you want to use the binary search algorithm, you need to reprocess the linked list, that is, rearrange the records in the linked list. The specific method is: traverse the linked list and move the i-th node in the linked list to the i-th node in the array. mark position.

Insert image description here

For example, the above table is a linked list that has been constructed. The process of reprocessing it is:

  • First, through its header node, it is known that the smallest keyword in the record is the keyword 13 with the array subscript 4, and 13 should be placed at the array subscript 1, so it needs to be stored in the array subscript 1. Keywords are exchanged. But in order to be able to find 49 later, point the next field of 13 to the location of 49 (the original value needs to be saved before changing, represented here by the q pointer), as shown in the following table:

    Insert image description here

  • Then find the next keyword 27 that 13 originally pointed to through the q pointer. At the same time, the q pointer points to the keyword 38 with the subscript 2. Since 27 should be moved to the position with the subscript 2, it is exchanged with the keyword 38 and changed at the same time. The next field of keyword 27 is as shown in the following table:

    Insert image description here

  • Later, when the next keyword is found through the q pointer, it is found that the pointed position is subscript 2, and it has been moved twice before, so it can be determined that what is stored in the array is not what you are looking for at this time, so you need to pass the next Continue searching for the next field marked 2, and find the position with the subscript 5, that is, the keyword 38. Since the subscript 5 is much larger than 2, we can judge that 38 is the value you are looking for, so the same record with the subscript 3 To swap positions, also change its next field, and point the q pointer to the position with index 1, as shown in the following table:

    Insert image description here

  • Then find the next keyword through the q pointer. Since the record in subscript 1 of the position it points to has been moved, we find keyword 49 through the next field and find that its position does not need to change; similarly, when passing through keyword 49 To find the position with subscript 3 in the next field, you still need to find the keyword 76 through its next field, and its position does not need to be changed.

The specific code implementation of the rearrangement is:

#include "iostream"
using namespace std;
#define SIZE 6
typedef struct {
    
    
    int rc;//记录项
    int next;//指针项,由于在数组中,所以只需要记录下一个结点所在数组位置的下标即可。
}SLNode;
typedef struct {
    
    
    SLNode r[SIZE];//存储记录的链表
    int length;//记录当前链表长度
}SLinkListType;
//重新排列函数
void Arrange(SLinkListType *SL){
    
    
    //令 p 指向当前要排列的记录
    int p=SL->r[0].next;
    for (int i=1; i<SL->length; i++) {
    
    
        //如果条件成立,证明原来的数据已经移动,需要通过不断找 next 域,找到其真正的位置
        while (p<i) {
    
    
            p=SL->r[p].next;
        }
        //找到之后,令 q 指针指向其链表的下一个记录所在的位置
        int q=SL->r[p].next;
        //条件成立,证明需要同下标为 i 的记录进行位置交换
        if (p!=i) {
    
    
            SLNode t;
            t=SL->r[p];
            SL->r[p]=SL->r[i];
            SL->r[i]=t;
            //交换完成后,该变 next 的值,便于后期遍历
            SL->r[i].next=p;
        }
        //最后令 p 指向下一条记录
        p=q;
    }
}

int main(int argc, const char * argv[]) {
    
    

    SLinkListType *SL=(SLinkListType*)malloc(sizeof(SLinkListType));
    SL->length=6;
    SL->r[0].rc=0;
    SL->r[0].next=4;

    SL->r[1].rc=49;
    SL->r[1].next=3;

    SL->r[2].rc=38;
    SL->r[2].next=1;

    SL->r[3].rc=76;
    SL->r[3].next=0;

    SL->r[4].rc=13;
    SL->r[4].next=5;

    SL->r[5].rc=27;
    SL->r[5].next=2;

    Arrange(SL);
    for (int i=1; i<6; i++) {
    
    
        cout << SL->r[i].rc << " ";
    }
    return 0;
}

The running result is:

13 27 38 49 76

Hill sort

Hill sorting, also known as "shrinking incremental sorting", is also a kind of insertion sorting, but compared with the previous sorting algorithms, Hill sorting has great improvements in time efficiency.

When using the direct insertion sort algorithm, if only a few of the records in the table are out of order and most remain in order, the efficiency of the algorithm will be relatively high in this case; in addition, if the total number of records that need to be sorted Very rarely, the efficiency of the algorithm will be equally high. Hill sorting is a sorting algorithm that improves the algorithm based on these two points.

The specific implementation idea of ​​Hill sorting is: first divide the entire record table into several parts, perform direct insertion sorting respectively, and then perform direct insertion sorting on the entire record table.

For example, the process of Hill sorting the unordered list {49, 38, 65, 97, 76, 13, 27, 49, 55, 4} is:

  • First, perform direct insertion sorting on {49, 13}, {38, 27}, {65, 49}, {97, 55}, {76, 4} (if you need to swap positions, just swap the storage locations), as follows As shown in the figure:

Insert image description here

In the above figure, two are compared, for example, 49 and 13 are compared, 13<49, so the storage locations are exchanged.

  • After one sorting, the records in the unordered table have been basically ordered. At this time, another split can be performed, as shown in the following figure:

    Insert image description here

  • After two splits, the unordered table is basically in order. At this time, a direct insertion sort is performed on the entire table (only a small amount of comparison and insertion operations are required). The final result of Hill sorting is:

Insert image description here

In the process of Hill sorting, for each divided sub-table, the records contained in each sub-table are not next to each other in the original table, but are separated by a fixed constant from each other. For example, in the above example, the constant for splitting records in the subtable is 5 when sorting for the first time, and 3 when sorting for the second time.

In this way, for records with smaller key values, the process of moving forward is not step by step, but a jump forward, and the time of comparison and sorting is reduced when the entire table is inserted and sorted for the last time. frequency.

Generally, when the number of records is large, the sorting efficiency of Hill sorting is higher than that of direct insertion sorting.

Specific code implementation of Hill sorting:

#include "iostream"
using namespace std;
#define SIZE 15
typedef struct {
    
    
    int key;   //关键字的值
}SLNode;
typedef struct {
    
    
    SLNode r[SIZE];//存储记录的数组
    int length;//记录数组中记录的数量
}SqList;
//希尔排序的实现函数,其中 dk 表示增值量
void ShellInsert(SqList * L,int dk){
    
    
    //从 dk+1 个记录起,每间隔 dk 个记录就调取一个进行直接插入排序算法
    for (int i=dk+1; i<=L->length; i++) {
    
    
        //如果新调取的关键字的值,比子表中最后一个记录的关键字小,说明需要将该值调换位置
        if (L->r[i].key<L->r[i-dk].key) {
    
    
            int j;
            //记录表中,使用位置下标为 0 的空间存储需要调换位置的记录的值
            L->r[0]=L->r[i];
            //做直接插入排序,如果下标为 0 的关键字比下标为 j 的关键字小,则向后一行下标为 j 的值,为新插入的记录腾出空间。
            for (j=i-dk; j>0 && (L->r[0].key<L->r[j].key); j-=dk){
    
    
                L->r[j+dk]=L->r[j];
            }
            //跳出循环后,将新的记录值插入到腾出的空间中,即完成了一次直接插入排序
            L->r[j+dk]=L->r[0];
        }
    }
}
//希尔排序,通过调用不同的增量值(记录),实现对多个子表分别进行直接插入排序
void ShellSort(SqList * L,int dlta[],int t){
    
    
    for (int k=0; k<t; k++) {
    
    
        ShellInsert(L, dlta[k]);
    }
}
int main(int argc, const char * argv[]) {
    
    
    int dlta[3]={
    
    5,3,1};//用数组来存储增量值,例如 5 代表每间隔5个记录组成一个子表,1表示每间隔一个,也就是最后一次对整张表的直接插入排序
    SqList *L=(SqList*)malloc(sizeof(SqList));
    L->r[1].key=49;
    L->r[2].key=38;
    L->r[3].key=64;
    L->r[4].key=97;
    L->r[5].key=76;
    L->r[6].key=13;
    L->r[7].key=27;
    L->r[8].key=49;
    L->r[9].key=55;
    L->r[10].key=4;
    L->length=10;
    //调用希尔排序算法
    ShellSort(L, dlta, 3);
    //输出语句
    for (int i=1; i<=10; i++) {
    
    
        cout << L->r[i].key << " ";
    }
    return 0;
}

operation result:

4 13 27 38 49 49 55 64 76 97

Tip: After a lot of research, it has been shown that it is best to select an increment value that has no common factors other than 1. At the same time, the last increment value in the entire increment array must be equal to 1, because the entire table must be done once in the end. Direct insertion sort algorithm.

Guess you like

Origin blog.csdn.net/weixin_45652283/article/details/132091547