Data Structure - Sorting

1. Insertion sort

1.1 Direct Insertion Sort - Principle:

The entire interval is divided into: 1. ordered interval; 2. unordered interval; each time the first element of the unordered interval is selected, a suitable position is selected to insert in the ordered interval.

1.2 Implementation:

public class InsertSort {

    public static void sort(int[] array){
        // 一共要取多少个元素来进行插入过程（无序区间里有多少个元素）
        for (int i = 0; i < array.length - 1; i++) {
            // 有序区间 [0, i]  至少在 i == 0 的时候得有一个元素
            // 无序区间 [i + 1, n)

            // 先取出无序区间的第一个元素，记为 k
            int k = array[i + 1];

            // 从后往前，遍历有序区间[0,i]
            // 找到属于无序第一个元素，即k的位置。
            int j = i;
            for (; j >= 0 && k < array[j]; j--) {
                array[j + 1] = array[j];        // 将不符合条件的数据往后般一格
            }

            array[j + 1] = k;
        }
    }
}

1.3 Performance Analysis: Data can be kept stable. Insertion sort, the closer the initial data is to the order, the higher the time efficiency.

1.4 Half-fold insertion sort (understand)

When selecting the position where the data should be inserted in an ordered interval, the idea of halving search can be used because of the orderliness of the interval.

    public static void bsInsertSort(int[] array) {

        for (int i = 1; i < array.length; i++) {
            
            int v = array[i];
            int left = 0;
            int right = i;
            // [left, right)
            // 需要考虑稳定性
            while (left < right) {

                int m = (left + right) / 2;
                if (v >= array[m]) {
                    left = m + 1;
                } else {
                    right = m; 
                }
            }
            // 搬移
            for (int j = i; j > left; j--) {
                array[j] = array[j - 1];
            }
            array[left] = v;
        }
    }

2. Hill sort

2.1 Principle: [Algorithm idea] First, divide the sequence of records to be sorted into several "sparser" subsequences, and perform direct insertion sorting respectively. After the above rough adjustment, the records in the whole sequence are basically in order, and finally all records are directly inserted and sorted.
        ①First, the distance between records is selected as di (i=1), and in the entire sequence of records to be sorted, all records with an interval of d1 are grouped into a group, and the group is directly inserted and sorted.
        ②Then take i=i+1, the distance between records is di ( di < d(i-1) ), in the entire sequence of records to be sorted, all records with an interval of di are grouped into a group and inserted directly into the group sort.
        Repeat step ② several times until the distance between records is di=1. At this time, there is only one subsequence, and the sequence is directly inserted and sorted to complete the entire sorting process.

2.2 Implementation

public class ShellSort {
    public static void sort(int[] a){
        //1.根据a的长度确定增长量h
        int h = 1;
        while (h < a.length/2){
            h = 2*h+1;
        }
        //2.希尔排序
        while ( h>=1 ){
           //排序:找到待插入的元素，
            for (int i = h; i < a.length; i++) {
                for (int j = i; j >= h ; j-=h) {
                    if (a[j-h]>a[j]){
                        //交换元素
                        swap(a,j-h,j);
                    }else {//j-h 比 j 小，不用交换。
                        break;
                    }
                }
            }
            h=h/2;
        }
    }
    private static void swap(int[] a, int i, int j){
        int emp;
        emp = a[i];
        a[i] = a[j];
        a[j] = emp;
    }
}

2.3 Stability: Unstable

3. Selection sort

3.1 Direct Selection Sort - Principle

[Algorithm idea]
In the first simple selection sort, starting from the first record, through n-1 keyword comparisons, select the record with the smallest keyword from the n records, and exchange it with the first record .
In the second simple selection sort, starting from the second record, through n-2 keyword comparisons, select the record with the smallest keyword from the n-1 records, and exchange it with the second record.
In the ith simple selection sort, starting from the ith record, through ni keyword comparisons, select the record with the smallest keyword from the n-i+1 records, and exchange it with the ith record.
In this way, after n-1 times of simple selection sorting, n-1 records will be placed in place, and the smallest record is left directly at the end, so a total of n-1 times of simple selection sorting are required.

3.2 Implementation

public class SelectSort {

 public static void sort(int[] array){

     for (int i = 0; i < array.length-1; i++) {
         int k = i;
         for (int j = i+1; j < array.length; j++) {
             if (array[j] < array[k]){
                 k = j;
             }
         }
         swap(array,k,i);
     }
 }

    private static void swap(int[] a, int i, int j){
        int emp;
        emp = a[i];
        a[i] = a[j];
        a[j] = emp;
    }
}

3.3 Stability: Unstable

3.4 Bidirectional Selection Sort (Understanding)

Select the smallest + largest elements from the unordered interval each time, and store them at the front and the end of the unordered interval until all the data elements to be sorted are exhausted.

public static void selectSort(int[] array) {
    
    for (int i = 0; i < array.length - 1; i++) {

        // 无序区间: [0, array.length - i)
        // 有序区间: [array.length - i, array.length)
        int max = 0;
        for (int j = 1; j < array.length - i; j++) {
        
            if (array[j] > array[max]) {
            max = j;
        }
    }

    int t = array[max];
    array[max] = array[array.length - i - 1];
    array[array.length - i - 1] = t;

    }
}

4. Heap Sort

4.1 Principle The basic principle is also selection sort, but instead of using traversal to find the largest number in the disordered interval, the heap is used to select the largest number in the disordered interval.

[Algorithm idea]
① Build the initial heap of the records to be sorted according to the definition of the heap (algorithm 9.9), and output the top element of the heap.
② Adjust the remaining record sequence, use the screening method to re-screen the first ni elements into a new heap, and then output the top element of the heap.
③ Repeat step ② to filter n-1 times, the newly filtered heap will become smaller and smaller, and the ordered
keywords will become more and more, and finally the sequence of records to be sorted becomes an ordered sequence. sequence, this process is called heap sort.
The drawing process is rather cumbersome, and you can draw and understand it yourself according to the ideas and codes.

4.2 Implementation

public class HeapSort {
    
    public static void sort(int[] array){
        //建初堆：升序建大堆，降序建小堆。
        for (int i = (array.length-2)/2; i >=0 ; i--) {
            shiftDown(array,array.length,i);
        }
        //维护堆：堆顶元素与最后一个元素交换后，堆顶的“堆性质”被破环，需要维护。此时维护的堆大小应该是依次减小的。
        for (int i = 0; i < array.length-1; i++) {
            swap(array,0,array.length-i-1);
            shiftDown(array,array.length-i-1,0);
        }
    }

    private static void shiftDown(int[] array, int length, int index) {

        while (index*2+1 < length){
            int left = index*2+1;
            int right = left+1;
            int max = left;

            if (right < length && array[left] < array[right]){
                max = right;
            }

            if (array[index] >= array[max]){
                return;
            }
            swap(array,index,max);
            index = max;
        }
    }

    private static void swap(int[] a, int i, int j){
        int emp;
        emp = a[i];
        a[i] = a[j];
        a[j] = emp;
    }
}

4.3 Stability: Unstable

5. Bubble sort

5.1 Principle: In an unordered interval, through the comparison of adjacent numbers, the largest number is bubbled to the end of the unordered interval, and this process continues until the array is ordered as a whole.

5.2 Implementation:

public class BubbleSort{

    public void sort(long[] array) {
        for (int i = 0; i < array.length - 1; i++) {
            boolean sorted = true;
            for (int j = 0; j < array.length - i - 1; j++) {
                if (array[j] > array[j + 1]) {
                    SortUtil.swap(array, j, j + 1);
                    sorted = false;
                }
            }
            if (sorted) {
                return;
            }
        }
    }
}

5.3 Stability: Stable

6. Quick Sort (Important)

6.1 Principle - Overview

1. Select a number from the range to be sorted as the pivot value;

2. Partition: Traverse the entire to-be-sorted interval, put those smaller than the reference value (which can include equals) to the left of the reference value, and those larger than the reference value (which can include equals) to the bottom of the reference value

right;

3. Using the divide and conquer idea, the left and right cells are processed in the same way, until the length of the cell == 1, which means it has been ordered, or the length of the cell == 0, which means there is no data.

6.2 Principle - partition: The essence of quick sort is the partition operation. There are many ways to achieve this operation. The general point is to divide data according to pivot.

6.3 Stability: Unstable

6.4 Principle - Selection of Reference Values

1. Select the edge (left or right)

2. Random selection

3. Take the middle of a few numbers (for example, take the middle of three numbers): array[left], array[mid], array[right] The size of the middle is the reference value

6.5 Code implementation:



public class QuickSort {
    public static void sort(int[] array) {
        
        quickSortRange(array,0,array.length-1);
    }

    // 为了代码书写方便，我们选择使用左闭右闭的区间表示形式
    // from，to 下标的元素都算在区间的元素中
    // 左闭右闭的情况下，区间内的元素个数 = to - from + 1;
    private static void quickSortRange(int[] array, int from, int to) {

        if (to - from +1 <= 1) {
            // 区间中元素个数 <= 1 个
            return;
        }

        // 挑选中区间最右边的元素 array[to],
        //int pi = partitionMethodA(array, from, to);
        //经过该步处理后数组array中的数据呈现: [from,pi)的元素是小于 pivot ;(pi,array.length-1]元素是大于 pivot ;
        //pivot == array[pi];
        // 按照分治算法的思路，使用相同的方式，处理相同性质的问题，只是问题的规模在变小
        int[] index = partitionD(array,from,to);
        int left = index[0];
        int right = index[1];
        quickSortRange(array, from, left);    // 针对小于等于 pivot 的区间做处理
        quickSortRange(array, right, to);   // 针对大于等于 pivot 的区间做处理
    }

    /**
     * 以区间最右边的元素 array[to] 最为 pivot，遍历整个区间，从 from 到 to，移动必要的元素
     * 进行分区
     * @param array
     * @param from
     * @param to
     * @return 最终 pivot 所在的下标
     */

    /*
        <= pivot: [from,left];
        > pivot : [right,to];
        未比较 :   (left,right);
     */
    private static int partitionA(int[] array, int from, int to) {
        int left = from;
        int right = to;
        int pivot = array[to];
        while (left < right){
            while (left < right && array[left] <= pivot){
                left++;
            }
            while(left < right && array[right] >= pivot){
                right--;
            }
            swap(array,left,right);
        }
        swap(array,left,to);
        return left;

    }

    /*
        <= pivot: [from,left];
        > pivot : [right,to];
        未比较 :   (left,right);

     */
    public static int partitionB(int[] array, int from, int to){
        int pivot = array[to];
        int left = from;
        int right = to;

        while(left < right){
            while(left < right && array[left] < pivot){
                left++;
            }
            array[right] = array[left];
            while(left < right && array[right] > pivot){
                right--;
            }
            array[left] = array[right];
        }
        array[left] = pivot;
        return left;
    }

    /**
     * 对 array 的 [from, to] 区间进行分区
     * 分区完成之后，区间被分割为 [<= pivot] pivot [>= pivot]
     * 分区过程中，始终保持
     * [from, s)    小于 pivot
     * [s, i)       大于等于 pivot
     * [i, to)      未比较过的元素
     * [to, to]     pivot
     * @param array
     * @param from
     * @param to
     * @return pivot 最终所在下标
     */
    public static int partitionC(int[] array,int from,int to){

        int s = from;
        int pivot = array[to];
        for (int i = from; i < to; i++) {   // 遍历 [from, to)
            // 这里加 == 号也保证不了稳定性，有交换操作
            if (array[i] < pivot) {
                // TODO: 可以进行简单的优化：如果 i == s，就不交换
                swap(array,i,s);
                s++;
            }
        }

        array[to] = array[s];
        array[s] = pivot;

        return s;
    }

    public static int[] partitionD(int[] array,int from,int to){

        int s = from;
        int i = from;
        int g = to;

        int pivot = array[to];
        while (g-i+1 > 0){
            if (array[i] == pivot){
                i++;
            }else if (array[i] < pivot){
                swap(array,s,i);
                s++;i++;
            }else {
                swap(array,g,i);
                g--;
            }
        }
        return new int[] {s-1,i};
    }

    public static int partitionE(int[]array,int left,int right){
        int d = left + 1;
        int pivot = array[left];

        for (int i = left+1; i <=right ; i++) {
            if(array[i] < pivot) {
                swap(array,i,d);
                d++;
            }
        }
        swap(array,d,left);
        return d;
    }


    private static void swap(int[] a, int i, int j){
        int emp;
        emp = a[i];
        a[i] = a[j];
        a[j] = emp;
    }
}

6.7 Optimization summary

1. It is very important to choose the benchmark value, usually use a few numbers to take the middle method

2. During the partition process, the number equal to the reference value is also selected

3. When the range to be sorted is less than a threshold, use direct insertion sort

6.8 Summary

1. Select a benchmark value in the range to be sorted

1. Select left or right

2. Randomly selected

3. Chinese and French for a few numbers

2. Make a partition so that the small number is on the left and the large number is on the right

1. hoare

2. Dig a pit

3. Traverse forward and backward

4. Also select the ones with the same reference value (understand)

3. Divide and conquer the left and right cells until the number of cells is less than a threshold, use insertion sort

7. Merge Sort (Important)

7.1 Principle - Overview: Merge sort (MERGE-SORT) is an efficient sorting algorithm based on the merge operation, which is a very typical application of the divide and conquer method. Merge the ordered subsequences to obtain a completely ordered sequence; that is, first make each subsequence ordered, and then make the subsequence segments ordered. If two sorted lists are merged into one sorted list, it is called two-way merge.

7.2 Principle - Merging Two Sorted Arrays

7.3 Implementation:

public class MergeSort {
    private static int[] assist;
    
    public static void sort(int[] array) {
        assist = new int[array.length];
        int lo = 0, hi = array.length - 1;
        sort(array, lo, hi);
    }

    private static void sort(int[] array, int lo, int hi) {
        if (hi <= lo) return;
        int mid = lo + (hi - lo) / 2;
        sort(array, lo, mid);
        sort(array, mid + 1, hi);//以上两步均为分组，
        merge(array, lo, mid, hi);//将array中从 lo 到 hi 的元素合并为有序数组。
    }

    private static void merge(int[] array, int lo, int mid, int hi) {
        int i = lo, p1 = lo, p2 = mid + 1;//三个指针
        while (p1 <= mid && p2 <= hi) {
            if (array[p1] < array[p2]) {
                assist[i++] = array[p1++];
            } else {
                assist[i++] = array[p2++];
            }
        }
        while (p1 <= mid) {
            assist[i++] = array[p1++];
        }
        while (p2 <= hi) {
            assist[i++] = array[p2++];
        }
        System.arraycopy(assist, lo, array, lo, hi - lo + 1);
    }

    private static void exchange(int[] a, int i, int j) {
        int emp;
        emp = a[i];
        a[i] = a[j];
        a[j] = emp;
    }
}

7.4 Stability: Stable

7.5 Optimization summary

Reuse the two arrays during the sorting process, reducing the copying process of elements.

8. External sorting

In the various sorting methods discussed above, the records to be sorted and related information are stored in the memory, and the entire sorting process is all completed in the memory, and does not involve the internal and external memory exchange of data, which is
collectively referred to as internal sorting. If the number of records to be sorted is so large that it is impossible to transfer the memory at one time, the entire sorting process must be completed by using the external memory to transfer people in batches. This kind of sorting is called external sorting
. This section will mainly introduce the basic idea of external sorting methods based on direct access devices (disk storage) and sequential access devices (tape storage).

8.1 The basic method of outer sorting

The most commonly used external sorting method is merge sort. The method consists of two stages: In the first stage, the records to be sorted are read into the memory in batches, the file is input into the content segment by segment, and each segment of the file is sorted by an effective internal sorting
method . The sorted file segment is called Sequence (or merge segment), when they are generated, they are written to the external memory in the form of sub-files, so that many initial
sequences are formed on the external memory; the second stage is the multi-way merge of sub-files. A certain merging method (such as the 2-way merging method) performs multiple merges, so that the length of the sequence gradually increases from small to large until it becomes a sequence, that is, the entire file is ordered. External memory such as tape and disk can be used for external sorting. The length of the initially formed serial file depends on the
size of the sorting area provided by the memory and the initial sorting strategy. The number of merge paths depends on the number of external devices that can be provided.

9. Sort summary:

9.1 is generally divided into the following sorts:

9.2 Compare various sorting methods from three aspects: the average time complexity of the algorithm, the worst time complexity and the auxiliary storage space required by the algorithm.

Sort method	Average time complexity	Worst yes gold complexity	Secondary storage space
simple sorting method	O (n2)	O (n2)	O(1)
quick sort	O(n log 2n)	O (n2)	O (log2n)
heap sort	O(n log 2n)	O(n log 2n)	O(1)
merge sort	O(n log 2n)	O(n log 2n)	O(n)

9.3 Comprehensive analysis and comparison of various sorting methods, the following conclusions can be drawn:
        ① The simple sorting method is generally only used when n is small (for example, n<30). When the records in the sequence are "basically ordered", direct insertion sorting is the best sorting method. If the data in the record is large, the simple selection sort method with fewer moves should be used.
        ②The average time complexity of quick sort, heap sort and merge sort are all O(nlog_n), but the experimental results show that in terms of average time performance, quick sort is the best of all sorting methods. Unfortunately, the worst-case time performance of quicksort is O(㎡). The worst time complexity of heap sort and merge sort is still O(nlogan). When n is large, merge sort has better time performance than heap sort, but it requires the most auxiliary space.
        ③ The simple sorting method can be used in combination with the sorting method with better performance. For example, in quick sort, when the length of the divided subintervals is less than a certain value, the direct insertion sorting method can be called instead; or the sequence to be sorted is divided into several subsequences, and the direct insertion sorting is performed respectively, and then the merge sort is used. method to merge ordered subsequences into a complete ordered sequence.
        ④The time complexity of radix sort can be written as O(dn). Therefore, it is most suitable for sequences with large values of n and small number of digits of the key, d. When d is much smaller than n, its time complexity is close to O(n)
        ⑤From the point of view of the stability of sorting, among all simple sorting methods, simple selection sorting is unstable, and other simple sorting methods are stable of. However, among those sorting methods with better time performance, Hill sort, quick sort and heap sort are all unstable, only merge sort and radix sort are stable.
        To sum up, each sorting method has its own characteristics, and no one method is absolutely optimal. The appropriate sorting method should be selected according to the specific situation, or a combination of methods can be used.