[java data structure] merge sort, super detailed

Insert picture description here


Preface


1. What is merge sort?

The idea of ​​merging and sorting:
Keep dividing the array in half until the length of the grouped sequence is less than or equal to 1, and then merge the two subsequences to obtain an ordered sequence.

Steps to implement 2-way merging:
1. First divide the array into segments and sort the elements of each segment.
2. Create a new array and merge the two sets of sorted sequences.
3. Move the elements from the new array back to the original In the array

Second, the realization of merge sort

1. Merge Sort

code show as below:

public class mergeSort {
    
    
    public static void mergeSort(long[] array){
    
    

        mergeSortInternal(array,0,array.length);
        //数组Array为左闭右开区间
    }
    public static void mergeSortInternal(long[] array,int lowIndex,int highIndex){
    
    

        int size=highIndex-lowIndex;
        if(size<=1){
    
    
            return;
        }
        //区间都是左闭右开区间,middleIndex不包括在左边的数组中,包含在右边的数组中
        int middleIndex=(lowIndex+highIndex)/2;
        mergeSortInternal(array,lowIndex,middleIndex);
        mergeSortInternal(array,middleIndex,highIndex);

        合并两个有序区间(array,lowIndex,middleIndex,highIndex);

    }

2. Merge two ordered sequences

The code is as follows (example):

 private static void 合并两个有序区间(long[] array,int lowIndex,int middleIndex,int highIndex){
    
    
        int size=highIndex-lowIndex;
        long[] extraArray=new long[size];
        int leftIndex=lowIndex;
        int rightIndex=middleIndex;
        int extraIndex=0;

        //两个序列都有元素时
        while(leftIndex<middleIndex&&rightIndex<highIndex){
    
    
            if(array[leftIndex]<=array[rightIndex]){
    
    
                extraArray[extraIndex]=array[rightIndex];
                leftIndex++;
                extraIndex++;
            }else{
    
    
                extraArray[extraIndex]=array[rightIndex];
                extraIndex++;
                rightIndex++;
            }
        }

        //开始有一个序列没有元素了
        if(leftIndex<middleIndex){
    
    
            while(leftIndex<middleIndex){
    
    
                extraArray[extraIndex++]=array[leftIndex++];
            }
        }else{
    
    
            while(rightIndex<highIndex){
    
    
                extraArray[extraIndex++]=array[rightIndex++];

            }
        }

3. Move the merged sequence elements back

        for(int i=0;i<size;i++){
    
    
            array[i+lowIndex]=extraArray[i];
        }

Three, performance analysis

Time complexity: best, worst, average O(n*log(n))
space complexity: all O(log(n))
stability: stable

Fourth, the sorting problem of massive data

Prerequisite: The memory is only 1G, and the data that needs to be sorted has 100G of
massive data. The characteristic is that all the data cannot be put in the memory for sorting, so external sorting (sorting with the help of external storage such as disks) is required, and merge sorting is The most commonly used external sorting, referred to as multi-way sorting.

1. Cut the data into n parts (200 parts, 512M each);
2. Sort 512M separately, because the memory is already available, any sorting is possible, and tasks can be assigned to two machines to participate in the sorting together;
3. The process of merging 200 channels and sorting 200 pieces of data at the same time. Put the smallest number in each file into the memory (in fact, you can also put more than one), select the smallest number, and insert the end To the last ordered file.


to sum up

1. When the number of data elements is relatively small, other sorting methods can be used (for example, when the number of elements is less than 16, it is recommended to use interpolation), because merge sorting always recursively for relatively small intervals Calling is expensive and not cost-effective, so it is not recommended.
2. When designing merge sorting, when calculating the length of the array, pay attention to the inclusion of the subscript of each interval does not contain the problem. The example code uses the interval with left closed and right opened.

Guess you like

Origin blog.csdn.net/m0_46551861/article/details/109280878