Sort large array

Sort large array

Preceded by a colleague asked me how a very large array sort of thinking is the beginning: decentralized merge sort. But in fact some of the specific details are still open to question, so here specifically to write an article to sum up.

problem

If an array has 100 million numbers, then this should be sorted array from small to large, how do we operate!

Thinking

In fact, for a larger problem with this amount of data, ideas are basically the same:

  • First of all, split up, for so much data, we certainly can not directly manipulate, instead of using 某种方式this array into a plurality of relatively small array;
  • Then, we have these smaller arrays using 某种方式sort;
  • Finally, we will all sorted array using 某种方式merge into an array.

achieve

As can be seen from the above ideas to accomplish this sort of large arrays, we need three steps, three steps we can take a different approach. Here we take a detailed look at each step what methods we can use.

Dismembered

So what methods we use it piecemeal? In fact, our algorithm needs to look at the back of the (array merger), dismembered in fact, this very large array is divided into several small pieces.

So what does it matter between these pieces? Ordered or disordered?

Between arrays are ordered

If we ask partitioned between array blocks are ordered in an array that is the number of the first block is less than the number of second array blocks, we can easily think quicksort an intermediate value of the primary sorted on the entire array can be divided into two parts, the left middle value larger than the intermediate value is smaller than the right.

Quick sort, in fact, the idea is recursive. But this step Actually, we just want this array group. Then we can set the value of each subarray indicates a block size. In the recursive process if we find the length of the sub-arrays have less value then we split up the completion of the first step. This sort of rapid exit.

注意: For the size of the sub array, the individual need not feel too small (<1000), since group if quick sort, if only to be divided into 1000 pieces of data for each group, about 15 recursed individual layers which are relatively deep feeling the recursion. Need to consider will be a memory overflow.

Memory overflow:

Overflow means that cross-border, the operating system will assign each process a maximum limit of stack space, if more than the memory size of the program will coredump, just as you use int * pi = new int [100000000 ]; the same crash because here a heap overflow.
The operating system assigns to a process of stack space is 2M, heap space is 4G on 32 bit machines. If the stack space your process will use more than 2M stack overflows, heap use more than 4G will heap overflow.

So why would lead to recursive stack overflow it? I believe we know the rules and out of the stack, after first-out, recursive if it has not first in the stack, the stack would be consistent with the presence in space, so it is easy to cause a stack-full to overflowing.

If recursive calls too many times, it will not only push the stack, the stack was then pressed burst, this is a stack overflow

Between arrays are unordered

If we did not ask for inter-sub-array is divided out, then we can simply pass a value representing the length of the sub-array, this array is divided into a large number of small arrays.

But if not required if ordered between sub-arrays, then we can let the size of the array as small as possible (space for time), to facilitate sub-array behind the sort.

Sorting an array of smaller

For a relatively small array sort There are many ways, quick sort selection sort can be considered.

FIG posted here a complexity of conventional spatial and temporal algorithm:

Commonly used algorithms

Array merge

The combined array, if selected in the first step of the time is divided between the sub-arrays are ordered, this step we only need to divide the array merge good about it. This can be achieved this sort of large arrays.

If the first step we chose between sub-arrays disorder, then we will in the time of the merger, we can adopt a similar approach to merge sort these smaller length of the array to merge.

to sum up

In fact, the method mentioned above can be achieved sorting function. But for a lot of data, the data will usually we all read into memory during sorting is very difficult, so the first step, if we chose to order the entire data into blocks. So we want to All the data into memory, and the idea of ​​using a recursive quicksort is even more greatly increase the memory footprint ordering process. Thus, in general we have adopted the first divided into small pieces (disordered), then after each performed a complete tile sort merge sort multiplexer.

Reference article

In contrast the efficiency of c ++ tail recursive, recursive general, loop

10 ^ n integers (one hundred million) ordering

Published 900 original articles · won praise 387 · Views 2.79 million +

Guess you like

Origin blog.csdn.net/kingmax54212008/article/details/104050404