External sorting notes summary

  Today saw something about the outside of the sort, some feel important things to write here and share with you, if inappropriate, also please feel free.

 

  1, the data is divided into several segments in the external memory is read into memory and then sort and output

  2, constituting external sorting time: generating an initial run of the sort of merging a run time + + initialization time and the merged string cis cis string I / O time

  3, replacement selection sorting: establishing a minimum heap size is M, the data from the external memory is read continuously in and outputs the data. Each element r to the output of the top of the stack, the external memory read data s, if s> = r, put the top of the stack and adjusting elements arranged s stack structure; otherwise put into the last element of the stack top of the stack , s into the end of the stack, a heap size to reduce and adjust the stack structure. Thus, for any external memory file, at least the output of a sequence of length M. The average sequence length is 2M generated

  4, merged consumption: k MERGING, m strings, height, can be increased k, m decrease (increase in the length of the string), this way can reduce the external memory I / O time; while noting the order of the merged also have an impact on consumption, is essentially optimized Huffman tree, pay attention to the method of k-way merge.

  5, merging multiple tree analysis: winner-loser Tree + tree, find the smallest object is to improve the "first element" in the k cis strings efficiency. A leaf node L [1, ..., n] denotes the internal nodes by B [1, ..., n-1] FIG. B stored in the array L is the actual index. On internal node view, the bottom rightmost index number is, where s is the depth; so the bottom of the internal node a total of n - 1 - (2 ^ s - 1) = n - 2 ^ s two. The bottom of the external nodes is 2 * (n - 2 ^ s) a

  6, the relationship between L [i] and B [p] (p is the parent of node i), referred to as the number of external nodes LowExt the bottom;

  If i <= LowExt, p = (i + offset) / 2, where offset is the number of all the nodes above the bottom of the external node

  若I > LowExt, p = (i – LowExt + n – 1) / 2

  7, the reconstructed tree need winner upwardly, comparing the current node and the sibling node to the root node along the path, the winner in a parent node, then the current node as a parent node; heavy tree loser configuration and does not need to compare sibling node, a parent node, and for as long as the comparison, the loser in a parent node, the winner into the B [0], and then continue upward

  For a more clear understanding of this matter, we refactor the code posted loser tree:

#L and B is defined by the
DEF Winner (L, B, C):
    IF L [B] <L [C]:
        return B
    return C
DEF Loser (L, B, C):
    IF L [B] <L [ C]:
        return C
    return B
DEF Replay (i): updated # i at the outer junction reconstructed value and
    IF i <= LowExt:
        P = (i + offset) // 2
    the else:
        P = (i - LowExt n-+ -. 1) // 2
    B [0] = Winner (L, I, B [P])
    B [P] = Loser (L, I, B [P])
    the while P // 2> =. 1: # match up along a path
        TEMP = Winner (L, B [P // 2], B [0])
        B [P // 2] = Loser (L, B [P // 2], B [0]
        B [0 ] = TEMP
        P = // 2

  8, the original method to find a run of k is the minimum of complexity O (k), generates a sequence length n of overall complexity is O (kn); now initialize a winner or loser tree tree complexity is O (k), each external node updates a complexity of O (logk), generates a sequence length n of overall complexity is O (k + nlogk)

  9, external sorting the loser not have to use tree / winner trees, leaving everyone to think heap is achievable

Guess you like

Origin www.cnblogs.com/zyna/p/12079193.html