Sort - external sorting

First, the algorithm thought

If the record to be sorted in the memory part, a part in the external memory, is called external sorting. External sorting can be summarized in one sentence: memory used as a work to a secondary sort external memory data. External sort of object called a "record", "page"

Some provisions of the first meaning of the symbols will be used below

  • n: total number of records
  • m: the total number of recorded segments
  • The number of packets recorded segment group: p
  • k: the number of records contained in each packet segment
  • Each segment comprises a number of records in the record: l

(1) generating an initial segment merged

The file contains n records incoming memory, according to a given internal sort algorithm or substituted - selection sorting algorithm is divided into m smaller ordered recorded segments
Here Insert Picture Description

(2) multi-way merge

M ordered these segments into a set of records for each segment k. These initial segments merge merge multiple passes, so that the orderly merge section gradually expanded, and finally merge to form a single segment of the entire file on the external memory, it will be done external sorting this document.

Here Insert Picture Description


Second, the algorithm process

Here Insert Picture Description

(1) Formation of 5 initial merge segments

Here Insert Picture Description

(2) multi-way merge

  1. Because memory space w = 2, so that each can only read into the memory 2 in recording, so these five ordered merge initial period into a set period every 2 to give p = m / k = 3 groups

Here Insert Picture Description

  1. Take one group, and the first paragraph of each section (5,3) into memory, select one of the smaller value (3) written back to memory, because the merged segments are ordered (increment), so the smaller value obtained the current record is the smallest of the group.

Here Insert Picture Description

  1. The minimum step on selected external memory write back and record it in the next smallest value is read into memory segment to fill the position over
    Here Insert Picture Description

  2. 3 process is repeated, until a long period ordered recorded in the external memory

Here Insert Picture Description

  1. 2,3,4 procedure is repeated until a final result.
    Here Insert Picture Description

Here Insert Picture Description

Here Insert Picture Description
PS: corresponding merge tree as follows
Here Insert Picture Description


Third, an important sub-algorithm

(1) replacement - selection algorithm (decreasing the initial merge section)

Replacement - selection algorithm

When using the internal sorting algorithm, due to the need to merge the entire segment into memory, re-use within the sort method to sort, so the size of the merged segment must not exceed the size w memory, which will limit the number of initial merge segments, initial the number of segments merged too much influence to the efficiency of external sorting algorithms. Therefore, we need to use "replacement - selection sort algorithm" to improve the algorithm.

(2) optimal merge tree (reducing the number of merge)

Best merge tree

初始记录经过 ”置换-选择“ 排序后,得到的是长度不等的初始归并段,此时需要进行归并排序。将当前的 p 组记录(每组含 k 个记录段)归并为存放到外存的 p 个有序记录的过程称为一趟归并,可见每个记录在每趟归并中需要两次 I/O操作(读写操作各一次),因此为了为了优化归并树的带权路径长度,减少归并次数,需要用到 “最佳归并树”

(3)败者树(减少比较次数)

败者树

在多路归并的过程中,如果有K个顺串,每次有K个候选值,要找出其中的最小值,普通的做法需要进行K-1次比较,而使用 “败者树” ,则只需要O(logK)次比较,其原理就像我们平常的分组比赛,一个参赛者在小组出线之后,只需要与其他小组出线的参赛者比赛即可决出最后的冠军(最值),而不需要和其他所有参赛者都比一遍。


四、算法性能分析

(1)时间复杂度

由于外部排序的时间复杂度分析较为复杂,故在这里只分析其部分细节

  1. 置换-选择排序中,选最值的那一步的时间要根据考试要求的选择算法而定
  2. 置换-选择排序中,所有记录都要进行两次I/O操作
  3. m 个初始归并段进行 k 路归并,归并的趟数为 l O g k m ⌈log_km⌉
  4. 每一次归并,所有记录都要进行两次I/O操作
  5. k路归并的败者树的高度为 l O g 2 k + 1 ⌈log_2k⌉+1 ,因此利用败者树从k个记录中选出最值需进行 l O g 2 k ⌈log_2k⌉ comparisons, i.e., the time complexity is O ( l O g 2 k ) O(log_2k)
  6. Time complexity k achievements path tree is merged loser O ( k l O g 2 k ) O(k log_2k)

(2) the spatial complexity

All steps of the spatial complexity are constants, and therefore O ( 1 ) O (1)

Guess you like

Origin blog.csdn.net/starter_____/article/details/94436782