Quick sort some details

This is not quick sort tutorials, but the record is worth attention to detail.

Quick sort of thinking is very simple, contact person should be easy to remember, it is to press a certain strategy choose a pivot, and then follow a certain strategy to pivot moved to the right position, making it smaller than the front, behind all than it is large, as it can be put on both sides, the last two front and rear portions separately. Pseudo-code as follows:

quicksort a[], lo, hi:
    if lo >= hi, return;
    p = partition(a, lo, hi);
    quicksort(a, lo, p - 1);
    quicksort(a, p + 1, hi);

So the key is the partition function of the content, if not handled properly, it degenerates into O (N ^ 2) time complexity; best case is O (NlogN), as each one must traverse all the elements N, minimum there LogN round, that is, every time when half of the division will be minimized.

There are two general strategies division, a process is a single pointer, a pointer is a double method, the single pointer method from left to right, expanding the boundary element is smaller than the pivot, code as follows:

int partition(int[] a, int lo, int hi){
        int i = lo - 1, j = lo;
        int t, pivot = a[hi];
        while (j < hi) {
        if (a[j] < pivot) {
                i++; //border of smaller elements
             t = a[j];
             a[j] = a[i];
              a[i] = t;
          }
            j++;
        }
        // i+1 is the positon
        t = a[hi];
        a[hi] = a[i + 1];
        a[i + 1] = t;
        return i + 1;
}

Where i identifies than the boundary with the pivot element can be explained with reference to various specific materials, because it is not the focus of this article;

Then the double pointer method, as follows:

int partition(int[]a, int lo, int hi){
    int i = lo - 1, j = hi;
    int t, v = a[hi];
    while (true) {
        while (a[++i] < v)
            if(i == hi)
                break;
        while (v < a[--j])
            if (j == lo)
                break;
        if (i >= j)
            break;
        t = a[i];
        a[i] = a[j];
        a[j] = t;
    }
    t = a[i];
    a[i] = a[hi];
    a[hi] = t; 
    return i;
}

It should be noted that, where the index selection has great randomness, +1 or -1 or itself does not matter, it is important not to confuse the boundaries can, so when we go online to find this type of code, will there boundary strange choice, but some are not so good, where the election of the two should be a better choice. But even so, there are many noteworthy problems.

Problem generally occurs at the boundary or special case, so think about what a special case. The first is the input has been ordered; the second is the input of all reverse order; third input element is all the same; the third is actually the first and second intersection; think of these exceptions will look at specific issues.

First look at selected pivot elements, there are generally three strategies:

  1. Select the first or last element

    When the input data is random when this method of selection is not a problem, also very effective. However, if the input data are three special cases of the above, it will lead to an array of values ​​is the pivot, then all elements are concentrated on one side, then the algorithm will be degraded from the wheel itself into N logN wheel, on the time complexity becomes N ^ 2;

  2. A randomly selected

    This method is better than the first one, but there may also be randomly selected to minimum and maximum values ​​leading to failures algorithm;

  3. Selecting three numbers, taking the middle element as the pivot

    This method is better than the above two, pivot side without data does not appear, in fact, there are other benefits, as mentioned below;

Next, a first partition view of the above algorithm, the section of which:

while (j < hi) {
    if (a[j] < pivot) {
        i++; //border of smaller elements
        t = a[j];
        a[j] = a[i];
        a[i] = t;
    }
        j++;
}

Here is basically no problem, but the problem is in the second line of this inequality, if all the data are the same, this inequality has not been established, causing all elements are concentrated on one side; even changed <=, when all the elements are phase At the same time, inequality has been established, or will it lead to all elements "one-sided", resulting in the worst time complexity.

Finally, look at the second partition algorithm, here is to select a [hi] as pivot, basically there is no problem, but if the element has been ordered or reverse, would lead to the worst complexity; but the pleasant surprise is that, if all the elements all the same, there will be a good deal, because here's two internal circulation and its inequality:

while (a[++i] < v)
    if(i == hi)
        break;
while (v < a[--j])
    if (j == lo)
        break;

If all the elements are the same, where i and j are alternately advance, without generating one-sided, because there were used the <and> to taste;

In addition, there are actually two break boundary crossing to the array, but first a determination is redundant because it has been selected a [hi] Pivot as, and used herein <, never less than one element itself, so here We will not be out of range; therefore acting here a [hi] is the starting sentinel, effectively preventing the bounds of the array;

Meanwhile, in order to determine the use of array bounds sentinel below, the following strategy may be used: the number of three-taking mentioned above, whichever intermediate pivot element as a strategy; we assumed pseudocode pivot value selected as follows:

choosePivotValue(int[]a, int lo, int hi):
    mid = lo + (hi - lo)/2;
    sort(a[lo], a[mid], a[hi]);//so that a[lo] <= a[mid] <= a[hi]
    swap(a[mid], a[hi - 1]);
    return a[hi - 1]

Selected here is a [hi - 1] as the pivot element values ​​noted herein a [hi]> pivot, so that the following can be used as sentinel;

That the above program becomes :( pseudo-code)

int partition(int[]a, int lo, int hi){
    int i = lo, j = hi - 1; // changed
    int t;
    pivot = choosePivotValue(a, lo, hi);//pseudo code
    while (true) {
        while (a[++i] < pivot)
                ;
        while (pivot < a[--j])
                ;
        if (i >= j)
            break;
        t = a[i];
        a[i] = a[j];
        a[j] = t;
    }
    t = a[i];
    a[i] = a[hi - 1];
    a[hi - 1] = t; 
    return i;
}

Some small changes here, from the back after the above selection function, there must be a [lo] <pivot && a [hi]> pivot, so here subscripts can be from lo and hi - 1 starts, the following is noted ++ i and - -j, which is to skip the first and the last two elements; this wording is correct, more concise, but the boundary value processing error-prone;

Here no matter how select pivot, eventually you want to put them down at both ends of the array, so that they can be ignored in the cycle, otherwise the issues to be addressed very complex, double pointer above law, is to pivot into the right side, so the cycle when hungry, and i is the element of exchange, if placed on the left, and when the cycle would have j exchange, which is very error-prone; and switching means putting the right elements in place in the past large, and i stop place It happens to be a big element, and if left to exchange, means putting small elements put in the past, and j is the place to stop small elements;

Another point to note is that the code above, wherein the inner loop section someone will write:

while(a[i] < pivot)i++;
while(a[j] > pivot)j++;

Just think, if all the elements are the same, this is the cycle of death, because ij is always the same; it is put inside ++ is correct, similar to other cases.

In addition, quick sort, although fast, but in a small array as effective as good insertion sort, so it can be integrated into the inside insertion sort, when the sub-array smaller than a certain value, the use insertion sort, the effect will be better; in order to avoid the call stack overhead of recursion can be non-recursive.

Therefore, the conclusion is:. Every character counts code, each symbol is not free to write.

The reason for writing this is because there are many details found in the past did not pay attention, understand the algorithm, should not be beyond the understanding of the effect.

Guess you like

Origin www.cnblogs.com/yhxcs/p/12070159.html