Python data structure application 5 - sorting (Sorting)

Before the specific algorithm, let's first look at the criteria measured by the sorting algorithm:

  1. Comparison: The time it takes to compare the number of times the magnitudes of two numbers are compared.
  2. Swap: The time it takes to swap a number into its proper place when it is found that it is not in its proper place.

Bubble Sort

This is a sequence that is often tested in interviews. Although it is simple, it is not simple to ensure that there is no mistake at all.

Bubbling, as the name suggests, pops up a bubble each time, and this bubble is the largest number in the remaining numbers. So, if there are n numbers to be sorted, then you need to take (n-1) bubbles. That is, the outermost loop takes len(list)-1times.

Each bubbling process is the inner loop. Take a pair each time, start with the first pair in order, compare the two numbers which are larger and swap to the upper part (right/bubble), and then execute the next pair in sequence (each time Enter 1), the number of inner loops decreases as the number of outer loops increases, because once an outer loop is performed, there will be one more number that has been sorted.

The bubbling process is as follows:

def bubble_sort(a_list):
    for pass_num in range(len(a_list)-1,0,-1):
        for i in range(pass_num):
            if a_list[i] > a_list[i+1]:
                a_list[i],a_list[i+1] = a_list[i+1],a_list[i]
a_list = [54, 26, 93, 17, 77, 31, 44, 55, 20]
bubble_sort(a_list)
print(a_list)
[17, 20, 26, 31, 44, 54, 55, 77, 93]

This kind of bubble sort is very time-consuming. For each outer loop, the number of comparisons that need to be compared is shown in the following figure:

Therefore, a total of \(\frac{1}{2}n^{2}-\frac{1}{2}n\) comparisons are required, and the time complexity is \(O(n^{2})\) , if you look at the worst case, that is, you need to swap two numbers after each comparison, then the total time ✖️2

In fact, in many cases, bubble sort does not need to complete all the outer loops to sort all the numbers, but due to the stupidity of the program, he still keeps executing it, wasting time. Then, we can improve the bubble sort to know that it stops working once all the data is already in order, for time optimization:

def short_bubble_sort(a_list):
    exchanges = True   # 此标志用来记录一轮循环中是否进行了交换
    pass_num = len(a_list)-1
    while pass_num > 0 and exchanges:
        exchanges = False
        for i in range(pass_num):
            if a_list[i]>a_list[i+1]:
                exchanges = True
                a_list[i],a_list[i+1] = a_list[i+1],a_list[i]
            pass_num -= 1
a_list=[20, 30, 40, 90, 50, 60, 70, 80, 100, 110]
short_bubble_sort(a_list)
print(a_list)
[20, 30, 40, 50, 60, 70, 80, 90, 100, 110]

Selection Sort

In fact, the result of each outer loop of selection sort is very similar to that of bubble sort, that is, each time the largest element is found among the elements to be sorted, it is exchanged with the position where it should be placed. That is, each time the outer loop is performed, one more element is arranged.

The time complexity of selection sort is still \(O(n^{2})\) , but it consumes less time than bubble sort because of its fewer element exchanges than bubble sort.

def selection_sort(a_list):
    for fill_slot in range(len(a_list)-1,0,-1):
        # fill_slot 这一轮最大元素将要放入的位置
        pos_of_max=0
        # 这一轮最大元素的位置
        for location in range(1, fill_slot+1):
            if a_list[location]>a_list[pos_of_max]:
                pos_of_max = location
        a_list[fill_slot],a_list[pos_of_max]=a_list[pos_of_max],a_list[fill_slot]
a_list = [54, 26, 93, 17, 77, 31, 44, 55, 20]
selection_sort(a_list)
print(a_list)   
[17, 20, 26, 31, 44, 54, 55, 77, 93]

Insertion Sort

Insertion sort is like inserting playing cards, inserting one card at a time to the appropriate size until n cards are inserted. So insertion sort requires n-1 insertion operations, that is, the outer loop. Each insert operation needs to compare the sorted numbers in turn until a suitable position is found. So the time complexity of insertion sort is \(O(n)\) in the best case and \(O(n^{2})\) in the worst case .

The insertion sort process is shown in the following figure:

def insertion_sort(a_list):
    for index in range(1, len(a_list)):
        # index 为该轮要插入元素的位置
        current_value = a_list[index]
        position = index   
        while position>0 and a_list[position-1]>current_value:
                a_list[position] = a_list[position-1]
                position = position - 1
        a_list[position] = current_value
a_list = [54, 26, 93, 17, 77, 31, 44, 55, 20]
insertion_sort(a_list)
print(a_list)
[17, 20, 26, 31, 44, 54, 55, 77, 93]

Shell Sort

Hill sort , also known as the decreasing-increment sorting algorithm, is a more efficient and improved version of insertion sort. Hill sort is an unstable sorting algorithm.

Hill sort is an improved method based on the following two properties of insertion sort:

Insertion sorting is efficient when operating on almost sorted data, that is, it can achieve the efficiency of linear sorting, but insertion sorting is generally inefficient, because insertion sorting can only move the data one bit at a time.

As shown in the figure below, there are a total of 9 numbers in this list. We divide these 9 numbers into three sublists, and the position increment is 3 (the dark part of each column in the figure is a sublist). Insertion sort is performed once for each sublist, and the original position is kept in a new list.

For this new list, we do a standard insertion sort. Note that since we have already sorted the previous sublist, we reduce the number of move operations for this standard insertion sort.

def shell_sort(a_list):
    increment = len(a_list) // 2  # (步进数)
    while increment > 0:
        for start_position in range(increment):
            gap_insertion_sort(a_list, start_position, increment)
        print("After increments of size", increment, "The list is",a_list)
        increment = increment // 2
        
def gap_insertion_sort(a_list, start, gap):
    for i in range(start+gap, len(a_list), gap):
        # 以下为插入排序
        current_value = a_list[i]
        position = i
        while position >= gap and a_list[position-gap]>current_value:
            a_list[position] = a_list[position-gap]
            position = position - gap
            a_list[position] = current_value
        
a_list = [54, 26, 93, 17, 77, 31, 44, 55, 20]
shell_sort(a_list)
print(a_list)
After increments of size 4 The list is [20, 26, 44, 17, 54, 31, 93, 55, 77]
After increments of size 2 The list is [20, 17, 44, 26, 54, 31, 77, 55, 93]
After increments of size 1 The list is [17, 20, 26, 31, 44, 54, 55, 77, 93]
[17, 20, 26, 31, 44, 54, 55, 77, 93]

Step increment is an important parameter in Hill sort. The functions listed above shell_sort()use a different stepping. First, n/2 sublists are created, next, n/4 sublists are created, and the steps are also successively reduced. The following image is the sublist selection in the first loop:

Merge Sort

From here, start to introduce the divide and conquer strategy .

Merge sort is actually very similar to binary search , using a recursive method. Each time, the list to be sorted is 'averagely' divided into two sublists on the left and right, and then sorted separately, recursively until the length of the sublist <= 1.

The first picture is the process of dividing the list:

The second figure is the process of conquer (merge) of sublists:

def merge_sort(a_list):
    print('splitting', a_list)
    if len(a_list)>1:
        mid = len(a_list) // 2
        # 这两个half需要额外的空间
        left_half = a_list[:mid]
        right_half = a_list[mid:]
        merge_sort(left_half)
        merge_sort(right_half)
        # 当左右两个sublist都排好序时,每次选择两个sublist的最小的数
        # 然后在这两数中选择更小的数依次放入待返回的list中
        i,j,k=0,0,0
        while i<len(left_half) and j<len(right_half):
            if left_half[i] < right_half[j]:
                a_list[k] = left_half[i]
                i = i + 1
            else:
                a_list[k] = right_half[j]
                j = j + 1
            k = k + 1
        while i < len(left_half): 
            a_list[k] = left_half[i] 
            i = i + 1
            k = k + 1
        while j < len(right_half): 
            a_list[k] = right_half[j] 
            j = j + 1 
            k = k + 1
        print("Merging ", a_list)
        
a_list = [54, 26, 93, 17, 77, 31, 44, 55, 20]
merge_sort(a_list)
print(a_list)      
splitting [54, 26, 93, 17, 77, 31, 44, 55, 20]
splitting [54, 26, 93, 17]
splitting [54, 26]
splitting [54]
splitting [26]
Merging  [26, 54]
splitting [93, 17]
splitting [93]
splitting [17]
Merging  [17, 93]
Merging  [17, 26, 54, 93]
splitting [77, 31, 44, 55, 20]
splitting [77, 31]
splitting [77]
splitting [31]
Merging  [31, 77]
splitting [44, 55, 20]
splitting [44]
splitting [55, 20]
splitting [55]
splitting [20]
Merging  [20, 55]
Merging  [20, 44, 55]
Merging  [20, 31, 44, 55, 77]
Merging  [17, 20, 26, 31, 44, 54, 55, 77, 93]
[17, 20, 26, 31, 44, 54, 55, 77, 93]

The time complexity of merge sort, it's time to come up with this picture again (from "Introduction to Algorithms"). Looking at this figure, for each conquer (merge) process, the time consumption for sorting after merge is len (sublist), which is reflected in the figure as n, n/2, n/4, .... So, the total time consumed by each row in the graph is n/len(sublist) * len(sublist), where n is the total length of the list. And there are log(n) such rows in total, so the time complexity of merge sort is \(O(nlog(n))\)

quicksort

Quick sort is also a divide-and-conquer strategy. Compared with merge sort, quick sort does not use extra space.

Quick sort will select a pivot value in the list, which can also be called a split point, which is usually the first and last elements of the list, as shown in the figure below, the pivot value is 54. Next, in the remaining number after removing the pivot, the leftmost is the left mark, the rightmost is the right mark, and the left mark is moved to the right until the number moved to is less than the pivot; the right mark is to the left move until the number moved to is greater than the pivot. Then, swap the numbers of the two markers at this time. This process continues until the two markers move to a cross and stop moving.

At this time, if the pivot number is inserted between the left marker and the right marker, the numbers on the left side of the pivot element are all smaller than the pivot element, and the numbers on the right side of the pivot element are all greater than the pivot element.

Perform a recursive quicksort on the sublist of numbers on the left and right. The whole process is as follows:

def quick_sort(a_list):
    quick_sort_helper(a_list, 0, len(a_list) - 1)
def quick_sort_helper(a_list, first, last):
    # first和last分别是a_list的首尾位置,由于快速排序没有额外空间,
    # 所以需要记录sublist的首尾位置
    if first < last: # 若len(sublist)>0
        split_point = partition(a_list, first, last)
        quick_sort_helper(a_list, first, split_point - 1)
        quick_sort_helper(a_list, split_point + 1, last)
def partition(a_list, first, last):
    pivot_value = a_list[first]
    left_mark = first+1
    right_mark = last
    done = False
    while not done:
        # 
        while left_mark <= right_mark and a_list[left_mark] <= pivot_value:
            left_mark = left_mark + 1
        while left_mark <= right_mark and a_list[right_mark] >= pivot_value:
            right_mark = right_mark - 1
        if right_mark < left_mark:
            done = True
        else: # (right_mark - left_mark) == 1
            a_list[left_mark],a_list[right_mark]=a_list[right_mark],a_list[left_mark]
    a_list[first],a_list[right_mark] = a_list[right_mark],a_list[first]
    return right_mark

a_list = [54, 26, 93, 17, 77, 31, 44, 55, 20]
quick_sort(a_list)
print(a_list)
[17, 20, 26, 31, 44, 54, 55, 77, 93]

The time complexity of quick sort depends on the choice of pivot . If the size of pivot is in the middle of the entire list every time, then the process of divide is similar to the process of merge sort, and the result of time complexity is \(O( nlog(n))\) .

However, not all situations are so good. Imagine the worst case. If the pivot element selects the smallest or largest number in the remaining list each time, only one element can be divided each time divide, which is basically the same as selection sorting, but the time complexity increases. is \(O(n^{2})\)

Therefore, in order to avoid this situation, we can try to randomly select the pivot , which can reduce the impact of the original structure of the original data on the complexity.

  • ** Reference:**
  1. Hill sort - Wikipedia

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325308145&siteId=291194637