Using Python to implement ten classic sorting algorithms (with animation)

The sorting algorithm is one of the most basic algorithms in Data Structures and Algorithms.

Sorting algorithms can be divided into internal sorting and external sorting. Internal sorting means that data records are sorted in memory, while external sorting means that the sorted data is too large to accommodate all the sorting records at one time, and external storage needs to be accessed during the sorting process. Common internal sorting algorithms are: insertion sort, Hill sort, selection sort, bubble sort, merge sort, quick sort, heap sort, radix sort, etc. Summarized with a picture:

About time complexity

  1. Square-order (O(n2)) sorting Various types of simple sorting: direct insertion, direct selection, and bubble sort.

  2. Linear logarithmic order (O(nlog2n)) sorting quick sort, heap sort and merge sort;

  3. O(n1+§)) ordering, where § is a constant between 0 and 1. Hill sort

  4. Linear order (O(n)) sorting radix sorting, in addition to bucket and box sorting.

about stability

  • The order of 2 equal keys after sorting is the same as their order before sorting

  • Stable sorting algorithms: bubble sort, insertion sort, merge sort, and radix sort.

  • Not stable sorting algorithms: selection sort, quick sort, Hill sort, heap sort.

Glossary

  • n: data size

  • k: the number of "buckets"

  • In-place: occupies constant memory, does not occupy additional memory

  • Out-place: takes up extra memory

1. Bubble sort

Bubble Sort (Bubble Sort) is also a simple and intuitive sorting algorithm. It iteratively walks through the array to be sorted, comparing two elements at a time, and swapping them if they are in the wrong order. The work of visiting the sequence is repeated until there is no need to exchange, that is to say, the sequence has been sorted. The name of this algorithm comes from the fact that the smaller elements will slowly "float" to the top of the sequence through exchange.

As one of the simplest sorting algorithms, bubble sorting gives me the same feeling as Abandon appeared in the word book. It is the first on the first page every time, so it is the most familiar. Bubble sorting also has an optimization algorithm, which is to set a flag. When elements are not exchanged during a sequence traversal, it proves that the sequence is already in order. But this improvement doesn't do much to improve performance.

(1) Algorithm steps

  1. Compare adjacent elements. If the first is bigger than the second, swap them both.

  2. Do the same for each pair of adjacent elements, from the first pair at the beginning to the last pair at the end. After this step is done, the last element will be the largest number.

  3. Repeat the above steps for all elements except the last one.

  4. Continue repeating the above steps for fewer and fewer elements each time until there are no pairs of numbers to compare.

(2) Animation presentation

(3) Python code

def bubbleSort(arr):
    for i in range(1, len(arr)):
        for j in range(0, len(arr)-i):
            if arr[j] > arr[j+1]:
                arr[j], arr[j + 1] = arr[j + 1], arr[j]
    return arr

2. Selection sort

Selection sort is a simple and intuitive sorting algorithm, no matter what data goes in, it has a time complexity of O(n²). So when it is used, the smaller the data size, the better. The only advantage may be that it does not take up additional memory space.

(1) Algorithm steps

  1. First find the smallest (largest) element in the unsorted sequence and store it at the beginning of the sorted sequence

  2. Then continue to find the smallest (largest) element from the remaining unsorted elements, and then put it at the end of the sorted sequence.

  3. Repeat the second step until all elements are sorted.

(2) Animation presentation

(3) Python code

def selectionSort(arr):
    for i in range(len(arr) - 1):
        # 记录最小数的索引
        minIndex = i
        for j in range(i + 1, len(arr)):
            if arr[j] < arr[minIndex]:
                minIndex = j
        # i 不是最小数时,将 i 和最小数进行交换
        if i != minIndex:
            arr[i], arr[minIndex] = arr[minIndex], arr[i]
    return arr

3. Insertion sort

Although the code implementation of insertion sort is not as simple and crude as bubble sort and selection sort, its principle should be the easiest to understand, because anyone who has played poker should be able to understand it in seconds. Insertion sorting is the most simple and intuitive sorting algorithm. It works by constructing an ordered sequence. For unsorted data, scan from the back to the front in the sorted sequence, find the corresponding position and insert it.

Insertion sort, like bubble sort, also has an optimization algorithm called split half insertion.

(1) Algorithm steps

  1. Treat the first element of the first sequence to be sorted as an ordered sequence, and treat the second element to the last element as an unsorted sequence.

  2. Scans the unsorted sequence sequentially from beginning to end, and inserts each scanned element into its proper position in the sorted sequence. (If the element to be inserted is equal to an element in the ordered sequence, the element to be inserted is inserted after the equal element.)

(2) Animation presentation

(3) Python code

def insertionSort(arr):
    for i in range(len(arr)):
        preIndex = i-1
        current = arr[i]
        while preIndex >= 0 and arr[preIndex] > current:
            arr[preIndex+1] = arr[preIndex]
            preIndex-=1
        arr[preIndex+1] = current
    return arr

4. Hill sort

Hill sort, also known as decrement-increment sorting algorithm, is a more efficient and improved version of insertion sort. But Hill sort is an unstable sorting algorithm.

Hill sort is an improved method based on the following two properties of insertion sort:

  • Insertion sorting is highly efficient when operating on almost sorted data, that is, it can achieve the efficiency of linear sorting;

  • But insertion sorting is generally inefficient, because insertion sorting can only move data one bit at a time;

The basic idea of ​​Hill sorting is: first divide the entire record sequence to be sorted into several subsequences for direct insertion sorting, and when the records in the entire sequence are "basically in order", then perform direct insertion sorting for all records in turn.

(1) Algorithm steps

  1. Choose an incremental sequence t1, t2, ..., tk, where ti > tj, tk = 1;

  2. According to the incremental sequence number k, sort the sequence k times;

  3. For each sorting, according to the corresponding increment ti, the column to be sorted is divided into several subsequences of length m, and direct insertion sorting is performed on each sublist respectively. Only when the increment factor is 1, the entire sequence is treated as a table, and the length of the table is the length of the entire sequence.

(2) Python code

def shellSort(arr):
    import math
    gap=1
    while(gap < len(arr)/3):
        gap = gap*3+1
    while gap > 0:
        for i in range(gap,len(arr)):
            temp = arr[i]
            j = i-gap
            while j >=0 and arr[j] > temp:
                arr[j+gap]=arr[j]
                j-=gap
            arr[j+gap] = temp
        gap = math.floor(gap/3)
    return arr

5. Merge sort

Merge sort (Merge sort) is an effective sorting algorithm based on the merge operation. This algorithm is a very typical application of Divide and Conquer.

As a typical algorithm application of divide and conquer, there are two ways to implement merge sort:

  • Top-down recursion (all recursive methods can be rewritten with iteration, so there is a second method);

  • Bottom-up iteration;

Like selection sort, the performance of merge sort is not affected by the input data, but the performance is much better than selection sort, because it is always O(nlogn) time complexity. The cost is that additional memory space is required.

(1) Algorithm steps

  1. Apply for space so that its size is the sum of the two sorted sequences, which is used to store the merged sequence;

  2. Set two pointers, the initial positions are respectively the starting positions of the two sorted sequences;

  3. Compare the elements pointed by the two pointers, select the relatively small element and put it into the merge space, and move the pointer to the next position;

  4. Repeat step 3 until a pointer reaches the end of the sequence;

  5. Copies all remaining elements of another sequence directly to the end of the merged sequence.

(2) Animation presentation

(3) Python code

def mergeSort(arr):
    import math
    if(len(arr)<2):
        return arr
    middle = math.floor(len(arr)/2)
    left, right = arr[0:middle], arr[middle:]
    return merge(mergeSort(left), mergeSort(right))

def merge(left,right):
    result = []
    while left and right:
        if left[0] <= right[0]:
            result.append(left.pop(0));
        else:
            result.append(right.pop(0));
    while left:
        result.append(left.pop(0));
    while right:
        result.append(right.pop(0));
    return result

6. Quick Sort

Quicksort is a sorting algorithm developed by Tony Hall. On average, sorting n items requires O(nlogn) comparisons. In the worst case, Ο(n2) comparisons are required, but this is not common. In fact, quicksort is usually significantly faster than other Ο(nlogn) algorithms because its inner loop can be implemented efficiently on most architectures.

Quicksort uses a divide and conquer strategy to divide a list into two sub-lists.

Quick sort is another typical application of the idea of ​​divide and conquer in sorting algorithms. In essence, quick sort should be regarded as a recursive divide and conquer method based on bubble sort.

The name of Quick Sort is simple and rude, because as soon as you hear the name, you know the meaning of its existence, which is fast and efficient! It is one of the fastest sorting algorithms for big data. Although the time complexity of Worst Case reaches O(n²), they are excellent. In most cases, they perform better than sorting algorithms with an average time complexity of O(n logn), but why? I don't know either. Fortunately, my obsessive-compulsive disorder committed again. After checking more than N materials, I finally found a satisfactory answer on the "Algorithm Art and Informatics Competition":

The worst running case of quick sort is O(n²), such as quick sort of sequential arrays. But its amortized expected time is O(nlogn), and the constant factor implied in the O(nlogn) notation is small, which is much smaller than the merge sort whose complexity is stable at O(nlogn). Therefore, for the vast majority of random number sequences with weaker order, quick sort is always better than merge sort.

(1) Algorithm steps

① Pick out an element from the sequence, called "pivot" (pivot);

② Reorder the sequence, all elements smaller than the reference value are placed in front of the reference value, and all elements larger than the reference value are placed behind the reference value (the same number can go to either side). After this partition exits, the benchmark is in the middle of the sequence. This is called a partition operation;

③ Recursively sort the sub-arrays of elements smaller than the reference value and the sub-arrays of elements greater than the reference value;

The bottom case of recursion is that the size of the sequence is zero or one, that is, it has always been sorted. Although it has been recursively going on, this algorithm will always exit, because in each iteration (iteration), it will put at least one element in its last position.

(2) Animation presentation

(3) Python code

def quickSort(arr, left=None, right=None):
    left = 0 if not isinstance(left,(int, float)) else left
    right = len(arr)-1 if not isinstance(right,(int, float)) else right
    if left < right:
        partitionIndex = partition(arr, left, right)
        quickSort(arr, left, partitionIndex-1)
        quickSort(arr, partitionIndex+1, right)
    return arr

def partition(arr, left, right):
    pivot = left
    index = pivot+1
    i = index
    while  i <= right:
        if arr[i] < arr[pivot]:
            swap(arr, i, index)
            index+=1
        i+=1
    swap(arr,pivot,index-1)
    return index-1

def swap(arr, i, j):
    arr[i], arr[j] = arr[j], arr[i]

7. Heap sort

Heapsort (Heapsort) refers to a sorting algorithm designed using the data structure of the heap. Stacking is a structure that approximates a complete binary tree, and at the same time satisfies the nature of stacking: that is, the key value or index of a child node is always smaller (or larger) than its parent node. Heap sort can be said to be a selection sort that uses the concept of heap to sort. Divided into two methods:

  1. Large top heap: the value of each node is greater than or equal to the value of its child nodes, used in ascending order in the heap sorting algorithm;

  2. Small top heap: the value of each node is less than or equal to the value of its child nodes, used in descending order in the heap sort algorithm;

The average time complexity of heap sort is O(nlogn).

(1) Algorithm steps

  1. Create a heap H[0...n-1];

  2. Swap the heap head (maximum value) and heap tail;

  3. Reduce the size of the heap by 1, and call shift_down(0), the purpose is to adjust the top data of the new array to the corresponding position;

  4. Repeat step 2 until the size of the heap is 1.

(2) Animation presentation

(3) Python code

def buildMaxHeap(arr):
    import math
    for i in range(math.floor(len(arr)/2),-1,-1):
        heapify(arr,i)

def heapify(arr, i):
    left = 2*i+1
    right = 2*i+2
    largest = i
    if left < arrLen and arr[left] > arr[largest]:
        largest = left
    if right < arrLen and arr[right] > arr[largest]:
        largest = right

    if largest != i:
        swap(arr, i, largest)
        heapify(arr, largest)

def swap(arr, i, j):
    arr[i], arr[j] = arr[j], arr[i]

def heapSort(arr):
    global arrLen
    arrLen = len(arr)
    buildMaxHeap(arr)
    for i in range(len(arr)-1,0,-1):
        swap(arr,0,i)
        arrLen -=1
        heapify(arr, 0)
    return arr

8. Counting and sorting

The core of counting sort is to convert the input data value into a key and store it in the additional array space. As a sort of linear time complexity, counting sort requires that the input data must be integers with a certain range.

(1) Animation presentation

(2) Python code

def countingSort(arr, maxValue):
    bucketLen = maxValue+1
    bucket = [0]*bucketLen
    sortedIndex =0
    arrLen = len(arr)
    for i in range(arrLen):
        if not bucket[arr[i]]:
            bucket[arr[i]]=0
        bucket[arr[i]]+=1
    for j in range(bucketLen):
        while bucket[j]>0:
            arr[sortedIndex] = j
            sortedIndex+=1
            bucket[j]-=1
    return arr

9. Bucket Sort

Bucket sort is an upgraded version of counting sort. It makes use of the mapping relationship of functions, and the key to high efficiency lies in the determination of this mapping function. To make bucket sort efficient, we need to do two things:

  1. In the case of sufficient extra space, try to increase the number of buckets

  2. The mapping function used can evenly distribute the input N data into K buckets

At the same time, for the sorting of elements in buckets, the choice of a comparison sorting algorithm is crucial to performance.

when is the fastest

When the input data can be evenly distributed to each bucket.

when is the slowest

When the input data is allocated to the same bucket.

Python code

def bucket_sort(s):
    """桶排序"""
    min_num = min(s)
    max_num = max(s)
    # 桶的大小
    bucket_range = (max_num-min_num) / len(s)
    # 桶数组
    count_list = [ [] for i in range(len(s) + 1)]
    # 向桶数组填数
    for i in s:
        count_list[int((i-min_num)//bucket_range)].append(i)
    s.clear()
    # 回填,这里桶内部排序直接调用了sorted
    for i in count_list:
        for j in sorted(i):
            s.append(j)

if __name__ == __main__ :
    a = [3.2,6,8,4,2,6,7,3]
    bucket_sort(a)
    print(a) # [2, 3, 3.2, 4, 6, 6, 7, 8]

10. Radix sorting

Radix sorting is a non-comparative integer sorting algorithm. Its principle is to cut the integer into different numbers according to the digits, and then compare each digit separately. Since integers can also represent strings (such as names or dates) and floating-point numbers in certain formats, radix sorting is not limited to integers.

Radix sort vs count sort vs bucket sort

There are two methods of radix sorting:

These three sorting algorithms all use the concept of buckets, but there are obvious differences in the use of buckets:

  • Cardinality sorting: buckets are allocated according to each digit of the key value;

  • Counting sort: each bucket only stores a single key value;

  • Bucket sorting: each bucket stores a certain range of values;

Animation presentation

Python code

def RadixSort(list):
    i = 0                                    #初始为个位排序
    n = 1                                     #最小的位数置为1(包含0)
    max_num = max(list) #得到带排序数组中最大数
    while max_num > 10**n: #得到最大数是几位数
        n += 1
    while i < n:
        bucket = {} #用字典构建桶
        for x in range(10):
            bucket.setdefault(x, []) #将每个桶置空
        for x in list: #对每一位进行排序
            radix =int((x / (10**i)) % 10) #得到每位的基数
            bucket[radix].append(x) #将对应的数组元素加入到相 #应位基数的桶中
        j = 0
        for k in range(10):
            if len(bucket[k]) != 0: #若桶不为空
                for y in bucket[k]: #将该桶中每个元素
                    list[j] = y #放回到数组中
                    j += 1
        i += 1
return  list

Guess you like

Origin blog.csdn.net/veratata/article/details/128612229