[Data Structure]Python Heapq Library--Small Top Heap

1. Introduction to heapq library

The heapq library is one of the Python standard libraries. It provides methods for building small top heaps and some basic operations on small top heaps (such as entering and exiting the heap, etc.), which can be used to implement heap sorting algorithms.

A heap is a basic data structure. The structure of a heap is a complete binary tree, and it satisfies the nature of accumulation: the value of each node (except leaf nodes) is greater than or equal to (or less than or equal to) its child nodes.

The heap structure is divided into a large top heap and a small top heap. The small top heap is used in heapq:

1. Large top heap: the value of each node (except leaf nodes) is greater than or equal to the value of its child nodes, and the value of the root node is the largest among all nodes.

2. Small top heap: the value of each node (except leaf nodes) is less than or equal to the value of its child nodes, and the value of the root node is the smallest among all nodes.

In the heapq library, the data type used by heapq is the basic data type list of Python. To satisfy the nature of accumulation, in this list, the value of index k must be less than or equal to the value of index 2*k+1 and index 2*k The value of +2 (in a complete binary tree, the data is inserted in breadth first, and the child node indexes of the node with index k are 2*k+1 and 2*k+2 respectively). It is also introduced in the source code of the heapq library, you can read the source code of heapq, there are not many codes.

Using Python to implement heap sorting can refer to: https://blog.csdn.net/weixin_43790276/article/details/104033696

The characteristics of a complete binary tree can be referred to: https://blog.csdn.net/weixin_43790276/article/details/104737870

# heapq_showtree.py 
import math
from io import StringIO


def show_tree(tree, total_width=36, fill=' '):
    """Pretty-print a tree."""
    output = StringIO()
    last_row = -1
    for i, n in enumerate(tree):
        if i:
            row = int(math.floor(math.log(i + 1, 2)))
        else:
            row = 0
        if row != last_row:
            output.write('\n')
        columns = 2 ** row
        col_width = int(math.floor(total_width / columns))
        output.write(str(n).center(col_width, fill))
        last_row = row
    print(output.getvalue())
    print('-' * total_width)
    print()

Second, use heapq to create a heap

array = [10, 17, 50, 7, 30, 24, 27, 45, 15, 5, 36, 21]
heap = []
for num in array:
    heapq.heappush(heap, num)
print("array:", array)
print("heap: ", heap)
show_tree(heap)
 
heapq.heapify(array)
print("array:", array)
show_tree(array)

operation result:

 

There are two ways to create a heap in heapq.

heappush(heap, num), first create an empty heap, and then add data to the heap one by one. After each piece of data is added, the heap satisfies the characteristics of the small top heap. When used heappush(), the heap order is preserved as new elements are added.

heapify(array), directly adjust the data list into a small top heap (for the principle of adjustment, refer to the article on heap sorting above, and the heapq library has already been implemented).

If the data is already in memory, use heapify()to more efficiently rearrange the elements in the list.

The results of the two methods will be different. In the above code, the heap structure obtained by using heappush(heap, num) is as follows

 

 The heap structure obtained by using heapify(array) is as follows:

However, both of these results satisfy the characteristics of the small top heap and do not affect the use of the heap (the heap will only fetch data from the top of the heap, and the structure will be readjusted after the data is fetched).

3. Use heapq to implement heap sorting

array = [10, 17, 50, 7, 30, 24, 27, 45, 15, 5, 36, 21]
heap = []
for num in array:
    heapq.heappush(heap, num)
print(heap)
for i in range(2):
    smallest = heapq.heappop(heap)
    print('pop    {:>3}:'.format(smallest))
    show_tree(data)

operation result:

 

 First add the data in the list to be sorted to the heap, construct a small top heap, print the first data, and confirm that it is the minimum value. Then take out the values ​​at the top of the heap one by one and add them to a new list until the data in the heap is taken out, and the new list is the sorted list.

4. Get the minimum or maximum value in the heap

array = [10, 17, 50, 7, 30, 24, 27, 45, 15, 5, 36, 21]
heapq.heapify(array)
print(heapq.nlargest(2, array))
print(heapq.nsmallest(3, array))

operation result:

[50, 45]
[5, 7, 10]

nlargest(num, heap), fetch num data from the heap, starting from the largest data, the return result is a list (even if only one data is fetched). If num is greater than or equal to the amount of data in the heap, all the data in the heap will be taken out from large to small, and no error will be reported, which is equivalent to realizing descending sorting.

nsmallest(num, heap), take out num data from the heap, starting from the smallest data, and return the result as a list.

These two methods can be used not only for heaps, but also for lists directly, with the same function.
 

5. Use heapq to merge two ordered lists

Combining several sorted sequences into a new sequence is easy for small datasets.

list(sorted(itertools.chain(*data)))
array_a = [10, 7, 15, 8]
array_b = [17, 3, 8, 20, 13]
array_merge = heapq.merge(sorted(array_a), sorted(array_b))
print("merge result:", list(array_merge))

operation result:

merge result: [3, 7, 8, 8, 10, 13, 15, 17, 20]

merge(list1, list2), merges two sorted lists into a new sorted list, and returns an iterator. This method can be used for merge sort.

For larger datasets, a large amount of memory will be used. Instead of sorting the entire combined sequence, use to generate a new sequence one merge()at a time .

import heapq
import random


random.seed(2022)

data = []
for i in range(4):
    new_data = list(random.sample(range(1, 101), 5))
    new_data.sort()
    data.append(new_data)

for i, d in enumerate(data):
    print('{}: {}'.format(i, d))

print('\nMerged:')
for i in heapq.merge(*data):
    print(i, end=' ')
print()

Because of merge()the heap implementation, it consumes memory based on the number of sequence elements being merged, not the number of elements in all sequences.

  1. heapq.merge()In the iterative operation, there is no one-time operation on the provided sequence, and very long sequences can be processed with low overhead.
  2. heapq.merge()The premise of the method is that all input sequences are required to be ordered.
  3. heapq.merge()method does not pre-sort.
  4. heapq.merge()method does not verify that the input sequence satisfies the requirements.
  5. heapq.merge()The method examines the first element of each sequence, compares it, puts the smallest one in the new sequence, and then selects the next smaller element from each of the previous sequences. Repeat this until a complete new sequence is generated.

Six, heapq method to replace data

array_c = [10, 7, 15, 8]
heapq.heapify(array_c)
print("before:", array_c)
# 先push再pop
item = heapq.heappushpop(array_c, 5)
print("after: ", array_c)
print(item)
 
array_d = [10, 7, 15, 8]
heapq.heapify(array_d)
print("before:", array_d)
# 先pop再push
item = heapq.heapreplace(array_d, 5)
print("after: ", array_d)
print(item)
before: [7, 8, 15, 10]
after:  [7, 8, 15, 10]
5
before: [7, 8, 15, 10]
after:  [5, 8, 15, 10]
7

heappushpop(heap, num), first add num to the heap, and then remove the data at the top of the heap.

heapreplace(heap, num), first remove the data at the top of the heap, and then add num to the heap.

The two methods are both into the heap and out of the heap, but the order is different, and can be used to replace the data in the heap. The specific difference can be seen in the example in the code.

array = [10, 17, 50, 7, 30, 24, 27, 45, 15, 5, 36, 21]
heap = []
for num in array:
    heapq.heappush(heap, num)

show_tree(heap)

for n in [0, 13]:
    smallest = heapq.heapreplace(heap, n)
    print('replace {:>2} with {:>2}:'.format(smallest, n))
    show_tree(heap)

Replacement elements can maintain a fixed-size heap, such as a prioritized queue of jobs.

 

Guess you like

Origin blog.csdn.net/zwqjoy/article/details/124529796