Python source code analysis-internal implementation of lists and tuples

Python source code analysis-internal implementation of lists and tuples

Internal implementation of the list

listobject.h:https://github.com/python/cpython/blob/949fe976d5c62ae63ed505ecf729f815d0baccfc/Include/listobject.h#L23

listobject.c: https://github.com/python/cpython/blob/3d75bd15ac82575967db367c517d7e6e703a6de3/Objects/listobject.c#L33

The specific structure of the list:

listc

It is obvious from the source code in Python that the essence of list is an over-allocate array. (More space than actual elements will be allocated, for example, an array has 5 elements, but if it is over-allocate, it may allocate space for 10 elements)

Among them, ob_item is a list of pointers, each pointer in it points to an element of the list, and allocate stores the amount of space this list has been allocated.

It should be noted that there is a difference between the actual space size of allocated and the list. The actual space size of the list can be viewed with the return value of len(list), which is the ib_size commented in the source code, which indicates how many elements are stored in the list. In actual situations, in order to optimize the storage structure and avoid allocating memory every time an element is added, the pre-allocated space of the list allcoated will be greater than ob_size (source code line 31)

Therefore, their relationship is: allocated >= ob_size ie len(list) >=0

When the current list is fully allocated, it will request more memory space from the system and copy all the original elements. Each time the list allocates space, it will follow the following pattern:

 
ob_size = 0
allocated = 0

print(allocated, end=" ")
for item in range(100):
    ob_size += 1
    if ob_size > allocated:
        allocated = ob_size + (ob_size >> 3) + (3 if ob_size < 9 else 6)
        print(allocated, end=" ")
# 0 4 8 16 25 35 46 58 72 88 106 ···

Internal implementation of tuples

tupleobject.h: https://github.com/python/cpython/blob/3d75bd15ac82575967db367c517d7e6e703a6de3/Include/tupleobject.h#L25

tupleobject.c:https://github.com/python/cpython/blob/3d75bd15ac82575967db367c517d7e6e703a6de3/Objects/tupleobject.c#L16

The specific structure of tuple:

c

From the source code, the tuple and the essence are also an array, but the space size is fixed. There are many optimizations for tuples in Python to improve the efficiency of the program.

Tuple static resource cache

Both the list and the tuple are the same when looking up elements through the index, but in addition to the tuple as the key of the dictionary, there is another feature: the allocation speed is very fast, on the one hand due to its immutability, on the other hand One reason is that it also functions as a static resource cache.

For some static variables, such as tuples, if it is not used or takes up little space, Python will temporarily cache this part of the memory, so that when we create a tuple of the same size next time, Python will no longer operate The system sends a request to request memory, but directly allocates the previously cached memory space, which can greatly speed up the running speed of the program.

It can be understood that when the PyTupleObject object is destructed , not only the object itself is not recycled, but even the underlying pointer array is cached.






For the follow-up update of the blog post, please follow my personal blog: Stardust Blog

Guess you like

Origin blog.csdn.net/u011130655/article/details/113019089