The underlying implementation principle of PHP array

Array is a very powerful and flexible data type in PHP. Unlike static languages such as Java and C, we do not need to specify the size and type of stored data when initializing a PHP array. We can use numeric indexing when assigning values. By string index:

Based on the powerful features of PHP arrays, we can easily implement more complex data structures, such as stacks, queues, lists, collections, dictionaries, and so on. The reason why PHP arrays are so powerful is that they are based on hash tables.

PHP array underlying data structure

The underlying hash table data structure of PHP arrays is defined as follows (located Zend/zend_types.h):

There are many members in this hash table, we pick a few more important ones to talk about:

arData: An array of storage elements stored in a hash table, whose memory is continuous and arDatapoints to the starting position of the array;
nTableSize: The total capacity of the array, that is, arDatathe number of elements that can be accommodated, the memory size is determined according to this value, its size is the power of 2, the minimum is 8, and then increase in order of 8, 16, 32 ... ;
nTableMask: This value is the hash function when the element mapping based on a hash value of the key is used, it is the actual value nTableSizeof the negative, i.e. nTableMask = -nTableSize, is represented by bit operation nTableMask = ~nTableSize + 1;
nNumUsed, nNumOfElements: nNumUsedIs currently used group index Bucketnumber, but not the number of elements in the array valid, because after a certain array elements are removed and not immediately removed from the array, but it marked IS_UNDEFonly in case of an array expansion will really deleted, nNumOfElementsit means that the effective number of elements in the array, which calls the countfunction return value, if there is no expansion, nNumUsedhas been incremental, whether or not remove elements;
nNextFreeElement: This is to automatically determine, from zero by default, such as the use of the index value $arr[] = 200, the time nNextFreeElementvalue is automatically incremented by 1;
pDestructor: When deleting or overwriting an element in the array, if this function handle is provided, this function is called when deleting or overwriting to clean up the old element
u: This consortium structure is mainly used for some auxiliary functions.

BucketThe structure is relatively simple, mainly used to save the elements keyand value, and h (a hash value, the hash value or called) is an integer: If the element is an index value, its value is the value of the index values; If the string is index, this value is keyby Time33calculating a hash value obtained by the algorithm, hthe mapping values for storage locations final element. BucketThe data structure is as follows:

Basic implementation of PHP array

Hash table is mainly composed of two parts: storage element array, hash function. The basic implementation of the hash table has been discussed in the algorithm above. In addition to the basic characteristics of the hash table, the array in PHP has a special place, that is, it is ordered (unlike HashMap in Java. The order is different): The order of the elements in the array is the same as the insertion order. How is this achieved?

In order to achieve the ordering of PHP arrays, the underlying hash table of PHP adds a layer of mapping table between the hash function and the element array. This mapping table is also an array. The size is the same as the array of stored elements. The type of stored elements is Integer type, used to save the subscript of the element in the actually stored ordered array-the elements are inserted into the actual storage array in order, and then the array subscript is stored in the newly added position according to the hash function hash list In the mapping table:

In this way, the order of the final stored data can be completed.

PHP array substructure is not explicitly identified in the intermediate map, but with arDataput together, at the time of initializing the array of memory allocated for not only Bucketmemory, but also allocate the same amount of uint32_tspace, which two space is allocated together, and then arDatashifted to the position storage elements of an array, and this intermediate can map by arDataaccessing to the forward.

Array initialization

Initialize the array is mainly aimed at HashTablesetting member, initialization and does not immediately allocate arDatathe memory, after inserting the first element will allocate arDatamemory. Initialization operation by zend_hash_initthe macro is finished, and finally by the _zend_hash_init_intfunction processing (the function is defined in the Zend/zend_hash.cfile):

At this time, HashTableonly the initial value is set and the size of the hash list of other members, can not be used to store elements.

Insert data

When you insert checks whether the array has been allocated storage space, because the initialization did not actually allocated arDatamemory, the first time will be inserted according to nTableSizethe size of the allocated after allocation will HashTable->u.flagsbe marked with HASH_FLAG_INITIALIZEDa mask, so that next time when you insert already found distribution the operation will not be repeated, is located in this check logic _zend_hash_add_or_update_ifunction:

if (UNEXPECTED(!(HT_FLAGS(ht) & HASH_FLAG_INITIALIZED))) {
    zend_hash_real_init_mixed(ht);
    if (!ZSTR_IS_INTERNED(key)) {
        zend_string_addref(key);
        HT_FLAGS(ht) &= ~HASH_FLAG_STATIC_KEYS;
        zend_string_hash_val(key);
    }
    goto add_to_hash;
}

If you arDatahave not assigned, and ultimately by zend_hash_real_init_mixed_exthe memory allocation to complete:

After allocating arDatamemory may be inserted after the operation, the first insertion element is inserted in order arData, and then in the arDataposition storage array according to a keyhash value of nTableMaska corresponding position of the intermediate map calculated in:

just above The most basic insert processing does not involve the overwriting and cleaning of existing data.

Hash collision

The hash table at the bottom of the PHP array uses the chain address method to solve the hash conflict, and the conflicting buckets are connected into a linked list.

HashTableThe Bucketrecords in conflict with the elements of its arDataposition in the array, which is not in a position to save list, the conflict element Bucketstructure, but stored in the storage elements zvalof u2the structure, i.e. Bucket.val.u2.next, it is inserted into two steps:

// 将映射表中原来的值保存到新 Bucket 中，哈希冲突时会用到（以链表方式解决哈希冲突）
Z_NEXT(p->val) = HT_HASH_EX(arData, nIndex);
// 再把新元素数组存储位置更新到数据表中
// 保存idx：((unit32_t*))(ht->arData)[nIndex] = idx
HT_HASH_EX(arData, nIndex) = HT_IDX_TO_HASH(idx);

Array lookup

Clear HashTableafter the solution and achieve the hash collision, the search process is relatively simple: The first keyhash value calculated nTableMaskto give a final hash value calculation nIndex, and then obtained from an intermediate storage element mapping table based on the hash value in the storage position of the ordered array idx, then according to idxthe ordered array of storage (i.e., arDataremoved) and Bucket, traversed Bucket, determination Bucketof keywhether to look for key, if the traversal terminated, otherwise continue in accordance with zval.u2.nextthe comparison traversal.

The corresponding underlying source code is as follows:

https://qcdn.xueyuanjun.com/storage/uploads/images/gallery/2019-10/scaled-1680-/2ce1bc0c91ff172f451d8fb8cae16c38adc8202c5d976f87888869086cbabd62.jpg

delete data

About array data deleted earlier we introduce the hash table nNumUsedand nNumOfElementshave been mentioned during the field, when you remove elements from an array, it does not really remove and re rehash, but when arDataafter full, will remove unnecessary data To improve performance. That will really delete array capacity is needed in the case of elements: First check the proportion of elements in the array has been removed, if the ratio reaches a threshold value of the operation to rebuild the index is triggered, this process will have been deleted Bucketto remove, and then put back the Bucketforward movement fill gaps, if it has not reached the threshold value will be assigned a new array 2 times the original size of the array, and the elements of the original array copied to the new array, and finally rebuild the index, the index will be deleted rebuild Bucketshift except.

The corresponding underlying code is as follows:

In addition, there are many other array operations, such as copy, merge, destroy, reset, etc., the code corresponding to these operations are located zend_hash.cin, interested students can go and see.