Array is a very powerful and flexible data type in PHP. Unlike static languages such as Java and C, we do not need to specify the size and type of stored data when initializing a PHP array. We can use numeric indexing when assigning values. By string index:
Based on the powerful features of PHP arrays, we can easily implement more complex data structures, such as stacks, queues, lists, collections, dictionaries, and so on. The reason why PHP arrays are so powerful is that they are based on hash tables.
PHP array underlying data structure
The underlying hash table data structure of PHP arrays is defined as follows (located Zend/zend_types.h
):
There are many members in this hash table, we pick a few more important ones to talk about:
arData
: An array of storage elements stored in a hash table, whose memory is continuous andarData
points to the starting position of the array;nTableSize
: The total capacity of the array, that is,arData
the number of elements that can be accommodated, the memory size is determined according to this value, its size is the power of 2, the minimum is 8, and then increase in order of 8, 16, 32 ... ;nTableMask
: This value is the hash function when the element mapping based on a hash value of the key is used, it is the actual valuenTableSize
of the negative, i.e.nTableMask = -nTableSize
, is represented by bit operationnTableMask = ~nTableSize + 1
;nNumUsed
,nNumOfElements
:nNumUsed
Is currently used group indexBucket
number, but not the number of elements in the array valid, because after a certain array elements are removed and not immediately removed from the array, but it markedIS_UNDEF
only in case of an array expansion will really deleted,nNumOfElements
it means that the effective number of elements in the array, which calls thecount
function return value, if there is no expansion,nNumUsed
has been incremental, whether or not remove elements;nNextFreeElement
: This is to automatically determine, from zero by default, such as the use of the index value$arr[] = 200
, the timenNextFreeElement
value is automatically incremented by 1;pDestructor
: When deleting or overwriting an element in the array, if this function handle is provided, this function is called when deleting or overwriting to clean up the old elementu
: This consortium structure is mainly used for some auxiliary functions.
Bucket
The structure is relatively simple, mainly used to save the elements key
and value
, and h (a hash value, the hash value or called) is an integer: If the element is an index value, its value is the value of the index values; If the string is index, this value is key
by Time33
calculating a hash value obtained by the algorithm, h
the mapping values for storage locations final element. Bucket
The data structure is as follows:
Basic implementation of PHP array
Hash table is mainly composed of two parts: storage element array, hash function. The basic implementation of the hash table has been discussed in the algorithm above. In addition to the basic characteristics of the hash table, the array in PHP has a special place, that is, it is ordered (unlike HashMap in Java. The order is different): The order of the elements in the array is the same as the insertion order. How is this achieved?
In order to achieve the ordering of PHP arrays, the underlying hash table of PHP adds a layer of mapping table between the hash function and the element array. This mapping table is also an array. The size is the same as the array of stored elements. The type of stored elements is Integer type, used to save the subscript of the element in the actually stored ordered array-the elements are inserted into the actual storage array in order, and then the array subscript is stored in the newly added position according to the hash function hash list In the mapping table:
In this way, the order of the final stored data can be completed.
PHP array substructure is not explicitly identified in the intermediate map, but with arData
put together, at the time of initializing the array of memory allocated for not only Bucket
memory, but also allocate the same amount of uint32_t
space, which two space is allocated together, and then arData
shifted to the position storage elements of an array, and this intermediate can map by arData
accessing to the forward.
Array initialization
Initialize the array is mainly aimed at HashTable
setting member, initialization and does not immediately allocate arData
the memory, after inserting the first element will allocate arData
memory. Initialization operation by zend_hash_init
the macro is finished, and finally by the _zend_hash_init_int
function processing (the function is defined in the Zend/zend_hash.c
file):
At this time, HashTable
only the initial value is set and the size of the hash list of other members, can not be used to store elements.
Insert data
When you insert checks whether the array has been allocated storage space, because the initialization did not actually allocated arData
memory, the first time will be inserted according to nTableSize
the size of the allocated after allocation will HashTable->u.flags
be marked with HASH_FLAG_INITIALIZED
a mask, so that next time when you insert already found distribution the operation will not be repeated, is located in this check logic _zend_hash_add_or_update_i
function:
if (UNEXPECTED(!(HT_FLAGS(ht) & HASH_FLAG_INITIALIZED))) {
zend_hash_real_init_mixed(ht);
if (!ZSTR_IS_INTERNED(key)) {
zend_string_addref(key);
HT_FLAGS(ht) &= ~HASH_FLAG_STATIC_KEYS;
zend_string_hash_val(key);
}
goto add_to_hash;
}
If you arData
have not assigned, and ultimately by zend_hash_real_init_mixed_ex
the memory allocation to complete:
After allocating arData
memory may be inserted after the operation, the first insertion element is inserted in order arData
, and then in the arData
position storage array according to a key
hash value of nTableMask
a corresponding position of the intermediate map calculated in:
just above The most basic insert processing does not involve the overwriting and cleaning of existing data.
Hash collision
The hash table at the bottom of the PHP array uses the chain address method to solve the hash conflict, and the conflicting buckets are connected into a linked list.
HashTable
The Bucket
records in conflict with the elements of its arData
position in the array, which is not in a position to save list, the conflict element Bucket
structure, but stored in the storage elements zval
of u2
the structure, i.e. Bucket.val.u2.next
, it is inserted into two steps:
// 将映射表中原来的值保存到新 Bucket 中,哈希冲突时会用到(以链表方式解决哈希冲突)
Z_NEXT(p->val) = HT_HASH_EX(arData, nIndex);
// 再把新元素数组存储位置更新到数据表中
// 保存idx:((unit32_t*))(ht->arData)[nIndex] = idx
HT_HASH_EX(arData, nIndex) = HT_IDX_TO_HASH(idx);
Array lookup
Clear HashTable
after the solution and achieve the hash collision, the search process is relatively simple: The first key
hash value calculated nTableMask
to give a final hash value calculation nIndex
, and then obtained from an intermediate storage element mapping table based on the hash value in the storage position of the ordered array idx
, then according to idx
the ordered array of storage (i.e., arData
removed) and Bucket
, traversed Bucket
, determination Bucket
of key
whether to look for key
, if the traversal terminated, otherwise continue in accordance with zval.u2.next
the comparison traversal.
The corresponding underlying source code is as follows:
https://qcdn.xueyuanjun.com/storage/uploads/images/gallery/2019-10/scaled-1680-/2ce1bc0c91ff172f451d8fb8cae16c38adc8202c5d976f87888869086cbabd62.jpg
delete data
About array data deleted earlier we introduce the hash table nNumUsed
and nNumOfElements
have been mentioned during the field, when you remove elements from an array, it does not really remove and re rehash
, but when arData
after full, will remove unnecessary data To improve performance. That will really delete array capacity is needed in the case of elements: First check the proportion of elements in the array has been removed, if the ratio reaches a threshold value of the operation to rebuild the index is triggered, this process will have been deleted Bucket
to remove, and then put back the Bucket
forward movement fill gaps, if it has not reached the threshold value will be assigned a new array 2 times the original size of the array, and the elements of the original array copied to the new array, and finally rebuild the index, the index will be deleted rebuild Bucket
shift except.
The corresponding underlying code is as follows:
In addition, there are many other array operations, such as copy, merge, destroy, reset, etc., the code corresponding to these operations are located zend_hash.c
in, interested students can go and see.