PHP hash table implementation

PHP hash achieve

PHP kernel hash table data structure is important, most of the PHP language features are implemented based on a hash table, for example: the scope of variables, function table, attribute classes, methods, internal engine Zend much of the data are stored in the hash table.

Data structure and description

PHP on a mention in the hash table is to use the zipper method to resolve the conflict, the specific point of speaking is to use a linked list to store the hash to the same slot of data, Zend in order to preserve the relationship between the data used doubly linked list to link elements.

Hash table structure

PHP hash table implemented in Zend / zend_hash.c in a bar or in accordance with the way, look at the data structure implemented in PHP, PHP using the following two hash table data structure to achieve, by the HashTable structure hash table to save the entire basic required information, and the structure for holding Bucket specific data content, as follows:

typedef struct _hashtable { 
    nTableSize uint ; // Bucket size of the hash, a minimum of 8 to 2x growth. nTableMask uint ; //-nTableSize. 1, the index value of the optimization nNumOfElements uint ; // number of elements in the hash Bucket currently exist, count () function returns the value directly nNextFreeElement ulong ; // next digit position index Bucket pInternalPointer * ; // pointer to the current iteration (foreach faster than for one reason) Bucket * pListHead ; // pointer to the first element memory array Bucket * pListTail ; // the last element pointer storage array Bucket ** arBuckets ; // store hash pDestructor array dtor_func_t ; // callback function that executes when you delete an element, for release zend_bool persistent resources ; // pointed out the way Bucket memory allocation. If persisient is TRUE, then use the operating system's own memory allocation function allocates memory for the Bucket, otherwise the PHP memory allocation function. unsigned char nApplyCount ;// mark the current number of hash Bucket recursively access (prevent multiple recursive) zend_bool bApplyProtection ; // mark the current hash bucket permit does not allow multiple access, is not allowed, a maximum of three times a recursive #if Zend_Debug int Inconsistent ; # endif } the HashTable ;

NTableSize field for indicating the capacity of the hash table, the hash table is the minimum initial capacity is 8. First look at the initialization function hash table:

ZEND_API int _zend_hash_init(HashTable *ht, uint nSize, hash_func_t pHashFunction, dtor_func_t pDestructor, zend_bool persistent ZEND_FILE_LINE_DC) { uint i = 3; //... if (nSize >= 0x80000000) { /* prevent overflow */ ht->nTableSize = 0x80000000; } else { while ((1U << i) < nSize) { i++; } ht->nTableSize = 1 << i; } // ... ht->nTableMask = ht->nTableSize - 1;   /* Uses ecalloc() so that Bucket* == NULL */ if (persistent) { tmp = (Bucket **) calloc(ht->nTableSize, sizeof(Bucket *)); if (!tmp) { return FAILURE; } ht->arBuckets = tmp; } else { tmp = (Bucket **) ecalloc_rel(ht->nTableSize, sizeof(Bucket *)); if (tmp) { ht->arBuckets = tmp; } }   return SUCCESS; }

For example, if the initial size set to 10, then the above algorithm will be resized to 16. That is always resized to be close to the initial size of a power of two.

Why do this to adjust it? We look HashTable method of mapping the hash value of the slot, the previous section we use the modulo hash value to be mapped to the slot, such as the size of the hash table 8, the hash value 100 , slot mapped to the index: 100% 8 = 4, because the index usually start from 0, so the slot index value 3, the index is calculated in the following manner using PHP:

h = zend_inline_hash_func(arKey, nKeyLength);
nIndex = h & ht->nTableMask;

() Function can be seen from the above _zend_hash_init, ht-> nTableMask size is ht-> nTableSize -1. As used herein, & modulo operation instead, since the consumption is relatively modulo operation and a much larger and bitwise operations.

The effect of mapping the mask to the hash value within the index range of the slot can be stored. For example: a key index value is 21, the size of the hash table is 8, the mask 7, and the seek time of the binary represented as: = 111 & 101 10101 is decimal 5. Because the power of two special binary -1: N-bit value is 1 behind, it is easier to be able to map the value, if an ordinary binary numbers will affect the results of the following hash value. Then the mean of the distribution of the hash function calculation on the impact may occur.

After setting up the hash table size of hash table space is required for the application to store data, and initialization code is as above, depending on whether persist and call different memory application method. PHP life cycle as previously described, the need to persist reflected in: persistent content access across multiple requests, rather than persistent storage will free up space occupied at the end of the request. Details are described in the memory management section.

nNumOfElements fields in the HashTable well understood, this field will be updated every insert an element or elements unset deleted. In this way during the count () will be able to return quickly when the number of array elements function statistics.

nNextFreeElement field is very useful. Look at the section of PHP code:

<?php
$a = array(10 => 'Hello'); $a[] = 'TIPI'; var_dump($a);   // ouput array(2) { [10]=> string(5) "Hello" [11]=> string(5) "TIPI" }

PHP can not specify an index value adding elements to the array, then the default number as an index, and C language enum similar, and the index of this element in the end how much is decided by the nNextFreeElement the field. If the digital key array exists, it will use the default key + 1 recently used, for example, the present embodiment has the element 10 as the key, so that the newly inserted a default index is 11 to.

Data container: Slot

Save the hash table to see data below slot data structure:

typedef struct bucket {
    H ulong ; // char * key on a value after the hash, or user-specified numerical index values nKeyLength uint ; // hash key length, and if the array index number, this value is 0 void * pData ; // point value, generally a copy of the user data, if the data is a pointer, points to pDataPtr void * pDataPtr ; / / If the data is a pointer, this value points to real value, while above this value pData points to struct bucket * pListNext ; // next element of the entire hash table struct bucket * pListLast ; // entire upper element of the hash table an element struct bucket * pNext ; // stored in the same hash bucket next element struct bucket * Plast ; // over an element of the same hash bucket // save the key value for the current string, the field It is defined only in the last, variable-length structure realized char Arkey [ . 1 ] ; } Bucket ;

The above comment fields. The key hash value h field holds the hash table. Here saved hash value instead of the index value in a hash table, because the capacity of the index value and the hash table is directly related to the expansion if the hash table, then the index will have to re-hash making index mapping, which is an optimization tool. String or number can be used as an index into an array in PHP. Direct digital index can be used as an index of the hash table, figures also do not need to be hashed. h nKeyLength field behind the field is marked as the key length, if the index is a number, then nKeyLength 0. If the index array in PHP may be converted into a digital string will be converted to digital index. Therefore, for example, in PHP '10', '11' character index and index numbers 10, 11 of this type there is no difference.

The last field structure used to hold the string above the key, and this field has only declared as an array of characters, in fact, there is a long common variable length structure , the main purpose is to increase flexibility. The following is the code for the hash table when inserting a new element of space applications

p = (Bucket *) pemalloc(sizeof(Bucket) - 1 + nKeyLength, ht->persistent); if (!p) { return FAILURE; } memcpy(p->arKey, arKey, nKeyLength);

As the code, plus the space to apply the length of the string key, and the key is then copied to the new application in space. For example, when the need for later lookup hash requires key so that you can compare> arKey lookup key and performs the same whether the location of the data by comparing p-. -1 application because the size of the space that byte structures in the body itself can still be used.

This field is defined in the PHP5.4 into the const char * arKey type.

 

Zend Engine hash table structures and relationships
Zend Engine hash table structures and relationships

 

FIG from the network .

  • Bucket structure maintains two doubly linked list, pNext and pLast list pointers point to the relationship of the present slot is located.
  • The pListNext and pListLast pointer to the entire hash table is a link between all the data. HashTable pListHead structure body and the maintenance of the head element pListTail entire hash table pointer and a pointer to the last element.

PHP function operation in very large array, for example: array_shift () and array_pop () function, respectively, and the pop-up element from the head end of the array. Hash tables hold the head and tail pointers, so that the target can be found in constant time when performing these operations. PHP there are some relatively much used in array manipulation functions: cycle next (), prev () or the like, a further pointer to the hash table can play a role: pInternalPointer, it is used to keep the current hashing inside the table pointer. This is very useful when cycling.

In the lower left corner in FIG hypothesis assumed successively inserted Bucket1, Bucket2, Bucket3 three elements:

  1. When inserted Bucket1, the hash table is empty, it is positioned after the hash index to 1 slot. 1 slot this time only one element Bucket1. Wherein the Bucket1 or pDataPtr pData points to the data storage Bucket1. At this time, since there is no link relationship. pNext, pLast, pListNext, pListLast hands are empty. HashTable structure while also stored in the first element of the entire hash table pointer, and the pointer to the last element, and at this time the HashTable pListHead pListTail pointers point to Bucket1.
  2. Bucket2 insertion, since the key key and Bucket1 Bucket2 of conflict, this time in front of the Bucket2 doubly-linked list. Since the distal end is inserted and placed in the list after Bucket2, at this time point Bucket2.pNext Bucket1, since after insertion Bucket2. Bucket1.pListNext point Bucket2, then Bucket2 is the last element of a hash table, which is HashTable.pListTail point Bucket2.
  3. Insert Bucket3, the hash key is not to slot 1, time point Bucket2.pListNext Bucket3, because after the insertion Bucket3. Meanwhile HashTable.pListTail to point Bucket3.

Bucket is simply to maintain the structure of the hash table insertion sequence elements in the hash table, the hash table structure to maintain the head and tail of the entire hash table. Always maintain the relationship between the budget during the operation of the hash table.

Hash table user interface

And on a similar, will brief operation interface PHP hash table. Providing the user interface the following types:

  • Initialization operation, e.g. zend_hash_init () function, the hash table is used to initialize the interface, allocate space.
  • Search, insert, delete, and update the user interface, which is more conventional operations.
  • And iterative loop, such interfaces for the operation cycle of the hash table.
  • Operating copying, sorting, inverted and destruction.

Select this section wherein the insert introduced. In PHP whether to add an array of operations (zend_hash_add), or an array of update operations (zend_hash_update), which ultimately calls _zend_hash_add_or_update function is completed, which is equivalent to two and a common public methods in object-oriented programming structure proprietary method to achieve a certain degree of code reuse.

 
ZEND_API int _zend_hash_add_or_update(HashTable *ht, const char *arKey, uint nKeyLength, void *pData, uint nDataSize, void **pDest, int flag ZEND_FILE_LINE_DC) { //...省略变量初始化和nKeyLength <=0 的异常处理   h = zend_inline_hash_func(arKey, nKeyLength); nIndex = h & ht->nTableMask;   p = ht->arBuckets[nIndex]; while (p != NULL) { if ((p->h == h) && (p->nKeyLength == nKeyLength)) { if (!memcmp(p->arKey, arKey, nKeyLength)) { // 更新操作 if (flag & HASH_ADD) { return FAILURE; } HANDLE_BLOCK_INTERRUPTIONS();   //..省略debug输出 if (ht->pDestructor) { ht->pDestructor(p->pData); } UPDATE_DATA(ht, p, pData, nDataSize); if (pDest) { *pDest = p->pData; } HANDLE_UNBLOCK_INTERRUPTIONS(); return SUCCESS; } } p = p->pNext; }   p = (Bucket *) pemalloc(sizeof(Bucket) - 1 + nKeyLength, ht->persistent); if (!p) { return FAILURE; } memcpy(p->arKey, arKey, nKeyLength); p->nKeyLength = nKeyLength; INIT_DATA(ht, p, pData, nDataSize); p->h = h; CONNECT_TO_BUCKET_DLLIST(p, ht->arBuckets[nIndex]); //Bucket双向链表操作 if (pDest) { *pDest = p->pData; }   HANDLE_BLOCK_INTERRUPTIONS(); CONNECT_TO_GLOBAL_DLLIST(p, ht) ; // add a new element to the rearmost ht Bucket link table of the array -> arBuckets [nIndex ] = P ; HANDLE_UNBLOCK_INTERRUPTIONS ( ) ; ht -> nNumOfElements ++; ZEND_HASH_IF_FULL_DO_RESIZE (ht ) ; / * If at this time the array the capacity is full, its expansion. * / Return SUCCESS ; }

Writing or updating the entire procedure is as follows:

  1. Generating a hash value by performing the operation nTableMask acquired in Bucket arBuckets array.
  2. If Bucket already exists in the element, then traverse the entire Bucket, look for the existence of the same key value element, and if there is an update call, then perform the update data operations.
  3. Bucket creating a new element, the initialization data, and adds a new element to the current list Bucket hash values ​​corresponding to front (CONNECT_TO_BUCKET_DLLIST).
  4. Adding a new element to the rearmost Bucket linked table array (CONNECT_TO_GLOBAL_DLLIST).
  5. The number of elements plus 1, if the capacity of the array is full at this time, then its expansion. Judgment here is the size of nNumOfElements and nTableSize basis. If nNumOfElements> nTableSize will be called zend_hash_do_resize way 2X expansion (nTableSize << 1).

Guess you like

Origin www.cnblogs.com/yixiaogo/p/11139871.html