PHP7 source of array_unique function analysis

The following source code based on PHP 7.3.8

array_unique Array (Array Array $ [, $ int = sort_flags SORT_STRING])
(the PHP. 4> = 4.0.1, the PHP. 5, the PHP. 7)
array_unique - removing duplicate values in array

Parameters:
input array: array.
sort_flag :( optional) sorting type flag, used to modify the sorting behavior, mainly have the following values:

SORT_REGULAR - Comparison of the usual methods (not modify Type)
SORT_NUMERIC - The digital form is more
SORT_STRING - Comparative accordance string
SORT_LOCALE_STRING - according to the current locale, in accordance with the comparison string.

array_unique/Ext/standard/array.c function source code file. due to

PHP_FUNCTION(array_unique){ 
    // code...
}

Length is too long, the complete code is not posted out here, reference may be GitHub source code posted.

Define the variable

    zval *array;
    uint32_t idx;
    Bucket *p;
    struct bucketindex *arTmp, *cmpdata, *lastkept;
    unsigned int i;
    zend_long sort_type = PHP_SORT_STRING; // 默认的排序规则
    compare_func_t cmp;

The first is the definition of variables, array_uniquefunctions default PHP_SORT_STRINGsort PHP_SORT_STRINGis defined in /ext/standard/php_array.h header file.

#define PHP_SORT_STRING             2

We can see the beginning and PHP function sort_flagparameters default predefined constants SORT_STRINGlike.

compare_func_t cmpThis line of code did not understand, do not know what to do. compare_func_tDefined in /Zend/zend_types.h in:

typedef int  (*compare_func_t)(const void *, const void *);

Should be the definition of a point inttype and the return value of the function pointer type with two pointer constant parameters, we found no relevant information, the first detour, continue to look down.

Parameter analysis

    ZEND_PARSE_PARAMETERS_START(1, 2)
        Z_PARAM_ARRAY(array)
        Z_PARAM_OPTIONAL
        Z_PARAM_LONG(sort_type)
    ZEND_PARSE_PARAMETERS_END();

ZEND_PARSE_PARAMETERS_START(1, 2), Will pass the first parameter indicates the number of parameters, the second parameter indicates a maximum number of parameters, i.e., the function parameter is a range of 1-2.

Determining the number of array elements

    if (Z_ARRVAL_P(array)->nNumOfElements <= 1) {  /* nothing to do */
        ZVAL_COPY(return_value, array);
        return;
    }

This code is easy to understand, when the array is empty or only one element without re-operation to directly arraycopied into the new array return_valueto return to.

Persistent memory allocation

This step only if sort_typeis PHP_SORT_STRINGonly executed. Below you can see a call zend_hash_initinitialized array, calling zend_hash_destroythe release of persistent memory.

    if (sort_type == PHP_SORT_STRING) {
        HashTable seen;
        zend_long num_key;
        zend_string *str_key;
        zval *val;
        // 初始化HashTable
        zend_hash_init(&seen, zend_hash_num_elements(Z_ARRVAL_P(array)), NULL, NULL, 0);
        // 初始化数组
        array_init(return_value);
        // 遍历数组
        ZEND_HASH_FOREACH_KEY_VAL_IND(Z_ARRVAL_P(array), num_key, str_key, val) {
            zval *retval;
            // 如果数组元素值是字符串
            if (Z_TYPE_P(val) == IS_STRING) {
                retval = zend_hash_add_empty_element(&seen, Z_STR_P(val));
            } else {
                zend_string *tmp_str_val;
                zend_string *str_val = zval_get_tmp_string(val, &tmp_str_val);
                retval = zend_hash_add_empty_element(&seen, str_val);
                zend_tmp_string_release(tmp_str_val);
            }
            if (retval) {
                /* First occurrence of the value */
                if (UNEXPECTED(Z_ISREF_P(val) && Z_REFCOUNT_P(val) == 1)) {
                    ZVAL_DEREF(val);
                }
                Z_TRY_ADDREF_P(val);
                if (str_key) {
                    zend_hash_add_new(Z_ARRVAL_P(return_value), str_key, val);
                } else {
                    zend_hash_index_add_new(Z_ARRVAL_P(return_value), num_key, val);
                }
            }
        } ZEND_HASH_FOREACH_END();
        // 释放哈希内存
        zend_hash_destroy(&seen);
        return;
    }

Sets the comparison function

    cmp = php_get_data_compare_func(sort_type, 0);
    // 将数组拷贝到返回数组中
    RETVAL_ARR(zend_array_dup(Z_ARRVAL_P(array)));

Comparing the sequence specific function pointer is controlled cmp, through the php_get_data_compare_funcincoming sort_typeand 0obtained, sort_typei.e. SORT_STRINGsuch markers.

php_get_data_compare_funcIn array.cthe definition (i.e., the file array_uniquefunctions in the same file), the code is too long, here only as default marker posted SORT_STRINGcode:

static compare_func_t php_get_data_compare_func(zend_long sort_type, int reverse) /* {{{ */
{
    switch (sort_type & ~PHP_SORT_FLAG_CASE) {
        case PHP_SORT_NUMERIC:
            // code...
        case PHP_SORT_STRING:
            if (sort_type & PHP_SORT_FLAG_CASE) {
                if (reverse) {
                    return php_array_reverse_data_compare_string_case;
                } else {
                    return php_array_data_compare_string_case;
                }
            } else {
                if (reverse) {
                    return php_array_reverse_data_compare_string;
                } else {
                    return php_array_data_compare_string;
                }
            }
            break;
    // code...

In the previous code, we can see that cmp = php_get_data_compare_func(sort_type, 0);the second parameter, i.e., the parameter reversevalue is 0, that is, when sort_typeis PHP_SORT_STRINGinvoked when the php_array_data_compare_stringfunction, i.e. SORT_STRINGusing php_array_data_compare_stringcompared. Continue the php_array_data_compare_stringfunction:

static int php_array_data_compare_string(const void *a, const void *b) /* {{{ */
{
    Bucket *f;
    Bucket *s;
    zval *first;
    zval *second;
    f = (Bucket *) a;
    s = (Bucket *) b;
    first = &f->val;
    second = &s->val;
    if (UNEXPECTED(Z_TYPE_P(first) == IS_INDIRECT)) {
        first = Z_INDIRECT_P(first);
    }
    if (UNEXPECTED(Z_TYPE_P(second) == IS_INDIRECT)) {
        second = Z_INDIRECT_P(second);
    }
    return string_compare_function(first, second);
}
/* }}} */

You can get such a call chain:

SORT_STRING -> php_get_data_compare_func -> php_array_data_compare_string -> string_compare_function;

string_compare_function Is a ZEND API, defined in /Zend/zend_operators.c in:

ZEND_API int ZEND_FASTCALL string_compare_function(zval *op1, zval *op2) /* {{{ */
{
    if (EXPECTED(Z_TYPE_P(op1) == IS_STRING) &&
        EXPECTED(Z_TYPE_P(op2) == IS_STRING)) {
        if (Z_STR_P(op1) == Z_STR_P(op2)) {
            return 0;
        } else {
            return zend_binary_strcmp(Z_STRVAL_P(op1), Z_STRLEN_P(op1), Z_STRVAL_P(op2), Z_STRLEN_P(op2));
        }
    } else {
        zend_string *tmp_str1, *tmp_str2;
        zend_string *str1 = zval_get_tmp_string(op1, &tmp_str1);
        zend_string *str2 = zval_get_tmp_string(op2, &tmp_str2);
        int ret = zend_binary_strcmp(ZSTR_VAL(str1), ZSTR_LEN(str1), ZSTR_VAL(str2), ZSTR_LEN(str2));
        zend_tmp_string_release(tmp_str1);
        zend_tmp_string_release(tmp_str2);
        return ret;
    }
}
/* }}} */

It can be seen SORT_STRINGusing the zend_binary_strcmpfunction string comparison. The following code is zend_binary_strcmpimplemented (also in /Zend/zend_operators.c):

ZEND_API int ZEND_FASTCALL zend_binary_strcmp(const char *s1, size_t len1, const char *s2, size_t len2) /* {{{ */
{
    int retval;
    if (s1 == s2) {
        return 0;
    }
    retval = memcmp(s1, s2, MIN(len1, len2));
    if (!retval) {
        return (int)(len1 - len2);
    } else {
        return retval;
    }
}
/* }}} */

The above code is to compare two strings. I.e. SORT_STRINGthe bottom sort of implementation is the C language memcmp, i.e. its two strings from front to back, in accordance with the by-byte comparison, once the bytes are different, the comparison is terminated and the size.

Sorting an array

    /* create and sort array with pointers to the target_hash buckets */
    // 根据 target_hash buckets 的指针创建数组并排序
    arTmp = (struct bucketindex *) pemalloc((Z_ARRVAL_P(array)->nNumOfElements + 1) * sizeof(struct bucketindex), GC_FLAGS(Z_ARRVAL_P(array)) & IS_ARRAY_PERSISTENT);
    for (i = 0, idx = 0; idx < Z_ARRVAL_P(array)->nNumUsed; idx++) {
        p = Z_ARRVAL_P(array)->arData + idx;
        if (Z_TYPE(p->val) == IS_UNDEF) continue;
        if (Z_TYPE(p->val) == IS_INDIRECT && Z_TYPE_P(Z_INDIRECT(p->val)) == IS_UNDEF) continue;
        arTmp[i].b = *p;
        arTmp[i].i = i;
        i++;
    }
    ZVAL_UNDEF(&arTmp[i].b.val);
    zend_sort((void *) arTmp, i, sizeof(struct bucketindex),
            cmp, (swap_func_t)array_bucketindex_swap);

This code initializes a new array, and then copied to the new array of values, then calls zend_sortsort function to sort the array. Sorting algorithm implemented in /Zend/zend_sort.c, the Note this sentence:

Derived from LLVM's libc++ implementation of std::sort.

This algorithm is based on ordering LLVMof libc++the std::sortimplementation, be quick drain of the optimized version, when the number of elements 16 have less special optimization, when the number of elements 5 or less by direct if elsesorting nested determination. Code is not posted out.

Deduplication Array

Back array_uniqueon, continue to look at the code:

/* go through the sorted array and delete duplicates from the copy */
    lastkept = arTmp;
    for (cmpdata = arTmp + 1; Z_TYPE(cmpdata->b.val) != IS_UNDEF; cmpdata++) {
        if (cmp(lastkept, cmpdata)) {
            lastkept = cmpdata;
        } else {
            if (lastkept->i > cmpdata->i) {
                p = &lastkept->b;
                lastkept = cmpdata;
            } else {
                p = &cmpdata->b;
            }
            if (p->key == NULL) {
                zend_hash_index_del(Z_ARRVAL_P(return_value), p->h);
            } else {
                if (Z_ARRVAL_P(return_value) == &EG(symbol_table)) {
                    zend_delete_global_variable(p->key);
                } else {
                    zend_hash_del(Z_ARRVAL_P(return_value), p->key);
                }
            }
        }
    }
    pefree(arTmp, GC_FLAGS(Z_ARRVAL_P(array)) & IS_ARRAY_PERSISTENT);

Traversing the sorted array, and then delete the duplicate elements.

All known circumferential row fast time complexity is O (nlogn), therefore, array_uniquea function of time complexity is O (nlogn). array_uniqueBottom row is called fast algorithm, increasing the run time overhead of a function, when a large amount of data can result in slower operation of the entire function.

Guess you like

Origin www.cnblogs.com/sunshineliulu/p/11723624.html