The following source code based on PHP 7.3.8
array_unique Array (Array Array $ [, $ int = sort_flags SORT_STRING])
(the PHP. 4> = 4.0.1, the PHP. 5, the PHP. 7)
array_unique - removing duplicate values in arrayParameters:
input array: array.
sort_flag :( optional) sorting type flag, used to modify the sorting behavior, mainly have the following values:SORT_REGULAR - Comparison of the usual methods (not modify Type)
SORT_NUMERIC - The digital form is more
SORT_STRING - Comparative accordance string
SORT_LOCALE_STRING - according to the current locale, in accordance with the comparison string.
array_unique
/Ext/standard/array.c function source code file. due to
PHP_FUNCTION(array_unique){
// code...
}
Length is too long, the complete code is not posted out here, reference may be GitHub source code posted.
Define the variable
zval *array;
uint32_t idx;
Bucket *p;
struct bucketindex *arTmp, *cmpdata, *lastkept;
unsigned int i;
zend_long sort_type = PHP_SORT_STRING; // 默认的排序规则
compare_func_t cmp;
The first is the definition of variables, array_unique
functions default PHP_SORT_STRING
sort PHP_SORT_STRING
is defined in /ext/standard/php_array.h header file.
#define PHP_SORT_STRING 2
We can see the beginning and PHP function sort_flag
parameters default predefined constants SORT_STRING
like.
compare_func_t cmp
This line of code did not understand, do not know what to do. compare_func_t
Defined in /Zend/zend_types.h in:
typedef int (*compare_func_t)(const void *, const void *);
Should be the definition of a point int
type and the return value of the function pointer type with two pointer constant parameters, we found no relevant information, the first detour, continue to look down.
Parameter analysis
ZEND_PARSE_PARAMETERS_START(1, 2)
Z_PARAM_ARRAY(array)
Z_PARAM_OPTIONAL
Z_PARAM_LONG(sort_type)
ZEND_PARSE_PARAMETERS_END();
ZEND_PARSE_PARAMETERS_START(1, 2)
, Will pass the first parameter indicates the number of parameters, the second parameter indicates a maximum number of parameters, i.e., the function parameter is a range of 1-2.
Determining the number of array elements
if (Z_ARRVAL_P(array)->nNumOfElements <= 1) { /* nothing to do */
ZVAL_COPY(return_value, array);
return;
}
This code is easy to understand, when the array is empty or only one element without re-operation to directly array
copied into the new array return_value
to return to.
Persistent memory allocation
This step only if sort_type
is PHP_SORT_STRING
only executed. Below you can see a call zend_hash_init
initialized array
, calling zend_hash_destroy
the release of persistent memory.
if (sort_type == PHP_SORT_STRING) {
HashTable seen;
zend_long num_key;
zend_string *str_key;
zval *val;
// 初始化HashTable
zend_hash_init(&seen, zend_hash_num_elements(Z_ARRVAL_P(array)), NULL, NULL, 0);
// 初始化数组
array_init(return_value);
// 遍历数组
ZEND_HASH_FOREACH_KEY_VAL_IND(Z_ARRVAL_P(array), num_key, str_key, val) {
zval *retval;
// 如果数组元素值是字符串
if (Z_TYPE_P(val) == IS_STRING) {
retval = zend_hash_add_empty_element(&seen, Z_STR_P(val));
} else {
zend_string *tmp_str_val;
zend_string *str_val = zval_get_tmp_string(val, &tmp_str_val);
retval = zend_hash_add_empty_element(&seen, str_val);
zend_tmp_string_release(tmp_str_val);
}
if (retval) {
/* First occurrence of the value */
if (UNEXPECTED(Z_ISREF_P(val) && Z_REFCOUNT_P(val) == 1)) {
ZVAL_DEREF(val);
}
Z_TRY_ADDREF_P(val);
if (str_key) {
zend_hash_add_new(Z_ARRVAL_P(return_value), str_key, val);
} else {
zend_hash_index_add_new(Z_ARRVAL_P(return_value), num_key, val);
}
}
} ZEND_HASH_FOREACH_END();
// 释放哈希内存
zend_hash_destroy(&seen);
return;
}
Sets the comparison function
cmp = php_get_data_compare_func(sort_type, 0);
// 将数组拷贝到返回数组中
RETVAL_ARR(zend_array_dup(Z_ARRVAL_P(array)));
Comparing the sequence specific function pointer is controlled cmp
, through the php_get_data_compare_func
incoming sort_type
and 0
obtained, sort_type
i.e. SORT_STRING
such markers.
php_get_data_compare_func
In array.c
the definition (i.e., the file array_unique
functions in the same file), the code is too long, here only as default marker posted SORT_STRING
code:
static compare_func_t php_get_data_compare_func(zend_long sort_type, int reverse) /* {{{ */
{
switch (sort_type & ~PHP_SORT_FLAG_CASE) {
case PHP_SORT_NUMERIC:
// code...
case PHP_SORT_STRING:
if (sort_type & PHP_SORT_FLAG_CASE) {
if (reverse) {
return php_array_reverse_data_compare_string_case;
} else {
return php_array_data_compare_string_case;
}
} else {
if (reverse) {
return php_array_reverse_data_compare_string;
} else {
return php_array_data_compare_string;
}
}
break;
// code...
In the previous code, we can see that cmp = php_get_data_compare_func(sort_type, 0);
the second parameter, i.e., the parameter reverse
value is 0, that is, when sort_type
is PHP_SORT_STRING
invoked when the php_array_data_compare_string
function, i.e. SORT_STRING
using php_array_data_compare_string
compared. Continue the php_array_data_compare_string
function:
static int php_array_data_compare_string(const void *a, const void *b) /* {{{ */
{
Bucket *f;
Bucket *s;
zval *first;
zval *second;
f = (Bucket *) a;
s = (Bucket *) b;
first = &f->val;
second = &s->val;
if (UNEXPECTED(Z_TYPE_P(first) == IS_INDIRECT)) {
first = Z_INDIRECT_P(first);
}
if (UNEXPECTED(Z_TYPE_P(second) == IS_INDIRECT)) {
second = Z_INDIRECT_P(second);
}
return string_compare_function(first, second);
}
/* }}} */
You can get such a call chain:
SORT_STRING -> php_get_data_compare_func -> php_array_data_compare_string -> string_compare_function;
string_compare_function
Is a ZEND API, defined in /Zend/zend_operators.c in:
ZEND_API int ZEND_FASTCALL string_compare_function(zval *op1, zval *op2) /* {{{ */
{
if (EXPECTED(Z_TYPE_P(op1) == IS_STRING) &&
EXPECTED(Z_TYPE_P(op2) == IS_STRING)) {
if (Z_STR_P(op1) == Z_STR_P(op2)) {
return 0;
} else {
return zend_binary_strcmp(Z_STRVAL_P(op1), Z_STRLEN_P(op1), Z_STRVAL_P(op2), Z_STRLEN_P(op2));
}
} else {
zend_string *tmp_str1, *tmp_str2;
zend_string *str1 = zval_get_tmp_string(op1, &tmp_str1);
zend_string *str2 = zval_get_tmp_string(op2, &tmp_str2);
int ret = zend_binary_strcmp(ZSTR_VAL(str1), ZSTR_LEN(str1), ZSTR_VAL(str2), ZSTR_LEN(str2));
zend_tmp_string_release(tmp_str1);
zend_tmp_string_release(tmp_str2);
return ret;
}
}
/* }}} */
It can be seen SORT_STRING
using the zend_binary_strcmp
function string comparison. The following code is zend_binary_strcmp
implemented (also in /Zend/zend_operators.c):
ZEND_API int ZEND_FASTCALL zend_binary_strcmp(const char *s1, size_t len1, const char *s2, size_t len2) /* {{{ */
{
int retval;
if (s1 == s2) {
return 0;
}
retval = memcmp(s1, s2, MIN(len1, len2));
if (!retval) {
return (int)(len1 - len2);
} else {
return retval;
}
}
/* }}} */
The above code is to compare two strings. I.e. SORT_STRING
the bottom sort of implementation is the C language memcmp
, i.e. its two strings from front to back, in accordance with the by-byte comparison, once the bytes are different, the comparison is terminated and the size.
Sorting an array
/* create and sort array with pointers to the target_hash buckets */
// 根据 target_hash buckets 的指针创建数组并排序
arTmp = (struct bucketindex *) pemalloc((Z_ARRVAL_P(array)->nNumOfElements + 1) * sizeof(struct bucketindex), GC_FLAGS(Z_ARRVAL_P(array)) & IS_ARRAY_PERSISTENT);
for (i = 0, idx = 0; idx < Z_ARRVAL_P(array)->nNumUsed; idx++) {
p = Z_ARRVAL_P(array)->arData + idx;
if (Z_TYPE(p->val) == IS_UNDEF) continue;
if (Z_TYPE(p->val) == IS_INDIRECT && Z_TYPE_P(Z_INDIRECT(p->val)) == IS_UNDEF) continue;
arTmp[i].b = *p;
arTmp[i].i = i;
i++;
}
ZVAL_UNDEF(&arTmp[i].b.val);
zend_sort((void *) arTmp, i, sizeof(struct bucketindex),
cmp, (swap_func_t)array_bucketindex_swap);
This code initializes a new array, and then copied to the new array of values, then calls zend_sort
sort function to sort the array. Sorting algorithm implemented in /Zend/zend_sort.c, the Note this sentence:
Derived from LLVM's libc++ implementation of std::sort.
This algorithm is based on ordering LLVM
of libc++
the std::sort
implementation, be quick drain of the optimized version, when the number of elements 16 have less special optimization, when the number of elements 5 or less by direct if else
sorting nested determination. Code is not posted out.
Deduplication Array
Back array_unique
on, continue to look at the code:
/* go through the sorted array and delete duplicates from the copy */
lastkept = arTmp;
for (cmpdata = arTmp + 1; Z_TYPE(cmpdata->b.val) != IS_UNDEF; cmpdata++) {
if (cmp(lastkept, cmpdata)) {
lastkept = cmpdata;
} else {
if (lastkept->i > cmpdata->i) {
p = &lastkept->b;
lastkept = cmpdata;
} else {
p = &cmpdata->b;
}
if (p->key == NULL) {
zend_hash_index_del(Z_ARRVAL_P(return_value), p->h);
} else {
if (Z_ARRVAL_P(return_value) == &EG(symbol_table)) {
zend_delete_global_variable(p->key);
} else {
zend_hash_del(Z_ARRVAL_P(return_value), p->key);
}
}
}
}
pefree(arTmp, GC_FLAGS(Z_ARRVAL_P(array)) & IS_ARRAY_PERSISTENT);
Traversing the sorted array, and then delete the duplicate elements.
All known circumferential row fast time complexity is O (nlogn), therefore, array_unique
a function of time complexity is O (nlogn). array_unique
Bottom row is called fast algorithm, increasing the run time overhead of a function, when a large amount of data can result in slower operation of the entire function.