Foreword
2895 word article, read it takes about 12 minutes.
In summary: This article summarizes 10 common array deduplication method, and various methods were compared.
- No public: "front-end Advanced Learning ', replies," 666 ", get a package of front-end technology books
Smoke past all forgotten, selfless world wide .
text
Deduplication Array is not a common requirement for front-end, the general gave back end to do, but it is an interesting question, and often appear in the interview the interviewer to examine the degree of mastery of JS. From the data type of point of view and an array of deduplication this question, we solve an array of only basic data type situation, then go heavy object. The first is our test data:
var meta = [
0,
'0',
true,
false,
'true',
'false',
null,
undefined,
Infinity,
{},
[],
function(){},
{ a: 1, b: 2 },
{ b: 2, a: 1 },
];
var meta2 = [
NaN,
NaN,
Infinity,
{},
[],
function(){},
{ a: 1, b: 2 },
{ b: 2, a: 1 },
];
var sourceArr = [...meta, ... Array(1000000)
.fill({})
.map(() => meta[Math.floor(Math.random() * meta.length)]),
...meta2];
Hereinafter all references sourceArr
are the above variables. sourceArr
It contains 1000008
pieces of data. It should be noted NaN
that it is the only one JS and their strict unequal value.
Then our goal is to top sourceArr
the array to get weight:
// 长度为14的数组
[false, "true", Infinity, true, 0, [], {}, "false", "0", null, undefined, {a: 1, b: 2}, NaN, function(){}]
Basic data types
1. ES6 in Set
ES6 This is a very common method for simple data types to weight basis, can use this method directly 扩展运算符 + Set
:
console.time('ES6中Set耗时:');
var res = [...new Set(sourceArr)];
console.timeEnd('ES6中Set耗时:');
// ES6中Set耗时:: 28.736328125ms
console.log(res);
// 打印数组长度20: [false, "true", Infinity, true, 0, [], [], {b: 2, a: 1}, {b: 2, a: 1}, {}, {}, "false", "0", null, undefined, {a: 1, b: 2}, {a: 1, b: 2}, NaN, function(){}, function(){}]
Or use Array.from + Set
:
console.time('ES6中Set耗时:');
var res = Array.from(new Set(sourceArr));
console.timeEnd('ES6中Set耗时:');
// ES6中Set耗时:: 28.538818359375ms
console.log(res);
// 打印数组长度20:[false, "true", Infinity, true, 0, [], [], {b: 2, a: 1}, {b: 2, a: 1}, {}, {}, "false", "0", null, undefined, {a: 1, b: 2}, {a: 1, b: 2}, NaN, function(){}, function(){}]
Advantages: simple and convenient, can be distinguished NaN
;
Disadvantages: able to identify identical objects and arrays;
Simple scene recommend using this method to weight.
2. Use the indexOf
Use the built-indexOf method to find:
function unique(arr) {
if (!Array.isArray(arr)) return;
var result = [];
for (var i = 0; i < arr.length; i++) {
if (array.indexOf(arr[i]) === -1) {
result.push(arr[i])
}
}
return result;
}
console.time('indexOf方法耗时:');
var res = unique(sourceArr);
console.timeEnd('indexOf方法耗时:');
// indexOf方法耗时:: 23.376953125ms
console.log(res);
// 打印数组长度21: [false, "true", Infinity, true, 0, [], [], {b: 2, a: 1}, {b: 2, a: 1}, {}, {}, "false", "0", null, undefined, {a: 1, b: 2}, {a: 1, b: 2}, NaN,NaN, function(){}, function(){}]
Advantages : ES5 following general methods, high compatibility, easy to understand;
Drawback : it can not be distinguished NaN
; require special handling;
You can use the following environment ES6.
3. Use inculdes method
And indexOf
similar, but inculdes
is ES7 (ES2016) new API:
function unique(arr) {
if (!Array.isArray(arr)) return;
var result = [];
for (var i = 0; i < arr.length; i++) {
if (!result.includes(arr[i])) {
result.push(arr[i])
}
}
return result;
}
console.time('includes方法耗时:');
var res = unique(sourceArr);
console.timeEnd('includes方法耗时:');
// includes方法耗时:: 32.412841796875ms
console.log(res);
// 打印数组长度20:[false, "true", Infinity, true, 0, [], [], {b: 2, a: 1}, {b: 2, a: 1}, {}, {}, "false", "0", null, undefined, {a: 1, b: 2}, {a: 1, b: 2}, NaN, function(){}, function(){}]
Advantages : can be distinguished NaN
;
Shortcomings : high ES version requirements, and indexOf
methods compared to the time-consuming;
4. Use the filter and method indexOf
This method is ingenious, by determining whether the current index value and the index is equal to the lookup to determine whether the filter element:
function unique(arr) {
if (!Array.isArray(arr)) return;
return arr.filter(function(item, index, arr) {
//当前元素,在原始数组中的第一个索引==当前索引值,否则返回当前元素
return arr.indexOf(item, 0) === index;
});
}
console.time('filter和indexOf方法耗时:');
var res = unique(sourceArr);
console.timeEnd('filter和indexOf方法耗时:');
// includes方法耗时:: 24.135009765625ms
console.log(res);
// 打印数组长度19:[false, "true", Infinity, true, 0, [], [], {b: 2, a: 1}, {b: 2, a: 1}, {}, {}, "false", "0", null, undefined, {a: 1, b: 2}, {a: 1, b: 2}, function(){}, function(){}]
Advantages : the function code shortened by using higher order;
Drawback : Because indexOf
not find NaN
, therefore NaN
be ignored.
This method is very elegant, very little amount of code, but the structure and use the Set weight compared to still fly in the ointment.
5. 利用reduce+includes
It is also a clever use of two higher-order functions:
var unique = (arr) => {
if (!Array.isArray(arr)) return;
return arr.reduce((prev,cur) => prev.includes(cur) ? prev : [...prev,cur],[]);
}
var res = unique(sourceArr);
console.time('reduce和includes方法耗时:');
var res = unique(sourceArr);
console.timeEnd('reduce和includes方法耗时:');
// reduce和includes方法耗时:: 100.47802734375ms
console.log(res);
// 打印数组长度20:[false, "true", Infinity, true, 0, [], [], {b: 2, a: 1}, {b: 2, a: 1}, {}, {}, "false", "0", null, undefined, {a: 1, b: 2}, {a: 1, b: 2}, NaN, function(){}, function(){}]
Advantages : the function code shortened by using higher order;
Disadvantage : ES Version high, slower speed;
Also very elegant, but if this method can be used, also can be used to re-structure Set.
6. Use Map structure
Use the map to achieve:
function unique(arr) {
if (!Array.isArray(arr)) return;
let map = new Map();
let result = [];
for (let i = 0; i < arr.length; i++) {
if(map .has(arr[i])) {
map.set(arr[i], true);
} else {
map.set(arr[i], false);
result.push(arr[i]);
}
}
return result;
}
console.time('Map结构耗时:');
var res = unique(sourceArr);
console.timeEnd('Map结构耗时:');
// Map结构耗时:: 41.483154296875ms
console.log(res);
// 打印数组长度20:[false, "true", Infinity, true, 0, [], [], {b: 2, a: 1}, {b: 2, a: 1}, {}, {}, "false", "0", null, undefined, {a: 1, b: 2}, {a: 1, b: 2}, NaN, function(){}, function(){}]
Set structure compared to the heavy consumption for a long time, it is not recommended.
7. Double nested, deleting duplicate the splice element
This is relatively common, double traverse of the array, pick repeat elements:
function unique(arr){
if (!Array.isArray(arr)) return;
for(var i = 0; i < arr.length; i++) {
for(var j = i + 1; j< arr.length; j++) {
if(Object.is(arr[i], arr[j])) {// 第一个等同于第二个,splice方法删除第二个
arr.splice(j,1);
j--;
}
}
}
return arr;
}
console.time('双层嵌套方法耗时:');
var res = unique(sourceArr);
console.timeEnd('双层嵌套方法耗时:');
// 双层嵌套方法耗时:: 41500.452880859375ms
console.log(res);
// 打印数组长度20: [false, "true", Infinity, true, 0, [], [], {b: 2, a: 1}, {b: 2, a: 1}, {}, {}, "false", "0", null, undefined, {a: 1, b: 2}, {a: 1, b: 2}, NaN, function(){}, function(){}]
Advantages : high compatibility.
Disadvantages : low performance, high time complexity.
Not recommended.
8. A method of using a sort
This idea is very simple, it is to use sort
the method to sort the array, and then loop through the array, and the adjacent elements of different elements singled out:
function unique(arr) {
if (!Array.isArray(arr)) return;
arr = arr.sort((a, b) => a - b);
var result = [arr[0]];
for (var i = 1; i < arr.length; i++) {
if (arr[i] !== arr[i-1]) {
result.push(arr[i]);
}
}
return result;
}
console.time('sort方法耗时:');
var res = unique(sourceArr);
console.timeEnd('sort方法耗时:');
// sort方法耗时:: 936.071044921875ms
console.log(res);
// 数组长度357770,剩余部分省略
// 打印:(357770) [Array(0), Array(0), 0...]
Advantages : no;
Disadvantage : time-consuming, the sorted data is not controlled;
Not recommended, because the method does not use the sort sort of numeric types 0
and strings type '0'
sort lead to a lot of redundant data exists.
The above method is only for the underlying data type, function without regard for the array of objects, the following look at how to weight the same object.
Object
The following implementation and utilization of this structure is similar to Map, as used herein, do not overlap the object key features to achieve
And using a filter 9. hasOwnProperty
Use filter
and hasOwnProperty
methods:
function unique(arr) {
if (!Array.isArray(arr)) return;
var obj = {};
return arr.filter(function(item, index, arr) {
return obj.hasOwnProperty(typeof item + item) ? false : (obj[typeof item + item] = true)
})
}
console.time('hasOwnProperty方法耗时:');
var res = unique(sourceArr);
console.timeEnd('hasOwnProperty方法耗时:');
// hasOwnProperty方法耗时:: 258.528076171875ms
console.log(res);
// 打印数组长度13: [false, "true", Infinity, true, 0, [], {}, "false", "0", null, undefined, NaN, function(){}]
Advantages : simple code, the same array of objects may be distinguished function;
Disadvantages : higher version required, because you want to find the whole prototype chain and therefore lower performance;
The method uses the object key will not be repeated thereby distinguishing characteristics of objects and arrays, but the above is by 类型+值
doing key way, so {a: 1, b: 2}
and {}
are treated as the same data. Therefore, this method also has shortcomings.
10. The use of a key target of unique characteristics
This method and use of Map
similar structure, but key
of different composition:
function unique(arr) {
if (!Array.isArray(arr)) return;
var result = [];
var obj = {};
for (var i = 0; i < arr.length; i++) {
var key = typeof arr[i] + JSON.stringify(arr[i]) + arr[i];
if (!obj[key]) {
result.push(arr[i]);
obj[key] = 1;
} else {
obj[key]++;
}
}
return result;
}
console.time('对象方法耗时:');
var res = unique(sourceArr);
console.timeEnd('对象方法耗时:');
// 对象方法耗时:: 585.744873046875ms
console.log(res);
// 打印数组长度15: [false, "true", Infinity, true, 0, [], {b: 2, a: 1}, {}, "false", "0", null, undefined, {a: 1, b: 2}, NaN, function(){}]
This method is relatively mature, and eliminates duplicate object repeating array, but the image {a: 1, b: 2}
and {b: 2, a: 1}
this can not be distinguished because of the two objects JSON.stringify()
strings, respectively obtained after {"a":1,"b":2}
and {"b":2,"a":1}
, thus calculated two key values different. Plus a method of determining whether the objects are equal like, read as follows:
function isObject(obj) {
return Object.prototype.toString.call(obj) === '[object Object]';
}
function unique(arr) {
if (!Array.isArray(arr)) return;
var result = [];
var obj = {};
for (var i = 0; i < arr.length; i++) {
// 此处加入对象和数组的判断
if (Array.isArray(arr[i])) {
arr[i] = arr[i].sort((a, b) => a - b);
}
if (isObject(arr[i])) {
let newObj = {}
Object.keys(arr[i]).sort().map(key => {
newObj[key]= arr[i][key];
});
arr[i] = newObj;
}
var key = typeof arr[i] + JSON.stringify(arr[i]) + arr[i];
if (!obj[key]) {
result.push(arr[i]);
obj[key] = 1;
} else {
obj[key]++;
}
}
return result;
}
console.time('对象方法耗时:');
var res = unique(sourceArr);
console.timeEnd('对象方法耗时:');
// 对象方法耗时:: 793.142822265625ms
console.log(res);
// 打印数组长度14: [false, "true", Infinity, true, 0, [], {b: 2, a: 1}, {}, "false", "0", null, undefined, NaN, function(){}]
in conclusion
method | advantage | Shortcoming |
---|---|---|
Set in ES6 | Simple and elegant, fast | Underlying type is recommended . High version requirements, and does not support an array of objectsNaN |
Use indexOf | ES5 following general methods, high compatibility, ease of understanding | Can not be distinguished NaN ; require special handling |
Use inculdes method | Can distinguishNaN |
High ES version requirements, and indexOf methods compared to the time-consuming |
Use filter and method indexOf | The function code shortened by using higher order; | Because indexOf can not find NaN , therefore NaN it is ignored. |
利用reduce+includes | The function code shortened by using higher order; | ES7 older to use a slower speed; |
Use Map structure | No significant advantage | ES6 or more, |
Double nested, deleting duplicate the splice element | High compatibility | Low performance, high complexity and time, if not used Object.is to determine the need for NaN special handling, extremely slow. |
Using the sort method | no | Time-consuming, the sorted data is not controlled; |
Use hasOwnProperty and filter | : Code simple, the same objects may be distinguished array function | High version requirements, because you want to find the whole prototype chain and therefore lower performance; |
Using the object key features will not be repeated | Elegant, wide range of data | Object recommended . The code is more complex. |
Limited capacity, the level of general, welcomed the errata, be grateful.
Subscribe more articles may be concerned about the public number "front-end Advanced Learning ', replies," 666 ", get a package of front-end technology books