Article directory
Recently, there was another round of code review, and I found some codes that implement duplication removal. I am using list.contain...
I thought about it, do many beginners also have this problem of deduplication?
So I chose to sort this out and share it.
text
First, create a List of simulated data, with a total of 2,000 items. Half of the data, 1,000 items, are duplicates:
public static List<String> getTestList() {
List<String> list = new ArrayList<>();
for (int i = 1; i <= 10000; i++) {
list.add(String.valueOf(i));
}
for (int i = 10000; i >= 1; i--) {
list.add(String.valueOf(i));
}
return list;
}
contains to remove duplicates
Let’s first look at the code we use contain to remove duplicates:
/**
* 使用 list.contain 去重
*
* @param testList
*/
private static void useContain2Distinct(List<String> testList) {
System.out.println("contains 开始去重,条数:" + testList.size());
List<String> testListDistinctResult = new ArrayList<>();
for (String str : testList) {
if (!testListDistinctResult.contains(str)) {
testListDistinctResult.add(str);
}
}
System.out.println("contains 去重完毕,条数:" + testListDistinctResult.size());
}
Let’s call it and see how long it takes:
public static void main(String[] args) {
List<String> testList = getTestList();
StopWatch stopWatch = new StopWatch();
stopWatch.start();
useContainDistinct(testList);
stopWatch.stop();
System.out.println("去重 最终耗时" + stopWatch.getTotalTimeMillis());
}
time consuming:
Evaluation: The efficiency of list.contain. My suggestion is, just know it and don’t use it.
As we all know, there is no duplicate data in Set, so let’s take a look at the performance of using HashSet to remove duplication:
Set to remove duplicates
ps: Here we use the add method of set to remove duplicates.
/**
* 使用set去重
*
* @param testList
*/
private static void useSetDistinct(List<String> testList) {
System.out.println("HashSet.add 开始去重,条数:" + testList.size());
List<String> testListDistinctResult = new ArrayList<>(new HashSet(testList));
System.out.println("HashSet.add 去重完毕,条数:" + testListDistinctResult.size());
}
Let’s call it and see how long it takes:
public static void main(String[] args) {
List<String> testList = getTestList();
StopWatch stopWatch = new StopWatch();
stopWatch.start();
useSetDistinct(testList);
stopWatch.stop();
System.out.println("去重 最终耗时" + stopWatch.getTotalTimeMillis());
}
time consuming:
Evaluation: The efficiency of HashSet, my suggestion is recommended.
Why is the time difference so big?
Without further ado, let’s look at the source code:
list.contains(o):
You can see that index(o) is used inside:
Time complexity: O(n) n: number of elements
So let's see what set.add(o) looks like:
Add of map, I won’t talk about the cliché. After hashing, it is directly stuffed into a certain position. Time complexity: O(1).
So which one is faster, O(n) or O(1)? Obviously.
ps: By the way, let’s talk about the contain of hashset.
The time complexity is also: O(1)
So let’s finally look at other deduplications:
Double for loop, remove to remove duplicates
/**
* 使用双for循环去重
* @param testList
*/
private static void use2ForDistinct(List<String> testList) {
System.out.println("list 双循环 开始去重,条数:" + testList.size());
for (int i = 0; i < testList.size(); i++) {
for (int j = i + 1; j < testList.size(); j++) {
if (testList.get(i).equals(testList.get(j))) {
testList.remove(j);
}
}
}
System.out.println("list 双循环 去重完毕,条数:" + testList.size());
}
public static void main(String[] args) {
List<String> testList = getTestList();
StopWatch stopWatch = new StopWatch();
stopWatch.start();
use2ForDistinct(testList);
stopWatch.stop();
System.out.println("去重 最终耗时" + stopWatch.getTotalTimeMillis());
}
time consuming:
Evaluation: Just know it, it’s just for fun, don’t use it, it’s too slow, and the code looks messy
Distinct deduplication of stream:
/**
* 使用Stream 去重
*
* @param testList
*/
private static void useStreamDistinct(List<String> testList) {
System.out.println("stream 开始去重,条数:" + testList.size());
List<String> testListDistinctResult = testList.stream().distinct().collect(Collectors.toList());
System.out.println("stream 去重完毕,条数:" + testListDistinctResult.size());
}
public static void main(String[] args) {
List<String> testList = getTestList();
StopWatch stopWatch = new StopWatch();
stopWatch.start();
useStreamDistinct(testList);
stopWatch.stop();
System.out.println("去重 最终耗时" + stopWatch.getTotalTimeMillis());
}
time consuming:
Evaluation: Not bad, mainly because the code is quite concise and a little bit tempting.