How to find duplicate values based upon first 10 digits?

Raj Raichand :

I have a scenario where i have a list as below :

List<String> a1 = new ArrayList<String>();  
a1.add("1070045028000");
a1.add("1070045028001");
a1.add("1070045052000");
a1.add("1070045086000");
a1.add("1070045052001");
a1.add("1070045089000");

I tried below to find duplicate elements but it will check whole string instead of partial string(first 10 digits).

for (String s:al){
         if(!unique.add(s)){  
             System.out.println(s);
         }
     }

Is there any possible way to identify all duplicates based upon the first 10 digits of a number & then find the lowest strings by comparing from the duplicates & add in to another list?

Note: Also there will be only 2 duplicates with each 10 digit string code always!!

Holger :

A straight-forward loop solution would be

List<String> a1 = Arrays.asList("1070045028000", "1070045028001",
    "1070045052000", "1070045086000", "1070045052001", "1070045089000");

Set<String> unique = new HashSet<>();
Map<String,String> map = new HashMap<>();

for(String s: a1) {
    String firstTen = s.substring(0, 10);
    if(!unique.add(firstTen)) map.put(firstTen, s);
}
for(String s1: a1) {
    String firstTen = s1.substring(0, 10);
    map.computeIfPresent(firstTen, (k, s2) -> s1.compareTo(s2) < 0? s1: s2);
}
List<String> minDup = new ArrayList<>(map.values());

First, we add all duplicates to a Map, then we iterate over the list again and select the minimum for all values present in the map.

Alternatively, we may add all elements to a map, collecting them into lists, then select the minimum out of those, which have a size bigger than one:

List<String> minDup = new ArrayList<>();
Map<String,List<String>> map = new HashMap<>();

for(String s: a1) {
    map.computeIfAbsent(s.substring(0, 10), x -> new ArrayList<>()).add(s);
}
for(List<String> list: map.values()) {
    if(list.size() > 1) minDup.add(Collections.min(list));
}

This logic is directly expressible with the Stream API:

List<String> minDup = a1.stream()
    .collect(Collectors.groupingBy(s -> s.substring(0, 10)))
    .values().stream()
    .filter(list -> list.size() > 1)
    .map(Collections::min)
    .collect(Collectors.toList());

Since you said that there will be only 2 duplicates per key, the overhead of collecting a List before selecting the minimum is negligible.


The solutions above assume that you only want to keep values having duplicates. Otherwise, you can use

List<String> minDup = a1.stream()
    .collect(Collectors.collectingAndThen(
        Collectors.toMap(s -> s.substring(0, 10), Function.identity(),
            BinaryOperator.minBy(Comparator.<String>naturalOrder())),
        m -> new ArrayList<>(m.values())));

which is equivalent to

Map<String,String> map = new HashMap<>();
for(String s: a1) {
    map.merge(s.substring(0, 10), s, BinaryOperator.minBy(Comparator.naturalOrder()));
}
List<String> minDup = new ArrayList<>(map.values());

Common to those solutions is that you don’t have to identify duplicates first, as when you want to keep unique values too, the task reduces to selecting the minimum when encountering a minimum.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324329&siteId=1