Filter List for unique elements

John Allison :

I am searching for an elegant way to filter a list for only the elements that are unique. An example:

   [1, 2, 2, 3, 1, 4]
-> [3, 4] // 1 and 2 occur more than once

Most solutions I found manually compute the occurrences of all elements and then filter by the elements that have exactly one occurrence.

That does not sound too elegant to me, maybe there is a better solution, a best practice or a name for a data-structure that solves this already? I was also thinking about maybe utilizing streams, but I do not know how.


Note that I am not asking for duplicate removal, i.e. [1, 2, 3, 4] but for keeping only the unique elements, so [3, 4].

The order of the resulting list or what type of Collection exactly does not matter to me.

Zabuza :

I doubt there is a better approach than actually counting and filtering for the ones that only appeared once. At least, all approaches I can think of will use something similar to that under the hood.

Also, it is not clear what you mean by elegant, readability or performance? So I will just dump some approaches.


Stream counting

Here is a stream-variant that computes number of occurrences (Map) and then filters for elements that appear only once. It is essentially the same as what you described already, or what Bags do under the hood:

List<E> result = elements.stream() // Stream<E>
    .collect(Collectors.groupingBy(Function.identity(), Collectors.counting())) // Map<E, Long>
    .entries() // Set<Entry<E, Long>>
    .stream()  // Stream<Entry<E, Long>>
    .filter(entry -> entry.getValue() == 1)
    .map(Entry::getKey)
    .collect(Collectors.toList());

It requires two full iterations over the data-set. Since it uses the Stream-API, the operations support multi-threading right from the get-go though. So if you have lots of elements, this might be pretty fast due to that.


Manual Set

Here is another approach that reduces iteration and lookup time by manually collecting into a Set to identify duplicates as fast as possible:

Set<E> result = new HashSet<>();
Set<E> appeared = new HashSet<>();

for (E element : elements) {
    if (result.contains(element)) { // 2nd occurrence
        result.remove(element);
        appeared.add(element);
        continue;
    }
    if (appeared.contains(element)) { // >2nd occurrence
        continue;
    }

    result.add(element); // 1st occurrence
}

As you see, this only requires one iteration over the elements instead of multiple.

This approach is elegant in a sense that it does not compute unnecessary information. For what you want, it is completely irrelevant to compute how often exactly elements appear. We only care for "does it appear once or more often?" and not if it appears 5 times or 11 times.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=323902&siteId=1