How to validate that a Java 8 Stream has two specific elements in it?

Stephane Grenier :

Let's say I have List<Car> and I want to search through that list to verify that I have both a Civic AND a Focus. If it's an OR it's very easy in that I can just apply an OR on the .filter(). Keep in mind that I can't do filter().filter() for this type of AND.

A working solution would be to do:

boolean hasCivic = reportElements.stream()
        .filter(car -> "Civic".equals(car.getModel()))
        .findFirst()
        .isPresent();

boolean hasFocus = reportElements.stream()
        .filter(car -> "Focus".equals(car.getModel()))
        .findFirst()
        .isPresent();

return hasCivic && hasFocus;

But then I'm basically processing the list twice. I can't apply an && in the filter nor can I do filter().filter().

Is there a way to process the stream once to find if the list contains both a Civic and a Focus car?

IMPORTANT UPDATE: The key problem with the solutions provided is that they all guarantee O(n) whereas my solution could be done after just two comparisons. If my list of cars is say 10 million cars then there would be a very significant performance cost. My solution however doesn't feel right, but maybe it is the best solution performance wise...

AJNeufeld :

You could filter the stream on "Civic" or "Focus", and then run a collector on getModel() returning a Set<String>. Then you could test if your set contains both keys.

Set<String> models = reportElements.stream()
       .map(Car::getModel)
       .filter(model -> model.equals("Focus") || model.equals("Civic"))
       .collect(Collectors.toSet());
return models.contains("Focus") && models.contains("Civic");

However, this would process the entire stream; it wouldn't "fast succeed" when both have been found.


The following is a "fast succeed" short-circuiting method. (Updated to include comments and clarifications from comments, below)

return reportElements.stream()
           .map(Car::getModel)
           .filter(model -> model.equals("Focus") || model.equals("Civic"))
           .distinct()
           .limit(2)
           .count() == 2;

Breaking the stream operations down one at a time, we have:

           .map(Car::getModel)

This operation transforms the stream of cars into a stream of car models. We do this for efficiency. Instead of calling car.getModel() multiple times in various places in the remainder of the pipeline (twice in the filter(...) to test against each of the desired models, and again for the distinct() operation), we apply this mapping operation once. Note that this does not create the "temporary map" mentioned in the comments; it merely translates the car into the car's model for the next stage of the pipeline.

           .filter(model -> model.equals("Focus") || model.equals("Civic"))

This filters the stream of car models, allowing only the "Focus" and "Civic" car models to pass.

           .distinct()

This pipeline operation is a stateful intermediate operation. It remembers each car model that it sees in a temporary Set. (This is likely the "temporary map" mentioned in the comments.) Only if the model does not exist in the temporary set, will it be (a) added to the set, and (b) passed on to the next stage of the pipeline.

At this point in the pipeline, there can only be at most two elements in the stream: "Focus" or "Civic" or neither or both. We know this because we know the filter(...) will only ever pass those two models, and we know that distinct() will remove any duplicates.

However, this stream pipeline itself does not know that. It would continue to pass car objects to the map stage to be converted into model strings, pass these models to the filter stage, and send on any matching items to the distinct stage. It cannot tell that this is futile, because it doesn't understand that nothing else can pass through the algorithm; it simple executes the instructions.

But we do understand. At most two distinct models can pass through the distinct() stage. So, we follow this with:

           .limit(2)

This is a short-circuiting stateful intermediate operation. It maintains a count of the number of items which pass through, and after the indicated amount, it terminates the stream, causing all subsequent items to be discarded without even starting down the pipeline.

At this point in the pipeline, there can only be at most two elements in the stream: "Focus" or "Civic" or neither or both. But if both, then the stream has been truncated and is at the end.

           .count() == 2;

Count up the number of items that made it through the pipeline, and test against the desired number.

If we found both models, the stream will immediate terminate, count() will return 2, and true will be returned. If both models are not present, of course, the stream is processed until the bitter end, count() will return a value less that two, and false will result.


Example, using an infinite stream of models. Every third model is a "Civic", every 7th model is a "Focus", the remainder are all "Model #":

boolean matched = IntStream.iterate(1, i -> i + 1)
    .mapToObj(i -> i % 3 == 0 ? "Civic" : i % 7 == 0 ? "Focus" : "Model "+i)
    .peek(System.out::println)
    .filter(model -> model.equals("Civic") || model.equals("Focus"))
    .peek(model -> System.out.println("  After filter:   " + model))
    .distinct()
    .peek(model -> System.out.println("  After distinct: " + model))
    .limit(2)
    .peek(model -> System.out.println("  After limit:    " + model))
    .count() == 2;
System.out.println("Matched = "+matched);

Output:

Model 1
Model 2
Civic
  After filter:   Civic
  After distinct: Civic
  After limit:    Civic
Model 4
Model 5
Civic
  After filter:   Civic
Focus
  After filter:   Focus
  After distinct: Focus
  After limit:    Focus
Matched = true

Notice that 3 models got through the filter(), but only 2 made it past distinct() and limit(). More importantly, notice that true was returned long before the end of the infinite stream of models was reached.


Generalizing the solution, since the OP wants something that could work with people, or credit cards, or IP addresses, etc., and the search criteria is probably not a fixed set of two items:

Set<String> models = Set.of("Focus", "Civic");

return reportElements.stream()
           .map( Car::getModel )
           .filter( models::contains )
           .distinct()
           .limit( models.size() )
           .count() == models.size();

Here, given an arbitrary models set, existence of any particular set of car models may be obtained, not limited to just 2.

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=429538&siteId=1