Should I stream multiple times or do all calculations in one stream?

Somaiah Kumbera :

I have the following code:

mostRecentMessageSentDate = messageInfoList
    .stream()
    .findFirst().orElse(new MessageInfo())
    .getSentDate();

unprocessedMessagesCount = messageInfoList
    .stream()
    .filter(messageInfo -> messageInfo.getProcessedDate() == null)
    .count();

hasAttachment = messageInfoList
    .stream()
    .anyMatch(messageInfo -> messageInfo.getAttachmentCount() > 0);

As you can see, I stream the same list 3 times, because I want to find 3 different values. If I did this in a For-Each loop, I could loop just once.

Is it better, performance wise to do this a for loop then, so that I loop only once? I find the streams much more readable.

Edit: I ran some tests:

import java.util.List;
import java.util.stream.Collectors;
import java.util.stream.IntStream;

public class Main {

public static void main(String[] args) {

    List<Integer> integerList = populateList();

    System.out.println("Stream time: " + timeStream(integerList));
    System.out.println("Loop time: " + timeLoop(integerList));

}

private static List<Integer> populateList() {
    return IntStream.range(0, 10000000)
            .boxed()
            .collect(Collectors.toList());
}

private static long timeStream(List<Integer> integerList) {
    long start = System.currentTimeMillis();

    Integer first = integerList
            .stream()
            .findFirst().orElse(0);

    long containsNumbersGreaterThan10000 = integerList
            .stream()
            .filter(i -> i > 10000)
            .count();

    boolean has10000 = integerList
            .stream()
            .anyMatch(i -> i == 10000);

    long end = System.currentTimeMillis();

    System.out.println("first: " + first);
    System.out.println("containsNumbersGreaterThan10000: " + containsNumbersGreaterThan10000);
    System.out.println("has10000: " + has10000);

    return end - start;
}

private static long timeLoop(List<Integer> integerList) {
    long start = System.currentTimeMillis();

    Integer first = 0;
    boolean has10000 = false;
    int count = 0;
    long containsNumbersGreaterThan10000 = 0L;
    for (Integer i : integerList) {
        if (count == 0) {
            first = i;
        }

        if (i > 10000) {
            containsNumbersGreaterThan10000++;
        }

        if (!has10000 && i == 10000) {
            has10000 = true;
        }

        count++;
    }

    long end = System.currentTimeMillis();

    System.out.println("first: " + first);
    System.out.println("containsNumbersGreaterThan10000: " + containsNumbersGreaterThan10000);
    System.out.println("has10000: " + has10000);

    return end - start;
}
}

and as expected, the for loop is always faster than the streams

first: 0
containsNumbersGreaterThan10000: 9989999
has10000: true
Stream time: 57
first: 0
containsNumbersGreaterThan10000: 9989999
has10000: true
Loop time: 38

But never significantly.

The findFirst was probably a bad example, because it just quits if the stream is empty, but I wanted to know if it made a difference.

I was hoping to get a solution that allowed multiple calculations from one stream. IntSummaryStatistics dont do exactly what I want. I think I'll heed @florian-schaetz and stick to favouring readbility for a marginal performance increase

Magnilex :

You don't iterate through the collection three times.

mostRecentMessageSentDate = messageInfoList
        .stream()
        .findFirst().orElse(new MessageInfo())
        .getSentDate();

The above checks if there are any elements in the collection and returns a value depending on this. It doesn't need to go through the whole collection.

unprocessedMessagesCount = messageInfoList
        .stream()
        .filter(messageInfo -> messageInfo.getProcessedDate() == null)
        .count();

This one needs to filter out all elements without a process date and counts them, so this one goes through the whole collection.

hasAttachment = messageInfoList
        .stream()
        .anyMatch(messageInfo -> messageInfo.getAttachmentCount() > 0);

The above only needs to go through the elements until it finds a message with an attachment.

So, of the three streams, only one of them is required to go through the whole collection, in the worst case you do the iteration two times (the second, and potentionally the third stream).

This could probably be done more efficient with a regular For-Each loop, but do you really need it? If your collection only contains a few objects, I wouldn't bother optimizing it.

However, with a traditional For-Each loop, you could combine the last two streams:

int unprocessedMessagesCount = 0;
boolean hasAttachment = false;

for (MessageInfo messageInfo: messageInfoList) {
  if (messageInfo.getProcessedDate() == null) {
    unprocessedMessagesCount++;
  }
  if (hasAttachment == false && messageInfo.getAttachmentCount() > 0) {
    hasAttachment = true;
  }
}

It is really up to you if you think this is a better solution (I also find the streams more readable). I don't see a way to combine the three streams into one, at least not in a more readable way.

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=457609&siteId=1