What is more efficient: sorted stream or sorting a list?

lexicore :

Assume we have some items in a collection and we want to sort them using certain comparator, expecting result in a list:

Collection<Item> items = ...;
Comparator<Item> itemComparator = ...;

One of the approaches is to sort items in a list, something like:

List<Item> sortedItems = new ArrayList<>(items);
Collections.sort(sortedItems, itemComparator);

Anothe approach is using a sorted stream:

List<Item> sortedItems = items
    .stream()
    .sorted(itemComparator)
    .collect(Collectors.toList());

I wonder, which approach is more efficient? Are there any advantages of a sorted stream (like faste sorting on multiple cores)?

Efficient in a sense of runtime complexity/fastest.

I don't trust myself to implement a perfect benchmark and studying SortedOps did not really enlighten me.

Stephen C :

It is safe to say that two forms of sort will have the same complexity ... even without looking at the code. (If they didn't then one form would be severely broken!)

Looking at Java 8 source code for streams (specifically the internal class java.util.stream.SortedOps), the sorted() method adds a component to a stream pipeline that captures all of the stream elements into either an array or an ArrayList.

  • An array is used if and only if the pipeline assembly code can deduce the number of elements in the stream ahead of time.

  • Otherwise, an ArrayList is used to gather the elements to be sorted.

If an ArrayList is used, you incur the extra overhead of building / growing the list.

Then we return to two versions of the code:

List<Item> sortedItems = new ArrayList<>(items);
Collections.sort(sortedItems, itemComparator);

In this version, the ArrayList constructor copies the elements items to an appropriately sized array, and then Collections.sort does an in-place sort of that array. (This happens under the covers).

List<Item> sortedItems = items
    .stream()
    .sorted(itemComparator)
    .collect(Collectors.toList());

In this version, as we have seen above, the code associated with sorted() either builds and sorts an array (equivalent to what happens above) or it builds the ArrayList the slow way. But on top of that, there are the overheads of stream the data from items and to the collector.

Overall (with the Java 8 implementation at least) code examination tells me that first version of the code cannot be slower than the second version, and in most (if not all) cases it will be faster. But as the list gets larger, the O(NlogN) sorting will tend to dominate the O(N) overheads of copying. That will mean that the relative difference between the two versions will get smaller.

If you really care, you should be able to write a benchmark to test the actual difference with a specific implementation of Java, and a specific input dataset. (Or adapt @Eugene's benchmark!)

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=430042&siteId=1