Detailed explanation of Java8 Stream

Stream class inheritance relationship

insert image description here

Pre-knowledge

The Spliterator interface uses

Spliterator is an interface introduced in Java 8. It is usually used with stream to traverse and split sequences.

Spliterator is needed wherever stream is used, such as List, Collection, IO channel and so on.

Let's first look at the definition of the stream method in Collection:

default Stream<E> stream() {
        return StreamSupport.stream(spliterator(), false);
    }
default Stream<E> parallelStream() {
        return StreamSupport.stream(spliterator(), true);
    }

We can see that whether it is a parallel stream or a non-parallel stream, it is constructed through StreamSupport, and a spliterator parameter needs to be passed in.

Well, after we know what the spliterator does, let's take a look at its specific structure:

insert image description here

The spliterator has four methods that must be implemented, which we will explain in detail next.

tryAdvance
tryAdvance is the method to process the elements in the stream, if the element exists, process it and return true, otherwise return false.

If we don't want to process the subsequent elements of the stream, just return false in tryAdvance. Using this feature, we can interrupt the processing of the stream. I will talk about this example in a later article.

trySplit
trySplit tries to split the existing stream. It is generally used in the case of parallelStream, because under concurrent stream, we need to use multiple threads to process different elements of the stream. trySplit is the method of splitting the elements in the stream.

Ideally, trySplit should split the stream into two equal parts to maximize performance.

estimateSize
estimateSize indicates the elements to be processed in the Spliterator, which are generally different before and after trySplit, as we will explain in specific examples later.

The characteristics
characteristics represent the characteristics of this Spliterator, and the Spliterator has 8 major characteristics:

public static final int ORDERED    = 0x00000010;//表示元素是有序的(每一次遍历结果相同)
public static final int DISTINCT   = 0x00000001;//表示元素不重复
public static final int SORTED     = 0x00000004;//表示元素是按一定规律进行排列(有指定比较器)
public static final int SIZED      = 0x00000040;//
表示大小是固定的
public static final int NONNULL    = 0x00000100;//表示没有null元素
public static final int IMMUTABLE  = 0x00000400;//表示元素不可变
public static final int CONCURRENT = 0x00001000;//表示迭代器可以多线程操作
public static final int SUBSIZED   = 0x00004000;//表示子Spliterators都具有SIZED特性

A Spliterator can have multiple features, multiple features are ORed, and finally the final characteristics are obtained.

For example,
above we discussed some key methods of Spliterator, now we give a specific example:

@AllArgsConstructor
@Data
public class CustBook {
    private String name;

}

First define a CustBook class and put a name variable in it.

Define a method to generate a list of CustBook:

    public static List<CustBook> generateElements() {
        return Stream.generate(() -> new CustBook("cust book"))
                .limit(1000)
                .collect(Collectors.toList());
    }

We define a call method, call the tryAdvance method in the call method, and pass in our custom processing method. Here we modify the name of the book and append additional information.

    public String call(Spliterator<CustBook> spliterator) {
        int current = 0;
        while (spliterator.tryAdvance(a -> a.setName("test name"
                .concat("- add new name")))) {
            current++;
        }

        return Thread.currentThread().getName() + ":" + current;
    }

Finally, write the test method:

    @Test
    public void useTrySplit(){
        Spliterator<CustBook> split1 = SpliteratorUsage.generateElements().spliterator();
        Spliterator<CustBook> split2 = split1.trySplit();

        log.info("before tryAdvance: {}",split1.estimateSize());
        log.info("Characteristics {}",split1.characteristics());
        log.info(call(split1));
        log.info(call(split2));
        log.info("after tryAdvance {}",split1.estimateSize());
    }

The result of the operation is as follows:

23:10:08.852 [main] INFO com.flydean.SpliteratorUsage - before tryAdvance: 500
23:10:08.857 [main] INFO com.flydean.SpliteratorUsage - Characteristics 16464
23:10:08.858 [main] INFO com.flydean.SpliteratorUsage - main:500
23:10:08.858 [main] INFO com.flydean.SpliteratorUsage - main:500
23:10:08.858 [main] INFO com.flydean.SpliteratorUsage - after tryAdvance 0

The List has a total of 1000 pieces of data. After calling trySplit once, the List is divided into two parts, each with 500 pieces of data.

Note that after the tryAdvance call, estimateSize becomes 0, indicating that all elements have been processed.

Take another look at this Characteristics=16464, convert it to hexadecimal: Ox4050 = ORDERED or SIZED or SUBSIZED These three OR operations.

Optional

Create an Optional object

1) An empty Optional object can be created using the static method empty()

Optional<String> empty = Optional.empty();
System.out.println(empty); // 输出:Optional.empty

2) You can use the static method of() to create a non-empty Optional object

Optional<String> opt = Optional.of("沉默王二");
System.out.println(opt); // 输出:Optional[沉默王二]

Of course, the parameter passed to the of() method must be non-empty, that is, it cannot be null, otherwise a NullPointerException will still be thrown.

String name = null;
Optional<String> optnull = Optional.of(name);

3) You can use the static method ofNullable() to create an Optional object that is both empty and non-null

String name = null;
Optional<String> optOrNull = Optional.ofNullable(name);
System.out.println(optOrNull); // 输出:Optional.empty

There is a ternary expression inside the ofNullable() method. If the parameter is null, the private constant EMPTY is returned; otherwise, a new Optional object is created using the new keyword—the NPE exception will not be thrown anymore.

Whether the judgment value exists

You can use the method isPresent() to judge whether an Optional object exists. If it exists, this method returns true, otherwise it returns false—replacing the judgment of obj != null.

Optional<String> opt = Optional.of("沉默王二");
System.out.println(opt.isPresent()); // 输出:true

Optional<String> optOrNull = Optional.ofNullable(null);
System.out.println(opt.isPresent()); // 输出:false

After Java 11, the method isEmpty() can also be used to determine the opposite result of isPresent().

Optional<String> opt = Optional.of("沉默王二");
System.out.println(opt.isPresent()); // 输出:false

Optional<String> optOrNull = Optional.ofNullable(null);
System.out.println(opt.isPresent()); // 输出:true

non-null expression

The Optional class has a very modern method - ifPresent(), which allows us to execute some code in a functional programming way, hence, I call it a non-null expression. If there is no such method, we usually need to judge the Optional object to be empty through the isPresent() method before executing the corresponding code:

Optional<String> optOrNull = Optional.ofNullable(null);
if (optOrNull.isPresent()) {
    System.out.println(optOrNull.get().length());
}

With ifPresent(), the situation is completely different, you can directly pass the Lambda expression to the method, the code is more concise and more intuitive.

Optional<String> opt = Optional.of("沉默王二");
opt.ifPresent(str -> System.out.println(str.length()));

After Java 9, you can also use the method ifPresentOrElse(action, emptyAction) to execute two results, execute action when it is not empty, and execute emptyAction when it is empty.

Optional<String> opt = Optional.of("沉默王二");
opt.ifPresentOrElse(str -> System.out.println(str.length()), () -> System.out.println("为空"));

set (get) default value

Sometimes, when we create (obtain) Optional object, we need a default value, orElse() and orElseGet() methods come in handy.

The orElse() method is used to return the value wrapped in the Optional object, if the value is not null, return; otherwise return the default value. The parameter type of this method is the same as the value type.

String nullName = null;
String name = Optional.ofNullable(nullName).orElse("沉默王二");
System.out.println(name); // 输出:沉默王二

The orElseGet() method is similar to the orElse() method, but with a different parameter type. If the value in the Optional object is null, the function in the parameter is executed.

String nullName = null;
String name = Optional.ofNullable(nullName).orElseGet(()->"沉默王二");
System.out.println(name); // 输出:沉默王二

Judging from the output results and the form of the code, these two methods are very similar, which inevitably arouses our doubts. Is it necessary for the designer of the Java class library to do this?

Suppose there is such a method to get the default value now, which is a very traditional way.

public static String getDefaultValue() {
    System.out.println("getDefaultValue");
    return "沉默王二";
}

Then, call the getDefaultValue() method to return the default value through the orElse() method and the orElseGet() method respectively.

public static void main(String[] args) {
    String name = null;
    System.out.println("orElse");
    String name2 = Optional.ofNullable(name).orElse(getDefaultValue());

    System.out.println("orElseGet");
    String name3 = Optional.ofNullable(name).orElseGet(OrElseOptionalDemo::getDefaultValue);
}

Note: Class name:: method name is a syntax introduced by Java 8. There is no () after the method name, indicating that the method may not necessarily be called.

The output is as follows:

orElse
getDefaultValue

orElseGet
getDefaultValue

The output is similar, not much different, in case the value of the Optional object is null. What if the value of the Optional object is not null?

public static void main(String[] args) {
    String name = "沉默王三";
    System.out.println("orElse");
    String name2 = Optional.ofNullable(name).orElse(getDefaultValue());

    System.out.println("orElseGet");
    String name3 = Optional.ofNullable(name).orElseGet(OrElseOptionalDemo::getDefaultValue);
}

The output is as follows:

orElse
getDefaultValue
orElseGet

Hey, orElseGet() didn't call getDefaultValue(). Which method has better performance, you understand?

get value

Intuitively, from a semantic point of view, the get() method is the most authentic way to obtain the value of the Optional object, but unfortunately, this method is flawed, because if the value of the Optional object is null, the method will throw NoSuchElementException. This completely defeats the purpose of using the Optional class in the first place.

public class GetOptionalDemo {
    public static void main(String[] args) {
        String name = null;
        Optional<String> optOrNull = Optional.ofNullable(name);
        System.out.println(optOrNull.get());
    }
}

This program throws an exception when run:

Exception in thread "main" java.util.NoSuchElementException: No value present
	at java.base/java.util.Optional.get(Optional.java:141)
	at com.cmower.dzone.optional.GetOptionalDemo.main(GetOptionalDemo.java:9)

Although the exception thrown is NoSuchElementException instead of NPE, in our opinion, it is obviously "fifty steps laughing at hundred steps". It is recommended to use the orElseGet() method to obtain the value of the Optional object.

filter value

Xiao Wang upgraded the previous code through the Optional class, and happily ran to the old horse to ask for a task after completion. Lao Ma thought that this young man was good, quick-witted, and active, and he was worthy of training, so he gave Xiao Wang a new task: to check the length of the password when the user registered.

After Xiao Wang got the task, he was very happy, because he was just about to learn the filter() method of the Optional class, which came in handy.

public class FilterOptionalDemo {
    public static void main(String[] args) {
        String password = "12345";
        Optional<String> opt = Optional.ofNullable(password);
        System.out.println(opt.filter(pwd -> pwd.length() > 6).isPresent());
    }
}

The parameter type of the filter() method is Predicate (a new functional interface in Java 8), which means that a Lambda expression can be passed to the method as a condition, and if the result of the expression is false, an EMPTY is returned Optional object, otherwise returns the filtered Optional object.

In the above example, because the length of password is 5, the result of program output is false. Assuming that the length of the password is required to be between 6 and 10 characters, one more condition can be added. Let's look at the code after Xiao Wang increased the difficulty.

Predicate<String> len6 = pwd -> pwd.length() > 6;
Predicate<String> len10 = pwd -> pwd.length() < 10;

password = "1234567";
opt = Optional.ofNullable(password);
boolean result = opt.filter(len6.and(len10)).isPresent();
System.out.println(result);

This time the output of the program is true, because the password becomes 7 digits, between 6 and 10 digits. Imagine how verbose the code would be if Xiao Wang used if-else to accomplish this task.

conversion value

After checking the length of the password, Xiao Wang still felt that he was not enjoying himself enough. He felt that the strength of the password should also be checked. For example, the password cannot be "password", such a password is too weak. So he began to study the map() method again, which can convert the original Optional object into a new Optional object according to certain rules, and the original Optional object will not be changed.

Let’s take a look at a simple example written by Xiao Wang:

public class OptionalMapDemo {
    public static void main(String[] args) {
        String name = "沉默王二";
        Optional<String> nameOptional = Optional.of(name);
        Optional<Integer> intOpt = nameOptional
                .map(String::length);
        
        System.out.println( intOpt.orElse(0));
    }
}

In the above example, the parameter String::length of the map() method means to regenerate the original string-type Optional according to the string length into a new Optional object whose type is Integer.

After figuring out the basic usage of the map() method, Xiao Wang decided to combine the map() method with the filter() method. The former is used to convert the password to lowercase, and the latter is used to judge the length and whether it is "password" .

public class OptionalMapFilterDemo {
    public static void main(String[] args) {
        String password = "password";
        Optional<String>  opt = Optional.ofNullable(password);

        Predicate<String> len6 = pwd -> pwd.length() > 6;
        Predicate<String> len10 = pwd -> pwd.length() < 10;
        Predicate<String> eq = pwd -> pwd.equals("password");

        boolean result = opt.map(String::toLowerCase).filter(len6.and(len10 ).and(eq)).isPresent();
        System.out.println(result);
    }
}

Stream pipeline solution

ReferencePipeline

insert image description here
ReferencePipeline is a structural class that defines internal classes to assemble various operation flows, defines three internal classes Head, StatelessOp, and StatefulOp, and implements the interface methods of BaseStream and Stream.

Stream pipeline solution

We can roughly think that some way should be used to record each step of the user's operation, and when the user calls the end operation, the previously recorded operations will be superimposed and executed in one iteration. Along this line of thought, there are several issues to be addressed:

  • How to record the user's operation?

  • How do operations stack up?

  • How to perform operations after superposition?

  • Where are the results (if any) after execution?

How operations are logged

Note that the word "operation" is used here, which refers to the operation of "Stream intermediate operation". Many Stream operations will require a callback function (Lambda expression), so a complete operation is <data source, operation , a callback function> consisting of a triplet.

The concept of Stage is used in Stream to describe a complete operation, and some kind of instantiated PipelineHelper is used to represent Stage, and the stages with sequential order are connected together to form the entire pipeline. An illustration of the inheritance relationship with Stream-related classes and interfaces.

insert image description here
There are also IntPipeline, LongPipeline, and DoublePipeline not shown in the figure. These three classes are specially customized for the three basic types (not packaging types), and are in a parallel relationship with ReferencePipeline.

Head in the figure is used to represent the first Stage, that is, the Stage generated by calling the Collection.stream() method. Obviously, this Stage does not contain any operations; StatelessOp and StatefulOp represent stateless and stateful Stages respectively, corresponding to Stateless and stateful intermediate operations.

The schematic diagram of the organizational structure of the Stream pipeline is as follows:

insert image description here
In the figure, the Head, which is stage0, is obtained through the Collection.stream() method, and then a series of intermediate operations are called to continuously generate new Streams. These Stream objects are organized together in the form of a doubly linked list to form the entire pipeline. Since each Stage records the previous Stage and this operation and callback function, all operations on the data source can be established by relying on this structure. This is how Stream records operations.

How operations stack up

The above only solves the problem of operation records. To make the pipeline play its due role, we need a solution to superimpose all operations together. You may think that this is very simple, you only need to execute each step of the operation (including the callback function) sequentially from the head of the pipeline.

This sounds feasible, but you ignore the previous Stage and don't know what kind of operation the latter Stage performs and what kind of callback function it is. In other words, only the current Stage itself knows how to execute the actions it contains. This requires some kind of protocol to coordinate the calling relationship between adjacent Stages.

This protocol is completed by the Sink interface, and the methods contained in the Sink interface are shown in the following table:

method name effect
void begin(long size) This method is called before starting to traverse elements to notify the Sink to get ready.
void end() Called after all elements have been traversed to notify the Sink that there are no more elements.
boolean cancellationRequested() Whether it is possible to end the operation, so that the short-circuit operation can end as soon as possible.
void accept(T t) Called when traversing elements, accepts an element to be processed, and processes the element. Stage encapsulates its own operations and callback methods into this method. The previous Stage only needs to call the current Stage.accept(T t) method.

With the above protocol, it is very convenient to call between adjacent Stages. Each Stage will encapsulate its own operations into a Sink. The previous Stage only needs to call the accept() method of the latter Stage. Need to know how it is handled internally.

Of course, for stateful operations, Sink's begin() and end() methods must also be implemented. For example, Stream.sorted() is a stateful intermediate operation, and its corresponding Sink.begin() method may create a container for the result, and the accept() method is responsible for adding elements to the container, and finally end() is responsible for Sort the container.

For short-circuit operations, Sink.cancellationRequested() must also be implemented. For example, Stream.findFirst() is a short-circuit operation. As long as an element is found, cancellationRequested() should return true, so that the caller can end the search as soon as possible. The four interface methods of Sink often cooperate with each other to complete computing tasks.

In fact, the essence of the internal implementation of Stream API is how to rewrite these four interface methods of Sink.

With Sink’s packaging of operations, the problem of calling between Stages is solved. During execution, it is only necessary to call the corresponding Sink of each Stage from the head of the pipeline to the data source in sequence.{begin(), accept(), cancellationRequested( ), end()} method is fine. A possible flow of the Sink.accept() method is as follows:

void accept(U u){
    1. 使用当前Sink包装的回调函数处理u
    2. 将处理结果传递给流水线下游的Sink
}

Several other methods of the Sink interface are also implemented according to this [processing -> forwarding] model.

Let's combine specific examples to see how Stream's intermediate operations package its own operations into Sink and how Sink forwards the processing results to the next Sink. First look at the Stream.map() method:

// Stream.map(),调用该方法将产生一个新的Stream
public final <R> Stream<R> map(Function<? super P_OUT, ? extends R> mapper) {
    ...
    return new StatelessOp<P_OUT, R>(this, StreamShape.REFERENCE,
                                 StreamOpFlag.NOT_SORTED | StreamOpFlag.NOT_DISTINCT) {
        @Override /*opWripSink()方法返回由回调函数包装而成Sink*/
        Sink<P_OUT> opWrapSink(int flags, Sink<R> downstream) {
            return new Sink.ChainedReference<P_OUT, R>(downstream) {
                @Override
                public void accept(P_OUT u) {
                    R r = mapper.apply(u);// 1. 使用当前Sink包装的回调函数mapper处理u
                    downstream.accept(r);// 2. 将处理结果传递给流水线下游的Sink
                }
            };
        }
    };
}

The above code may seem complicated, but the logic is actually very simple, which is to wrap the callback function mapper into a Sink. Since Stream.map() is a stateless intermediate operation, the map() method returns a StatelessOp internal class object (a new Stream), calling the opWripSink() method of this new Stream will get a wrapper for the current callback function The Sink.

Let's look at a more complicated example. The Stream.sorted() method will sort the elements in the Stream. Obviously, this is a stateful intermediate operation, because the final order cannot be obtained before reading all the elements. Putting aside the template code and directly entering the essence of the problem, how does the sorted() method encapsulate the operation into a Sink? A possible packaged Sink code for sorted() is as follows:

// Stream.sort()方法用到的Sink实现
class RefSortingSink<T> extends AbstractRefSortingSink<T> {
    private ArrayList<T> list;// 存放用于排序的元素
    RefSortingSink(Sink<? super T> downstream, Comparator<? super T> comparator) {
        super(downstream, comparator);
    }
    @Override
    public void begin(long size) {
        ...
        // 创建一个存放排序元素的列表
        list = (size >= 0) ? new ArrayList<T>((int) size) : new ArrayList<T>();
    }
    @Override
    public void end() {
        list.sort(comparator);// 只有元素全部接收之后才能开始排序
        downstream.begin(list.size());
        if (!cancellationWasRequested) {// 下游Sink不包含短路操作
            list.forEach(downstream::accept);// 2. 将处理结果传递给流水线下游的Sink
        }
        else {// 下游Sink包含短路操作
            for (T t : list) {// 每次都调用cancellationRequested()询问是否可以结束处理。
                if (downstream.cancellationRequested()) break;
                downstream.accept(t);// 2. 将处理结果传递给流水线下游的Sink
            }
        }
        downstream.end();
        list = null;
    }
    @Override
    public void accept(T t) {
        list.add(t);// 1. 使用当前Sink包装动作处理t,只是简单的将元素添加到中间列表当中
    }
}

The above code perfectly shows how the four interface methods of Sink work together:

  • First, the begin() method tells Sink the number of elements involved in sorting, which is convenient for determining the size of the intermediate result container;

  • Afterwards, elements are added to the intermediate result through the accept() method, and the caller will continue to call this method during final execution until all elements are traversed;

  • Finally, the end() method tells the Sink that all elements have been traversed, start the sorting step, and pass the result to the downstream Sink after the sorting is completed;

  • If the downstream Sink is a short-circuit operation, when the result is passed to the downstream, it is constantly asked whether the downstream cancellationRequested() can end the processing.

How to perform operations after superposition

insert image description here
Sink perfectly encapsulates every operation of Stream, and provides a mode of [processing -> forwarding] to superimpose operations. This series of gears has been meshed, and the last step is to toggle the gear to start execution.

What started this chain of operations? Maybe you have already thought that the original driving force for starting is the terminal operation (Terminal Operation). Once a certain terminal operation is called, it will trigger the execution of the entire pipeline.

There can be no other operations after the end operation, so the end operation will not create a new pipeline stage (Stage). Intuitively speaking, the linked list of the pipeline will not be extended later.

The end operation will create a Sink that wraps its own operation, which is also the last Sink in the pipeline. This Sink only needs to process the data without passing the result to the downstream Sink (because there is no downstream). For the Sink's [processing -> forwarding] model, the Sink that ends the operation is the exit of the call chain.

Let's examine how the upstream Sink finds the downstream Sink. An optional solution is to set a Sink field in the PipelineHelper, find the downstream Stage in the pipeline and access the Sink field.

But the designer of the Stream class library did not do this, but set up a Sink AbstractPipeline.opWrapSink(int flags, Sink downstream) method to get the Sink, the function of this method is to return a new operation that contains the current Stage representative and can Pass the result to the downstream Sink object. Why generate a new object instead of returning a Sink field?

This is because using opWrapSink() can combine the current operation with the downstream Sink (the downstream parameter above) into a new Sink. Just imagine that as long as you start from the last Stage of the pipeline and continuously call the opWrapSink() method of the previous Stage until the very beginning (not including stage0, because stage0 represents the data source and does not contain operations), you can get a representative of all operations on the pipeline. Sink, expressed in code is like this:

// AbstractPipeline.wrapSink()
// 从下游向上游不断包装Sink。如果最初传入的sink代表结束操作,
// 函数返回时就可以得到一个代表了流水线上所有操作的Sink。
final <P_IN> Sink<P_IN> wrapSink(Sink<E_OUT> sink) {
    ...
    for (AbstractPipeline p=AbstractPipeline.this; p.depth > 0; p=p.previousStage) {
        sink = p.opWrapSink(p.previousStage.combinedFlags, sink);
    }
    return (Sink<P_IN>) sink;
}

Now all the operations on the pipeline from the beginning to the end are packaged into a sink. Executing this sink is equivalent to executing the entire pipeline. The code to execute the sink is as follows:

// AbstractPipeline.copyInto(), 对spliterator代表的数据执行wrappedSink代表的操作。
final <P_IN> void copyInto(Sink<P_IN> wrappedSink, Spliterator<P_IN> spliterator) {
    ...
    if (!StreamOpFlag.SHORT_CIRCUIT.isKnown(getStreamAndOpFlags())) {
        wrappedSink.begin(spliterator.getExactSizeIfKnown());// 通知开始遍历
        spliterator.forEachRemaining(wrappedSink);// 迭代
        wrappedSink.end();// 通知遍历结束
    }
    ...
}

The above code first calls the wrappedSink.begin() method to tell the Sink that the data is coming, then calls the spliterator.forEachRemaining() method to iterate the data, and finally calls the wrappedSink.end() method to notify the Sink that the data processing is over. The logic is so clear.

Where is the result after executing

The final question is where is the result (if any) that the user needs after all the operations on the pipeline have been performed? The first thing to explain is that not all Stream end operations need to return results. Some operations are just for using their side effects (Side-effects). For example, using the Stream.forEach() method to print out the results is a common side-effect scenario (fact In addition to printing, other scenarios should avoid using side effects), where does the result of the end operation that really needs to return the result exist?

Special Note: Side effects should not be abused. Maybe you think it is a good choice to collect elements in Stream.forEach(), just like the following code, but unfortunately, the correctness and efficiency of this use cannot be guaranteed. Because Stream may be executed in parallel. Most places where side effects are used can be done more safely and efficiently using the reduce operation.

// 错误的收集方式
ArrayList<String> results = new ArrayList<>();
stream.filter(s -> pattern.matcher(s).matches())
      .forEach(s -> results.add(s));  // Unnecessary use of side-effects!
// 正确的收集方式
List<String>results =
     stream.filter(s -> pattern.matcher(s).matches())
             .collect(Collectors.toList());  // No side-effects!

Back to the question of pipeline execution results, where are the pipeline results that need to be returned? This should be discussed in different situations. The following table shows various Stream end operations that return results.

return type The corresponding end operation
boolean anyMatch() allMatch() noneMatch()
Optional findFirst() findAny()
Reduction result reduce() collect()
array toArray()
  • For operations that return boolean or Optional in the table (Optional is a container that stores a value), since the value returns a value, you only need to record this value in the corresponding Sink, and return it when the execution ends.

  • For reduce operations, the final result is placed in the container specified by the user when calling (the container type is specified by the collector). collect(), reduce(), max(), and min() are all reduction operations. Although max() and min() also return an Optional, the bottom layer is actually implemented by calling the reduce() method.

  • For the case where the return is an array, the unquestioned result will be placed in the array. This is of course true, but before the array is finally returned, the result is actually stored in a data structure called Node. Node is a multi-fork tree structure, elements are stored in the leaves of the tree, and a leaf node can store multiple elements. This is done for convenience of parallel execution. Regarding the specific structure of Node, we will give detailed instructions in the next section when we explore how Stream is executed in parallel.

Stream common operations

insert image description here

insert image description here

Four Construction Forms of Streams

Build streams directly from values

@Test
    public void streamFromValue() {
        Stream stream = Stream.of(1, 2, 3, 4, 5);

        stream.forEach(System.out::println);
    }

Build streams from arrays

@Test
    public void streamFromArray() {
        int[] numbers = {1, 2, 3, 4, 5};

        IntStream stream = Arrays.stream(numbers);
        stream.forEach(System.out::println);
    }

Generate a stream from a file

@Test
    public void streamFromFile() throws IOException {
        // TODO 此处替换为本地文件的地址全路径
        String filePath = "";

        Stream<String> stream = Files.lines(
                Paths.get(filePath));

        stream.forEach(System.out::println);
    }

Generating streams through functions (infinite streams)

@Test
    public void streamFromFunction() {
//        2,4,6,8...一直无限下去
//        Stream stream = Stream.iterate(0, n -> n + 2);

        Stream stream = Stream.generate(Math::random);
        stream.limit(100)
                .forEach(System.out::println);

    }

Predefined collectors for streams

collection collector

/**
     * 集合收集器
     */
    @Test
    public void toList() {

        List<Sku> list = CartService.getCartSkuList();

        List<Sku> result = list.stream()
                .filter(sku -> sku.getTotalPrice() > 100)

                .collect(Collectors.toList());

        System.out.println(
                JSON.toJSONString(result, true));

    }

Grouping (the return value is Map<Boolean,List>, the key is of Boolean type, and can only be divided into two groups)

@Test
    public void group() {
        List<Sku> list = CartService.getCartSkuList();

        // Map<分组条件,结果集合>
        Map<Object, List<Sku>> group = list.stream()
                .collect(
                        Collectors.groupingBy(
                                sku -> sku.getSkuCategory()));

        System.out.println(
                JSON.toJSONString(group, true));
    }

Partition

@Test
    public void partition() {
        List<Sku> list = CartService.getCartSkuList();

        Map<Boolean, List<Sku>> partition = list.stream()
                .collect(Collectors.partitioningBy(
                        sku -> sku.getTotalPrice() > 100));

        System.out.println(
                JSON.toJSONString(partition, true));
    }

The common operations of the flow collector are as follows:

// Accumulate names into a List
List<String> list = people.stream().map(Person::getName).collect(Collectors.toList());

// Accumulate names into a TreeSet
Set<String> set = people.stream().map(Person::getName).collect(Collectors.toCollection(TreeSet::new));

// Convert elements to strings and concatenate them, separated by commas
String joined = things.stream()
                      .map(Object::toString)
                      .collect(Collectors.joining(", "));

// Compute sum of salaries of employee
int total = employees.stream()
                     .collect(Collectors.summingInt(Employee::getSalary)));

// Group employees by department
Map<Department, List<Employee>> byDept
    = employees.stream()
               .collect(Collectors.groupingBy(Employee::getDepartment));

// Compute sum of salaries by department
Map<Department, Integer> totalByDept
    = employees.stream()
               .collect(Collectors.groupingBy(Employee::getDepartment,
                                              Collectors.summingInt(Employee::getSalary)));

// Partition students into passing and failing
Map<Boolean, List<Student>> passingFailing =
    students.stream()
            .collect(Collectors.partitioningBy(s -> s.getGrade() >= PASS_THRESHOLD));

Reduction and aggregation operations (involving parallel execution)

reduction

@Test
    public void reduceTest() {

        /**
         * 订单对象
         */
        @Data
        @AllArgsConstructor
        class Order {
            /**
             * 订单编号
             */
            private Integer id;
            /**
             * 商品数量
             */
            private Integer productCount;
            /**
             * 消费总金额
             */
            private Double totalAmount;
        }

        /*
            准备数据
         */
        ArrayList<Order> list = Lists.newArrayList();
        list.add(new Order(1, 2, 25.12));
        list.add(new Order(2, 5, 257.23));
        list.add(new Order(3, 3, 23332.12));

        /*
            以前的方式:
            1. 计算商品数量
            2. 计算消费总金额
         */

        /*
            汇总商品数量和总金额
         */
        Order order = list.stream()
                .parallel()
                .reduce(
                        // 初始化值
                        new Order(0, 0, 0.0),

                        // Stream中两个元素的计算逻辑
                        (Order order1, Order order2) -> {
                            System.out.println("执行 计算逻辑 方法!!!");

                            int productCount =
                                    order1.getProductCount()
                                            + order2.getProductCount();

                            double totalAmount =
                                    order1.getTotalAmount()
                                            + order2.getTotalAmount();

                            return new Order(0, productCount, totalAmount);
                        },

                        // 并行情况下,多个并行结果如何合并
                        (Order order1, Order order2) -> {
                            System.out.println("执行 合并 方法!!!");

                            int productCount =
                                    order1.getProductCount()
                                            + order2.getProductCount();

                            double totalAmount =
                                    order1.getTotalAmount()
                                            + order2.getTotalAmount();

                            return new Order(0, productCount, totalAmount);
                        });

        System.out.println(JSON.toJSONString(order, true));
    }

operation result

执行 计算逻辑 方法!!!
执行 计算逻辑 方法!!!
执行 计算逻辑 方法!!!
执行 合并 方法!!!
执行 合并 方法!!!
{
    "id":0,
    "productCount":10,
    "totalAmount":23614.469999999998
}

Summarize collect

@Test
    public void collectTest() {
        /**
         * 订单对象
         */
        @Data
        @AllArgsConstructor
        class Order {
            /**
             * 订单编号
             */
            private Integer id;
            /**
             * 用户账号
             */
            private String account;
            /**
             * 商品数量
             */
            private Integer productCount;
            /**
             * 消费总金额
             */
            private Double totalAmount;
        }

        /*
            准备数据
         */
        ArrayList<Order> list = Lists.newArrayList();
        list.add(new Order(1, "zhangxiaoxi", 2, 25.12));
        list.add(new Order(2, "zhangxiaoxi",5, 257.23));
        list.add(new Order(3, "lisi",3, 23332.12));

        /*
            Map<用户账号, 订单(数量和金额)>
         */

        Map<String, Order> collect = list.stream()
                .parallel()
                .collect(
                        () -> {
                            System.out.println("执行 初始化容器 操作!!!");

                            return new HashMap<String, Order>();
                        },
                        (HashMap<String, Order> map, Order newOrder) -> {
                            System.out.println("执行 新元素添加到容器 操作!!!");

                            /*
                                新元素的account已经在map中存在了
                                不存在
                             */
                            String account = newOrder.getAccount();

                            // 如果此账号已存在,将新订单数据累加上
                            if (map.containsKey(account)) {
                                Order order = map.get(account);
                                order.setProductCount(
                                        newOrder.getProductCount()
                                                + order.getProductCount());
                                order.setTotalAmount(
                                        newOrder.getTotalAmount()
                                                + order.getTotalAmount());
                            } else {
                                // 如果不存在,直接将新订单存入map
                                map.put(account, newOrder);
                            }

                        }, (HashMap<String, Order> map1, HashMap<String, Order> map2) -> {
                            System.out.println("执行 并行结果合并 操作!!!");

                            map2.forEach((key, value) -> {
                                map1.merge(key, value, (order1, order2) -> {

                                    // TODO 注意:一定要用map1做合并,因为最后collect返回的是map1
                                    return new Order(0, key,
                                            order1.getProductCount()
                                                    + order2.getProductCount(),
                                            order1.getTotalAmount()
                                                    + order2.getTotalAmount());
                                });
                            });
                        });

        System.out.println(JSON.toJSONString(collect, true));
    }

operation result

执行 初始化容器 操作!!!
执行 初始化容器 操作!!!
执行 初始化容器 操作!!!
执行 新元素添加到容器 操作!!!
执行 新元素添加到容器 操作!!!
执行 新元素添加到容器 操作!!!
执行 并行结果合并 操作!!!
执行 并行结果合并 操作!!!
{
    "lisi":{
        "account":"lisi",
        "id":3,
        "productCount":3,
        "totalAmount":23332.12
    },
    "zhangxiaoxi":{
        "account":"zhangxiaoxi",
        "id":0,
        "productCount":7,
        "totalAmount":282.35
    }
}

Guess you like

Origin blog.csdn.net/qq_32907491/article/details/131482571