java Streams (1) API introduction

A package (stream) such as java.util.stream was introduced in java8. The addition of new features is designed to help developers perform a series of operations on collections at a higher level of abstraction.
With the java.util.stream package, we can concisely and declaratively express the possible parallel processing of collections, arrays, and other data sources. Implement changes from external iteration to internal iteration.

higher level abstraction

Consider such a problem, when we want to collect a class of students from Shaanxi, we usually do this before java8.

        public List<Student> isFromShannxi(List<Student> students){
            Objects.requireNonNull(students, "Cannot find students");
            List<Student> shannxiStudents = new ArrayList<>();
            for(Student student : students) {
                if (student.isFrom(Constants.Province.SHANNXI)) {
                    shannxiStudents.add(student);
                }
            }
            return shannxiStudents;
        }

Such code is estimated that every java developer can implement it in minutes. But such an operation requires writing a lot of boilerplate code each time, creating a collection that holds the collected results, traversing the incoming collection, filtering by specific conditions, and then adding the filtered results to the result set and returning it. This kind of code has several disadvantages
: 1. It is difficult to maintain. If you can't read the whole loop, you can't guess the original meaning of the code. If the naming is not standardized or there is no comment (Imagine that the variable names above are like this xs (student) lzsxdxs (from Students in Shaanxi) ...), it is more difficult to maintain, don't think that there is no such code, I have taken over such a code myself, it is quite painful.
2 It is difficult to scale to parallel execution, and it takes a lot of effort to modify the estimated sessions for parallel processing.

The following shows the use of streams to achieve the same functionality

        public List<Student> isFromShannxi(Stream<Student> stream){
            Objects.requireNonNull(stream, "Cannot find students");
            return stream
                    .filter(student -> student.isFrom(Constants.Province.SHANNXI))
                    .collect(toList());
        }

From the code, we can see the original intention of the program, filter the stream and convert the filtered result into a set and return it. As for which type to convert, it is inferred by the JVM. The execution process of the entire program seems to describe one thing, without too many intermediate variables, which can reduce GC pressure and improve efficiency. Then start to explore the function of stream.

Introduction to stream

The package java.util.stream introduces streams.
Streams and collections differ in the following ways.

  • There is no storage. Streams are not data structures that store elements; instead, they pass elements from sources such as data structures, arrays, constructor functions, or I/O channels through computational operations.
  • Essentially, operations on streams produce results, but do not modify their source. For example Stream gets a new element from the collection without the filtered element, instead of removing the filtered element from the collection.
  • Lazy evaluation, many operations on streams are lazy.
  • Early evaluation calls an early evaluation method after a series of lazy evaluations to produce the final result.
  • Collections have a size limit, but streams have no size limit.
  • Stream elements are accessed only once during the declaration cycle, and like iterators, a new stream must be generated when the same element needs to be accessed again.
The following ways are provided in java to generate streams
  • Collection.stream() creates a stream using a collection;
  • Collection.parallelStream() creates parallel streams using collections;
  • Arrays.stream(Object[]);
  • Stream.of(Object[]), IntStream.range(int, int) 或Stream.iterate(Object, UnaryOperator), Stream.empty(), Stream.generate(Supplier
  • BufferedReader.lines();
  • Random.ints();
  • BitSet.stream () , Pattern.splitAsStream (java.lang.CharSequence) 和 和 JarFile.stream ()。
Intermediate operations on streams (no side effects)
  • filter(Predicate
  • map(Function
  • flatMap(Function
  • distinct() has removed duplicate stream elements
  • sorted() stream elements sorted in natural order
  • Sorted(Comparator
  • limit(long) truncates stream elements to the provided length
  • skip(long) discards the first N elements of the stream element
  • takeWhile(Predicate
  • dropWhile(Predicate
Terminate operation
  • forEach(Consumer
  • toArray() creates an array using the elements of the stream.
  • reduce(...) aggregates the elements of the stream into a summary value.
  • collect(...) aggregates the elements of the stream into an aggregated result container.
  • min(Comparator
  • max(Comparator
  • count() returns the size of the stream.
  • {any,all,none}Match(Predicate
  • findFirst() returns the first element of the stream, if any.
  • findAny ()
Stream operations and pipes

Stream operations act as intermediate fabrication and termination operations, and are combined into stream pipelines. A stream pipeline consists of a source (such as a Collection, an array, a generator function, or an I/O channel); followed by zero or more intermediate operations, such as Stream.filter or Stream.map; and an array such as Stream.forEach or Terminal operation Stream.reduce. Termination operations such as Stream.forEach or IntStream.sum can traverse the stream to produce results or side effects. After the execution of the termination operation, the stream pipeline is considered consumed and can no longer be used; if you need to traverse the same data source again, you must return to the data source to obtain a new stream.
The slow processing flow enables significant efficiency; filtering, mapping and summing can be fused into a single pass over the data, with minimal intermediate states. Laziness also avoids checking all the data when it's not necessary; for operations like "find the first string longer than 1000 characters", just check enough strings to find a string with the desired characteristics, while There is no need to check all strings available in the source. (This behavior becomes even more important when the input stream is infinite, not just large.) Intermediate operations are further divided into stateless and stateful operations. Stateless operations such as filter and map do not preserve the state of previously seen elements when processing new elements - each element can be processed independently of operations on other elements. Stateful operations, such as distinct and sorted, may contain the state of previously seen elements while processing new elements. Stateful operations may require processing the entire input before producing a result. For example, sorting a stream can only yield any results after all elements of the stream have been viewed. Therefore, in parallel computing, some pipelines containing stateful intermediate operations may require multiple passes of data, or may need to cache important data. Pipelines containing only stateless intermediate operations can be processed at a time, either sequentially or in parallel, with minimal data buffering. Additionally, some operations are considered short-circuit operations. If an intermediate operation is short-circuited when presented with infinite input, it may produce a finite stream. Terminal operations are short-circuited and may terminate in a finite amount of time if infinite input occurs. Short-circuiting in a pipeline is a necessary but not sufficient condition to handle the graceful termination of infinite streams in finite time.

Looking at the total number of students from Shaanxi who took power electronics in a class
Processing elements with explicit for-loops are serial in nature. Streams facilitate parallel execution by redefining computation as a pipeline of aggregate operations rather than as a command operation for each individual element. All stream operations can be performed serially or in parallel. Stream implementations in the JDK create serial streams unless parallelism is explicitly required. For example, Collection has methods Collection.stream() and Collection.parallelStream() that produce sequential and parallel streams, respectively; other streaming methods (such as IntStream.range(int, int) produce sequential streams), but can be generated by calling their BaseStream.parallel() method to efficiently parallelize these streams. To execute the previous "sum of widget weights" query in parallel, we can do

        return students.stream()
                .filter(student -> student.isFrom(Constants.Province.SHANNXI))
                .filter(student -> student.getScores().containsKey(Constants.Course.POWER_ELECTRONICS))
                .count();
            

This procedure is fast to execute, because there are only dozens of students in each class, but if I need to select students from Shaanxi among the national college students who have taken power electronics courses. Then it will take a while to execute. At this time, we need to change serial to parallel to make full use of multi-core cpu resources. The traditional modification method needs to be changed into a code structure, but the modification of the stream is very simple. We just need to add one line of code. As for the principles and constraints of turning a serial stream into a parallel stream, we will analyze it later.

        return students.stream()
                .parallel()
                .filter(student -> student.isFrom(Constants.Province.SHANNXI))
                .filter(student -> student.getScores().containsKey(Constants.Course.POWER_ELECTRONICS))
                .count();
            

The only difference between the serial and parallel versions of this example is to create the initial stream, using "parallelStream()" instead of "stream()". When a termination operation is initiated, the stream pipeline will execute sequentially or in parallel, depending on the direction of the stream invoked by the stream. The isParallel() method can be used to determine whether the stream is executing in serial or parallel, and the direction of the stream can be modified using the BaseStream.sequential() and BaseStream.parallel() operations. When a termination operation is initiated, the stream pipeline is executed sequentially or in parallel according to the mode of the stream invoked by the stream. Except for operations that are determined to be explicitly indeterminate, such as whether a stream of findAny() executes sequentially or in parallel should not change the result of the computation. Most stream operations accept parameters that describe user-specified behavior, which are usually lambda expressions. In order to maintain correct behavior, these behavior parameters must be unobtrusive and in most cases stateless. Such a parameter is always an instance of a functional interface such as Function, but also often a lambda expression or a method reference.

Do not modify the data source

Streams enable you to perform potentially parallel aggregation operations on various data sources, even non-thread-safe collections such as ArrayLists. This is only possible if we can prevent interfering with the data source during execution of the stream pipeline. With the exception of the escape-hatch operations iterator() and spliterator(), execution begins when the terminal operation is called and ends when the terminal operation completes. For most data sources, preventing interference means ensuring that the data source is not modified at all during the execution of the streaming pipeline. The notable exception is that the sources of streams are concurrent collections, which are designed to handle concurrent modifications. Concurrent stream sources are those for which the Spliterator reports the CONCURRENT property. A behavior parameter is said to interfere with concurrent data sources if the modification of the behavior parameter results in the modification of the data source. The requirement for all pipeline streams is not limited to parallel pipelines, unless the stream sources are concurrent, modifying the stream's data source during execution of the stream pipeline may cause exceptions, or inconsistent results. For a well-behaved stream source, modifications to the source can be made before terminating the operation, and these modifications can overwrite the previous stream. As follows.

        public static void modifyStream() {
            List<String> list = new ArrayList<>(Arrays.asList("one", "two"));
            Stream<String> stream = list.stream();
            list.add("three");
            System.out.println(stream.collect(Collectors.joining(", ", "[", "]")));
        }

First create a list of two strings: "one"; and "two". Then create a stream from that list. Next modify the list by adding a third string: "three". Finally, the elements of the stream are collected and joined together. Since the list has been modified before the terminal collect operation begins, the result will be a string of "[one, two, three]". All streams returned from JDK collections and most other JDK classes work well this way;

stateless behavior

If the behavior parameter of a stream operation is stateful, the results of parallelized operations may be inconsistent. An example of a stateful lambda is the map() operation. If the operations are parallelized, the same input may produce different results due to different thread scheduling. Whereas the result of a stateless lambda expression is always the same. So your best bet is to avoid stateful behavior parameters throughout your stream operations.

side effect

Side effects of behavioral parameters on stream operations are generally discouraged, as they often lead to inadvertent violations of stateless requirements as well as other thread-safety hazards.
If a behavior parameter does have side effects, unless explicitly stated, there is no guarantee that those side effects will be visible to other threads, nor that different operations on the "same" element within the same stream pipeline are guaranteed to be executed in the same thread. Furthermore, the ordering of these effects can be surprising. Even if the pipeline is constrained to produce a result in the same order as the source encounters (e.g., IntStream.range(0,5).parallel().map(x -> x*2).toArray() must produce [0, 2 , 4, 6, 8]), nor does it guarantee the order in which mapper functions are applied to individual elements, or what thread executes any behavioral parameters for a given element. Many computations may be tempted to use side effects, which can be expressed more safely and efficiently without side effects, such as using simplifications instead of mutable accumulators. However, side effects such as println() for debugging purposes are usually harmless. A small number of stream operations (such as forEach() and) peek() can only be operated by side effects; these should be used with care. As an example of how to convert a stream pipeline that uses side effects inappropriately to one that doesn't, the code below searches for a stream of strings that match a given regular expression and puts the matches into a list.

         ArrayList<String> results = new ArrayList<>();
         stream.filter(s -> pattern.matcher(s).matches())
               .forEach(s -> results.add(s));  // Unnecessary use of side-effects!

This code uses side effects unnecessarily. A non-thread-safe ArrayList can lead to incorrect results if executed in parallel, and adding the required synchronization can cause contention that defeats the benefits of parallelism. Also, the use of side effects here is completely unnecessary; the forEach() can simply be replaced with a narrowing operation that is safer, more efficient, and more suitable for parallelization:

         List<String>results =
             stream.filter(s -> pattern.matcher(s).matches())
                   .collect(Collectors.toList());  // No side-effects!
order

Streams may or may not have order, whether or not a stream has an order depends on the stream source and intermediate operations, some stream sources (such as List or Array) are inherently ordered, while others (such as HashSet) are not. Some intermediate operations, such as sorted(), can be in an unordered, otherwise initial order of the stream, and others can cause an ordered stream to be unordered, such as BaseStream.unordered(). Also, some terminating operations may ignore flow order, such as forEach(). If the stream is ordered, most operations are constrained to operate on elements in the order in which they are encountered; if the source of the stream is a List containing [1, 2, 3], then the result of executing map(x -> x* 2) Must be [2, 4, 6]. However, any permutation of the values ​​[2, 4, 6] is a valid result if the source has no defined encounter order. For parallel streams, relaxing the ordering constraints can sometimes lead to more efficient execution. Certain collection operations such as filtering duplicates ( distinct() ) or grouping reduction ( Collectors.groupingBy() ) can be implemented more efficiently if the order of elements is irrelevant. Similarly, the operation limit(), which is intrinsically tied to the hit command, may require buffering to ensure proper ordering, undermining the benefits of parallelism. In the case that the stream has encountered commands, but the user does not particularly care about the encountered commands, explicitly canceling the unordered() of the stream can improve the parallel performance of some stateful or terminal operations. However, even under ordering constraints, most streaming pipelines are effectively parallelized.

reduce operation

The retuce operation takes a sequence of input elements and combines them into a single aggregated result by repeated application of a combinatorial operation, such as finding a set of numbers, or accumulating the sum of the elements or up to a list. The stream's class has universal reduction operations, so-called multiple forms of reduce() and collect(), and multiple specialized reduction forms, such as sum(), max(), or count(). For example, to achieve the addition of 1 to 100. The traditional for loop is implemented like this.

        int sum = 0;
        for(int x : numbers) {
            sum += x;
        }
        return sum;
        
        return numbers.stream()
                .reduce(0, Integer::sum);
            
mutable reduce

We want to concatenate strings in a Collection and process the elements through a stream. as follows

        String result = strings.stream()
                .reduce("", String::concat);

        
        StringBuilder result = strings.stream()
                .collect(StringBuilder::new,
                        (sb, s) -> sb.append(s),
                        (sb, sb2) -> sb.append(sb2));
                        
        StringBuilder result = strings.stream()
                .collect(StringBuilder::new,
                        StringBuilder::append,
                        StringBuilder::append);
                    

In the first example above, although strings can be spliced ​​together, the efficiency is not high due to the constant new String objects. We should use StringBuilder to implement this function, the second and third are string concatenation implemented with StringBuilder, where the third piece of code optimizes the second piece with method references. But it still feels a bit cumbersome. Can there be a simpler way to implement this function? Don't worry, the designer of the class library has already thought of this application scenario. Collectors.join() is designed to solve this problem. Let's look at an example using an off-the-shelf API

        String result = strings.stream()
                    .collect(Collectors.joining(""));
            

Among them, the joining method has several overloaded methods, which can add separators, prefixes and suffixes, such as Collectors.joining(", ", "[", "]" ), and the string added to our test is "Hello" "World" , the result is "[Hello, World]"

concluding remarks

The functions provided by stream are very powerful. There is also a class Collectors under the java.util.stream package, which is a good partner with stream, and can achieve more powerful functions in a simpler way through combination. The above code listing can be downloaded from Github .

Reference documentation

java Streams
should return stream or collection
java8 get min and max value in stream
java.util.stream
java8 functional programming

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325313452&siteId=291194637