Java 8 Stream study notes

Stream type the full path is: java.util.stream.Stream

As a major highlight of Java 8, it java.io package of InputStream and OutputStream are completely different concepts. It is also different StAX for XML parsing Stream, Stream Amazon Kinesis is not real-time processing of large data. Stream 8 Java is enhanced in the collection (Collection) the object function, it focuses on the collection of various objects is very convenient and efficient polymerization operation (aggregate operation), or mass data operations (bulk data operation). Lambda expressions Stream API by means of the same emerging, which greatly improves the efficiency of programming and program readability. At the same time it provides serial and parallel modes to be aggregated operation, concurrency model can take advantage of multi-core processors, using fork / join and split task in parallel to accelerate the process. Usually write parallel code is difficult and error-prone, but using Stream API without writing a single line of multi-threaded code, you can easily write high-performance concurrent programs. So, java.util.stream in Java 8 for the first time is a product of the combined effects of a functional language + multi-core era.

Stream is not a collection of elements, it is not a data structure of the data is not saved, it is about algorithms and calculations, it is more like an advanced version of Iterator. The original version of the Iterator, a user can explicitly through the elements and perform a certain operation; Advanced versions Stream, as long as the user needs to perform any given operation of its contained elements, such as "filter out length greater than 10 string "," Get the first letter of each string "and, implicitly Stream traversed internally converting the data accordingly.

Stream like an iterator (the Iterator), unidirectional, not reciprocating, data can only be traversed once traversed once after exhausted, like water flows from the front, gone.

And while the iterator is different and, in parallel Stream operation, only imperative iterators, the serialization operation. As the name suggests, when a serial mode to traverse, and then read each item to read the next item. When used to traverse the parallel data it is divided into a plurality of segments, each of which is processed in a different thread, and then outputs the results together. Fork Stream dependent parallelism introduced in Java7 / Join frame (JSR166y) to accelerate the process and split task. Parallel Java API evolution is basically as follows:

1.0-1.4 in java.lang.Thread
5.0 java.util.concurrent
6.0 Phasers, etc.
The 7.0 Fork / Join framework
8.0 Lambda

Stream is another major feature is the data source itself can be infinite (?)

Stream configuration

When we use a stream of time, usually consists of three basic steps:

Gets a data source (source) → → perform the data conversion operation to obtain the desired results, each conversion does not change the original Stream object and returns a new Stream object (there can be multiple conversions), which may allow its operation same as the chain arrangement, into a conduit, as shown in FIG.

1. FIG flow conduit (Stream Pipeline) configuration

There are many ways to generate Stream Source:

From the Collection and arrays
- Collection.stream()
- Collection.parallelStream()
- Arrays.stream(T array) or Stream.of()
从 BufferedReader
- java.io.BufferedReader.lines()
Static Factory
java.util.stream.IntStream.range()
java.nio.file.Files.walk()
Build yourself
- java.util.Spliterator
other
- Random.ints()
- BitSet.stream()
- Pattern.splitAsStream (java.lang.CharSequence)
- JarFile.stream()

Operation type is divided into two streams:

Intermediate : a stream may be followed by zero or more intermediate operations. Its main purpose is to open the flow, making some degree of data mapping / filtering, and then returns a new stream, to use the next operation. Such operations are of an inert (the lazy), that is, only calls to such methods, and does not really start traverse flow.
Terminal : a stream can have only one terminal operation, when the operation is performed, the flow would be used "light", and can no longer be operated. So this must be the last operation flow. Implementation of Terminal operations will really begin to traverse the stream and produces a result or a side effect.

In the switching operation a plurality of times (Intermediate operation) for a Stream, Stream every element of each conversion, and is repeatedly executed, so that the time complexity is N (conversions) for loop where all the operations are do sum out of it? In fact not the case, the conversion operations are lazy, and will integrate multiple conversion operation when Terminal operation, once the cycle is complete. We can understand such a simple, Stream, there are a set of operational functions, each conversion operation is to convert the function into this collection, the collection cycle corresponding Stream when in Terminal operation, and then perform all of the functions of each element .

Another operation is called Short-circuiting . Used to refer to:

For an intermediate operation, if it is accepted that an infinite (infinite / unbounded) of Stream, but returns a limited new Stream.
For a terminal operation, if it is to accept an infinite Stream, but the result can be calculated for a limited time.

When operating an infinite Stream, and wish to complete the operation within a limited time, you have a short-circuiting operation in the pipeline is a necessary but not sufficient condition.

3. A list of exemplary operation of the stream

stream () Gets the current small items source, filter and intermediate operation is mapToInt, data filtering and conversion, a final sum () of the operation terminal, to meet the requirements for the weight of all the small items summed.

Detailed use stream

Simply put, the use of the Stream is to implement a filter-map-reduce the process to produce a final result or cause a side effect (side effect).

Specific usage

1. Create a common stream

1.1 Collection under stream () and parallelStream () method

List<String> list = new ArrayList<>();
Stream<String> stream = list.stream(); //获取一个顺序流
Stream<String> parallelStream = list.parallelStream(); //获取一个并行流

1.2 Arrays in stream () method, the flow into the array to a

Integer[] nums = new Integer[10];
Stream<Integer> stream = Arrays.stream(nums);

1.3 Stream in static method: of (), iterate (), generate ()

Stream<Integer> stream1 = Stream.of(1,2,3,4,5);
/**
* iterate第一个参数是种子，第二个参数为元素值的生成过程，也就是
* 除第一个位置（0位置）之后的元素值都由前一个元素值作为输入参数（x），
* limit是限制Stream长度
*/
Stream<Integer> stream2 = Stream.iterate(0, x -> x + 2).limit(6);
stream2.forEach(System.out::println);//0 2 4 6 8 10
		
Stream<Double> stream3 = Stream.generate(Math::random).limit(2);
stream3.forEach(System.out::println);//0.4196684476746345 0.9268584030269439

1.4 BufferedReader.lines () method, each row contents to a stream

BufferedReader reader = new BufferedReader(new FileReader("F:\\test_stream.txt"));
Stream<String> lineStream = reader.lines();
lineStream.forEach(System.out::println);

1.5 Use Pattern.splitAsStream () method, the strings are separated into streams

Pattern pattern = Pattern.compile(",");
Stream<String> stringStream = pattern.splitAsStream("a,b,c,d");
stringStream.forEach(System.out::println);

2. The operation of the intermediate stream

2.1 Screening the slice
        filter: filter certain elements flow
        limit (n): Get n elements
        skip (n): n skip element, with limit (n) may be implemented tab
        distinct: in elementary stream by hashCode () and equals () element deduplication

Stream<Integer> stream = Stream.of(6, 4, 6, 7, 3, 9, 8, 10, 12, 14, 14);
 
Stream<Integer> newStream = stream.filter(s -> s > 5) //6 6 7 9 8 10 12 14 14
        .distinct() //6 7 9 8 10 12 14
        .skip(2) //9 8 10 12 14
        .limit(2); //9 8
newStream.forEach(System.out::println);

2.2 Mapping
       map: receiving a function as a parameter, the function will be applied to each element, and maps it into a new element.
       flatMap: receiving a function as a parameter, the value of each stream are replaced with another stream, then all flows connected to a stream.

List<String> list = Arrays.asList("a,b,c", "1,2,3");
 
//将每个元素转成一个新的且不带逗号的元素
Stream<String> s1 = list.stream().map(s -> s.replaceAll(",", ""));
s1.forEach(System.out::println); // abc  123
 
Stream<String> s3 = list.stream().flatMap(s -> {
    //将每个元素转换成一个stream
    String[] split = s.split(",");
    Stream<String> s2 = Arrays.stream(split);
    return s2;
});
s3.forEach(System.out::println); // a b c 1 2 3

2.3 sort
sorted (): natural order, the stream to be achieved by the elements Comparable interface
sorted (Comparator com): custom ordering custom Comparator Sequencer

List<String> list = Arrays.asList("aa", "ff", "dd");
//String 类自身已实现Compareable接口
list.stream().sorted().forEach(System.out::println);// aa dd ff
 
Student s1 = new Student("aa", 10);
Student s2 = new Student("bb", 20);
Student s3 = new Student("aa", 30);
Student s4 = new Student("dd", 40);
List<Student> studentList = Arrays.asList(s1, s2, s3, s4);
 
//自定义排序：先按姓名升序，姓名相同则按年龄升序
studentList.stream().sorted(
        (o1, o2) -> {
            if (o1.getName().equals(o2.getName())) {
                return o1.getAge() - o2.getAge();
            } else {
                return o1.getName().compareTo(o2.getName());
            }
        }
).forEach(System.out::println);

2.4 Consumption
peek: As in the map, can be obtained in each elementary stream. However, a map is received Function expression, the return value; the Consumer expression peek is received, there is no return value.

Student s1 = new Student("aa", 10);
Student s2 = new Student("bb", 20);
List<Student> studentList = Arrays.asList(s1, s2);
 
studentList.stream()
        .peek(o -> o.setAge(100))
        .forEach(System.out::println);   
 
//结果：
Student{name='aa', age=100}
Student{name='bb', age=100}

Summary: peek without receiving a return value lambda expression can do some output, an external processing. receiving a map with a lambda expression return value, after Stream generic type of parameter map to convert the type of the lambda expression returns.

3. terminating operation flow

3.1 matches polymerization operation
        allMatch: receiving a Predicate function returns true if the stream when each element matches the assertion, false otherwise
        when receiving a Predicate functions, each element when the stream does not conform to the assertion: noneMatch returns true, otherwise returns to false
        AnyMatch: receiving a Predicate functions, as long as there is a stream satisfying the assertion element returns true, otherwise returns to false
        the findFirst: a first element of the return stream
        findAny: any elementary stream in the return
        count: return flow the total number of elements
        max: maximum value of the elements in the return flow
        min: minimum return flow element

List<Integer> list = Arrays.asList(1, 2, 3, 4, 5);
 
boolean allMatch = list.stream().allMatch(e -> e > 10); //false
boolean noneMatch = list.stream().noneMatch(e -> e > 10); //true
boolean anyMatch = list.stream().anyMatch(e -> e > 4);  //true
 
Integer findFirst = list.stream().findFirst().get(); //1
Integer findAny = list.stream().findAny().get(); //1
 
long count = list.stream().count(); //5
Integer max = list.stream().max(Integer::compareTo).get(); //5
Integer min = list.stream().min(Integer::compareTo).get(); //1

3.2 Reduction operation
        Optional <T> reduce (BinaryOperator < T> accumulator): The first time through, the ACC with the first argument to the first elementary stream, the second parameter is a second flow element element; a second execution, the first parameter is a function of the first result of the execution, the second parameter is the third element of the stream; and so on.
        T reduce (T identity, BinaryOperator < T> accumulator): The above process is the same, but when the first performance, the first parameter is a function of accumulator Identity, and the first element of the second parameter is the flow.
        <U> U reduce (U identity , BiFunction <U, super T, U?> Accumulator, BinaryOperator <U> combiner): serial stream (stream), the method with the second method of the same, i.e., the third parameter combiner will not work. In parallel streams (parallelStream), we know that the flow is fork join a plurality of execution threads, each thread of execution flow at this time second method just reduce (identity, accumulator), as the third parameter combiner function , each thread sucked execution result as a new flow, then use the first method reduce (accumulator) flow statute.

//经过测试，当元素个数小于24时，并行时线程数等于元素个数，当大于等于24时，并行时线程数为16
List<Integer> list = Arrays.asList(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24);
 
Integer v = list.stream().reduce((x1, x2) -> x1 + x2).get(); //相当于把所有数求和
System.out.println(v);   // 300
 
Integer v1 = list.stream().reduce(10, (x1, x2) -> x1 + x2);
System.out.println(v1);  //310
 
Integer v2 = list.stream().reduce(0,
        (x1, x2) -> {
            System.out.println("stream accumulator: x1:" + x1 + "  x2:" + x2);
            return x1 - x2;
        },
        (x1, x2) -> {
            System.out.println("stream combiner: x1:" + x1 + "  x2:" + x2);
            return x1 * x2;
        });
System.out.println(v2); // -300
 
Integer v3 = list.parallelStream().reduce(0,
        (x1, x2) -> {
            System.out.println("parallelStream accumulator: x1:" + x1 + "  x2:" + x2);
            return x1 - x2;
        },
        (x1, x2) -> {
            System.out.println("parallelStream combiner: x1:" + x1 + "  x2:" + x2);
            return x1 * x2;
        });
System.out.println(v3); //197474048

3.3 collection operation
        collect: receiving a Collector example, the elements were collected into another data structure.
        Collector <T, A, R> is an interface with the following 5 abstract methods:
            Supplier Supplier <A> (): Creates a result container A
            BiConsumer <A, T> ACC with (): consumer interfaces, the first parameter a container A, the second parameter is the flow element T.
            BinaryOperator <A> combiner (): function interface, the role of this parameter as a method to keep up with the parameter combiner (the reduce) added and the results of the sub-processes to run in parallel in the stream (the accumulator vessel A function operation) are combined.
            Function <A, R> finisher ( ): function interface, the parameters are: a container A, the return type is: The method of the collect desired final result R.
            Set <Characteristics> characteristics (): returns a set of immutable Set to indicate the characteristics of the Collector. There are three characteristics:
                the CONCURRENT: Indicates collector supports concurrency. (There are other official document describing temporarily not to explore, so do not make too much translation)
                Unordered: it indicates that the original collection operation sequence flow elements are not retained.
                IDENTITY_FINISH: parameter represents finisher simply identifies it can be ignored.

3.3.1 Collector tool library: Collectors

Student s1 = new Student("aa", 10,1);
Student s2 = new Student("bb", 20,2);
Student s3 = new Student("cc", 10,3);
List<Student> list = Arrays.asList(s1, s2, s3);
 
//装成list
List<Integer> ageList = list.stream().map(Student::getAge).collect(Collectors.toList()); // [10, 20, 10]
 
//转成set
Set<Integer> ageSet = list.stream().map(Student::getAge).collect(Collectors.toSet()); // [20, 10]
 
//转成map,注:key不能相同，否则报错
Map<String, Integer> studentMap = list.stream().collect(Collectors.toMap(Student::getName, Student::getAge)); // {cc=10, bb=20, aa=10}
 
//字符串分隔符连接
String joinName = list.stream().map(Student::getName).collect(Collectors.joining(",", "(", ")")); // (aa,bb,cc)
 
//聚合操作
//1.学生总数
Long count = list.stream().collect(Collectors.counting()); // 3
//2.最大年龄 (最小的minBy同理)
Integer maxAge = list.stream().map(Student::getAge).collect(Collectors.maxBy(Integer::compare)).get(); // 20
//3.所有人的年龄
Integer sumAge = list.stream().collect(Collectors.summingInt(Student::getAge)); // 40
//4.平均年龄
Double averageAge = list.stream().collect(Collectors.averagingDouble(Student::getAge)); // 13.333333333333334
// 带上以上所有方法
DoubleSummaryStatistics statistics = list.stream().collect(Collectors.summarizingDouble(Student::getAge));
System.out.println("count:" + statistics.getCount() + ",max:" + statistics.getMax() + ",sum:" + statistics.getSum() + ",average:" + statistics.getAverage());
 
//分组
Map<Integer, List<Student>> ageMap = list.stream().collect(Collectors.groupingBy(Student::getAge));
//多重分组,先根据类型分再根据年龄分
Map<Integer, Map<Integer, List<Student>>> typeAgeMap = list.stream().collect(Collectors.groupingBy(Student::getType, Collectors.groupingBy(Student::getAge)));
 
//分区
//分成两部分，一部分大于10岁，一部分小于等于10岁
Map<Boolean, List<Student>> partMap = list.stream().collect(Collectors.partitioningBy(v -> v.getAge() > 10));
 
//规约
Integer allAge = list.stream().map(Student::getAge).collect(Collectors.reducing(Integer::sum)).get(); //40

3.3.2 Collectors.toList () parse

//toList 源码
public static <T> Collector<T, ?, List<T>> toList() {
    return new CollectorImpl<>((Supplier<List<T>>) ArrayList::new, List::add,
            (left, right) -> {
                left.addAll(right);
                return left;
            }, CH_ID);
}
 
//为了更好地理解，我们转化一下源码中的lambda表达式
public <T> Collector<T, ?, List<T>> toList() {
    Supplier<List<T>> supplier = () -> new ArrayList();
    BiConsumer<List<T>, T> accumulator = (list, t) -> list.add(t);
    BinaryOperator<List<T>> combiner = (list1, list2) -> {
        list1.addAll(list2);
        return list1;
    };
    Function<List<T>, List<T>> finisher = (list) -> list;
    Set<Collector.Characteristics> characteristics = Collections.unmodifiableSet(EnumSet.of(Collector.Characteristics.IDENTITY_FINISH));
 
    return new Collector<T, List<T>, List<T>>() {
        @Override
        public Supplier supplier() {
            return supplier;
        }
 
        @Override
        public BiConsumer accumulator() {
            return accumulator;
        }
 
        @Override
        public BinaryOperator combiner() {
            return combiner;
        }
 
        @Override
        public Function finisher() {
            return finisher;
        }
 
        @Override
        public Set<Characteristics> characteristics() {
            return characteristics;
        }
    };
 
}

Conclusion

In short, Stream features can be summarized as follows:

Not a data structure
It has no internal memory, it is only with the operation of the pipeline from the Source (data structures, arrays, generator function, IO channel) to fetch data.
It never modify the data underlying their encapsulated data structure. For example, Stream of filter operations can generate a filtered Stream does not contain new elements, rather than removing those elements from the source.
Stream all operations must be based on a lambda expression as a parameter
It does not support indexed access
You can request the first element, but can not request the second, third, or last. However, please refer to the next item.
It is easy to generate an array or List
Inerting
Many Stream operation is backward delay, until it finally figure out how much data will need to start.
Intermediate operations will always be inerted.
Parallelism
When a Stream is parallelized, there is no need to write multithreaded code, all of its operations automatically performed in parallel.
It can be unlimited
- Set a fixed size, Stream is not necessary. limit (n) and findFirst short-circuiting operation () operation on such can be done quickly and Stream unlimited.

Reference article:

Detailed usage of Java 8 stream

Java Streams API 8 Comments

Java 8 Stream and map the difference peek

Beat_IT_W

Published 61 original articles · won praise 9 · views 30000 +

Private letter concerns