Java 8 Stream API Beginner's Tutorial

Past memory big data Past memory big data
Java 8 brings us a new feature, which is the Stream API that this article will introduce, which allows us to process data in a declarative way. Stream uses a syntax similar to SQL to provide a high-level abstraction of Java set operations and expressions. Greatly improve the productivity of Java programmers, allowing programmers to write efficient, clean, and concise code.

Stream creation

There are many ways to create stream instances of different sources. Once a stream instance is created, its source will not be modified, so we create multiple stream instances from a single source.

Empty Stream

If we want to create an empty Stream, we can use the empty() method, as follows:


Stream<String> iteblogEmptyStream = Stream.empty();

Usually used when streams have no elements and do not want to return null:


public Stream<String> streamOf(List<String> list) {
    return list == null || list.isEmpty() ? Stream.empty() : list.stream();
}

Create Stream through Collection

Any class in Java that inherits the Collection interface can create a Stream


List<String> list = Lists.newArrayList("iteblog", "iteblog_hadoop");
Stream<String> listStream = list.stream();

Set<String> set = Sets.newHashSet();
Stream<String> setStream = set.stream();

Create Stream through Array

Array can also create Stream


Stream<String> streamOfArray = Stream.of("a", "b", "c");

Of course, we can also create a Stream from an existing array


String[] iteblogArr = new String[]{"iteblog", "iteblog_hadoop", "java 8"};
Stream<String> streamOfArrayFull = Arrays.stream(iteblogArr);
Stream<String> streamOfArrayPart = Arrays.stream(iteblogArr, 1, 3);

Create Stream Stream through Stream.builder() Provides a builder method to create Stream:


Stream streamBuilder = Stream.builder().add("iteblog").add("iteblog_hadoop").add("java").build();
Stream<Object> streamBuilder = Stream.builder().add("iteblog").add("iteblog_hadoop").add("java").build();

The Stream type created above is Stream. If we want to create a specified type of Stream, we need to explicitly specify the type


Stream<String> streamBuilder = Stream.<String>builder().add("iteblog").add("iteblog_hadoop").add("java").build();

Create a Stream by Stream.generate() The Stream.generate() method receives a Supplier type parameter to generate elements. The size of the generated stream is infinite, so we need to specify the size of the stream generated to avoid the problem of insufficient memory:


Stream<String> streamGenerated = Stream.generate(() -> "iteblog").limit(88);

Create Stream through Stream.iterate() We can also create Stream through Stream.iterate()


Stream<Integer> streamIterated = Stream.iterate(2, n -> n * 2).limit(88);

The first parameter of the Stream.iterate method will be the first value of this Stream, and the second element will be the previous element multiplied by 2. Like the Stream.generate() method, we also need to specify the size of the stream generated to avoid the problem of insufficient memory.

Create Stream by atomic type

The three atomic types of int, long and double in Java 8 can be used to create streams, and the corresponding interfaces are IntStream, LongStream, and DoubleStream respectively.


IntStream intStream = IntStream.range(0, 10);
LongStream longStream = LongStream.rangeClosed(0, 10);
DoubleStream doubleStream = DoubleStream.of(1.0, 2.0);
range(int startInclusive, int endExclusive)

It is equivalent to the following code:


for (long i = startInclusive; i < endExclusive ; i++) { ... }
rangeClosed(int startInclusive, int endInclusive)

It is equivalent to the following code:


for (long i = startInclusive; i <= endInclusive ; i++) { ... }

You should see the difference: the Stream generated by rangeClosed contains the last element, but the range is missing. Of course, the Random class in Java 8 also adds a Stream corresponding to the above three atomic types for us:


Random random = new Random();
IntStream intStream = random.ints(10);
LongStream longs = random.longs(10);
DoubleStream doubleStream = random.doubles(10);

Create Stream from string

The String class in Java 8 provides the chars() method to create a Stream:


IntStream streamOfChars = "abc".chars();

We can also create a Stream through the following methods:


Stream<String> streamOfString = Pattern.compile(", ").splitAsStream("a, b, c");

Create Stream from file

Files in the Java NIO class of Java 8 allows us to create a Stream through the lines() method, and each line of data in the file will become an element in the stream:


Path path = Paths.get("/user/iteblog/test.txt");
Stream<String> streamOfStrings = Files.lines(path);
Stream<String> streamWithCharset = Files.lines(path, Charset.forName("UTF-8"));

Stream reference

The following code is allowed:


Stream<String> stream = Stream.of("iteblog", "iteblog_hadoop", "spark")
                .filter(element -> element.contains("iteblog"));
Optional<String> anyElement = stream.findAny();

We use the stream variable to refer to a defined Stream, this is allowed, and then we use findAny() to manipulate this Stream, which is also runnable. But if we reuse the stream variable, an IllegalStateException will occur during execution:


Optional<String> firstElement = stream.findFirst();

Exception in thread "main" java.lang.IllegalStateException: stream has already been operated upon or closed
    at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:229)
    at java.util.stream.ReferencePipeline.findFirst(ReferencePipeline.java:464)
    at com.java.iteblog.Java8Test.main(Java8Test.java:19)

The above example shows that the references to Java 8 streams are not reusable. The reason for this is because Java's streams are designed to provide a capability to apply a limited sequence of operations to the source of an element in a functional manner, rather than storing the element. If we write like this, it's ok


List<String> iteblogList = Stream.of("iteblog", "iteblog_hadoop", "spark")
                .filter(element -> element.contains("iteblog")).collect(Collectors.toList());

Optional<String> anyElement = iteblogList.stream().findAny();
Optional<String> firstElement = iteblogList.stream().findFirst();

Stream pipeline

To perform a series of operations on the elements of a data source and aggregate their results, three parts are required: data source (source), intermediate operations (intermediate operations) and terminal operations (terminal operation).

Java 8 Stream API Beginner's Tutorial
If you want to learn about Spark, Hadoop or HBase-related articles in time, please follow the WeChat public account: iteblog_hadoop

The intermediate operation returns a new and modified Stream. For example, in the following example, we use the skip() method to skip the first element of the old Stream and return a new Stream named iteblogSkip:


Stream<String> iteblogSkip = Stream.of("iteblog", "iteblog_hadoop", "spark").skip(1);

If you need multiple modification operations, you can link multiple intermediate operations:


Stream<String> iteblogSkip = Stream.of("iteblog", "iteblog_hadoop", "spark")
                .skip(1).map(element -> element.substring(0, 3));

In the above example, we used both skip() and map() methods, and got a new Stream reference.

The stream itself is of no value. What users are really interested in is the result of terminal operations. It can be a certain type of value, or it can be an operation applied to each element of the stream. Only one terminal operation can be used per stream. The correct and most convenient way to use stream is through the stream pipeline, which is a chain of data sources, intermediate operations and terminal operations, for example:


long count = Stream.of("iteblog", "iteblog_hadoop", "spark")
                .skip(1).map(element -> element.substring(0, 3))
                .count();

Lazy Invocation

Java 8 Stream API Beginner's Tutorial If you want to learn about Spark, Hadoop or HBase-related articles in time, please follow the WeChat public account: iteblog_hadoop

Intermediate operations are lazy, which means they are only called when required by the terminal operation execution. To illustrate this, suppose we have a method called wasCalled(), and its internal counter is incremented every time it is called:


private static long counter;

private static void wasCalled() {
    counter++;
}

Now we call the wasCalled() method in the filter() intermediate operation:


List<String> list = Arrays.asList("iteblog", "iteblog_hadoop", "spark");
counter = 0;
Stream<String> stream = list.stream().filter(element -> {
            wasCalled();
            return element.contains("iteblog");
});

System.out.println(counter);

There are three elements in our data source list, and then we call the filter() method on this data source. According to reason, the filter() method should be called once for each element, so that the value of the counter ratio variable should be 3. But if we run the above code, you will find that the value of counter is still 0! That is to say, the filter() method is not called at all. The reason is that the intermediate operation is lazy, and the filter() method is executed only when the terminal operation is added.

We modify the above code to the following code:


List<String> list = Arrays.asList("iteblog", "iteblog_hadoop", "spark");

list.stream().filter(element -> {
            System.out.println("filter() was called");
            return element.contains("hadoop");
}).map(element -> {
            System.out.println("map() was called");
            return element.toUpperCase();
}).findFirst();

输出：
filter() was called
filter() was called
map() was called

It can be seen that in the terminal, filter() was called twice and map() was called once. In other words, the filter() function is called twice; the map() function is called once. The Stream pipeline is executed vertically. In our example, first run filter() and then map(). Only when filter() returns true will map() be called, and then findFirst() only needs to find the first one that satisfies The element can terminate the operation of the program.

Stream running sequence

From a performance point of view, the chaining order of different operations in the Stream pipeline is important. The results of the following two code snippets are the same, but the following code is recommended.


long size = list.stream().map(element -> {
    wasCalled();
    return element.substring(0, 3);
}).skip(2).count();

long size = list.stream().skip(2).map(element -> {
    wasCalled();
    return element.substring(0, 3);
}).count();

Because the first code snippet runs map() three times, and the second code snippet runs map() only once. Therefore, when writing Java Stream programs, the recommended order of Stream pipelines is: skip() -> filter() -> distinct().

Stream aggregation

The Stream API has many terminal operations that aggregate Stream into an atomic type of data, such as count(), max(), min(), sum(), etc., but these operations work according to a predefined implementation. What if developers need to customize the aggregation logic of Stream? This is the reduce() and collect() methods that this summary will introduce.

Introduction to reduce() method

reduce() has three overloaded methods, but they all receive the following types of parameters:

•Identity: the initial value of the accumulator, if the stream is empty and there is no content to be accumulated, it is the default value;
•accumulator: a function that specifies the element aggregation logic. When accumulator aggregates each element in the data source, it will generate a new temporary object. The number of generated objects is equal to the number of elements in the data source, but only the last value is useful, which is not very good for performance. of.
• Combiner: A function that aggregates the results of the accumulator. Combiner will be called only in parallel mode to reduce the results of accumulators from different threads. Now let's take a look at how to use reduce() three methods: Example 1


OptionalInt sum = IntStream.range(1, 4).reduce((a, b) -> a + b);
System.out.println(sum);

输出：
OptionalInt[6]（也就是 1 + 2 + 3）

Example two


int reducedTwoParams = IntStream.range(1, 4).reduce(10, (a, b) -> a + b);
System.out.println(reducedTwoParams);

输出：
16（也就是 10 + 1 + 2 + 3）

Example three


int reducedParams = Stream.of(1, 2, 3)
  .reduce(10, (a, b) -> a + b, (a, b) -> {
     System.out.println("combiner was called");
     return a + b;
  });

System.out.println(reducedParams);

输出：
16（也就是 10 + 1 + 2 + 3）

It can be seen that in example three, although we specified the combiner, the console did not output the combiner was called, which means that the above combiner was not actually called. If we want to call the combiner, we can modify it as follows:


int reducedParallel = Arrays.asList(1, 2, 3).parallelStream()
    .reduce(10, (a, b) -> a + b, (a, b) -> {
       System.out.println("combiner was called");
       return a + b;
    });

System.out.println(reducedParallel);

输出：
combiner was called
combiner was called
36

It can be seen that the output result this time is 36, and the output of the combiner was called twice. The reason is 36 because the above program calls accumulator for each element first, that is, calls accumulator three times, and then adds it to the initial value of the accumulator. Because this actions are executed in parallel, the accumulator is called three times. The result is (10 + 1 = 11; 10 + 2 = 12; 10 + 3 = 13). Now we call combiner to add the above three results (12 + 13 = 25; 25 + 11 = 36) so we get 36.

Introduction to collect() method

The collect() method also provides the logic implementation related to aggregation. Its function signature is R collect(Collector<? super T, A, R> collector). Java 8 provides most commonly used collector logic implementations, which we can use directly . In order to illustrate how to use, we still provide some examples:


static class Product {
        private int price;
        private String name;

        Product(int price, String name) {
            this.price = price;
            this.name = name;
        }

        public int getPrice() {
            return price;
        }

        public void setPrice(int price) {
            this.price = price;
        }

        public String getName() {
            return name;
        }

        public void setName(String name) {
            this.name = name;
        }
}

List<Product> productList = Arrays.asList(new Product(23, "potatoes"),
  new Product(14, "orange"), new Product(13, "lemon"),
  new Product(23, "bread"), new Product(13, "sugar"));

Take out all the product names in productList and convert them into list:


List<String> collectorCollection = productList.stream().map(Product::getName).collect(Collectors.toList());

Take out all the product names in productList and combine them into a string:


String listToString = productList.stream().map(Product::getName)
                .collect(Collectors.joining(", ", "[", "]"));

Calculate the average price of all products in productList:


double averagePrice = productList.stream().collect(Collectors.averagingInt(Product::getPrice));

Calculate the total price of all products in productList:


int summingPrice = productList.stream().collect(Collectors.summingInt(Product::getPrice));

Calculate all the product statistics in productList:


IntSummaryStatistics statistics = productList.stream().collect(Collectors.summarizingInt(Product::getPrice));
System.out.println(statistics);

Output:


IntSummaryStatistics{count=5, sum=86, min=13, average=17.200000, max=23}

Categorize products according to their price


Map<Integer, List<Product>> collectorMapOfLists = productList.stream()
                .collect(Collectors.groupingBy(Product::getPrice));

The result of the above is that the products with the same price are all placed in the same List. Group products according to related logic


Map<Boolean, List<Product>> mapPartioned = productList.stream()
  .collect(Collectors.partitioningBy(element -> element.getPrice() > 15));

The result of the above program is that the price is greater than 15 put in a List. Convert list to set


Set<Product> unmodifiableSet = productList.stream()
  .collect(Collectors.collectingAndThen(Collectors.toSet(),
  Collections::unmodifiableSet));

Custom collector

There are always some reasons why the API that comes with the system cannot meet our needs. At this time, we can customize the collector. For example, below we have customized a collector and put all the products in the LinkedList:


Collector<Product, ?, LinkedList<Product>> toLinkedList =
  Collector.of(LinkedList::new, LinkedList::add, 
    (first, second) -> { 
       first.addAll(second); 
       return first; 
    });

LinkedList<Product> linkedListOfPersons = productList.stream().collect(toLinkedList);

to sum up

Stream API is a powerful but easy-to-understand tool for processing element sequences. It allows us to reduce a lot of code, create more readable programs, and can increase the productivity of the application.