Java8 In Action-2.函数式数据处理(三)

3.用流收集数据
你可以把Java 8的流看作花哨又懒惰的数据集迭代器。它们支持两种类型的操作：中间操作（如filter或map）和终端操作（如count、 findFirst、 forEach和reduce）。中间操作可以链接起来，将一个流转换为另一个流。这些操作不会消耗流，其目的是建立一个流水线。与此相反，终端操作会消耗流，以产生一个最终结果，例如返回流中的最大元素。它们通常可以通过优化流水线来缩短计算时间。

3.1收集器
函数式编程相对于指令式编程的一个主要优势：你只需指出希望的结果——“做什么”，而不用操心执行的步骤——“如何做”。

收集器用作高级归约

//1.java8之前的指令式代码
Map<Currency, List<Transaction>> transactionsByCurrencies = new HashMap<>();
for (Transaction transaction : transactions) {
	Currency currency = transaction.getCurrency();
	List<Transaction> transactionsForCurrency = transactionsByCurrencies.get(currency);
	if (transactionsForCurrency == null) {
		transactionsForCurrency = new ArrayList<>();
		transactionsByCurrencies.put(currency, transactionsForCurrency);
}
transactionsForCurrency.add(transaction);
}

//java8后的函数式编程
Map<Currency, List<Transaction>> transactionsByCurrencies=transactions.stream().collect(Collectors.groupingBy(Transaction::getCurrency));

在这里插入图片描述

预定义收集器
从Collectors类提供的工厂方法（例如groupingBy）创建的收集器。它们主要提供了三大功能：

将流元素归约和汇总为一个值
元素分组
元素分区

归约和汇总

        List<Apple> list = Arrays.asList(new Apple("g", 1), new Apple("r", 2), new Apple("g", 3));
        //数量统计
        //查找颜色为绿色的苹果个数
        long g = list.stream().filter(a -> Objects.equals("g", a.getColor())).count();
        Long g1 = list.stream().filter(a -> Objects.equals("g", a.getColor())).collect(Collectors.counting());
        System.out.println(g + " " + g1);
        //查找流中的最大值和最小值 Collectors.maxBy和Collectors.minBy
        //查找重量最重的苹果
        Optional<Apple> apple = list.stream().collect(Collectors.maxBy(Comparator.comparing(Apple::getWeight)));
        System.out.println(apple.get());
        /**
         * 汇总
         * 1.Collectors类专门为汇总提供了一个工厂方法：Collectors.summingInt。它可接受一
         *   个把对象映射为求和所需int的函数，并返回一个收集器；该收集器在传递给普通的collect方
         *   法后即执行我们需要的汇总操作。
         * 2.Collectors.summingLong和Collectors.summingDouble方法的作用完全一样，可以用
         *   于求和字段为long或double的情况。
         * 3.Collectors.averagingInt，连同对应的averagingLong和
         *   averagingDouble可以计算数值的平均数
         */
        //求出所有苹果的总重
        Integer totalWeight = list.stream().collect(Collectors.summingInt(Apple::getWeight));
        //之前的做法 map-reduce
        Integer totalWeight2 = list.stream().map(Apple::getWeight).reduce(0, Integer::sum);
        System.out.println(totalWeight + " " + totalWeight2);
        //求出平均重量
        Double averWeight = list.stream().collect(Collectors.averagingDouble(Apple::getWeight));
        System.out.println(averWeight);
        System.out.println("=====================");
        /**
         * 通过Collectors.summarizingInt工厂方法在一次操作中求出以上值
         * 相应的summarizingLong和summarizingDouble工厂方法有相关的LongSummary-
         * Statistics和DoubleSummaryStatistics类型，适用于收集的属性是原始类型long或double的情况。
         */
        IntSummaryStatistics statistics = list.stream().collect(Collectors.summarizingInt(Apple::getWeight));
        long totalWeight3 = statistics.getSum();
        double averWeight2 = statistics.getAverage();
        System.out.println(totalWeight3 + " " + averWeight2);
        System.out.println("========================");
        /**
         * 连接字符串 Collectors.joining()
         * joining工厂方法返回的收集器会把对流中每一个对象应用toString方法得到的所有字符串连接成一个字符串。
         * joining在内部使用了StringBuilder来把生成的字符串逐个追加起来
         */
        List<Trader> traderList = Arrays.asList(new Trader("a", "beijing"), new Trader("b", "shanghai"));
        //将每个交易员的姓名连接成一个字符串
        String nameStr = traderList.stream().map(Trader::getName).collect(Collectors.joining());
        System.out.println(nameStr);
        //将每个交易员的姓名连接成一个字符串,用逗号分隔
        String nameStr1 = traderList.stream().map(Trader::getName).collect(Collectors.joining(","));
        System.out.println(nameStr1);
        System.out.println("===========================");

        /**
         * 广义的归约汇总
         * 事实上，我们已经讨论的所有收集器，都是一个可以用reducing工厂方法定义的归约过程
         * 的特殊情况而已。Collectors.reducing工厂方法是所有这些特殊情况的一般化。
         */
        //求出所有苹果的总重
        Integer totalWeight4 = list.stream().collect(Collectors.reducing(0, Apple::getWeight, (i, j) -> i + j));
        Integer totalWeight5 = list.stream().collect(Collectors.reducing(
                                            0,  //初始值
                                                    Apple::getWeight, //转换函数
                                                    Integer::sum));   //累积函数
        Integer totalWeight6 = list.stream().map(Apple::getWeight).reduce(Integer::sum).get();
        //更简洁的方法,性能更好,避免自动拆箱操作,，也就是从Integer到int的隐式转换，它在这里毫无用处。
        int totalWeight7 = list.stream().mapToInt(Apple::getWeight).sum();
        System.out.print(totalWeight4 + " " + totalWeight5 + " " + totalWeight6 + " " + totalWeight7);
        //求出最重的苹果
        Optional<Apple> highestWeightApple = list.stream().collect(Collectors.reducing((a, b) -> a.getWeight() > b.getWeight() ? a : b));
        System.out.println(highestWeightApple.get());
        System.out.println("==========================");

        //错误的使用reduce()方法的示例
        Stream<Integer> stream = Arrays.asList(1, 2, 3, 4, 5, 6).stream();
        List<Integer> numbers = stream.reduce(
                new ArrayList<>(),
                (List<Integer> l, Integer e) -> {
                    l.add(e);
                    return l; },
                (List<Integer> l1, List<Integer> l2) -> {
                    l1.addAll(l2);
                    return l1; });
        numbers.forEach(a -> System.out.print(a + ","));
        //正确的示例,使用collect方法
        List<Integer> numbers2 =  Arrays.asList(1, 2, 3, 4, 5, 6).stream().collect(Collectors.toList());
        /**
         * Stream接口的collect()和reduce()方法有何不同?
         * 上面错误的示例中有两个问题:
         * 一个语义问题和一个实际问题。语义问题在于，reduce方法旨在把两个值结合起来生成一个新值，它是一个不可变的归约。
         * 与此相反，collect方法的设计就是要改变容器，从而累积要输出的结果。
         * 这意味着，上面的代码片段是在滥用reduce方法，因为它在原地改变了作为累加器的List。
         * 以错误的语义使用reduce方法还会造成一个实际问题：这个归约过程不能并行工作，因为由多个线程并发
         * 修改同一个数据结构可能会破坏List本身。在这种情况下，如果你想要线程安全，就需要每
         * 次分配一个新的List，而对象分配又会影响性能。这就是collect方法特别适合表达可变容
         * 器上的归约的原因，更关键的是它适合并行操作.
         */
        System.out.println("=======================");
        //用reducing连接字符串
        String color = list.stream().map(Apple::getColor).collect(Collectors.joining());
        String color1 = list.stream().map(Apple::getColor).collect(Collectors.reducing((c1,c2) -> c1 + c2)).get();
        String color2 = list.stream().collect(Collectors.reducing("",Apple::getColor,(c1,c2) -> c1 + c2));
        System.out.println(color + " " + color1 + " " + color2);

        /**
         *  一个常见的数据库操作是根据一个或多个属性对集合中的项目进行分组。就像前面讲到按货
         * 币对交易进行分组的例子一样，如果用指令式风格来实现的话，这个操作可能会很麻烦、啰嗦而
         * 且容易出错。但是，如果用Java 8所推崇的函数式风格来重写的话，就很容易转化为一个非常容
         * 易看懂的语句。
         * 分组操作的结果是一个Map，把分组函数返回的值作为映射的键，把流中
         * 所有具有这个分类值的项目的列表作为对应的映射值
         */
        List<Apple> list = Arrays.asList(new Apple("g", 2),new Apple("g", 8), new Apple("r", 16), new Apple("g", 21));
        //按照颜色给苹果分类
        Map<String, List<Apple>> colorMap = list.stream().collect(Collectors.groupingBy(Apple::getColor));
        colorMap.forEach((k, v) -> System.out.println(v.size()));
        //按不同的重量等级给苹果分组  [0,10] 轻量级 [11,20] 普通级 [21,∞] 重量级
        Map<String, List<Apple>> appleList = list.stream().collect(Collectors.groupingBy(apple -> {
            if (apple.getWeight() <= 10) return "l";
            else if (11 <= apple.getWeight() && apple.getWeight() <= 20) return "c";
            else return "h";
        }));
        appleList.forEach((k, v) -> {
            System.out.println(k);
            v.forEach(System.out::print);
            System.out.println();
        });
        /**
         * 多级分组
         * 要实现多级分组，我们可以使用一个由双参数版本的Collectors.groupingBy工厂方法创
         * 建的收集器，它除了普通的分类函数之外，还可以接受collector类型的第二个参数。那么要进
         * 行二级分组的话，我们可以把一个内层groupingBy传递给外层groupingBy，并定义一个为流
         * 中项目分类的二级标准
         * 普通的单参数groupingBy(f)（其中f是分类函数）实际上是groupingBy(f,toList())的简便写法
         */
        //先按照颜色给苹果分组,再按照重量分组
        Map<String, Map<String, List<Apple>>> map = list.stream().collect(Collectors.groupingBy(Apple::getColor, Collectors.groupingBy(apple -> {
            if (apple.getWeight() <= 10) return "l";
            else if (11 <= apple.getWeight() && apple.getWeight() <= 20) return "c";
            else return "h";
        })));
        map.forEach((k,v) -> {
            v.forEach((k1,v1) -> {
                System.out.println(k + "." + k1);
                v1.forEach(System.out::print);
                System.out.println();
            });
        });
        //统计不同颜色的苹果分别有多少个
        Map<String, Long> map1 = list.stream().collect(Collectors.groupingBy(Apple::getColor, Collectors.counting()));
        map1.forEach((k,v) -> System.out.println(k + " " + v));
        //按照颜色给苹果分组,并找出每组最重的苹果
        Map<String, Optional<Apple>> map2 = list.stream().collect(Collectors.groupingBy(Apple::getColor, Collectors.maxBy(Comparator.comparing(Apple::getWeight))));
        map2.forEach((k,v) -> System.out.println(k + " " + v.get()));
        /**
         * 把收集器的结果转换为另一种类型
         */
        Map<String, Apple> map3 = list.stream().collect(Collectors.groupingBy(Apple::getColor, Collectors.collectingAndThen(
                Collectors.maxBy(Comparator.comparing(Apple::getWeight)),
                Optional::get  //包装后的收集器
        )));
        map3.forEach((k,v) -> System.out.println(k + " " + v));
        System.out.println("+++++++++++++++++++++++");
        //不同颜色组的苹果都有哪些重量级别
        Map<String, Set<String>> map4 = list.stream().collect(Collectors.groupingBy(Apple::getColor, Collectors.mapping(apple -> {
            if (apple.getWeight() <= 10) return "l";
            else if (11 <= apple.getWeight() && apple.getWeight() <= 20) return "c";
            else return "h";
        }, Collectors.toSet())));
        map4.forEach((k,v) -> {
            System.out.print(k + ":");
            v.forEach((v1) -> {
                System.out.print(v1 + " ");
            });
            System.out.println();
        });

 /**
         * 分区
         * 分区是分组的特殊情况：由一个谓词（返回一个布尔值的函数）作为分类函数，它称分区函数。
         * 分区函数返回一个布尔值，这意味着得到的分组Map的键类型是Boolean，于是它最多可以
         * 分为两组——true是一组，false是一组。
         * 分区的好处在于保留了分区函数返回true或false的两套流元素列表
         */
        //给苹果分区:是红苹果和不是红苹果
        List<Apple> list = Arrays.asList(new Apple("g", 2), new Apple("g", 8), new Apple("r", 16), new Apple("g", 21));
        Map<Boolean, List<Apple>> map = list.stream().collect(Collectors.partitioningBy(a -> Objects.equals("r", a.getColor())));
        map.forEach((k,v) -> System.out.println(k + ":" + v));
        //或者先筛选,再收集
        List<Apple> redAppleList = list.stream().filter(a -> Objects.equals("r", a.getColor())).collect(Collectors.toList());
        System.out.println("=================");
        //给苹果分区:是红苹果和不是红苹果,在每个分区中再按照重量分组(先分区后分组)
        Map<Boolean, Map<Integer, List<Apple>>> map1 = list.stream().collect(Collectors.partitioningBy(a -> Objects.equals("r", a.getColor()), Collectors.groupingBy(Apple::getWeight)));
        map1.forEach((k,v) -> System.out.println(k + "=" +v));
        System.out.println("==================");
        //找到红苹果和非红苹果中最重的苹果
        Map<Boolean, Apple> map2 = list.stream().collect(
                Collectors.partitioningBy(
                        a -> Objects.equals("r", a.getColor()),
                        Collectors.collectingAndThen(
                                Collectors.maxBy(Comparator.comparing(Apple::getWeight)),
                                Optional::get
                        )
                )
        );
        map2.forEach((k,v) -> System.out.println(k + "=" + v));
        System.out.println("==================");
        //给苹果分区:是红苹果和不是红苹果,然后再按重量>10继续分区
        Map<Boolean, Map<Boolean, List<Apple>>> map3 = list.stream().collect(
                Collectors.partitioningBy(
                        a -> Objects.equals("r", a.getColor()),
                        Collectors.partitioningBy(
                                b -> b.getWeight() > 10
                        )
                )
        );
        map3.forEach((k,v) -> System.out.println(k + "=" + v));
        //分别统计红苹果和非红苹果的个数
        Map<Boolean, Long> map4 = list.stream().collect(
                Collectors.partitioningBy(
                        a -> Objects.equals("r", a.getColor()),
                        Collectors.counting()
                )
        );
        map4.forEach((k,v) -> System.out.println(k + "=" + v));
        System.out.println("======================");

        //将数字按质数和非质数分区
        Map<Boolean, List<Integer>> map5 = IntStream.rangeClosed(2, 100).boxed().collect(
                Collectors.partitioningBy(n ->
                        IntStream.rangeClosed(2, (int) Math.sqrt((double) n)).noneMatch(i -> n % i == 0)
                )
        );
        map5.forEach((k,v) -> System.out.println(k + "=" + v));

在这里插入图片描述

public static void main(String[] args) throws IOException {
        /**
         * 收集器接口
         * Collector接口包含了一系列方法，为实现具体的归约操作（即收集器）提供了范本
         * public interface Collector<T, A, R> {
         *  Supplier<A> supplier();
         *  BiConsumer<A, T> accumulator();
         *  Function<A, R> finisher();
         *  BinaryOperator<A> combiner();
         *  Set<Characteristics> characteristics();
         * }
         * T是流中要收集的项目的泛型。
         * A是累加器的类型，累加器是在收集过程中用于累积部分结果的对象。
         * R是收集操作得到的对象（通常但并不一定是集合）的类型。
         */
        //自定义收集器获得更好的性能
        Map<Boolean, List<Integer>> map = partitionPrimesWithCustomCollector(100);
        map.forEach((k,v) -> System.out.println(k + "=" + v));
    }

    public static Map<Boolean, List<Integer>> partitionPrimesWithCustomCollector(int n){
        //collect方法重载自定义核心逻辑
        return IntStream.rangeClosed(2, 100).boxed().collect(
                () -> new HashMap<Boolean, List<Integer>>() {{   //供应源
                    put(Boolean.TRUE, new ArrayList<>());
                    put(Boolean.FALSE, new ArrayList<>());
                }},
                (acc, candidate) -> {
                    acc.get(isPrime(acc.get(Boolean.TRUE), candidate)).add(candidate);
                },  //累加器
                (map1, map2) -> {  //组合器
                    map1.get(Boolean.TRUE).addAll(map2.get(Boolean.TRUE));
                    map1.get(Boolean.FALSE).addAll(map2.get(Boolean.FALSE));
                }
        );
    }

    public static boolean isPrime(List<Integer> primes, int candidate){
        int candidateRoot = (int) Math.sqrt((double) candidate);
        return takeWhile(primes, i -> i <= candidateRoot)
                .stream()
                .noneMatch(p -> candidate % p == 0);
    }

    public static <A> List<A> takeWhile(List<A> list, Predicate<A> p) {
        int i = 0;
        for (A item : list) {
            if (!p.test(item)) {  //检查列表中的当前项目是否满足谓词
                return list.subList(0, i);  //如果不满足，返回该项目之前的前缀子列表
            }
            i++;
        }
        return list;//列表中的所有项目都满足谓词，返回列表本身
    }

小结:

collect是一个终端操作，它接受的参数是将流中元素累积到汇总结果的各种方式（称为收集器）。
预定义收集器包括将流元素归约和汇总到一个值，例如计算最小值、最大值或平均值。
预定义收集器可以用groupingBy对流中元素进行分组，或用partitioningBy进行分区。
收集器可以高效地复合起来，进行多级分组、分区和归约。
你可以实现Collector接口中定义的方法来开发你自己的收集器。

Java8 In Action-2.函数式数据处理(三)

猜你喜欢