Stream Collectors.groupingBy的四种用法解决分组统计（计数、求和、平均数等）、范围统计、分组合并、分组结果自定义映射等问题

前言

近期，由于业务需要，会统计一些简单的页面指标，如果每个统计都通过SQL实现的话，又略感枯燥乏味。于是选择使用Stream的分组功能。对于这些简单的统计指标来说，Stream的分组更为灵活，只需要提取出需要统计的数据，便可以对这些数据进行任意处理，而无需再次编写不同的SQL去统计不同的指标。

此文主要是总结我在此前的工作中使用到的Collectors.groupingBy的一些方法和技巧。根据平时使用的习惯，将Collectors.groupingBy的功能大致分为四种，但这种界定都是模糊的，并不是绝对，每种功能都可以穿插使用，这里只是更方便了解Collectors.groupingBy各个方法的使用规则。

四种分组功能如下：

基础分组功能
分组统计功能
分组合并功能
分组自定义映射功能

Stream的其它用法可以参考下文：

超详细的Java8 Stream使用方法：筛选、排序、最大值、最小值、计数求和平均数、分组、合并、映射、去重等

语法说明

基础语法

Collector<T, ?, Map<K, List<T>>> groupingBy(Function<? super T, ? extends K> classifier)

Collector<T, ?, Map<K, D>> groupingBy(Function<? super T, ? extends K> classifier, Collector<? super T, A, D> downstream)

Collector<T, ?, M> groupingBy(Function<? super T, ? extends K> classifier, Supplier<M> mapFactory, Collector<? super T, A, D> downstream)

classifier：键映射：该方法的返回值是键值对的键

mapFactory：无参构造函数提供返回类型：提供一个容器初始化方法，用于创建新的 Map容器（使用该容器存放值对）。

downstream：值映射：通过聚合方法将同键下的结果聚合为指定类型，该方法返回的是键值对的值。

前置数据

List<Student> students = Stream.of(
        Student.builder().name("小张").age(16).clazz("高一1班").course("历史").score(88).build(),
        Student.builder().name("小李").age(16).clazz("高一3班").course("数学").score(12).build(),
        Student.builder().name("小王").age(17).clazz("高二1班").course("地理").score(44).build(),
        Student.builder().name("小红").age(18).clazz("高二1班").course("物理").score(67).build(),
        Student.builder().name("李华").age(15).clazz("高二2班").course("数学").score(99).build(),
        Student.builder().name("小潘").age(19).clazz("高三4班").course("英语").score(100).build(),
        Student.builder().name("小聂").age(20).clazz("高三4班").course("物理").score(32).build()
).collect(Collectors.toList());

分组的4种使用方法

1. 基础分组功能

说明：基础功能，分组并返回Map容器。将用户自定义的元素作为键，同时将键相同的元素存放在List中作为值。

Collectors.groupingBy：基础分组功能

下面的写法都是等价的

// 将不同课程的学生进行分类
Map<String, List<Student>> groupByCourse = students.stream().collect(Collectors.groupingBy(Student::getCourse));
Map<String, List<Student>> groupByCourse1 = students.stream().collect(Collectors.groupingBy(Student::getCourse, Collectors.toList()));
// 上面的方法中容器类型和值类型都是默认指定的，容器类型为：HashMap，值类型为：ArrayList
// 可以通过下面的方法自定义返回结果、值的类型
Map<String, List<Student>> groupByCourse2 = students.stream()
        .collect(Collectors.groupingBy(Student::getCourse, HashMap::new, Collectors.toList()));

键类型、容器类型、值类型都可以进行自定义，一般来说键值类型都可以根据需要自定义结果，而容器类型则只能设置为Map(M extends Map<K, D>)的子类。
容器类型只能设置为Map类型，一般可以根据Map实现类的不同特性选择合适的容器：Hashmap LinkedHashMap ConcurrentHashMap WeakHashMap TreeMap Hashtable等等。
如需要保证students分组后的有序性的话，那么可以自定义容器类型为LinkedHashMap。

Collectors.groupingBy：自定义键——字段映射

一般而言，我们都是对一批Java对象进行分组，根据需求我们可能会选择其中的一个或多个字段，也可能会根据一些字段格式化操作，以此生成键。

例如：

身份证、手机号、ID
年份、月份、指定格式的日期
多个ID组合
日期 + 类型属性
……

// 字段映射 分组显示每个课程的学生信息
Map<String, List<Student>> filedKey = students.stream().collect(Collectors.groupingBy(Student::getCourse));
// 组合字段 分组现实每个班不同课程的学生信息
Map<String, List<Student>> combineFiledKey = students.stream().collect(Collectors.groupingBy(student -> student.getClazz() + "#" + student.getCourse()));

Collectors.groupingBy：自定义键——范围

有时候除了根据指定字段外，我们还需要根据对不同区间内的数据设置不同的键，区别于字段，这种范围类型的键多数情况下都是通过比较来生成的，常用于统计指标。

例如：

对是否有某种属性、类型进行统计
统计多个区间内的人数、比例

// 根据两级范围 将学生划分及格不及格两类
Map<Boolean, List<Student>> customRangeKey = students.stream().collect(Collectors.groupingBy(student -> student.getScore() > 60));
// 根据多级范围 根据学生成绩来评分
Map<String, List<Student>> customMultiRangeKey = students.stream().collect(Collectors.groupingBy(student -> {
    
    
    if (student.getScore() < 60) {
    
    
        return "C";
    } else if (student.getScore() < 80) {
    
    
        return "B";
    }
    return "A";
}));

后文剩下的三个功能点其作用都是自定义值的类型，它们都基于第三个参数：Collector<? super T, A, D> downstream。它们都是通过实现Collector接口来实现各种downstream操作，从而完成值的自定义设置。

2. 分组统计功能

说明：分组后，对同一分组内的元素进行计算：计数、平均值、求和、最大最小值、范围内数据统计。

Collectors.counting：计数

计数语法：
Collector<T, ?, Long> counting()

// 计数
Map<String, Long> groupCount = students.stream()
        .collect(Collectors.groupingBy(Student::getCourse, Collectors.counting()));

Collectors.summingInt：求和

求和语法：
Collector<T, ?, Integer> summingInt(ToIntFunction<? super T> mapper)
Collector<T, ?, Long> summingLong(ToLongFunction<? super T> mapper)
Collector<T, ?, Double> summingDouble(ToDoubleFunction<? super T> mapper)

求和针对流中元素类型的不同，分别提供了三种计算方式：Int、Double、Long。计算方式与计算结果必须与元素类型匹配。

// 求和
Map<String, Integer> groupSum = students.stream()
        .collect(Collectors.groupingBy(Student::getCourse, Collectors.summingInt(Student::getScore)));

Collectors.averagingInt：平均值

平均值语法：
Collector<T, ?, Double> averagingInt(ToIntFunction<? super T> mapper)
Collector<T, ?, Double> averagingLong(ToLongFunction<? super T> mapper)
Collector<T, ?, Double> averagingDouble(ToDoubleFunction<? super T> mapper)

平均值计算关注点：

平均值有三种计算方式：Int、Double、Long。
计算方式仅对计算结果的精度有影响。
计算结果始终返回Double。

// 增加平均值计算
Map<String, Double> groupAverage = students.stream()
        .collect(Collectors.groupingBy(Student::getCourse, Collectors.averagingInt(Student::getScore)));

Collectors.minBy：最大最小值

最大最少值语法：
Collector<T, ?, Optional<T>> minBy(Comparator<? super T> comparator)
Collector<T, ?, Optional<T>> maxBy(Comparator<? super T> comparator)

Collectors.collectingAndThen语法：
Collector<T,A,RR> collectingAndThen(Collector<T,A,R> downstream, Function<R,RR> finisher)

Function<R,RR>：提供参数类型为R，返回结果类型为RR。

Collectors.minBy方法返回的类型为Optional<T>>，在取数据时还需要校验Optional是否为空。

不过这一步可以通过Collectors.collectingAndThen方法实现，并返回校验结果。Collectors.collectingAndThen的作用便是在使用聚合函数之后，对聚合函数的结果进行再加工。

// 同组最小值
Map<String, Optional<Student>> groupMin = students.stream()
        .collect(Collectors.groupingBy(Student::getCourse,Collectors.minBy(Comparator.comparing(Student::getCourse))));
// 使用Collectors.collectingAndThen方法，处理Optional类型的数据
Map<String, Student> groupMin2 = students.stream()
        .collect(Collectors.groupingBy(Student::getCourse,
        Collectors.collectingAndThen(Collectors.minBy(Comparator.comparing(Student::getCourse)), op ->op.orElse(null))));
// 同组最大值
Map<String, Optional<Student>> groupMax = students.stream()
        .collect(Collectors.groupingBy(Student::getCourse,Collectors.maxBy(Comparator.comparing(Student::getCourse))));

Collectors.summarizingInt：完整统计（同时获取以上的全部统计结果）

完整统计语法：
Collector<T, ?, IntSummaryStatistics> summarizingInt(ToIntFunction<? super T> mapper)
Collector<T, ?, LongSummaryStatistics> summarizingLong(ToLongFunction<? super T> mapper)
Collector<T, ?, DoubleSummaryStatistics> summarizingDouble(ToDoubleFunction<? super T> mapper)

统计方法提供了三种计算方式：Int、Double、Long。它会将输入元素转为上述三种计算方式的基本类型，然后进行计算。Collectors.summarizingXXX方法可以计算一般统计所需的所有结果。

无法向下转型，即Long无法转Int等。

返回结果取决于用的哪种计算方式。

// 统计方法同时统计同组的最大值、最小值、计数、求和、平均数信息
HashMap<String, IntSummaryStatistics> groupStat = students.stream()
        .collect(Collectors.groupingBy(Student::getCourse, HashMap::new,Collectors.summarizingInt(Student::getScore)));
groupStat.forEach((k, v) -> {
    
    
	// 返回结果取决于用的哪种计算方式
    v.getAverage();
    v.getCount();
    v.getMax();
    v.getMin();
    v.getSum();
});

Collectors.partitioningBy：范围统计

Collectors.partitioningBy语法：
Collector<T, ?, Map<Boolean, D>> partitioningBy(Predicate<? super T> predicate)
Collector<T, ?, Map<Boolean, D>> partitioningBy(Predicate<? super T> predicate, Collector<? super T, A, D> downstream)

predicate：条件参数，对分组的结果划分为两个范围。

上面的统计都是基于某个指标项的。如果我们需要统计范围，比如：得分大于、小于60分的人的信息，那么我们可以通过Collectors.partitioningBy方法对映射结果进一步切分

// 切分结果，同时统计大于60和小于60分的人的信息
Map<String, Map<Boolean, List<Student>>> groupPartition = students.stream()
        .collect(Collectors.groupingBy(Student::getCourse, Collectors.partitioningBy(s -> s.getScore() > 60)));
// 同样的，我们还可以对上面两个分组的人数数据进行统计
Map<String, Map<Boolean, Long>> groupPartitionCount = students.stream()
        .collect(Collectors.groupingBy(Student::getCourse, Collectors.partitioningBy(s -> s.getScore() > 60, Collectors.counting())));

Collectors.partitioningBy仅支持将数据划分为两个范围进行统计，如果需要划分多个，可以嵌套Collectors.partitioningBy执行，不过需要在执行完后，手动处理不需要的数据。也可以在第一次Collectors.partitioningBy获取结果后，再分别对该结果进行范围统计。

Map<String, Map<Boolean, Map<Boolean, List<Student>>>> groupAngPartitionCount = students.stream()
        .collect(Collectors.groupingBy(Student::getCourse, Collectors.partitioningBy(s -> s.getScore() > 60,
                Collectors.partitioningBy(s -> s.getScore() > 90))));

3. 分组合并功能

说明：将同一个键下的值，通过不同的方法最后合并为一条数据。

Collectors.reducing：合并分组结果

Collectors.reducing语法：
Collector<T, ?, Optional> reducing(BinaryOperator op)
Collector<T, ?, T> reducing(T identity, BinaryOperator op)
Collector<T, ?, U> reducing(U identity, Function<? super T, ? extends U> mapper, BinaryOperator op)

identity：合并标识值（因子），它将参与累加函数和合并函数的运算（即提供一个默认值，在流为空时返回该值，当流不为空时，该值作为起始值，参与每一次累加或合并计算）

mapper：映射流中的某个元素，并根据此元素进行合并。

op：合并函数，将mapper映射的元素，进行两两合并，最初的一个元素将于合并标识值进行合并。

// 合并结果，计算每科总分
Map<String, Integer> groupCalcSum = students.stream()
        .collect(Collectors.groupingBy(Student::getCourse, Collectors.reducing(0, Student::getScore, Integer::sum)));
// 合并结果，获取每科最高分的学生信息
Map<String, Optional<Student>> groupCourseMax = students.stream()
        .collect(Collectors.groupingBy(Student::getCourse, Collectors.reducing(BinaryOperator.maxBy(Comparator.comparing(Student::getScore)))));

Collectors.joining：合并字符串

Collectors.joining语法：
Collector<CharSequence, ?, String> joining()
Collector<CharSequence, ?, String> joining(CharSequence delimiter)
Collector<CharSequence, ?, String> joining(CharSequence delimiter, CharSequence prefix, CharSequence suffix)

delimiter：分隔符

prefix：每个字符的前缀

suffix：每个字符的后缀

Collectors.joining只能对字符进行操作，因此一般会与其它downstream方法组合使用。

// 统计各科的学生姓名
Map<String, String> groupCourseSelectSimpleStudent = students.stream()
        .collect(Collectors.groupingBy(Student::getCourse, Collectors.mapping(Student::getName, Collectors.joining(","))));

4. 分组自定义映射功能

说明：实际上Collectors.groupingBy的第三个参数downstream，其实就是就是将元素映射为不同的值。而且上面的所有功能都是基于downstream的。这一节，主要介绍一些方法来设置自定义值。

Collectors.toXXX：映射结果为Collection对象

将结果映射为ArrayList：
Collector<T, ?, List> toList()

将结果映射为HashSet：
Collector<T, ?, Set> toSet()

将结果映射为HashMap或其他map类：
Collector<T, ?, Map<K,U>> toMap(Function<? super T, ? extends K> keyMapper, Function<? super T, ? extends U> valueMapper)
Collector<T, ?, Map<K,U>> toMap(Function<? super T, ? extends K> keyMapper, Function<? super T, ? extends U> valueMapper, BinaryOperator<U> mergeFunction)
Collector<T, ?, M> toMap(Function<? super T, ? extends K> keyMapper, Function<? super T, ? extends U> valueMapper, BinaryOperator<U> mergeFunction, Supplier<M> mapSupplier)

keyMapper：key映射

valueMapper：value映射

mergeFunction：当流中的key重复时，提供的合并方式，默认情况下，将会抛出IllegalStateException异常。

mapSupplier：提供Map容器的无参初始化方式，可以自定义返回的Map容器类型。

Collectors.toConcurrentMap的语法同Collectors.toMap，不过他们仍然有一些区别：

前者默认返回ConcurrentHashMap，后者返回HashMap

在处理并行流中存在差异：toMap会多次调用mapSupplier，产生多个map容器，最后在通过Map.merge()合并起来，而toConcurrentMap则只会调用一次，并且该容器将会不断接受其他线程的调用以添加键值对。在并发情况下，toMap容器合并的性能自然是不如toConcurrentMap优秀的。

Map<String, Map<String, Integer>> courseWithStudentScore = students.stream()
        .collect(Collectors.groupingBy(Student::getCourse, Collectors.toMap(Student::getName, Student::getScore)));
Map<String, LinkedHashMap<String, Integer>> courseWithStudentScore2 = students.stream()
        .collect(Collectors.groupingBy(Student::getCourse, Collectors.toMap(Student::getName, Student::getScore, (k1, k2) -> k2, LinkedHashMap::new)));

Collectors.mapping：自定义映射结果

Collectors.mapping语法：
Collector<T, ?, R> mapping(Function<? super T, ? extends U> mapper, Collector<? super U, A, R> downstream)

Collectors.mapping的功能比较丰富，除了可以将分组结果映射为自己想要的值外，还能组合上面提到的所有downstream方法。

将结果映射为指定字段：

Map<String, List<String>> groupMapping = students.stream()
        .collect(Collectors.groupingBy(Student::getCourse, Collectors.mapping(Student::getName, Collectors.toList())));

转换bean对象：

Map<String, List<OutstandingStudent>> groupMapping2 = students.stream()
        .filter(s -> s.getScore() > 60)
        .collect(Collectors.groupingBy(Student::getCourse, Collectors.mapping(s -> BeanUtil.copyProperties(s, OutstandingStudent.class), Collectors.toList())));

组合joining

// 组合joining
Map<String, String> groupMapperThenJoin= students.stream()
        .collect(Collectors.groupingBy(Student::getCourse, Collectors.mapping(Student::getName, Collectors.joining(","))));
// 利用collectingAndThen处理joining后的结果
Map<String, String> groupMapperThenLink = students.stream()
        .collect(Collectors.groupingBy(Student::getCourse,
                Collectors.collectingAndThen(Collectors.mapping(Student::getName, Collectors.joining("，")), s -> "学生名单：" + s)));

Collector：自定义downstream

可以参考：【Java8 Stream】：探秘Stream实现的核心：Collector，模拟Stream的实现

Collector<T, A, R>范型的含义：

<T>：规约操作（reduction operation）的输入元素类型

<A>：是规约操作的输出结果类型，该类型是可变可累计的，可以是各种集合容器，或者具有累计操作（如add）的自定义对象。

<R>：规约操作结果经过转换操作后返回的最终结果类型

Collector中方法定义，下面的方法的返回值都可以看作函数（function）：

Supplier<A> supplier()：该函数创建并返回新容器对象。

BiConsumer<A, T> accumulator()：该函数将把元素值放入容器对象，并返回容器。

BinaryOperator<A> combiner()：该函数会把两个容器（此时每个容器都是处理流元素的部分结果）合并，该函数可以返回这两个容器中的一个，也可以返回一个新的容器。

Function<A, R> finisher()：该函数将执行最终的转换，它会将combiner的最终合并结果A转变为R。

Set<Characteristics> characteristics()：提供集合列表，该列表将提供当前Collector的一些特征值。这些特征将会影响上述函数的表现。

上述函数的语法：

Supplier<T>#T get()：调用一个无参方法，返回一个结果。一般来说是构造方法的方法引用。

BiConsumer<T, U>#void accept(T t, U u)：根据给定的两个参数，执行相应的操作。

BinaryOperator<T> extends BiFunction<T,T,T>#T apply(T t, T u)：合并t和u，返回其中之一，或创建一个新对象放回。

Function<T, R>#R apply(T t)：处理给定的参数，并返回一个新的值。

public interface Collector<T, A, R> {
    
    

    Supplier<A> supplier();
    
    BiConsumer<A, T> accumulator();

    BinaryOperator<A> combiner();

    Function<A, R> finisher();

    Set<Characteristics> characteristics();
   }