Java 对Redis数据进行MapReduce

前言：

MapReduce 简称 MR是一个框架可以使用它来编写分布式处理大量数据的应用程序。由于它允许在大型商用硬件集群上并行处理数据，因此 MapReduce 可以显着加快数据处理速度。下面将介绍java基于Redisson-Redis的内存数据网格使用MapReduce处理存储在Redis中的数据。

什么是MapReduce？

MapReduce 是一种可以用Java实现的分布式计算的程序模型。该算法包含两个关键任务，称为Map和Reduce。

Map任务的目的是将数据集转换为另一个数据集，其中元素被分解为称为元组的键/值对。该Reduce 任务将这些数据元组组合成一小组元组，使用地图的输出作为输入。

分布式计算意味着将任务分成几个单独的进程，然后可以在大型商用硬件集群上并行执行。一旦 MapReduce 将大数据集的各个元素分解为元组，然后进一步将它们缩小为较小的集合，剩余的数据可以并行处理，这可以显着加快需要对数据执行的处理。

何时用MapReduce处理Redis数据？

在许多情况下，使用它MapReduce 来处理Redis数据很有帮助。通常它们的共同点是您需要处理的数据量非常大。

举一个简单的例子，您可以考虑一种情况，其中您有大量组织的月度能耗数据。现在假设您需要处理此数据以生成每个组织的最大使用年份，最小使用年份等结果。虽然编写算法来执行这种处理对于有经验的程序员来说并不困难，但是如果必须运行大量数据，许多这样的算法将花费很长时间来执行。

作为长处理时间的解决方案，您可以使用 MapReduce 减少数据集的总体大小，从而使处理速度更快。对于许多任务来说，处理时间的减少可能非常重要，因为它可以释放硬件，以便可以用于其他计算任务。

还有更多的情况，MapReduce 使用Redisson存储在Redis中的分布式数据可能是一件非常有用的事情。例如，MapReduce 如果您需要快速，可靠且准确地计算非常大的文件或文件集合的字数，特别有用。

用Java对Redis上存储的数据执行分布式MapReduce的示例

了解此算法如何使用Redisson-MapReduce 提供的文本数据并对其进行处理以可靠地生成准确的字数。

一、创建Redisson

从各个配置文件中心读取配置

// from JSON
Config config = Config.fromJSON(...)
// from YAML
Config config = Config.fromYAML(...)
// or dynamically
Config config = new Config();
…

二、创建一个Redisson实例：

RedissonClient redisson = Redisson.create(config);

三、定义Mapper对象用于每个Map 条目，并按空格分割值以分隔单词：

public class WordMapper implements RMapper<String, String, String, Integer> {
    @Override
    public void map(String key, String value, RCollector<String, Integer> collector) {
            String[] words = value.split("[^a-zA-Z]");
            for (String word : words) {
                collector.emit(word, 1);
            }
        }
    }
}

四、定义Reducer 对象，这计算每个单词的总和。

public class WordReducer implements RReducer<String, Integer> {
     @Override
     public Integer reduce(String reducedKey, Iterator<Integer> iter) {
         int sum = 0;
         while (iter.hasNext()) {
            Integer i = (Integer) iter.next();
            sum += i;
         }
         return sum;
     }
}

五、定义Collator 对象可选，这会计算单词总数。

public class WordCollator implements RCollator<String, Integer, Integer> {
     @Override
     public Integer collate(Map<String, Integer> resultMap) {
        int result = 0;
        for (Integer count : resultMap.values()) {
            result += count;
        }
        return result;
     }
}

六、执行实例

RMap<String, String> map = redisson.getMap("wordsMap");
    map.put("line1", "Alice was beginning to get very tired");
    map.put("line2", "of sitting by her sister on the bank and");
    map.put("line3", "of having nothing to do once or twice she");
    map.put("line4", "had peeped into the book her sister was reading");
    map.put("line5", "but it had no pictures or conversations in it");
    map.put("line6", "and what is the use of a book");
    map.put("line7", "thought Alice without pictures or conversation");
    RMapReduce<String, String, String, Integer> mapReduce
             = map.<String, Integer>mapReduce()
                  .mapper(new WordMapper())
                  .reducer(new WordReducer());
    // count occurrences of words
    Map<String, Integer> mapToNumber = mapReduce.execute();
    // count total words amount
    Integer totalWordsAmount = mapReduce.execute(new WordCollator());

MapReduce 也可用于收集类型的对象其中包括 Set，SetCache，List，SortedSet，ScoredSortedSet，Queue， BlockingQueue， Deque, BlockingDeque，PriorityQueue，PriorityDeque。

总结：

使用Redisson对Redis中存储的数据执行MapReduce；Redisson是一个最流行的Redis客户端，为使用Java进行编程和数据处理提供了无限可能。Redisson提供了服务，对象，集合，锁和同步器的分布式实现。它支持一系列Redis配置，包括单个，集群，标记或主从配置。

MapReduce 如果您已经使用Redisson在Redis中存储大量数据，则使用是一个很好的选择。Redisson提供了一种基于Java的 MapReduce 编程模型，可以轻松处理存储在Redis中的大量数据。

可参考redission的更多使用：

      https://blog.csdn.net/u011663149/article/details/87877522
      https://blog.csdn.net/u011663149/article/details/87875421
      https://blog.csdn.net/u011663149/article/details/85297483