MapReduce distributed parallel programming exercises

MapReduce distributed parallel programming exercises

First, the purpose

1、理解MapReduce分布式并行编程的基本概念和原理;
2、掌握MapReduce的执行流程以及shuffle的执行过程; 
3、理解WordCount词频统计的设计思路;
4、学会MapReduce分布式并行编程思想,可以解决数据处理的实际问题。

2. Content

对下面原始数据进行处理,把所有拨打同一个公共服务电话的电话号码统计起来,展示为每个公共服务号码对应多个用户号码。
原始数据:
13718855152  112
18610117315  110
89451849  112
13718855153  110
13718855154  112
18610117315  114
18910117315  114
输出结果:
110  13718855153|18610117315|
112  13718855154|89451849|13718855152|

114 18910117315|18610117315|

3. Process

1. Create the TellMapper class
Insert image description here

2. Create the TellRedcer class
Insert image description here

3. Create the TellCount class and test the main method
Insert image description here

Insert image description here

4. Open the terminal and view the content results of the files generated by data processing.

Insert image description here

Guess you like

Origin blog.csdn.net/m0_63599362/article/details/131984297