Hadoop7days-4 MR实现倒排索引

实现倒排索引值得是:将位于不同文件里面的单词,统计出其在不同文件中出现的次数,结果应为

“hello”,"a.txt->3,b.txt->2,c.txt->2"

的形式。要达成这一目标,需要设置多个mapper和reducer类。可以使用倒退的方法,来确定各个mapper和reducer要实现的功能,其步骤如下:

mapper 的输出是
context.write("hell0->a.txt","1");
context.write("hell0->a.txt","1");
context.write("hell0->a.txt","1");


shuffle后变为:
<"hello a.txt" , {1,1,1}>
------------------------------reducer
reducer的输入:
<"hello a.txt" , {1,1,1}>


reducer的输出应该是:
"hello","a.txt->3"
"hello","b.txt->2"
"hello","c.txt->2"
------------------------------maper的输出应该是:
mapper的输入应该是:
"hello","a.txt->3"
"hello","b.txt->2"
"hello","c.txt->2"

context.write("hello","a.txt->3"}
context.write("hello","b.txt->2"}
context.write("hello","c.txt->2"}
shuffle之后变为:

<"hello",{"a.txt->3","b.txt->2","c.txt->2">
-----------------------------最终reducer的输出
reducer的输入应该是
context.write("hello",{"a.txt->3","b.txt->2","c.txt->2"}

reducer的输出

context.write("hello","a.txt->3 b.txt->2 c.txt->2");

下面开始我们的设计

第一个map应该讲文件变为  "word->name,"1"的形式


第一个reducer应该将 “word->name”,"1"变为 “word”,"name,1"的形式,我们加一个combiner,让combiner完成这个功能


reducer:



猜你喜欢

转载自blog.csdn.net/qq_22772465/article/details/80106891