在hadoop streaming 运行c++

参考文章：

https://blog.csdn.net/huangmeng1214/article/details/11731531

以下是全文内容

hadoop streaming是一个hadoop的工具，可以使用户使用其他语言编写mapreduce程序，用户只需要提供Mapper和Reducer，就可以执行Map/Reduce作业

1、下面以实现WordCount为例，使用C++编写Mapper和Reducer

Mapper.cpp代码如下：

[cpp]view plain copy
#include <iostream>  
#include <string>  
using namespace std;  
  
int main()  
{  
    string key;  
    const int value = 1;  
      
    while (cin >> key)  
    {  
        cout << key << " " << value << endl;  
    }  
      
    return 0;  
}  

Reducer.cpp代码如下：

[cpp]view plain copy
#include <iostream>  
#include <string>  
#include <map>  
using namespace std;  
  
int main()  
{  
    string key;  
    int value;  
    map<string, int> result;  
    map<string, int>::iterator it;  
      
    while(cin >> key)  
    {  
        cin >> value;  
        it = result.find(key);  
        if (it != result.end())  
        {  
            (it->second)++;  
        }  
        else  
        {  
            result[key] = value;  
        }  
    }  
      
    for (it = result.begin(); it != result.end(); ++it)  
    {  
        cout << it->first << " " << it->second << endl;  
    }  
      
    return 0;  
}  

2、编译产生可执行文件Mapper和Reducer，命令如下：

[html]view plain copy
#g++ Mapper.cpp -o Mapper  
#g++ Reducer.cpp -o Reducer  

3、编辑一个脚本runJob.sh如下：

[html]view plain copy
$HADOOP_HOME/bin/hadoop jar $HADOOP_HOME/contrib/streaming/hadoop-streaming-1.1.2.jar \  
-mapper Mapper \  
-reducer Reducer \  
-input /test/input/a.txt \  
-output /test/output/test3 \  
-file Mapper \  
-file Reducer  

-input是Job输入文件在hdfs中的位置

-output是Job产生结果存放在hdfs的目录

-file指定Mapper和Reducer的位置，如果不指定file，使用-mapper和-reducer可能会错

此外还可以指定使用-jobconf指定mapreduce job的一些参数，比如map个数和reduce个数，可以参考hadoop streaming官方文档

4、执行命令#sh runJob.sh可以看到MapReduce Job完成正常

可以看到结果与使用hadoop-example-1.1.2.jar wordcount效果是一样的

--------------------------------分割线-----------------------------------------

由于我本人部署hadoop是按照前面的博客，所以在runJob.sh文件编写上稍微有点不同,内容如下：

/usr/local/hadoop/bin/hadoop jar /usr/local/hadoop/share//hadoop/tools/lib/hadoop-streaming-2.9.0.jar \

-file ~/testc/Mapper (我的可执行文件Mapper编译在了改文件夹下）

-file ~/testc/Reducer

-input /user/hadoop/input/a.txt (hdfs 的input路径即为 /user/hadoop)

-output /userhadoop/output/test3

-mapper Mapper

-reducer Reducer

在hadoop streaming 运行c++

猜你喜欢