Write (a) the use of IDEA Mapper executed on Hadoop

 

 

 

 

1. Right-click on an item -> Open Module Settings

 

 

 

2.Artifacts --> + --> JAR --> From modules with dependencies...

 

 

3.Main Class is your project ( script ) is the main method is to run the class, choose a

 

 

4. The following diagram, provided META-INF / MANIFEST.MF

!!!!! Remember, you can not use the default (at least I used the default is unsuccessful) !!!!!

 

5. Select the root directory of your project, must be placed in the root directory

 

 

6. After setting like this, the two options for JAR files from libraries of:

Select the first case, so after the package is a jar package

Select the second, then kick down the package is a jar package, you take away the jar package used by the project, the second personal recommendation

 

 

7. After setting can point to OK

8. this page, Build on make check mark, the other is not the same all right

 

 

9. The final step, Build Artifacts ... -> XXX.jar -> Build 

 

10. Copy path to find here jar package on the line

 

11. Edit the startup script sh

we data-clean.sh

Insert text

#!/bin/bash
#当前时间减去一天  处理上一天的数据
day_str=`date -d '-1 day' +'%Y-%m-%d'`

#输入文件
inpath=/app-log-data/data/$day_str
#输出文件
outpath=/app-log-data/clean/$day_str

echo "准备清洗$day_str 的数据......"

#执行mapper清洗数据   
#如果打jar未指定主方法  需要在下面jar后面加上主方法的路径
/home/hadoop/apps/hadoop-2.7.5/bin/hadoop jar /root/hadoop.jar  $inpath $outpath

#判断上一条命令是否执行成功   不等于0就是执行失败
if [[ $? -ne 0 ]]; then
    echo "failed"
else
    echo "succeed"
fi

 

12.将未清洗的数据上传到HDFS上

创建目录未清洗数据目录:

 hadoop fs -mkdir /app-log-data/data

 

创建目录清洗完成数据目录

hadoop fs -mkdir /app-log-data/clean

 

 

 

13.执行jar

sh data-clean.sh

 

 

 

错误总结:

1.8032端口拒绝连接

解决:1.start-yarn.sh     启动yarn  服务器:hadoop4        2.yarn-daemon.sh start resourcemanager   启动备用yarn    服务器:hadoop3

 

2.提示输入或者输出路径不存在

解决:在HDFS上创建输入和输出目录

 

Guess you like

Origin www.cnblogs.com/h-kang/p/10966752.html