oozie(2):oozie配置调度MapReduce简单实例wordcount

一、实现功能

使用oozie实现简单的wordcount的mapreduce实例任务调用。

二、步骤

1.在oozie根目录下创建目录

mkdir oozie-apps

2.解压缩根目录下原有的实例包

tar -zxf oozie-examples.tar.gz

3.把需要的模板复制到oozie-apps,并改名字

cp -ra examples/apps/map-reduce/ oozie-apps/
cd oozie-apps/
mv map-reduce/ mr-wordcount

4.修改两个关键文件

(1)job.properties

作用:job.properties的属性设值,然后,给workflow.xml使用。

nameNode=hdfs://hadoop:8020
jobTracker=hadoop:8032		
queueName=default
examplesRoot=oozie-apps

oozie.wf.application.path=${nameNode}/user/hadoop/${examplesRoot}/mr-wordcount/workflow.xml
outputDir=map-reduce

(2)修改workflow.xml

<start to="mr-node"/>
<action name="mr-node">
	<map-reduce>
		<job-tracker>${jobTracker}</job-tracker>
		<name-node>${nameNode}</name-node>
		<prepare>
			<delete path="${nameNode}/user/hadoop/${examplesRoot}/mr-wordcount/output"/>
		</prepare>
		<configuration>
			<property>
				<name>mapred.job.queue.name</name>
				<value>${queueName}</value>
			</property>
			
		  <!--New API-->
			<property>
				<name>mapred.mapper.new-api</name>
				<value>true</value>
			</property>
			<property>
				<name>mapred.reducer.new-api</name>
				<value>true</value>
			</property>
			

			这些都可以去刚刚跑的wordcount任务的history里面configuration中搜索关键字得到
			
			<!--mapper class-->
			<property>
				<name>mapreduce.job.map.class</name>
				<value>com.bigdata.hadoop.WordCountMR$WordCountMapper</value>
			</property>
			
			<property>
				<name>mapreduce.map.output.key.class</name>
				<value>org.apache.hadoop.io.Text</value>
			</property>
			<property>
				<name>mapreduce.map.output.value.class</name>
				<value>org.apache.hadoop.io.IntWritable</value>
			</property>
		
		<!--reducer class-->
			<property>
				<name>mapreduce.job.reduce.class</name>
				<value>com.bigdata.hadoop.WordCountMR$WordCountReducer</value>
			</property>
			<property>
				<name>mapreduce.job.output.key.class</name>
				<value>org.apache.hadoop.io.Text</value>
			</property>
			<property>
				<name>mapreduce.job.output.value.class</name>
				<value>org.apache.hadoop.io.IntWritable</value>
			</property>
			
			<!--INPUT-->
			<property>
				<name>mapred.input.dir</name>
				<value>${nameNode}/user/hadoop/${examplesRoot}/mr-wordcount/input</value>
			</property>
			
			<!--OUTPUT-->
			<property>
				<name>mapred.output.dir</name> 
				<value>${nameNode}/user/hadoop/${examplesRoot}/mr-wordcount/output</value>
			</property>
		</configuration>
	</map-reduce>
	<ok to="end"/>
	<error to="fail"/>
</action>
<kill name="fail">
	<message>Map/Reduce failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name="end"/>

5.上传新编的wordcount测试包

/opt/modules/oozie-4.1.0-cdh5.7.0/oozie-apps/mr-wordcount/lib下,删除原有oozie-examples-4.1.0-cdh5.7.0.jar包

rm lib/oozie-examples-4.0.0-cdh5.3.6.jar

加载自己测试包

cp /opt/datas/wc.jar lib/

备注:测试包位置可以从https://download.csdn.net/download/u010886217/10831358下载

6.创建测试数据目录

/opt/modules/oozie-4.1.0-cdh5.7.0/oozie-apps/mr-wordcount下
mkdir input

拷贝测试数据

cp /opt/datas/wc.data /opt/modules/oozie-4.1.0-cdh5.7.0/oozie-apps/mr-wordcount/input/

7.上传至hdfs

这个是必须的,oozie最终运行地方是hdfs上的文件,如果配置有更改,也需要先删除hdfs上文件,然后重新上传完成

bin/hdfs dfs -mkdir -p /user/hadoop
bin/hdfs dfs -put /opt/modules/oozie-4.1.0-cdh5.7.0/oozie-apps/ /user/hadoop

三、执行测试

1.准备

(1)hdfs

(2)yarn

扫描二维码关注公众号,回复: 4417538 查看本文章

(3)historyserver(这也是必须启动的,用来反馈信息给oozie)

2.启动oozie

bin/oozied.sh start

3.跑命令

bin/oozie job -oozie http://hadoop:11000/oozie -config oozie-apps/mr-wordcount/job.properties -run

4.检查结果

http://hadoop:11000/oozie/

在yarn查看,多了两个任务,先进行oozie,然后进行jar包的mapreduce

http://hadoop:9090/cluster

查看输出结果目录里的内容

输出目录/user/hadoop/oozie-apps/mr-wordcount/output有内容

猜你喜欢

转载自blog.csdn.net/u010886217/article/details/84846199
今日推荐