Export Oozie (b) of the table to achieve hive hdfs

Export Oozie (b) of the table to achieve hive hdfs

A, Oozie Profile

Apache Oozie is the company's top-level project.

Oozie is one of the four big data collaborative framework - task scheduling framework, the other three were data conversion tool Sqoop, document library collection framework Flume, Big Data WEB tool Hue.

It provides for Hadoop MapReduce Jobs, Spark (Streaming) Jobs, Hive Jobs such as task scheduling and coordination, management actions are directed acyclic graph (DAG).

Two, Oozie three functional modules

1, workflow (Workflow): the definition of job task execution.

2, Coordinator: periodically trigger the workflow, performing periodic workflow.

3, Bundle Job: bind multiple coordinator, coordinator submit or trigger so together.

Oozie defines the control flow nodes (Control Flow Nodes) and action nodes (Action Nodes), wherein the flow control node defines the beginning and end of the process, and performing a control flow path (Execution Path), such as start, kill, end, fork, join, decision and the like; and an operation node comprises map reduce, pig, hive, ssh, java, email, sub workflow like.

oozie is essentially a work coordination tools (the underlying principle is to convert xml language to map reduce program to do, but just focus on doing map end processing, to avoid the shuffle process)

Three, workflow configuration

1, job.properties: define job parameters related to attributes and

2, workflow.xml: define the control flow and node operation node

3, lib: the relevant information documents (jar package) stored job tasks running

Fourth, to achieve export from the hive to hdfs Case

4.1 document preparation

1、job.properties

nameNode=hdfs://sandbox-hdp.hortonworks.com:8020
jobTracker=sandbox-hdp.hortonworks.com:8032
queueName=default
#Set Oozie environment
oozie.wf.application.path=${nameNode}/events/demo
oozie.use.system.libpath=true
targetDir=/events/demo/users

2、workflow.xml

<workflow-app name="hive_oozie_demo" xmlns="uri:oozie:workflow:0.5">
    <start to="run"/>
    <action name="run">
        <hive2 xmlns="uri:oozie:hive2-action:0.1">
            <job-tracker>${jobTracker}</job-tracker>
			<name-node>${nameNode}</name-node>
			<configuration>
				<property>
					<name>mapred.job.queue.name</name>
					<value>${queueName}</value>
				</property>
			</configuration>
			<jdbc-url>jdbc:hive2://localhost:10000/default</jdbc-url>
			<script>hql/demo.hql</script>
        </hive2>
		<ok to="export"/>
		<error to="fail"/>
    </action>
	
    <action name="export">
        <hive xmlns="uri:oozie:hive-action:0.2">
            <job-tracker>${jobTracker}</job-tracker>
			<name-node>${nameNode}</name-node>
			<prepare>
				<delete path="${targetDir}"/>
			</prepare>
			<job-xml>conf/hive-site.xml</job-xml>
			<configuration>
				<property>
					<name>mapred.job.queue.name</name>
					<value>${queueName}</value>
				</property>
			</configuration>
			<script>hql/export.hql</script>
			<param>outputDir=${targetDir}</param>
        </hive>
		<ok to="end"/>
		<error to="fail"/>
    </action>	
	<kill name="fail">
		<message>ETL task(d) failed,The error message is [${wf:errorMessage(wf:lastErrorNode())}]</message>
	</kill>
	<end name="end"/>
</workflow-app>	

3、export.hql

EXPORT TABLE DEMO.users to '${outputDir}';

4.2 oozie task execution process

1, hdfs created in the corresponding directory, and upload files into deployment

hdfs dfs -put export.hql /events/demo/hql
hdfs dfs -put workflow.xml /events/demo
hdfs dfs -mkdir -p /events/demo/conf
hdfs dfs -put hive-site.xml /events/demo/conf

2, submitted oozie task

oozie job --oozie http://sandbox-hdp.hortonworks.com:11000/oozie --config ./job.properties -run

3, View

hdfs dfs -ls /events/demo
hdfs dfs -ls /events/demo/users/data

Five, Oozie commonly used shell commands

# 查看Oozie任务列表
oozie jobs
# kill掉指定的Oozie任务
oozie job -kill Oozie任务号
Published 22 original articles · won praise 22 · views 772

Guess you like

Origin blog.csdn.net/weixin_45568892/article/details/105159781