Export Oozie (b) of the table to achieve hive hdfs
A, Oozie Profile
Apache Oozie is the company's top-level project.
Oozie is one of the four big data collaborative framework - task scheduling framework, the other three were data conversion tool Sqoop, document library collection framework Flume, Big Data WEB tool Hue.
It provides for Hadoop MapReduce Jobs, Spark (Streaming) Jobs, Hive Jobs such as task scheduling and coordination, management actions are directed acyclic graph (DAG).
Two, Oozie three functional modules
1, workflow (Workflow): the definition of job task execution.
2, Coordinator: periodically trigger the workflow, performing periodic workflow.
3, Bundle Job: bind multiple coordinator, coordinator submit or trigger so together.
Oozie defines the control flow nodes (Control Flow Nodes) and action nodes (Action Nodes), wherein the flow control node defines the beginning and end of the process, and performing a control flow path (Execution Path), such as start, kill, end, fork, join, decision and the like; and an operation node comprises map reduce, pig, hive, ssh, java, email, sub workflow like.
oozie is essentially a work coordination tools (the underlying principle is to convert xml language to map reduce program to do, but just focus on doing map end processing, to avoid the shuffle process)
Three, workflow configuration
1, job.properties: define job parameters related to attributes and
2, workflow.xml: define the control flow and node operation node
3, lib: the relevant information documents (jar package) stored job tasks running
Fourth, to achieve export from the hive to hdfs Case
4.1 document preparation
1、job.properties
nameNode=hdfs://sandbox-hdp.hortonworks.com:8020
jobTracker=sandbox-hdp.hortonworks.com:8032
queueName=default
#Set Oozie environment
oozie.wf.application.path=${nameNode}/events/demo
oozie.use.system.libpath=true
targetDir=/events/demo/users
2、workflow.xml
<workflow-app name="hive_oozie_demo" xmlns="uri:oozie:workflow:0.5">
<start to="run"/>
<action name="run">
<hive2 xmlns="uri:oozie:hive2-action:0.1">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>${queueName}</value>
</property>
</configuration>
<jdbc-url>jdbc:hive2://localhost:10000/default</jdbc-url>
<script>hql/demo.hql</script>
</hive2>
<ok to="export"/>
<error to="fail"/>
</action>
<action name="export">
<hive xmlns="uri:oozie:hive-action:0.2">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<prepare>
<delete path="${targetDir}"/>
</prepare>
<job-xml>conf/hive-site.xml</job-xml>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>${queueName}</value>
</property>
</configuration>
<script>hql/export.hql</script>
<param>outputDir=${targetDir}</param>
</hive>
<ok to="end"/>
<error to="fail"/>
</action>
<kill name="fail">
<message>ETL task(d) failed,The error message is [${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name="end"/>
</workflow-app>
3、export.hql
EXPORT TABLE DEMO.users to '${outputDir}';
4.2 oozie task execution process
1, hdfs created in the corresponding directory, and upload files into deployment
hdfs dfs -put export.hql /events/demo/hql
hdfs dfs -put workflow.xml /events/demo
hdfs dfs -mkdir -p /events/demo/conf
hdfs dfs -put hive-site.xml /events/demo/conf
2, submitted oozie task
oozie job --oozie http://sandbox-hdp.hortonworks.com:11000/oozie --config ./job.properties -run
3, View
hdfs dfs -ls /events/demo
hdfs dfs -ls /events/demo/users/data
Five, Oozie commonly used shell commands
# 查看Oozie任务列表
oozie jobs
# kill掉指定的Oozie任务
oozie job -kill Oozie任务号