(A) of the basic concept and realization hbase table data Oozie written hive Case

(A) of the basic concept and realization hbase table data Oozie written hive Case

A, Oozie About
Oozie Apache is the company's top-level project.
Oozie is one of the four big data collaborative framework - task scheduling framework, the other three were data conversion tool Sqoop, document library collection framework Flume, Big Data WEB tool Hue.
It provides for Hadoop MapReduce Jobs, Spark (Streaming) Jobs, Hive Jobs such as task scheduling and coordination, management actions are directed acyclic graph (DAG).

Here Insert Picture Description
Two, Oozie three functional modules

1, workflow (Workflow): the definition of job task execution.

2, Coordinator: periodically trigger the workflow, performing periodic workflow.

3, Bundle Job: bind multiple coordinator, coordinator submit or trigger so together.

Oozie defines the control flow nodes (Control Flow Nodes) and action nodes (Action Nodes), wherein the flow control node defines the beginning and end of the process, and performing a control flow path (Execution Path), such as start, kill, end, fork, join, decision and the like; and an operation node comprises map reduce, pig, hive, ssh , java, email, sub workflow like.
Here Insert Picture Description
oozie is essentially a work coordination tools (the underlying principle is to convert xml language to map reduce program to do, but just focus on doing map end processing, to avoid the shuffle process)

Three, workflow configuration

1, job.properties: define job parameters related to attributes and

2, workflow.xml: define the control flow and node operation node

3, lib: the relevant information documents (jar package) stored job tasks running

Fourth, realize hbase table data is written hive Case

4.1 document preparation

1、job.properties

nameNode=hdfs://sandbox-hdp.hortonworks.com:8020
jobTracker=sandbox-hdp.hortonworks.com:8032
queueName=default

#Set Oozie environment
oozie.wf.application.path=${nameNode}/events/demo

oozie.use.system.libpath=true

2、workflow.xml

<workflow-app name="hive_oozie_demo" xmlns="uri:oozie:workflow:0.5">
    <start to="run"/>
    <action name="run">
        <hive2 xmlns="uri:oozie:hive2-action:0.1">
            <job-tracker>${jobTracker}</job-tracker>
			<name-node>${nameNode}</name-node>
			<configuration>
				<property>
					<name>mapred.job.queue.name</name>
					<value>${queueName}</value>
				</property>
			</configuration>
			<jdbc-url>jdbc:hive2://localhost:10000/default</jdbc-url>
			<script>hql/demo.hql</script>
        </hive2>
		<ok to="end"/>
		<error to="fail"/>
    </action>
	<kill name="fail">
		<message>ETL task(d) failed,The error message is [${wf:errorMessage(wf:lastErrorNode())}]</message>
	</kill>	
	<end name="end"/>
</workflow-app>	

3, demo.hql

create database if not exists demo;

use demo;

drop table if exists employee;
create external table employee(account string,firstName string,lastName string,department string,emailAddress string,phone string)
 stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
 with serdeproperties('hbase.columns.mapping'=':key,profile:firstName,
 profile:lastName,department:name,contact:emailAddress,contact:phone')
 tblproperties('hbase.table.name'='employee');

drop table if exists users;
create table users as
 select * from employee;

4.2 oozie case execution process

1, enter hbase

hbase shell

2, hbase built in table employee

create 'employee','profile','department','contact'
put 'employee','nml','profile:firstName','ml'
put 'employee','nml','profile:lastName','nie'
put 'employee','nml','department:name','bigdata'
put 'employee','nml','contact:emailAddress','[email protected]'
put 'employee','nml','contact:phone','168-666-2786'

3, hdfs created in the corresponding directory

hdfs dfs -mkdir -p /events/demo/hql
hdfs dfs -put demo.hql /events/demo/hql
hdfs dfs -put workflow.xml /events/demo

4, submitted oozie task

oozie job --oozie http://sandbox-hdp.hortonworks.com:11000/oozie --config ./job.properties -run

5, into the hive view employee table imported

0: jdbc:hive2://localhost:10000> select * from employee;
+-------------------+---------------------+--------------------+----------------------+------------------------+-----------------+--+
| employee.account  | employee.firstname  | employee.lastname  | employee.department  | employee.emailaddress  | employee.phone  |
+-------------------+---------------------+--------------------+----------------------+------------------------+-----------------+--+
| nml               | ml                  | nie                | bigdata              | ml.nie@bigdata.com     | 168-666-2786    |
+-------------------+---------------------+--------------------+----------------------+------------------------+-----------------+--+
Published 22 original articles · won praise 22 · views 775

Guess you like

Origin blog.csdn.net/weixin_45568892/article/details/104976165