MR workflow engine oozie extension and configuration

Oozie, which has been tossing for a long time,
can be used for the time being.

Let’s talk about two main points:
one, deployment related
two , operation related

-----------------------------
One, deployment related
oozie The version is oozie-3.1.3-incubating, I don't know why, since 3.0, oozie does not support sqoopAction and HiveAction.
In order to be able to perform these two actions, some modifications have been made to his jar package.

1. Compile HiveAction and SqoopAction related classes.
There are a total of four classes: HiveAction.java, HiveMain.java, SqoopAction.java, SqoopMain.java
These four classes are pulled from the cloudera version oozie-2.3.2-cdh3u2.
The compilation method, nothing special, is to use oozie3.1.3 to build a project, add these four classes, then compile, find the class files of these four classes, and then put the class files of these four classes in oozie-core-3.1. 3.jar in the corresponding path.

2. Add the corresponding xsd file.
In cloudera's oozie-2.3.2-cdh3u2\src\client\src\main\resources,
find hive-action-0.2.xsd and sqoop-action-0.2.xsd, add these two things to oozie-client -3.1.3-incubating.jar inside.

3. Re-setup oozie.
Replace the modified jar package with the original jar package, and then re-setup oozie. According to the previous http://taoo.iteye.com/blog/1518580 log, after setup, add the corresponding jar package to the lib.

4. Restart oozie.



Second, operation related
1, sharelib configuration
When any action is executed, it will depend on this sharelib, and this sharelib should be placed on hdfs.
oozie provides a sharelib: oozie-sharelib-3.1.3-incubating.tar.gz. Unzip it.
Then, add the relevant jar packages according to the versions of hive, pig and sqoop. In addition to the jar packages of these three tools, the jar packages in hive's lib also need to be added. Note that the jar in hive's lib must be the version benchmark. If the jar package already exists in sharelib, it is different from the hive lib. The corresponding jar package version in Hive is different, must, must, must be based on the version of lib in hive.
By the way, the related jdbc package should also be added to sharelib.

2. Specify the relevant configuration file.
Especially when hive is running, you must specify the hive configuration file, otherwise the hive client does not know where to find the metastore.
The specified method is to add in the configuration:
Xml code Collection code

    <property> 
        <name>oozie.hive.defaults</name> 
        <value>my-hive-default.xml</value> 
    </property> 


The my-hive-default.xml file is hive-site.xml, which needs to be placed in the corresponding workflow path on hdfs.

In fact, hive-related actions, such as sqoop, may also need to be configured.

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=326706427&siteId=291194637