HUE+OOZIE scheduling shell (sqoop)

Table of contents

demand background

solution

method of execution

1. Write a shell file

2. Put the sh file on hdfs

3. Create a workflow

 4. Execute the test

 5. Create a coordinate schedule

6. Execute coord


demand background

According to the business situation, data needs to be pulled regularly through sqoop+mysql+hive. Business data is landed in hive through sqoop+mysql; ETL result data is landed in mysql through sqopp+hive.

solution

The big data component HUE+OOZIE schedules shell scripts to execute sqoop commands, which is convenient for management and troubleshooting.

method of execution

1. Write a shell file

sqoop-mysql2hive.sh

#!/bin/bash
# 如果是输入的日期按照取输入日期;如果没输入日期取当前时间的前一天
#do_date=$(date -d "-1 day" +%F)

if [ -n "$1" ]; then
  do_date=$1
else
  do_date=$(date -d "-1 day" +%F)
fi

jdbc_url_dduser="jdbc:mysql://xxx:13306/dduser?serverTimezone=Asia/Shanghai&characterEncoding=utf8&tinyInt1isBit=false"

jdbc_username=root
jdbc_password=123456

echo "===开始从mysql中提取业务数据日期为 $do_date 的数据==="

#sqoop-mysql2hive-appconfig
sqoop import --connect $jdbc_url_dduser --username $jdbc_username --password $jdbc_password --table app_config --hive-overwrite --hive-import --hive-table dd_database_bigdata.ods_app_config --target-dir /warehouse/dd/bigdata/ods/tmp/ods_app_config --hive-drop-import-delims -m 1 --input-null-string '\\N' --input-null-non-string '\\N'


echo "===从mysql中提取日期为 $do_date 的数据完成==="

Statement explanation:

Define a variable in the shell file, directly define such as: jdbc_username=root, use this parameter: $jdbc_username

`date -d "-1 day" +%F` on the previous day in the shell, `date +%F` on the current day

When the shell action needs to pass parameters, HUE stipulates that $1, $2, and $3 should be used, which will be mentioned later when creating a Schedule.

2. Put the sh file on hdfs

/warehouse/dd/oozie/workspace/workspace-sqoop-hive2mysql-now/shell/sqoop-hive2mysql-now-shell.sh

3. Create a workflow

 4. Execute the test

 Program execution can be viewed in the job

 5. Create a coordinate schedule

If you select a previous time from start time, after the task is created and executed, multiple jobs will be executed first to make up for the selected time difference.

For example: we have a task at 10 minutes per hour, the current time is 12:15, and from selects 12:00, when this coordinate is executed, a job workflow will be executed immediately, which is the task executed at 12:10. So for this from, we only need to default the time at that time when it is created. 

6. Execute coord

 

Guess you like

Origin blog.csdn.net/xieedeni/article/details/121249522