Use kettle to migrate data on linux

Connect a run kettle migration with windows in the unit, which in view of a table increments of about 200W each day, local migration speed is too slow, with vpn connection server instability often broken, kettle does not support HTTP, decide the kettle on the configuration of windows on linux environment to run.

A: linux installation jdk

Reference: https://www.cnblogs.com/nothingonyou/p/11936850.html

Two: liunx deployment kettle

kettle directly deployed on the mysql server you want to convert, avoiding the middle of the transmission network consumption , the unzipped file under windows, linux uploaded to the relevant directory after compression, as shown:

1. Establish a kettle directory, and unzip the file

[root@localhost opt]# mkdir kettle
[root@localhost opt]# mv data-integration.zip kettle/
[root@localhost opt]# cd kettle
[root@localhost kettle]# unzip data-integration.zip
[root@localhost kettle]# rm -rf data-integration.zip 
[root@localhost kettle]# cd data-integration/
[root@localhost data-integration]# chmod +x *.sh

2. Test whether the installation is successful

[root@localhost /]# cd /opt/kettle/data-integration/
[root@localhost data-integration]# ./kitchen.sh 

The following message appears indicating that the installation was successful:

[root@localhost data-integration]# ./kitchen.sh 
#######################################################################
WARNING:  no libwebkitgtk-1.0 detected, some features will be unavailable
    Consider installing the package with apt-get or yum.
    e.g. 'sudo apt-get install libwebkitgtk-1.0-0'
#######################################################################
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=51200m; support was removed in 8.0
Options:
  -rep            = Repository name
  -user           = Repository username
  -pass           = Repository password
  -job            = The name of the job to launch
  -dir            = The directory (dont forget the leading /)
  -file           = The filename (Job XML) to launch
  -level          = The logging level (Basic, Detailed, Debug, Rowlevel, Error, Minimal, Nothing)
  -logfile        = The logging file to write to
  -listdir        = List the directories in the repository
  -listjobs       = List the jobs in the specified directory
  -listrep        = List the available repositories
  -norep          = Do not log into the repository
  -version        = show the version, revision and build date
  -param          = Set a named parameter <NAME>=<VALUE>. For example -param:FILE=customers.csv
  -listparam      = List information concerning the defined parameters in the specified job.
  -export         = Exports all linked resources of the specified job. The argument is the name of a ZIP file.
  -custom         = Set a custom plugin specific option as a String value in the job using <NAME>=<Value>, for example: -custom:COLOR=Red
  -maxloglines    = The maximum number of log lines that are kept internally by Kettle. Set to 0 to keep all rows (default)
  -maxlogtimeout  = The maximum age (in minutes) of a log line while being kept internally by Kettle. Set to 0 to keep all rows indefinitely (default)

Remarks:

kitchen.sh : job for job execution
pan.sh : ktr for performing conversion

Three: script calls the kettle program

1. Create a working directory kettle

[root@localhost opt]# mkdir -p /opt/kettle/kettle_file/job
[root@localhost opt]# mkdir -p /opt/kettle/kettle_file/transition
[root@localhost opt]# mkdir -p /opt/kettle/kettle_sh
[root@localhost opt]# mkdir -p /opt/kettle/kettle_log

2. Create the executable file in / opt / kettle under / kettle_sh directory vim o2m.sh

#!/bin/sh
cd /opt/kettle/data-integration/
export JAVA_HOME=/opt/apps/java/jdk1.8.0_191
export PATH=$JAVA_HOME/bin:$PATH
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
./pan.sh -file=/opt/kettle/kettle_file/transition/o2m.ktr >>/opt/kettle/kettle_log/o2m_$(date +%Y%m%d).log &

3. Modify the execute permissions

chmod +x o2m.sh

4. configured windows of ktr file, after the test successfully uploaded to the linux directory corresponding transition

 

 5.linux oracle and mysql on the drive and directory windows are not the same, on:

/opt/kettle/data-integration/libswt/linux/x86_64
[root@localhost x86_64]# pwd
/opt/kettle/data-integration/libswt/linux/x86_64
[root@localhost x86_64]# ll
总用量 7444
-rw-r--r--. 1 root root  992808 11月 26 09:03 mysql-connector-java-5.1.41-bin.jar
-rw-r--r--. 1 root root 2001778 11月 26 09:03 mysql-connector-java-6.0.6.jar
-rw-r--r--. 1 root root 2739670 11月 26 09:04 ojdbc6.jar
-rw-r--r--. 1 root root 1880133 5月  16 2017 swt.jar

6.执行sh脚本文件

./o2m.sh &

7.查看log观察转换情况:

[root@localhost kettle_log]# tail -20f 1127_20191128.log
2019/11/28 14:55:26 - 表输出.4 - 完成处理 (I=0, O=510558, R=510558, W=510558, U=0, E=0)
2019/11/28 14:55:26 - 表输出.0 - 完成处理 (I=0, O=510558, R=510558, W=510558, U=0, E=0)
2019/11/28 14:55:26 - 表输出.5 - 完成处理 (I=0, O=510558, R=510558, W=510558, U=0, E=0)
2019/11/28 14:55:26 - 表输出.1 - 完成处理 (I=0, O=510558, R=510558, W=510558, U=0, E=0)
2019/11/28 14:55:26 - 表输出.2 - 完成处理 (I=0, O=510558, R=510558, W=510558, U=0, E=0)
2019/11/28 14:55:26 - 表输出.3 - 完成处理 (I=0, O=510558, R=510558, W=510558, U=0, E=0)
2019/11/28 14:55:26 - 表输出.7 - 完成处理 (I=0, O=510558, R=510558, W=510558, U=0, E=0)
2019/11/28 14:55:26 - Pan - 完成!
2019/11/28 14:55:26 - Pan - 开始=2019/11/28 14:52:41.492, 停止=2019/11/28 14:55:26.329
2019/11/28 14:55:26 - Pan - Processing ended after 2 minutes and 44 seconds (164 seconds total).
2019/11/28 14:55:26 - 2 -  
2019/11/28 14:55:26 - 2 - 进程 表输入.0 成功结束, 处理了 4084464 行. ( 24905 行/秒)
2019/11/28 14:55:26 - 2 - 进程 表输出.0 成功结束, 处理了 510558 行. ( 3113 行/秒)
2019/11/28 14:55:26 - 2 - 进程 表输出.1 成功结束, 处理了 510558 行. ( 3113 行/秒)
2019/11/28 14:55:26 - 2 - 进程 表输出.2 成功结束, 处理了 510558 行. ( 3113 行/秒)
2019/11/28 14:55:26 - 2 - 进程 表输出.3 成功结束, 处理了 510558 行. ( 3113 行/秒)
2019/11/28 14:55:26 - 2 - 进程 表输出.4 成功结束, 处理了 510558 行. ( 3113 行/秒)
2019/11/28 14:55:26 - 2 - 进程 表输出.5 成功结束, 处理了 510558 行. ( 3113 行/秒)
2019/11/28 14:55:26 - 2 - 进程 表输出.6 成功结束, 处理了 510558 行. ( 3113 行/秒)
2019/11/28 14:55:26 - 2 - 进程 表输出.7 成功结束, 处理了 510558 行. ( 3113 行/秒)

可以看到,400W+的数据只用了3分钟不到便完成了转换,速度得到了大大提升。

Guess you like

Origin www.cnblogs.com/nothingonyou/p/11950749.html