目前涉及到多数据源的情况,大数据集群需要把相关的数据按照一定的需求进行抽取,因此采用kettle进行数据拉取使用。
首先安装三台centos7 ,分别配置好静态ip,ssh免密码登录,关闭防火墙,jdk1.8安装,ntp时间同步 请参照https://blog.csdn.net/weixin_42575806/article/details/110185977
具体步骤这里不多说了!
我这里通过xshell远程工具来连接的
先把主机名和ip地址映射配置好,其他另外两个节点参照配置
[root@Kettlemaster data-integration]# vim /etc/hostname
另外两个节点也是这样做,在这里不多赘述了。
我这边先把相关软件上传到/usr/local/src
下面安装jdk
先在本地上传一个jdk的安装包 tar -xzvf jdk-8u181-linux-x64.tar.gz
配置jdk的环境变量
export JAVA_HOME=/usr/local/jdk1.8.0_181
export JRE_HOME=${JAVA_HOME}/jre
export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib:$CLASSPATH
export JAVA_PATH=${JAVA_HOME}/bin:${JRE_HOME}/bin
export PATH=$PATH:${JAVA_PATH}
我们把jdk分发到另外两个节点
scp -r /usr/local/jdk1.8.0_181 Kettleslave1:/usr/local
scp -r /usr/local/jdk1.8.0_181 Kettleslave2:/usr/local
分别给另外两个节点配置环境变量,具体操作跟前面的一样,这里我不多说了。
Kettle的安装
下载地址:
链接:https://pan.baidu.com/s/1bO_A9zNnwWTV8shIqsO2Sw
提取码:is8s
官网下载连接https://community.hitachivantara.com/s/article/data-integration-kettle
先把kettle的安装包上传上来,然后解压
[root@Kettlemaster src]# unzip pdi-ce-8.2.0.0-342.zip
我们可以进来看看
我们开始配置kettle 进入/usr/local/src/data-integration/pwd
我们先在master主机上面修改配置,这个地方我们用hostname或者固定ip也可以
[root@Kettlemaster pwd]#vim carte-config-master-8080.xml
<slave_config>
<!--
Document description...
- masters: You can list the slave servers to which this slave has to report back to.
If this is a master, we will contact the other masters to get a list of all the slaves in the cluster.
- report_to_masters : send a message to the defined masters to let them know we exist (Y/N)
- slaveserver : specify the slave server details of this carte instance.
IMPORTANT : the username and password specified here are used by the master instances to connect to this slave.
-->
<slaveserver>
<name>master1</name>
<hostname>Kettlemaster</hostname>
<port>8080</port>
<master>Y</master>
</slaveserver>
</slave_config>
再修改剩余两台服务器配置
[root@Kettlemaster pwd]# vim carte-config-8081.xml
<slave_config>
<!--
Document description...
- masters: You can list the slave servers to which this slave has to report back to.
If this is a master, we will contact the other masters to get a list of all the slaves in the cluster.
- report_to_masters : send a message to the defined masters to let them know we exist (Y/N)
- slaveserver : specify the slave server details of this carte instance.
IMPORTANT : the username and password specified here are used by the master instances to connect to this slave.
-->
<masters>
<slaveserver>
<name>master1</name>
<hostname>Kettlemaster</hostname>
<port>8080</port>
<username>cluster</username>
<password>cluster</password>
<master>Y</master>
</slaveserver>
</masters>
<report_to_masters>Y</report_to_masters>
<slaveserver>
<name>slave1-8081</name>
<hostname>Kettleslave1</hostname>
<port>8081</port>
<username>cluster</username>
<password>cluster</password>
<master>N</master>
</slaveserver>
</slave_config>
[root@Kettlemaster pwd]# vim carte-config-8082.xml
<slave_config>
<!--
Document description...
- masters: You can list the slave servers to which this slave has to report back to.
If this is a master, we will contact the other masters to get a list of all the slaves in the cluster.
- report_to_masters : send a message to the defined masters to let them know we exist (Y/N)
- slaveserver : specify the slave server details of this carte instance.
IMPORTANT : the username and password specified here are used by the master instances to connect to this slave.
-->
<masters>
<slaveserver>
<name>master1</name>
<hostname>Kettlemaster</hostname>
<port>8080</port>
<username>cluster</username>
<password>cluster</password>
<master>Y</master>
</slaveserver>
</masters>
<report_to_masters>Y</report_to_masters>
<slaveserver>
<name>slave2-8082</name>
<hostname>Kettleslave2</hostname>
<port>8082</port>
<username>cluster</username>
<password>cluster</password>
<master>N</master>
</slaveserver>
</slave_config>
我这里是搭建的三节点集群,所以我就配置了三个,当然也可以配置4节点,5节点了。大家可以根据自己的需要去配置。
现在把配置好的kettle分发给另外两节点
[kettle@kettlemaster modules]$ scp -r /usr/local/src/data-integration Kettleslave1:/usr/local/src
[kettle@kettlemaster modules]$ scp -r /usr/local/src/data-integration Kettleslave2:/usr/local/src
在主节点上启动一下服务:
/usr/local/src/data-integration/carte.sh Kettlemaster 8080 &
在浏览器打开http://192.168.2.111:8080这个地址,其中账号密码都是cluster
表示master启动成功。
同样的在从节点也启动一下服务
/usr/local/src/data-integration/carte.sh Kettleslave1 8081 &
在浏览器打开 http://192.168.2.112:8081
/usr/local/src/data-integration/carte.sh Kettleslave2 8082 &
在浏览器打开 http://192.168.2.113:8082