Apache Kylin 4.0.2 集群模式安装部署指南

前言

本文隶属于专栏《大数据安装部署》,该专栏为笔者原创,引用请注明来源,不足和错误之处请在评论区帮忙指出,谢谢!


软件要求

Hadoop: cdh5.x, cdh6.x, hdp2.x, EMR5.x, EMR6.x, HDI4.x
Hive: 0.13 - 1.2.1+
Spark: 2.4.7/3.1.1
Mysql: 5.1.17 及以上
JDK: 1.8+
OS: Linux only, CentOS 6.5+ or Ubuntu 16.0.4+


准备

在 node1 执行下面的操作:

建议使用下面的安装包:

Apache Kylin 4.0.2 安装包

解压到指定文件夹:

tar -zxvf apache-kylin-4.0.2-bin.tar.gz -C /opt/bigdata/

配置环境变量 $KYLIN_HOME 指向 Kylin 文件夹。

vim /etc/profile
JAVA_HOME=/usr/java/jdk1.8.0_341-amd64
KYLIN_HOME=/opt/bigdata/apache-kylin-4.0.2-bin
PATH=$PATH:$JAVA_HOME/bin:$KYLIN_HOME/bin

export PATH USER LOGNAME MAIL HOSTNAME HISTSIZE HISTCONTROL JAVA_HOME KYLIN_HOME
source /etc/profile

使用脚本下载 spark

download-spark.sh

下载的 Spark 版本是 3.1.3 版本的,Spark 的安装目录在 $KYLIN_HOME/spark 目录下。


JDBC 驱动包

我们需要将 Mysql JDBC 的驱动包放到 $KYLIN_HOME/ext 目录下(没有这个目录需要手动创建)

cp mysql-connector-java.jar /opt/bigdata/apache-kylin-4.0.2-bin/ext/

将 node1 节点的 kylin 安装目录拷贝到 node2 和 node3:

scp -r /opt/bigdata/apache-kylin-4.0.2-bin/ node2:/opt/bigdata/

scp -r /opt/bigdata/apache-kylin-4.0.2-bin/ node3:/opt/bigdata/

kylin.properties 配置

vim /opt/bigdata/apache-kylin-4.0.2-bin/conf/kylin.properties

3 个节点的 kylin 配置信息如下,其中 node1 用于查询, node2 和 node3 用于任务执行。


node1

kylin.metadata.url=kylin_test@jdbc,driverClassName=com.mysql.jdbc.Driver,url=jdbc:mysql://node1:3306/kylin,username=kylin,[email protected]=node2:2181

kylin.server.cluster-servers-with-mode=node1:7070:query,node2:7070:job,node3:7070:job
kylin.job.scheduler.default=100

kylin.server.mode=query

kylin.cube.cubeplanner.enabled=true
kylin.server.query-metrics2-enabled=true
kylin.metrics.reporter-query-enabled=true
kylin.metrics.reporter-job-enabled=true
kylin.metrics.monitor-enabled=true

kylin.web.dashboard-enabled=true

kylin.metadata.url 配置了 Mysql 元数据库连接信息

kylin.env.zookeeper-connect-string 配置了 ZK 的连接信息

kylin.server.cluster-servers-with-mode 配置了所有 kylin 节点的信息,方便服务自主发现。

kylin.job.scheduler.default=100 配置了可以方便服务自主发现。

kylin.server.mode 配置了当前 kylin 节点的类型(query——代表查询节点,job——代表任务节点,all——代表两种都可以)

kylin.cube.cubeplanner.enabled/kylin.server.query-metrics2-enabled/kylin.metrics.* 可以开启 Cube Planner

Cube Planner 使 Apache Kylin 变得更节约资源。其智能 build 部分 Cube 以最小化 building Cube 的花费且同时最大化服务终端用户查询的利益,然后从运行中的查询学习模式且相应的进行动态的推荐 cuboids。

kylin.web.dashboard-enabled 可以开启 Kylin Dashboard,这个只有 query/all 节点才有用。

Kylin Dashboard 展示有用的 Cube 使用数据,对用户非常重要。


node2/node3

kylin.metadata.url=kylin_test@jdbc,driverClassName=com.mysql.jdbc.Driver,url=jdbc:mysql://node1:3306/kylin,username=kylin,password=1qaz@WSX
kylin.env.zookeeper-connect-string=node2:2181

kylin.server.cluster-servers-with-mode=node1:7070:query,node2:7070:job,node3:7070:job
kylin.server.mode=job

kylin.cube.cubeplanner.enabled=true
kylin.server.query-metrics2-enabled=true
kylin.metrics.reporter-query-enabled=true
kylin.metrics.reporter-job-enabled=true
kylin.metrics.monitor-enabled=true

kylin.web.dashboard-enabled=false

kylin.job.scheduler.default=100
kylin.server.self-discovery-enabled=true
kylin.job.lock=org.apache.kylin.job.lock.zookeeper.ZookeeperJobLock

node2 和 node3 的配置信息完全一样。

kylin.job.scheduler.default=100
kylin.server.self-discovery-enabled=true

以上配置可以引入基于Curator的主从模式多任务引擎调度器CuratorScheculer

kylin.job.lock=org.apache.kylin.job.lock.zookeeper.ZookeeperJobLock

这个配置可以使得任务引擎高可用。


解决root用户对HDFS文件系统操作权限不够问题

这时如果直接启动 kylin 会报错,因为 HDFS 文件系统中 hdfs 才有超级用户权限,故我们需要以下设置:

[root@node1 ~]# sudo -u hdfs hadoop fs -chown -R root:root /kylin

[root@node1 ~]# sudo -u hdfs hadoop fs -chown -R root:root /kylin
[root@node1 ~]# hadoop fs -ls /
Found 3 items
drwxr-xr-x   - root root                0 2022-10-23 16:58 /kylin
drwxrwxrwt   - hdfs supergroup          0 2022-10-23 15:47 /tmp
drwxr-xr-x   - hdfs supergroup          0 2022-10-23 15:47 /user

这时我们在三个节点都运行运行环境检查脚本:

[root@node1 ~]# check-env.sh 
Retrieving hadoop conf dir...
...................................................[PASS]
KYLIN_HOME is set to /opt/bigdata/apache-kylin-4.0.2-bin
Checking hive
...................................................[PASS]
Checking hadoop shell
...................................................[PASS]
Checking hdfs working dir
WARNING: log4j.properties is not found. HADOOP_CONF_DIR may be incomplete.
...................................................[PASS]
WARNING: log4j.properties is not found. HADOOP_CONF_DIR may be incomplete.
WARNING: log4j.properties is not found. HADOOP_CONF_DIR may be incomplete.

Checking environment finished successfully. To check again, run 'bin/check-env.sh' manually.

这就代表 kylin 可以正常启动了。


启动

在 3 个节点都执行下面的启动命令:

kylin.sh start

如下:

[root@node2 ~]# kylin.sh start
Retrieving hadoop conf dir...
...................................................[PASS]
KYLIN_HOME is set to /opt/bigdata/apache-kylin-4.0.2-bin
Checking hive
...................................................[PASS]
Checking hadoop shell
...................................................[PASS]
Checking hdfs working dir
WARNING: log4j.properties is not found. HADOOP_CONF_DIR may be incomplete.
...................................................[PASS]
WARNING: log4j.properties is not found. HADOOP_CONF_DIR may be incomplete.
WARNING: log4j.properties is not found. HADOOP_CONF_DIR may be incomplete.

Checking environment finished successfully. To check again, run 'bin/check-env.sh' manually.
Retrieving hadoop conf dir...
Retrieving Spark dependency...
Start replace hadoop jars under /opt/bigdata/apache-kylin-4.0.2-bin/spark/jars.
Find platform specific jars:/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/../hadoop/client/hadoop-annotations-3.0.0-cdh6.3.2.jar
/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/../hadoop/client/hadoop-auth-3.0.0-cdh6.3.2.jar
/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/../hadoop/client/hadoop-common-3.0.0-cdh6.3.2.jar
/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/../hadoop/hadoop-annotations-3.0.0-cdh6.3.2.jar
/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/../hadoop/hadoop-common-3.0.0-cdh6.3.2.jar
/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/../hadoop/hadoop-auth-3.0.0-cdh6.3.2.jar /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/../hadoop-hdfs/hadoop-hdfs-client.jar
/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/../hadoop-hdfs/hadoop-hdfs-httpfs.jar
/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/../hadoop-hdfs/hadoop-hdfs-native-client.jar
/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/../hadoop-hdfs/hadoop-hdfs-native-client-3.0.0-cdh6.3.2.jar
/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/../hadoop-hdfs/hadoop-hdfs-3.0.0-cdh6.3.2.jar
/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/../hadoop-hdfs/hadoop-hdfs-httpfs-3.0.0-cdh6.3.2.jar
/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/../hadoop-hdfs/hadoop-hdfs-client-3.0.0-cdh6.3.2.jar /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-mapreduce-client-app-3.0.0-cdh6.3.2.jar
/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-mapreduce-client-core-3.0.0-cdh6.3.2.jar
/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-3.0.0-cdh6.3.2.jar
/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-mapreduce-client-shuffle-3.0.0-cdh6.3.2.jar
/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-mapreduce-client-common-3.0.0-cdh6.3.2.jar /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/../hadoop-yarn/hadoop-yarn-common-3.0.0-cdh6.3.2.jar
/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/../hadoop-yarn/hadoop-yarn-api-3.0.0-cdh6.3.2.jar
/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/../hadoop-yarn/hadoop-yarn-server-web-proxy-3.0.0-cdh6.3.2.jar
/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/../hadoop-yarn/hadoop-yarn-client-3.0.0-cdh6.3.2.jar
/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/../hadoop-yarn/hadoop-yarn-server-common-3.0.0-cdh6.3.2.jar /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/../../jars/htrace-core4-4.2.0-incubating.jar
/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/../../jars/htrace-core4-4.1.0-incubating.jar /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/../../jars/woodstox-core-5.0.3.jar
/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/../../jars/woodstox-core-5.1.0.jar
/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/../../jars/commons-configuration2-2.1.1.jar
/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/../../jars/woodstox-core-asl-4.4.1.jar
/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/../../jars/re2j-1.1.jar
/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/../../jars/commons-configuration2-2.1.jar
/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/../../jars/stax2-api-3.1.4.jar
/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/../../jars/re2j-1.0.jar  , will replace with these jars under /opt/bigdata/apache-kylin-4.0.2-bin/spark/jars.
Done hadoop jars replacement under /opt/bigdata/apache-kylin-4.0.2-bin/spark/jars.
Start to check whether we need to migrate acl tables
Not HBase metadata. Skip check.

A new Kylin instance is started by root. To stop it, run 'kylin.sh stop'
Check the log at /opt/bigdata/apache-kylin-4.0.2-bin/logs/kylin.log
Web UI is at http://node2:7070/kylin

WEB UI

访问 node1 节点的 WEB UI:

http://node1:7070/kylin/login

用户名密码是: ADMIN/KYLIN

在这里插入图片描述

登录进去后,就可以看到:

在这里插入图片描述

在 System => Instances 中我们可以看到 3 个节点的状态信息

在这里插入图片描述

猜你喜欢

转载自blog.csdn.net/Shockang/article/details/127483023