Apache Kylin 4.0.2 Cluster Mode Installation and Deployment Guide

foreword

This article belongs to the column "Big Data Installation and Deployment" . This column is original by the author. Please indicate the source for the citation. Please point out the deficiencies and mistakes in the comment area, thank you!


software requirements

Hadoop: cdh5.x, cdh6.x, hdp2.x, EMR5.x, EMR6.x, HDI4.x
Hive: 0.13 - 1.2.1+
Spark: 2.4.7/3.1.1
Mysql: 5.1.17 及以上
JDK: 1.8+
OS: Linux only, CentOS 6.5+ or Ubuntu 16.0.4+


Prepare

Do the following on node1:

It is recommended to use the following installation packages:

Apache Kylin 4.0.2 installation package

Unzip to the specified folder:

tar -zxvf apache-kylin-4.0.2-bin.tar.gz -C /opt/bigdata/

Configure the environment variable $KYLIN_HOME to point to the Kylin folder.

vim /etc/profile
JAVA_HOME=/usr/java/jdk1.8.0_341-amd64
KYLIN_HOME=/opt/bigdata/apache-kylin-4.0.2-bin
PATH=$PATH:$JAVA_HOME/bin:$KYLIN_HOME/bin

export PATH USER LOGNAME MAIL HOSTNAME HISTSIZE HISTCONTROL JAVA_HOME KYLIN_HOME
source /etc/profile

Download spark using a script

download-spark.sh

The downloaded Spark version is 3.1.3, and the Spark installation directory is in the $KYLIN_HOME/spark directory.


JDBC driver package

We need to put the Mysql JDBC driver package in the $KYLIN_HOME/ext directory (no such directory needs to be created manually)

cp mysql-connector-java.jar /opt/bigdata/apache-kylin-4.0.2-bin/ext/

Copy the kylin installation directory of node1 node to node2 and node3:

scp -r /opt/bigdata/apache-kylin-4.0.2-bin/ node2:/opt/bigdata/

scp -r /opt/bigdata/apache-kylin-4.0.2-bin/ node3:/opt/bigdata/

kylin.properties configuration

vim /opt/bigdata/apache-kylin-4.0.2-bin/conf/kylin.properties

The kylin configuration information of the three nodes is as follows, where node1 is used for query, node2 and node3 are used for task execution.


node1

kylin.metadata.url=kylin_test@jdbc,driverClassName=com.mysql.jdbc.Driver,url=jdbc:mysql://node1:3306/kylin,username=kylin,[email protected]=node2:2181

kylin.server.cluster-servers-with-mode=node1:7070:query,node2:7070:job,node3:7070:job
kylin.job.scheduler.default=100

kylin.server.mode=query

kylin.cube.cubeplanner.enabled=true
kylin.server.query-metrics2-enabled=true
kylin.metrics.reporter-query-enabled=true
kylin.metrics.reporter-job-enabled=true
kylin.metrics.monitor-enabled=true

kylin.web.dashboard-enabled=true

kylin.metadata.url configures Mysql metadata database connection information

kylin.env.zookeeper-connect-string configures the connection information of ZK

kylin.server.cluster-servers-with-mode configures the information of all kylin nodes to facilitate self-discovery of services.

kylin.job.scheduler.default=100 is configured to facilitate self-discovery of services.

kylin.server.mode configures the type of the current kylin node (query—represents the query node, job—represents the task node, all—represents both types)

kylin.cube.cubeplanner.enabled/kylin.server.query-metrics2-enabled/kylin.metrics.* can open Cube Planner

Cube PlannerMake Apache Kylin more resource-efficient. It intelligently builds part of Cube to minimize the cost of building Cube while maximizing the benefits of serving end-user queries, and then learns patterns from running queries and dynamically recommends cuboids accordingly.

kylin.web.dashboard-enabled can enable Kylin Dashboard, which is only available for query/all nodes.

Kylin DashboardDisplaying useful Cube usage data is very important to users.


node2/node3

kylin.metadata.url=kylin_test@jdbc,driverClassName=com.mysql.jdbc.Driver,url=jdbc:mysql://node1:3306/kylin,username=kylin,password=1qaz@WSX
kylin.env.zookeeper-connect-string=node2:2181

kylin.server.cluster-servers-with-mode=node1:7070:query,node2:7070:job,node3:7070:job
kylin.server.mode=job

kylin.cube.cubeplanner.enabled=true
kylin.server.query-metrics2-enabled=true
kylin.metrics.reporter-query-enabled=true
kylin.metrics.reporter-job-enabled=true
kylin.metrics.monitor-enabled=true

kylin.web.dashboard-enabled=false

kylin.job.scheduler.default=100
kylin.server.self-discovery-enabled=true
kylin.job.lock=org.apache.kylin.job.lock.zookeeper.ZookeeperJobLock

The configuration information of node2 and node3 are exactly the same.

kylin.job.scheduler.default=100
kylin.server.self-discovery-enabled=true

The above configuration can introduce Curator-based master-slave mode multi-task engine schedulerCuratorScheculer

kylin.job.lock=org.apache.kylin.job.lock.zookeeper.ZookeeperJobLock

This configuration can make the task engine highly available.


Solve the problem that the root user does not have enough permissions to operate the HDFS file system

At this time, if you start kylin directly, an error will be reported, because hdfs in the HDFS file system has super user permissions, so we need the following settings:

[root@node1 ~]# sudo -u hdfs hadoop fs -chown -R root:root /kylin

[root@node1 ~]# sudo -u hdfs hadoop fs -chown -R root:root /kylin
[root@node1 ~]# hadoop fs -ls /
Found 3 items
drwxr-xr-x   - root root                0 2022-10-23 16:58 /kylin
drwxrwxrwt   - hdfs supergroup          0 2022-10-23 15:47 /tmp
drwxr-xr-x   - hdfs supergroup          0 2022-10-23 15:47 /user

At this time, we run the runtime environment check script on all three nodes:

[root@node1 ~]# check-env.sh 
Retrieving hadoop conf dir...
...................................................[PASS]
KYLIN_HOME is set to /opt/bigdata/apache-kylin-4.0.2-bin
Checking hive
...................................................[PASS]
Checking hadoop shell
...................................................[PASS]
Checking hdfs working dir
WARNING: log4j.properties is not found. HADOOP_CONF_DIR may be incomplete.
...................................................[PASS]
WARNING: log4j.properties is not found. HADOOP_CONF_DIR may be incomplete.
WARNING: log4j.properties is not found. HADOOP_CONF_DIR may be incomplete.

Checking environment finished successfully. To check again, run 'bin/check-env.sh' manually.

This means that kylin can start normally.


start up

Execute the following startup commands on all three nodes:

kylin.sh start

as follows:

[root@node2 ~]# kylin.sh start
Retrieving hadoop conf dir...
...................................................[PASS]
KYLIN_HOME is set to /opt/bigdata/apache-kylin-4.0.2-bin
Checking hive
...................................................[PASS]
Checking hadoop shell
...................................................[PASS]
Checking hdfs working dir
WARNING: log4j.properties is not found. HADOOP_CONF_DIR may be incomplete.
...................................................[PASS]
WARNING: log4j.properties is not found. HADOOP_CONF_DIR may be incomplete.
WARNING: log4j.properties is not found. HADOOP_CONF_DIR may be incomplete.

Checking environment finished successfully. To check again, run 'bin/check-env.sh' manually.
Retrieving hadoop conf dir...
Retrieving Spark dependency...
Start replace hadoop jars under /opt/bigdata/apache-kylin-4.0.2-bin/spark/jars.
Find platform specific jars:/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/../hadoop/client/hadoop-annotations-3.0.0-cdh6.3.2.jar
/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/../hadoop/client/hadoop-auth-3.0.0-cdh6.3.2.jar
/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/../hadoop/client/hadoop-common-3.0.0-cdh6.3.2.jar
/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/../hadoop/hadoop-annotations-3.0.0-cdh6.3.2.jar
/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/../hadoop/hadoop-common-3.0.0-cdh6.3.2.jar
/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/../hadoop/hadoop-auth-3.0.0-cdh6.3.2.jar /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/../hadoop-hdfs/hadoop-hdfs-client.jar
/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/../hadoop-hdfs/hadoop-hdfs-httpfs.jar
/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/../hadoop-hdfs/hadoop-hdfs-native-client.jar
/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/../hadoop-hdfs/hadoop-hdfs-native-client-3.0.0-cdh6.3.2.jar
/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/../hadoop-hdfs/hadoop-hdfs-3.0.0-cdh6.3.2.jar
/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/../hadoop-hdfs/hadoop-hdfs-httpfs-3.0.0-cdh6.3.2.jar
/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/../hadoop-hdfs/hadoop-hdfs-client-3.0.0-cdh6.3.2.jar /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-mapreduce-client-app-3.0.0-cdh6.3.2.jar
/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-mapreduce-client-core-3.0.0-cdh6.3.2.jar
/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-3.0.0-cdh6.3.2.jar
/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-mapreduce-client-shuffle-3.0.0-cdh6.3.2.jar
/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-mapreduce-client-common-3.0.0-cdh6.3.2.jar /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/../hadoop-yarn/hadoop-yarn-common-3.0.0-cdh6.3.2.jar
/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/../hadoop-yarn/hadoop-yarn-api-3.0.0-cdh6.3.2.jar
/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/../hadoop-yarn/hadoop-yarn-server-web-proxy-3.0.0-cdh6.3.2.jar
/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/../hadoop-yarn/hadoop-yarn-client-3.0.0-cdh6.3.2.jar
/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/../hadoop-yarn/hadoop-yarn-server-common-3.0.0-cdh6.3.2.jar /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/../../jars/htrace-core4-4.2.0-incubating.jar
/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/../../jars/htrace-core4-4.1.0-incubating.jar /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/../../jars/woodstox-core-5.0.3.jar
/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/../../jars/woodstox-core-5.1.0.jar
/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/../../jars/commons-configuration2-2.1.1.jar
/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/../../jars/woodstox-core-asl-4.4.1.jar
/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/../../jars/re2j-1.1.jar
/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/../../jars/commons-configuration2-2.1.jar
/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/../../jars/stax2-api-3.1.4.jar
/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/../../jars/re2j-1.0.jar  , will replace with these jars under /opt/bigdata/apache-kylin-4.0.2-bin/spark/jars.
Done hadoop jars replacement under /opt/bigdata/apache-kylin-4.0.2-bin/spark/jars.
Start to check whether we need to migrate acl tables
Not HBase metadata. Skip check.

A new Kylin instance is started by root. To stop it, run 'kylin.sh stop'
Check the log at /opt/bigdata/apache-kylin-4.0.2-bin/logs/kylin.log
Web UI is at http://node2:7070/kylin

WEB UI

Access the WEB UI of the node1 node:

http://node1:7070/kylin/login

The username and password are: ADMIN/KYLIN

insert image description here

After logging in, you can see:

insert image description here

In System => Instances we can see the status information of the 3 nodes

insert image description here

Guess you like

Origin blog.csdn.net/Shockang/article/details/127483023