foreword
This article belongs to the column "Big Data Installation and Deployment" . This column is original by the author. Please indicate the source for the citation. Please point out the deficiencies and mistakes in the comment area, thank you!
software requirements
Hadoop: cdh5.x, cdh6.x, hdp2.x, EMR5.x, EMR6.x, HDI4.x
Hive: 0.13 - 1.2.1+
Spark: 2.4.7/3.1.1
Mysql: 5.1.17 及以上
JDK: 1.8+
OS: Linux only, CentOS 6.5+ or Ubuntu 16.0.4+
Prepare
Do the following on node1:
It is recommended to use the following installation packages:
Apache Kylin 4.0.2 installation package
Unzip to the specified folder:
tar -zxvf apache-kylin-4.0.2-bin.tar.gz -C /opt/bigdata/
Configure the environment variable $KYLIN_HOME to point to the Kylin folder.
vim /etc/profile
JAVA_HOME=/usr/java/jdk1.8.0_341-amd64
KYLIN_HOME=/opt/bigdata/apache-kylin-4.0.2-bin
PATH=$PATH:$JAVA_HOME/bin:$KYLIN_HOME/bin
export PATH USER LOGNAME MAIL HOSTNAME HISTSIZE HISTCONTROL JAVA_HOME KYLIN_HOME
source /etc/profile
Download spark using a script
download-spark.sh
The downloaded Spark version is 3.1.3, and the Spark installation directory is in the $KYLIN_HOME/spark directory.
JDBC driver package
We need to put the Mysql JDBC driver package in the $KYLIN_HOME/ext directory (no such directory needs to be created manually)
cp mysql-connector-java.jar /opt/bigdata/apache-kylin-4.0.2-bin/ext/
Copy the kylin installation directory of node1 node to node2 and node3:
scp -r /opt/bigdata/apache-kylin-4.0.2-bin/ node2:/opt/bigdata/
scp -r /opt/bigdata/apache-kylin-4.0.2-bin/ node3:/opt/bigdata/
kylin.properties configuration
vim /opt/bigdata/apache-kylin-4.0.2-bin/conf/kylin.properties
The kylin configuration information of the three nodes is as follows, where node1 is used for query, node2 and node3 are used for task execution.
node1
kylin.metadata.url=kylin_test@jdbc,driverClassName=com.mysql.jdbc.Driver,url=jdbc:mysql://node1:3306/kylin,username=kylin,[email protected]=node2:2181
kylin.server.cluster-servers-with-mode=node1:7070:query,node2:7070:job,node3:7070:job
kylin.job.scheduler.default=100
kylin.server.mode=query
kylin.cube.cubeplanner.enabled=true
kylin.server.query-metrics2-enabled=true
kylin.metrics.reporter-query-enabled=true
kylin.metrics.reporter-job-enabled=true
kylin.metrics.monitor-enabled=true
kylin.web.dashboard-enabled=true
kylin.metadata.url configures Mysql metadata database connection information
kylin.env.zookeeper-connect-string configures the connection information of ZK
kylin.server.cluster-servers-with-mode configures the information of all kylin nodes to facilitate self-discovery of services.
kylin.job.scheduler.default=100 is configured to facilitate self-discovery of services.
kylin.server.mode configures the type of the current kylin node (query—represents the query node, job—represents the task node, all—represents both types)
kylin.cube.cubeplanner.enabled/kylin.server.query-metrics2-enabled/kylin.metrics.* can open Cube Planner
Cube Planner
Make Apache Kylin more resource-efficient. It intelligently builds part of Cube to minimize the cost of building Cube while maximizing the benefits of serving end-user queries, and then learns patterns from running queries and dynamically recommends cuboids accordingly.
kylin.web.dashboard-enabled can enable Kylin Dashboard, which is only available for query/all nodes.
Kylin Dashboard
Displaying useful Cube usage data is very important to users.
node2/node3
kylin.metadata.url=kylin_test@jdbc,driverClassName=com.mysql.jdbc.Driver,url=jdbc:mysql://node1:3306/kylin,username=kylin,password=1qaz@WSX
kylin.env.zookeeper-connect-string=node2:2181
kylin.server.cluster-servers-with-mode=node1:7070:query,node2:7070:job,node3:7070:job
kylin.server.mode=job
kylin.cube.cubeplanner.enabled=true
kylin.server.query-metrics2-enabled=true
kylin.metrics.reporter-query-enabled=true
kylin.metrics.reporter-job-enabled=true
kylin.metrics.monitor-enabled=true
kylin.web.dashboard-enabled=false
kylin.job.scheduler.default=100
kylin.server.self-discovery-enabled=true
kylin.job.lock=org.apache.kylin.job.lock.zookeeper.ZookeeperJobLock
The configuration information of node2 and node3 are exactly the same.
kylin.job.scheduler.default=100
kylin.server.self-discovery-enabled=true
The above configuration can introduce Curator-based master-slave mode multi-task engine schedulerCuratorScheculer
kylin.job.lock=org.apache.kylin.job.lock.zookeeper.ZookeeperJobLock
This configuration can make the task engine highly available.
Solve the problem that the root user does not have enough permissions to operate the HDFS file system
At this time, if you start kylin directly, an error will be reported, because hdfs in the HDFS file system has super user permissions, so we need the following settings:
[root@node1 ~]# sudo -u hdfs hadoop fs -chown -R root:root /kylin
[root@node1 ~]# sudo -u hdfs hadoop fs -chown -R root:root /kylin
[root@node1 ~]# hadoop fs -ls /
Found 3 items
drwxr-xr-x - root root 0 2022-10-23 16:58 /kylin
drwxrwxrwt - hdfs supergroup 0 2022-10-23 15:47 /tmp
drwxr-xr-x - hdfs supergroup 0 2022-10-23 15:47 /user
At this time, we run the runtime environment check script on all three nodes:
[root@node1 ~]# check-env.sh
Retrieving hadoop conf dir...
...................................................[PASS]
KYLIN_HOME is set to /opt/bigdata/apache-kylin-4.0.2-bin
Checking hive
...................................................[PASS]
Checking hadoop shell
...................................................[PASS]
Checking hdfs working dir
WARNING: log4j.properties is not found. HADOOP_CONF_DIR may be incomplete.
...................................................[PASS]
WARNING: log4j.properties is not found. HADOOP_CONF_DIR may be incomplete.
WARNING: log4j.properties is not found. HADOOP_CONF_DIR may be incomplete.
Checking environment finished successfully. To check again, run 'bin/check-env.sh' manually.
This means that kylin can start normally.
start up
Execute the following startup commands on all three nodes:
kylin.sh start
as follows:
[root@node2 ~]# kylin.sh start
Retrieving hadoop conf dir...
...................................................[PASS]
KYLIN_HOME is set to /opt/bigdata/apache-kylin-4.0.2-bin
Checking hive
...................................................[PASS]
Checking hadoop shell
...................................................[PASS]
Checking hdfs working dir
WARNING: log4j.properties is not found. HADOOP_CONF_DIR may be incomplete.
...................................................[PASS]
WARNING: log4j.properties is not found. HADOOP_CONF_DIR may be incomplete.
WARNING: log4j.properties is not found. HADOOP_CONF_DIR may be incomplete.
Checking environment finished successfully. To check again, run 'bin/check-env.sh' manually.
Retrieving hadoop conf dir...
Retrieving Spark dependency...
Start replace hadoop jars under /opt/bigdata/apache-kylin-4.0.2-bin/spark/jars.
Find platform specific jars:/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/../hadoop/client/hadoop-annotations-3.0.0-cdh6.3.2.jar
/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/../hadoop/client/hadoop-auth-3.0.0-cdh6.3.2.jar
/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/../hadoop/client/hadoop-common-3.0.0-cdh6.3.2.jar
/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/../hadoop/hadoop-annotations-3.0.0-cdh6.3.2.jar
/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/../hadoop/hadoop-common-3.0.0-cdh6.3.2.jar
/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/../hadoop/hadoop-auth-3.0.0-cdh6.3.2.jar /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/../hadoop-hdfs/hadoop-hdfs-client.jar
/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/../hadoop-hdfs/hadoop-hdfs-httpfs.jar
/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/../hadoop-hdfs/hadoop-hdfs-native-client.jar
/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/../hadoop-hdfs/hadoop-hdfs-native-client-3.0.0-cdh6.3.2.jar
/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/../hadoop-hdfs/hadoop-hdfs-3.0.0-cdh6.3.2.jar
/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/../hadoop-hdfs/hadoop-hdfs-httpfs-3.0.0-cdh6.3.2.jar
/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/../hadoop-hdfs/hadoop-hdfs-client-3.0.0-cdh6.3.2.jar /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-mapreduce-client-app-3.0.0-cdh6.3.2.jar
/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-mapreduce-client-core-3.0.0-cdh6.3.2.jar
/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-3.0.0-cdh6.3.2.jar
/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-mapreduce-client-shuffle-3.0.0-cdh6.3.2.jar
/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-mapreduce-client-common-3.0.0-cdh6.3.2.jar /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/../hadoop-yarn/hadoop-yarn-common-3.0.0-cdh6.3.2.jar
/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/../hadoop-yarn/hadoop-yarn-api-3.0.0-cdh6.3.2.jar
/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/../hadoop-yarn/hadoop-yarn-server-web-proxy-3.0.0-cdh6.3.2.jar
/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/../hadoop-yarn/hadoop-yarn-client-3.0.0-cdh6.3.2.jar
/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/../hadoop-yarn/hadoop-yarn-server-common-3.0.0-cdh6.3.2.jar /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/../../jars/htrace-core4-4.2.0-incubating.jar
/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/../../jars/htrace-core4-4.1.0-incubating.jar /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/../../jars/woodstox-core-5.0.3.jar
/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/../../jars/woodstox-core-5.1.0.jar
/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/../../jars/commons-configuration2-2.1.1.jar
/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/../../jars/woodstox-core-asl-4.4.1.jar
/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/../../jars/re2j-1.1.jar
/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/../../jars/commons-configuration2-2.1.jar
/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/../../jars/stax2-api-3.1.4.jar
/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/../../jars/re2j-1.0.jar , will replace with these jars under /opt/bigdata/apache-kylin-4.0.2-bin/spark/jars.
Done hadoop jars replacement under /opt/bigdata/apache-kylin-4.0.2-bin/spark/jars.
Start to check whether we need to migrate acl tables
Not HBase metadata. Skip check.
A new Kylin instance is started by root. To stop it, run 'kylin.sh stop'
Check the log at /opt/bigdata/apache-kylin-4.0.2-bin/logs/kylin.log
Web UI is at http://node2:7070/kylin
WEB UI
Access the WEB UI of the node1 node:
http://node1:7070/kylin/login
The username and password are: ADMIN/KYLIN
After logging in, you can see:
In System => Instances we can see the status information of the 3 nodes