Apache Kyuubi (Incubating), a distributed and multi-tenant gateway for serving serverless SQL on Lakehouse. This article is an introductory article to kyuubi. It introduces the basic installation and use of Kyuubi, and uses the Spark engine as an example to describe how to submit the first Spark SQL task.
You can also read the article "Comprehensive Comparative Analysis of Kyuubi and Spark ThriftServer" to understand the similarities and differences between Kyuubi and Spark ThriftServer.
Installation package download
Go to the following page to download the kyuubi installation package: https://kyuubi.apache.org/releases.html . The following takes the 1.5.0-incubating version as an example.
mkdir /data && cd /data
wget https://dlcdn.apache.org/incubator/kyuubi/kyuubi-1.5.0-incubating/apache-kyuubi-1.5.0-incubating-bin.tgz
tar zxvf apache-kyuubi-1.5.0-incubating-bin.tgz
ln -s apache-kyuubi-1.5.0-incubating-bin kyuubi
Since we are using the spark engine here, we also need to download the spark installation package.
cd /data
wget https://www.apache.org/dyn/closer.lua/spark/spark-3.2.1/spark-3.2.1-bin-hadoop3.2.tgz
tar zxvf spark-3.2.1-bin-hadoop3.2.tgz
ln -s spark-3.2.1-bin-hadoop3.2 spark
Modify configuration file
cd conf
cp spark-env.sh.template spark-env.sh
vim spark-env.sh
Set HADOOP_CONF_DIR
export HADOOP_CONF_DIR=/etc/hadoop/conf
Test whether spark tasks can be submitted to yarn
bin/spark-submit --master yarn --class org.apache.spark.examples.SparkPi ./examples/jars/spark-examples_2.12-3.2.1.jar
Get the following results
Configuration file modification
Modify configuration file
cd /data/kyuubi
cd conf
cp kyuubi-defaults.conf.template kyuubi-defaults.conf
cp kyuubi-env.sh.template kyuubi-env.sh
cp log4j2.properties.template log4j2.properties
For configuration parameters of the above files, please refer to: https://kyuubi.apache.org/docs/latest/deployment/settings.html
The following takes HDP 3.1.4 as an example
The contents of kyuubi-env.sh are as follows
export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.322.b06-1.el7_9.x86_64
export SPARK_HOME=/data/spark
export HADOOP_CONF_DIR=/etc/hadoop/conf
export KYUUBI_JAVA_OPTS="-Xmx6g -XX:+UnlockDiagnosticVMOptions -XX:ParGCCardsPerStrideChunk=4096 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSConcurrentMTEnabled -XX:CMSInitiatingOccupancyFraction=70 -XX:+UseCMSInitiatingOccupancyOnly -XX:+CMSClassUnloadingEnabled -XX:+CMSParallelRemarkEnabled -XX:+UseCondCardMark -XX:MaxDirectMemorySize=1024m -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=./logs -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintTenuringDistribution -Xloggc:./logs/kyuubi-server-gc-%t.log -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=5M -XX:NewRatio=3 -XX:MetaspaceSize=512m"
export KYUUBI_BEELINE_OPTS="-Xmx2g -XX:+UnlockDiagnosticVMOptions -XX:ParGCCardsPerStrideChunk=4096 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSConcurrentMTEnabled -XX:CMSInitiatingOccupancyFraction=70 -XX:+UseCMSInitiatingOccupancyOnly -XX:+CMSClassUnloadingEnabled -XX:+CMSParallelRemarkEnabled -XX:+UseCondCardMark"
The contents of kyuubi-defaults.conf are as follows
kyuubi.ha.zookeeper.quorum hadoop1:2181,hadoop2:2181,hadoop3:2181
spark.master yarn
The log4j2.properties file does not need to be modified.
Start kyuubi
bin/kyuubi start
You can see that kyuubi has been started
Use beeline to connect to kyuubi
bin/beeline -u "jdbc:hive2://hadoop1:2181,hadoop2:2181,hadoop3:2181/default;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=kyuubi" -n hdfs
get
View YARN UI