Kyuubi Basic Installation and Usage Guide

Apache Kyuubi (Incubating), a distributed and multi-tenant gateway for serving serverless SQL on Lakehouse. This article is an introductory article to kyuubi. It introduces the basic installation and use of Kyuubi, and uses the Spark engine as an example to describe how to submit the first Spark SQL task.

You can also read the article "Comprehensive Comparative Analysis of Kyuubi and Spark ThriftServer" to understand the similarities and differences between Kyuubi and Spark ThriftServer.

Installation package download

Go to the following page to download the kyuubi installation package: https://kyuubi.apache.org/releases.html . The following takes the 1.5.0-incubating version as an example.

mkdir /data && cd /data
wget https://dlcdn.apache.org/incubator/kyuubi/kyuubi-1.5.0-incubating/apache-kyuubi-1.5.0-incubating-bin.tgz
tar zxvf apache-kyuubi-1.5.0-incubating-bin.tgz
ln -s apache-kyuubi-1.5.0-incubating-bin kyuubi

Since we are using the spark engine here, we also need to download the spark installation package.

cd /data
wget https://www.apache.org/dyn/closer.lua/spark/spark-3.2.1/spark-3.2.1-bin-hadoop3.2.tgz
tar zxvf spark-3.2.1-bin-hadoop3.2.tgz
ln -s spark-3.2.1-bin-hadoop3.2 spark

Modify configuration file

cd conf
cp spark-env.sh.template spark-env.sh

vim spark-env.sh

Set HADOOP_CONF_DIR

export HADOOP_CONF_DIR=/etc/hadoop/conf

Test whether spark tasks can be submitted to yarn

bin/spark-submit --master yarn --class org.apache.spark.examples.SparkPi ./examples/jars/spark-examples_2.12-3.2.1.jar

Get the following results

Configuration file modification

Modify configuration file

cd /data/kyuubi
cd conf
cp kyuubi-defaults.conf.template kyuubi-defaults.conf
cp kyuubi-env.sh.template kyuubi-env.sh
cp log4j2.properties.template log4j2.properties

For configuration parameters of the above files, please refer to: https://kyuubi.apache.org/docs/latest/deployment/settings.html

The following takes HDP 3.1.4 as an example

The contents of kyuubi-env.sh are as follows

export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.322.b06-1.el7_9.x86_64
export SPARK_HOME=/data/spark
export HADOOP_CONF_DIR=/etc/hadoop/conf
export KYUUBI_JAVA_OPTS="-Xmx6g -XX:+UnlockDiagnosticVMOptions -XX:ParGCCardsPerStrideChunk=4096 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSConcurrentMTEnabled -XX:CMSInitiatingOccupancyFraction=70 -XX:+UseCMSInitiatingOccupancyOnly -XX:+CMSClassUnloadingEnabled -XX:+CMSParallelRemarkEnabled -XX:+UseCondCardMark -XX:MaxDirectMemorySize=1024m  -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=./logs -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintTenuringDistribution -Xloggc:./logs/kyuubi-server-gc-%t.log -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=5M -XX:NewRatio=3 -XX:MetaspaceSize=512m"
export KYUUBI_BEELINE_OPTS="-Xmx2g -XX:+UnlockDiagnosticVMOptions -XX:ParGCCardsPerStrideChunk=4096 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSConcurrentMTEnabled -XX:CMSInitiatingOccupancyFraction=70 -XX:+UseCMSInitiatingOccupancyOnly -XX:+CMSClassUnloadingEnabled -XX:+CMSParallelRemarkEnabled -XX:+UseCondCardMark"

The contents of kyuubi-defaults.conf are as follows

kyuubi.ha.zookeeper.quorum  hadoop1:2181,hadoop2:2181,hadoop3:2181
spark.master                yarn

The log4j2.properties file does not need to be modified.

Start kyuubi

bin/kyuubi start

You can see that kyuubi has been started

Use beeline to connect to kyuubi

 bin/beeline -u "jdbc:hive2://hadoop1:2181,hadoop2:2181,hadoop3:2181/default;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=kyuubi" -n hdfs

get

View YARN UI

Submit test task