Install
Prerequisite: HADOOP/HDFS/YARN, SPARK and other components need to be installed, and environment variables must be configured
1. Download the livy installation package
Download interface from livy official website
cd /opt
wget https://dlcdn.apache.org/incubator/livy/0.7.1-incubating/apache-livy-0.7.1-incubating-bin.zip
2. Unzip the installation package
unzip apache-livy-0.7.1-incubating-bin.zip
3. Configuration
- Modify livy-env.sh
JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0_211.jdk/Contents/Home
HADOOP_CONF_DIR=/Users/xxx/Documents/software/hadoop-3.3.1/etc/hadoop
SPARK_HOME=/Users/xxx/Documents/software/spark-3.2.1
SPARK_CONF_DIR=/Users/xxx/Documents/software/spark-3.2.1/conf
- configurelivy.conf
# 配置livy会话所使用的spark集群部署模式
livy.spark.master = yarn
# 配置livy会话所使用的Spark集群部署模式
livy.spark.deploy.mode = cluster
# 默认使用hiveContext
livy.repl.enable.hive-context = true
# 开启用户代理
livy.impersonation.enabled = true
# 配置session空闲过期时间
livy.server.session.timeout = 1h
# 配置thriftserver
livy.server.thrift.enabled = true
livy.server.thrift.port = 10002
# 配置 recovery
livy.server.recovery.mode = recovery
livy.server.recovery.state-store = filesystem
livy.server.recovery.state-store.url = hdfs://10.253.128.30:9000/livy/
- Configure log4j
cp log4j.properties.template log4j.properties
- Copy the jersey-core-1.9.jar package to the jars directory
4. Start livy
# 进入到livy目录下
cd /opt/livy-0.7.1
bin/livy-server start
visitlivy-ui
curl http://ip:8998/ui
Livy configuration items
configuration | header default value | illustrate |
---|---|---|
livy.server.spark-home | spark directory | |
livy.spark.master | ||
livy.spark.deploy-mode | ||
livy.spark.scala-version | ||
livy.spark.version | ||
livy.session.staging-dir | ||
livy.file.upload.max.size | ||
livy.file.local-dir-whitelist | ||
livy.repl.enable-hive-context | ||
livy.environment | ||
livy.server.host | ||
livy.server.port | 8998 | |
livy.ui.basePath | ||
livy.ui.enabled | ||
livy.server.request-header.size | 131072 | |
livy.server.response-header.size | 131072 | |
livy.server.csrf-protection.enabled | false | |
livy.impersonation.enabled | false | |
livy.superusers | null | |
livy.server.access-control.enabled | false | |
livy.server.access-control.allowed-users | * | |
livy.server.access-control.modify-users | null | |
livy.server.access-control.view-users | null | |
livy.keystore | ||
livy.keystore.password | ||
livy.key-password |
Livy uses
livy-session
Through livy-session, spark-shell can be executed through rest to process interactive requests
- session creation
curl -XPOST 'http://10.253.128.30:8998/sessions' -H 'Content-Type:application/json' --data '{"kind": "spark"}'
-
View session
http://10.253.128.30:8998/ui -
session使用 curl -XPOST ‘http://10.253.128.30:8998/sessions/2/statements’ -H ‘Content-Type:application/json’ --d ‘{“code”: “sc.textFile(”“)”}’
Note: When the status of the livy server changes to idle, the request will be sent to it before execution. When executed, its status changes to busy. After the execution is completed, the state becomes idle again
livyy-batch
Non-interactive requests are processed by livy-batch, that is, equivalent to spark-submit operations.
examples:
curl -XPOST -H 'Content-Type:application/json' http://10.253.128.30:8998/batches --data '{"conf": {"spark.master": "yarn-cluster"}, "file": "hdfs://", "className":"", "name":"", "executorCores": "","executorMemory":"512m", "driverCores": 1, "driverMemory":"512m", "queue":"default","args":[\"100\"] }'