[Big Data] Spark open source REST service -- installation and use of Apache Livy

Install

Prerequisite: HADOOP/HDFS/YARN, SPARK and other components need to be installed, and environment variables must be configured

1. Download the livy installation package

Download interface from livy official website

cd /opt
wget https://dlcdn.apache.org/incubator/livy/0.7.1-incubating/apache-livy-0.7.1-incubating-bin.zip

2. Unzip the installation package

unzip apache-livy-0.7.1-incubating-bin.zip

3. Configuration

  1. Modify livy-env.sh
JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0_211.jdk/Contents/Home
HADOOP_CONF_DIR=/Users/xxx/Documents/software/hadoop-3.3.1/etc/hadoop
SPARK_HOME=/Users/xxx/Documents/software/spark-3.2.1
SPARK_CONF_DIR=/Users/xxx/Documents/software/spark-3.2.1/conf
  1. configurelivy.conf
# 配置livy会话所使用的spark集群部署模式
livy.spark.master = yarn
# 配置livy会话所使用的Spark集群部署模式
livy.spark.deploy.mode = cluster
# 默认使用hiveContext
livy.repl.enable.hive-context = true
# 开启用户代理
livy.impersonation.enabled = true
# 配置session空闲过期时间
livy.server.session.timeout = 1h
# 配置thriftserver
livy.server.thrift.enabled = true
livy.server.thrift.port = 10002
# 配置 recovery 
livy.server.recovery.mode = recovery
livy.server.recovery.state-store = filesystem
livy.server.recovery.state-store.url = hdfs://10.253.128.30:9000/livy/
  1. Configure log4j
cp log4j.properties.template log4j.properties
  1. Copy the jersey-core-1.9.jar package to the jars directory

4. Start livy

# 进入到livy目录下
cd /opt/livy-0.7.1
bin/livy-server start

visitlivy-ui

curl http://ip:8998/ui

Livy configuration items

configuration header default value illustrate
livy.server.spark-home spark directory
livy.spark.master
livy.spark.deploy-mode
livy.spark.scala-version
livy.spark.version
livy.session.staging-dir
livy.file.upload.max.size
livy.file.local-dir-whitelist
livy.repl.enable-hive-context
livy.environment
livy.server.host
livy.server.port 8998
livy.ui.basePath
livy.ui.enabled
livy.server.request-header.size 131072
livy.server.response-header.size 131072
livy.server.csrf-protection.enabled false
livy.impersonation.enabled false
livy.superusers null
livy.server.access-control.enabled false
livy.server.access-control.allowed-users *
livy.server.access-control.modify-users null
livy.server.access-control.view-users null
livy.keystore
livy.keystore.password
livy.key-password

Livy uses

livy-session

Through livy-session, spark-shell can be executed through rest to process interactive requests

  1. session creation
curl -XPOST 'http://10.253.128.30:8998/sessions' -H 'Content-Type:application/json' --data '{"kind": "spark"}'
  1. View session
    http://10.253.128.30:8998/ui

  2. session使用 curl -XPOST ‘http://10.253.128.30:8998/sessions/2/statements’ -H ‘Content-Type:application/json’ --d ‘{“code”: “sc.textFile(”“)”}’

Note: When the status of the livy server changes to idle, the request will be sent to it before execution. When executed, its status changes to busy. After the execution is completed, the state becomes idle again

livyy-batch

Non-interactive requests are processed by livy-batch, that is, equivalent to spark-submit operations.
examples:

curl -XPOST  -H 'Content-Type:application/json' http://10.253.128.30:8998/batches --data '{"conf": {"spark.master": "yarn-cluster"}, "file": "hdfs://", "className":"", "name":"", "executorCores": "","executorMemory":"512m", "driverCores": 1, "driverMemory":"512m", "queue":"default","args":[\"100\"] }'

Guess you like

Origin blog.csdn.net/u013412066/article/details/129793483