1. Use singularities/spark2.2 to build spark
Refer to https://hub.docker.com/r/singularities/spark
singularities/spark:2.2版本中
Hadoop版本:2.8.2
Spark版本: 2.2.1
Scala版本:2.11.8
Java版本:1.8.0_151
Create docker-compose.yml file
version: "2"
services:
master:
image: singularities/spark
command: start-spark master
hostname: master
ports:
- "6066:6066"
- "7070:7070"
- "8080:8080"
- "50070:50070"
worker:
image: singularities/spark
command: start-spark worker master
environment:
SPARK_WORKER_CORES: 1
SPARK_WORKER_MEMORY: 2g
links:
- master
2. Start spark
启动容器
docker-compose up -d
查看容器
docker-compose ps
停止容器
docker-compose stop
删除容器
docker-compose rm
3. If the startup is successful, you can visit localhost:8080, localhost:50070 to check the status through the browser
4. Create Python+pyspark+jupyter environment
Create a new Dockerfile file and write the following
FROM python:3.7
MAINTAINER Haiyang Lv
USER root
RUN pip install jupyter -i http://pypi.douban.com/simple --trusted-host pypi.douban.com
RUN pip install pyspark -i http://pypi.douban.com/simple --trusted-host pypi.douban.com
CMD jupyter notebook --allow-root --ip=0.0.0.0
5. Create Python+pyspark+jupyter docker images
在Dockerfile的目录下执行命令
docker build -t python_jupyter:latest .
6. Start the jupyter container
docker run -itd -p 8888:8888 --name jupyter python_jupyter
7. Browser visit jupyter 127.0.0.1:8888
Follow the prompts to enter the container and execute jupyter notebook list to view the token
Enter the container command docker exec -it jupyter bash
Just copy it to the browser