The big data environment built by Docker, one-click start and stop, the
code is not moved, the environment first
I am a Docker fan. When I was learning big data related technologies, I came up with an idea:
Build a big data development environment with docker!
What are the benefits of doing this?
As long as I have this docker-compose.yml container orchestration description file, I can start my big data environment on any machine where the docker software is installed.
What is done once and for all, isn't it the goal that we programmers do every day and strive for?
How to do?
I searched all domestic blogs and posts, but there was no suitable answer.
I can only do it myself.
docker hub
First I go to docker hub. This is the docker version of github.
I searched a lot of keywords such as Hadoop, spark, etc. and found a company;
The big data environment built by Docker, one-click start and stop
This company made almost all big data components into a docker image. And it is fine-grained and divided into roles. It's really great.
For example, the picture you see now is the docker image he made for the role of namenode in Hadoop. It will become especially easy if you do some packaging and personalization on top of it.
So I looked for the big data component I wanted from his Registry
Hadoop
Hive
Spark
easy, all found.
virtual machine
After wiring, we need to install docker in the virtual machine.
What else needs a virtual machine?
Let me talk about it. Install a virtual machine. Windows is not convenient. (Mac friends can drift by).
For the virtual machine, I use virtual box and ubuntu installed.
Then I started to install docker.
Installed docker also needs to install his twin brother, docker-compose
The big data environment built by Docker, one-click start and stop
docker-compose.yml
docker-compose makes the orchestration of docker containers easy.
docker-compose.yml records how to arrange the process. He is a description file!
The following is the docker-compose.yml file of my big data environment!
version: '2'
services:
namenode:
image: bde2020/hadoop-namenode:1.1.0-hadoop2.8-java8
container_name: namenode
volumes:
- ./data/namenode:/hadoop/dfs/name
environment:
- CLUSTER_NAME=test
env_file:
- ./hadoop-hive.env
ports:
- 50070:50070
- 8020:8020
datanode:
image: bde2020/hadoop-datanode:1.1.0-hadoop2.8-java8
depends_on:
- namenode
volumes:
- ./data/datanode:/hadoop/dfs/data
env_file:
- ./hadoop-hive.env
ports:
- 50075:50075
hive-server:
image: bde2020/hive:2.1.0-postgresql-metastore
container_name: hive-server
env_file:
- ./hadoop-hive.env
environment:
- "HIVE_CORE_CONF_javax_jdo_option_ConnectionURL=jdbc:postgresql://hive-metastore/metastore"
ports:
- "10000:10000"
hive-metastore:
image: bde2020/hive:2.1.0-postgresql-metastore
container_name: hive-metastore
env_file:
- ./hadoop-hive.env
command: /opt/hive/bin/hive --service metastore
ports:
- 9083:9083
hive-metastore-postgresql:
image: bde2020/hive-metastore-postgresql:2.1.0
ports:
- 5432:5432
volumes:
- ./data/postgresql/:/var/lib/postgresql/data
spark-master:
image: bde2020/spark-master:2.1.0-hadoop2.8-hive-java8
container_name: spark-master
ports:
- 8080:8080
- 7077:7077
env_file:
- ./hadoop-hive.env
spark-worker:
image: bde2020/spark-worker:2.1.0-hadoop2.8-hive-java8
depends_on:
- spark-master
environment:
- SPARK_MASTER=spark://spark-master:7077
ports:
- "8081:8081"
env_file:
- ./hadoop-hive.env
mysql-server:
image: mysql:5.7
container_name: mysql-server
ports:
- "3306:3306"
environment:
- MYSQL_ROOT_PASSWORD=zhangyang517
volumes:
- ./data/mysql:/var/lib/mysql
elasticsearch:
image: elasticsearch:6.5.3
environment:
- discovery.type=single-node
ports:
- "9200:9200"
- "9300:9300"
networks:
- es_network
kibana:
image: kibana:6.5.3
ports:
- "5601:5601"
networks:
- es_network
networks:
es_network:
external: true
Later I needed to use elasticsearch and kibana, so I added them directly. It's really convenient.
The most important thing is that he can easily share to your friends, good friends.
Next we need to write a start script and a stop script. In this way, one-key start and stop can be realized.
run.sh
#!/bin/bash
docker-compose -f docker-compose.yml up -d namenode hive-metastore-postgresql
docker-compose -f docker-compose.yml up -d datanode hive-metastore
sleep 5
docker-compose -f docker-compose.yml up -d hive-server
docker-compose -f docker-compose.yml up -d spark-master spark-worker
docker-compose -f docker-compose.yml up -d mysql-server
#docker-compose -f docker-compose.yml up -d elasticsearch
#docker-compose -f docker-compose.yml up -d kibana
my_ip=`ip route get 1|awk '{print $NF;exit}'`
echo "Namenode: http://${my_ip}:50070"
echo "Datanode: http://${my_ip}:50075"
echo "Spark-master: http://${my_ip}:8080"
stop.sh
#!/bin/bash
docker-compose stop
Look at the effect:
The big data environment built by Docker has a one-click start, stop and
start successfully. Verify