Big data environment built by Docker

The big data environment built by Docker, one-click start and stop, the
code is not moved, the environment first

I am a Docker fan. When I was learning big data related technologies, I came up with an idea:

Build a big data development environment with docker!
What are the benefits of doing this?

As long as I have this docker-compose.yml container orchestration description file, I can start my big data environment on any machine where the docker software is installed.
What is done once and for all, isn't it the goal that we programmers do every day and strive for?

How to do?

I searched all domestic blogs and posts, but there was no suitable answer.
I can only do it myself.

docker hub
First I go to docker hub. This is the docker version of github.
I searched a lot of keywords such as Hadoop, spark, etc. and found a company;
Insert picture description here

The big data environment built by Docker, one-click start and stop

This company made almost all big data components into a docker image. And it is fine-grained and divided into roles. It's really great.
For example, the picture you see now is the docker image he made for the role of namenode in Hadoop. It will become especially easy if you do some packaging and personalization on top of it.

So I looked for the big data component I wanted from his Registry

Hadoop
Hive
Spark
easy, all found.

virtual machine

After wiring, we need to install docker in the virtual machine.
What else needs a virtual machine?
Let me talk about it. Install a virtual machine. Windows is not convenient. (Mac friends can drift by).

For the virtual machine, I use virtual box and ubuntu installed.
Then I started to install docker.
Installed docker also needs to install his twin brother, docker-compose
Insert picture description here

The big data environment built by Docker, one-click start and stop
docker-compose.yml

docker-compose makes the orchestration of docker containers easy.
docker-compose.yml records how to arrange the process. He is a description file!
The following is the docker-compose.yml file of my big data environment!


version: '2' 
services:
  namenode:
    image: bde2020/hadoop-namenode:1.1.0-hadoop2.8-java8
    container_name: namenode
    volumes:
      - ./data/namenode:/hadoop/dfs/name
    environment:
      - CLUSTER_NAME=test
    env_file:
      - ./hadoop-hive.env
    ports:
      - 50070:50070
      - 8020:8020  
  datanode:
    image: bde2020/hadoop-datanode:1.1.0-hadoop2.8-java8
    depends_on: 
      - namenode
    volumes:
      - ./data/datanode:/hadoop/dfs/data
    env_file:
      - ./hadoop-hive.env
    ports:
      - 50075:50075
  hive-server:
    image: bde2020/hive:2.1.0-postgresql-metastore
    container_name: hive-server
    env_file:
      - ./hadoop-hive.env
    environment:
      - "HIVE_CORE_CONF_javax_jdo_option_ConnectionURL=jdbc:postgresql://hive-metastore/metastore"
    ports:
      - "10000:10000"
  hive-metastore:
    image: bde2020/hive:2.1.0-postgresql-metastore
    container_name: hive-metastore
    env_file:
      - ./hadoop-hive.env
    command: /opt/hive/bin/hive --service metastore
    ports:
      - 9083:9083
  hive-metastore-postgresql:
    image: bde2020/hive-metastore-postgresql:2.1.0
    ports:
      - 5432:5432
    volumes:
      - ./data/postgresql/:/var/lib/postgresql/data
  spark-master:
    image: bde2020/spark-master:2.1.0-hadoop2.8-hive-java8
    container_name: spark-master
    ports:
      - 8080:8080
      - 7077:7077
    env_file:
      - ./hadoop-hive.env
  spark-worker:
    image: bde2020/spark-worker:2.1.0-hadoop2.8-hive-java8
    depends_on:
      - spark-master
    environment:
      - SPARK_MASTER=spark://spark-master:7077
    ports:
      - "8081:8081"
    env_file:
      - ./hadoop-hive.env
  mysql-server:
    image: mysql:5.7
    container_name: mysql-server
    ports:
      - "3306:3306"
    environment:
      - MYSQL_ROOT_PASSWORD=zhangyang517
    volumes:
      - ./data/mysql:/var/lib/mysql

  elasticsearch:
    image: elasticsearch:6.5.3
    environment:
      - discovery.type=single-node
    ports:
      - "9200:9200"
      - "9300:9300"
    networks: 
      - es_network
  kibana:
    image: kibana:6.5.3
    ports:
      - "5601:5601"
    networks: 
      - es_network

networks:
  es_network:
    external: true

Later I needed to use elasticsearch and kibana, so I added them directly. It's really convenient.
The most important thing is that he can easily share to your friends, good friends.

Next we need to write a start script and a stop script. In this way, one-key start and stop can be realized.
run.sh

#!/bin/bash

docker-compose -f docker-compose.yml up -d namenode hive-metastore-postgresql
docker-compose -f docker-compose.yml up -d datanode hive-metastore
sleep 5
docker-compose -f docker-compose.yml up -d hive-server
docker-compose -f docker-compose.yml up -d spark-master spark-worker
docker-compose -f docker-compose.yml up -d mysql-server
#docker-compose -f docker-compose.yml up -d elasticsearch
#docker-compose -f docker-compose.yml up -d kibana
my_ip=`ip route get 1|awk '{print $NF;exit}'`
echo "Namenode: http://${my_ip}:50070"
echo "Datanode: http://${my_ip}:50075"
echo "Spark-master: http://${my_ip}:8080"

stop.sh

#!/bin/bash
docker-compose stop

Look at the effect:
Insert picture description here

The big data environment built by Docker has a one-click start, stop and
start successfully. Verify
Insert picture description hereInsert picture description hereInsert picture description here

Original link: https://www.toutiao.com/a6781686481690821132/?tt_from=weixin&utm_campaign=client_share&app=news_article&utm_source=weixin&iid=4309686640186884&utm_medium=toutiao_ios&wxshare_count=1

Guess you like

Origin blog.csdn.net/weixin_43614067/article/details/108115988