Detailed tutorial on quickly deploying Hadoop clusters through docker-compose

I. Overview

The docker-compose project is an official open source project of docker, which is responsible for the rapid arrangement of docker container clusters, to manage containers easily and efficiently, and to define and run multiple containers.

  • Deploying applications through docker-compose is very simple and fast. However, because docker-compose manages a single machine, applications deployed through docker-compose are generally used in non-production environment scenarios such as testing, poc environments, and learning. If the production environment needs to use containerized deployment, it is recommended to use K8s.

  • Hadoop cluster deployment is still a little troublesome. For small partners who can quickly use Hadoop clusters, here we use docker-compose to deploy Hadoop clusters.

For the introduction of docker-compose, please refer to my following articles:

If you need to deploy the Hadoop environment through k8s, you can refer to the following articles of mine:

Hadoop NameNode HA Architecture:
insert image description here
Hadoop YARN HA Architecture:
insert image description here

2. Install docker and docker-compose

1) Install docker

# 安装yum-config-manager配置工具
yum -y install yum-utils

# 建议使用阿里云yum源:(推荐)
#yum-config-manager --add-repo https://download.docker.com/linux/centos/docker-ce.repo
yum-config-manager --add-repo http://mirrors.aliyun.com/docker-ce/linux/centos/docker-ce.repo

# 安装docker-ce版本
yum install -y docker-ce
# 启动并开机启动
systemctl enable --now docker
docker --version

2) Install docker-compose

Official installation address tutorial: https://docs.docker.com/compose/install/other/

curl -SL https://github.com/docker/compose/releases/download/v2.16.0/docker-compose-linux-x86_64 -o /usr/local/bin/docker-compose

chmod +x /usr/local/bin/docker-compose
docker-compose --version

三、docker-compose deploy

Before talking about Hadoop, here are a few important knowledge points to add. In fact, it was also mentioned in k8s, but here we are explaining it again for docker-compose.

1) Set the number of copies

replicas_test.yaml

version: '3'
services:
  replicas_test:
    image: registry.cn-hangzhou.aliyuncs.com/bigdata_cloudnative/centos:7.7.1908
    restart: always
    command: ["sh","-c","sleep 36000"]
    deploy:
      replicas: 2
    healthcheck:
      test: ["CMD-SHELL", "hostname"]
      interval: 10s
      timeout: 5s
      retries: 3

implement

docker-compose -f replicas_test.yaml up -d
docker-compose -f replicas_test.yaml ps

insert image description here
As can be seen from the figure above, the number of created service containers is controlled deploy.replicasby , but not all scenarios are applicable. Some components of Hadoop below are not applicable. For example, when it is required to set the host name and container name, it is not applicable. Adjust the number of containers.

2) Resource isolation

The resource isolation of docker-compose is the same as that in k8s, so it is easy to understand through the following example, as follows:

resources_test.yaml

version: '3'
services:
  resources_test:
    image: registry.cn-hangzhou.aliyuncs.com/bigdata_cloudnative/centos:7.7.1908
    restart: always
    command: ["sh","-c","sleep 36000"]
    deploy:
      replicas: 2
      resources:
        # 容器资源申请的最大值,容器最多能适用这么多资源
        limits:
          cpus: '1'
          memory: 100M
        # 所需资源的最小值,跟k8s里的requests一样,就是运行容器的最小值
        reservations:
          cpus: '0.5'
          memory: 50M
    healthcheck:
      test: ["CMD-SHELL", "hostname"]
      interval: 10s
      timeout: 5s
      retries: 3

implement

docker-compose -f resources_test.yaml up -d
docker-compose -f resources_test.yaml ps
# 查看状态
docker stats deploy-test-resources_test-1

insert image description here

四、docker-compose network

Network is a very important knowledge point in the container, so here we focus on examples to see if different docker-compose projects are accessed by name. By default, each docker-compose is a project (different directories , multiple compose in the same directory belong to a project), each project will generate a network by default. Note that by default only names within the same network can reach each other. Then how to access by name in different projects, and then explain with an example.

test1/test1.yaml

version: '3'
services:
  test1:
    image: registry.cn-hangzhou.aliyuncs.com/bigdata_cloudnative/centos:7.7.1908
    container_name: c_test1
    hostname: h_test1
    restart: always
    command: ["sh","-c","sleep 36000"]
    healthcheck:
      test: ["CMD-SHELL", "hostname"]
      interval: 10s
      timeout: 5s
      retries: 3

test2/test2.yaml

version: '3'
services:
  test2:
    image: registry.cn-hangzhou.aliyuncs.com/bigdata_cloudnative/centos:7.7.1908
    container_name: c_test2
    hostname: h_test2
    restart: always
    command: ["sh","-c","sleep 36000"]
    healthcheck:
      test: ["CMD-SHELL", "hostname"]
      interval: 10s
      timeout: 5s
      retries: 3

The verification results are as follows:

docker-compose -f test1/test1.yaml up -d
docker-compose -f test2/test2.yaml up -d
# 查看network,会生成两个network,如果两个yaml文件在同一个目录下,只会生成一个,它们也就属于同一个network下,是可以通过名称相互访问的。这里是在不同的目录下,就会生成两个network,默认情况下,不同的network是隔离的,不能通过名称访问的。yaml文件所在的目录名就是项目名称。这个项目名称是可以通过参数指定的,下面会细讲。
docker network ls

# 互ping
docker exec -it c_test1 ping c_test2
docker exec -it c_test1 ping h_test2

docker exec -it c_test2 ping c_test1
docker exec -it c_test2 ping h_test1

# 卸载
docker-compose -f test1/test1.yaml down
docker-compose -f test2/test2.yaml down

insert image description here
Next, we add the network and then test and verify

  • test1/network_test1.yaml

Create a new network in test1/network_test1.yamlthe definition , test2/network_test2.yamland refer to the network created by test1 below, then these two projects are in the same network, pay attention to the order of execution.

version: '3'
services:
  network_test1:
    image: registry.cn-hangzhou.aliyuncs.com/bigdata_cloudnative/centos:7.7.1908
    container_name: c_network_test1
    hostname: h_network_test1
    restart: always
    command: ["sh","-c","sleep 36000"]
    # 使用network
    networks:
      - test1_network
    healthcheck:
      test: ["CMD-SHELL", "hostname"]
      interval: 10s
      timeout: 5s
      retries: 3

# 定义创建新网络
networks:
  test1_network:
    driver: bridge
  • test2/network_test2.yaml
version: '3'
services:
  network_test2:
    image: registry.cn-hangzhou.aliyuncs.com/bigdata_cloudnative/centos:7.7.1908
    container_name: c_network_test2
    hostname: h_network_test2
    restart: always
    networks:
      - test1_network
    command: ["sh","-c","sleep 36000"]
    healthcheck:
      test: ["CMD-SHELL", "hostname"]
      interval: 10s
      timeout: 5s
      retries: 3

# 引用test1的网络
networks:
  # 项目名_网络名,可以通过docker network ls查看network名称
  test1_test1_network:
    external: true

The verification results are as follows:

docker-compose -f test1/network_test1.yaml up -d
docker-compose -f test2/network_test2.yaml up -d

# 查看网络
docker network ls

# 互ping
docker exec -it c_network_test1 ping -c3 c_network_test2 
docker exec -it c_network_test1 ping -c3 h_network_test2
docker exec -it c_network_test2 ping -c3 c_network_test1
docker exec -it c_network_test2 ping -c3 h_network_test1

# 卸载,注意顺序,要先卸载应用方,要不然network被应用了是删除不了的
docker-compose -f test2/network_test2.yaml down
docker-compose -f test1/network_test1.yaml down

insert image description here
From the above experiments, we can see that only multiple projects can be accessed through the host name or container name in the same network.

5. docker-compose project

The default project name is the name of the directory where the current yaml file is located. The network name generated when explaining the network above will also be the first project name, but the project name can be customized. The example is explained as follows:

# test.yaml
version: '3'
services:
  test:
    image: registry.cn-hangzhou.aliyuncs.com/bigdata_cloudnative/centos:7.7.1908
    restart: always
    command: ["sh","-c","sleep 36000"]
    healthcheck:
      test: ["CMD-SHELL", "hostname"]
      interval: 10s
      timeout: 5s
      retries: 3

implement

# 先不加参数
docker-compose -f test.yaml up -d
# 查看网络
network ls

# 使用参数自定义项目名称,-p, --project-name,有四种写法
docker-compose -p=p001 -f test.yaml up -d
docker-compose -p p002 -f test.yaml up -d
docker-compose --project-name=p003 -f test.yaml up -d
docker-compose --project-name p004 -f test.yaml up -d
# 查看网络
docker network ls

# 查看所有项目
docker-compose ls

insert image description here

6. Hadoop deployment (non-high availability)

1) Install JDK

# jdk包在我下面提供的资源包里,当然你也可以去官网下载。
tar -xf jdk-8u212-linux-x64.tar.gz

# /etc/profile文件中追加如下内容:
echo "export JAVA_HOME=`pwd`/jdk1.8.0_212" >> /etc/profile
echo "export PATH=\$JAVA_HOME/bin:\$PATH" >> /etc/profile
echo "export CLASSPATH=.:\$JAVA_HOME/lib/dt.jar:\$JAVA_HOME/lib/tools.jar" >> /etc/profile

# 加载生效
source /etc/profile

2) Download hadoop related software

### 1、Hadoop
# 下载地址:https://dlcdn.apache.org/hadoop/common/
wget https://dlcdn.apache.org/hadoop/common/hadoop-3.3.5/hadoop-3.3.5.tar.gz --no-check-certificate

### 2、hive
# 下载地址:http://archive.apache.org/dist/hive
wget http://archive.apache.org/dist/hive/hive-3.1.3/apache-hive-3.1.3-bin.tar.gz

### 2、spark
# Spark下载地址:http://spark.apache.org/downloads.html
wget https://dlcdn.apache.org/spark/spark-3.3.2/spark-3.3.2-bin-hadoop3.tgz --no-check-certificate

### 3、flink
wget https://dlcdn.apache.org/flink/flink-1.17.0/flink-1.17.0-bin-scala_2.12.tgz --no-check-certificate

3) Build the mirror Dockerfile

FROM registry.cn-hangzhou.aliyuncs.com/bigdata_cloudnative/centos:7.7.1908

RUN rm -f /etc/localtime && ln -sv /usr/share/zoneinfo/Asia/Shanghai /etc/localtime && echo "Asia/Shanghai" > /etc/timezone

RUN export LANG=zh_CN.UTF-8

# 创建用户和用户组,跟yaml编排里的user: 10000:10000
RUN groupadd --system --gid=10000 hadoop && useradd --system --home-dir /home/hadoop --uid=10000 --gid=hadoop hadoop

# 安装sudo
RUN yum -y install sudo ; chmod 640 /etc/sudoers

# 给hadoop添加sudo权限
RUN echo "hadoop ALL=(ALL) NOPASSWD: ALL" >> /etc/sudoers

RUN yum -y install install net-tools telnet wget nc

RUN mkdir /opt/apache/

# 安装 JDK
ADD jdk-8u212-linux-x64.tar.gz /opt/apache/
ENV JAVA_HOME /opt/apache/jdk1.8.0_212
ENV PATH $JAVA_HOME/bin:$PATH

# 配置 Hadoop
ENV HADOOP_VERSION 3.3.5
ADD hadoop-${HADOOP_VERSION}.tar.gz /opt/apache/
ENV HADOOP_HOME /opt/apache/hadoop
RUN ln -s /opt/apache/hadoop-${HADOOP_VERSION} $HADOOP_HOME

ENV HADOOP_COMMON_HOME=${HADOOP_HOME} \
    HADOOP_HDFS_HOME=${HADOOP_HOME} \
    HADOOP_MAPRED_HOME=${HADOOP_HOME} \
    HADOOP_YARN_HOME=${HADOOP_HOME} \
    HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop \
    PATH=${
    
    PATH}:${HADOOP_HOME}/bin

# 配置Hive
ENV HIVE_VERSION 3.1.3
ADD apache-hive-${HIVE_VERSION}-bin.tar.gz /opt/apache/
ENV HIVE_HOME=/opt/apache/hive
ENV PATH=$HIVE_HOME/bin:$PATH
RUN ln -s /opt/apache/apache-hive-${HIVE_VERSION}-bin ${HIVE_HOME}

# 配置spark
ENV SPARK_VERSION 3.3.2
ADD spark-${SPARK_VERSION}-bin-hadoop3.tgz /opt/apache/
ENV SPARK_HOME=/opt/apache/spark
ENV PATH=$SPARK_HOME/bin:$PATH
RUN ln -s /opt/apache/spark-${SPARK_VERSION}-bin-hadoop3 ${SPARK_HOME}

# 配置 flink
ENV FLINK_VERSION 1.17.0
ADD flink-${FLINK_VERSION}-bin-scala_2.12.tgz /opt/apache/
ENV FLINK_HOME=/opt/apache/flink
ENV PATH=$FLINK_HOME/bin:$PATH
RUN ln -s /opt/apache/flink-${FLINK_VERSION} ${FLINK_HOME}

# 创建namenode、datanode存储目录
RUN mkdir -p /opt/apache/hadoop/data/{
    
    hdfs,yarn} /opt/apache/hadoop/data/hdfs/namenode /opt/apache/hadoop/data/hdfs/datanode/data{
    
    1..3} /opt/apache/hadoop/data/yarn/{
    
    local-dirs,log-dirs,apps}

COPY bootstrap.sh /opt/apache/

COPY config/hadoop-config/* ${HADOOP_HOME}/etc/hadoop/

RUN chown -R hadoop:hadoop /opt/apache

ENV ll "ls -l"

WORKDIR /opt/apache

Start building the image

docker build -t hadoop:v1 . --no-cache

# 为了方便小伙伴下载即可使用,我这里将镜像文件推送到阿里云的镜像仓库
docker tag hadoop:v1 registry.cn-hangzhou.aliyuncs.com/bigdata_cloudnative/hadoop:v1
docker push registry.cn-hangzhou.aliyuncs.com/bigdata_cloudnative/hadoop:v1

### 参数解释
# -t:指定镜像名称
# . :当前目录Dockerfile
# -f:指定Dockerfile路径
#  --no-cache:不缓存

4) configuration

The configuration files will be uniformly placed in the resource package provided at the bottom.

1. Hadoop configuration

There are mainly the following files: core-site.xml, dfs.hosts, dfs.hosts.exclude, hdfs-site.xml, mapred-site.xml, yarn-hosts-exclude, yarn-hosts-include,yarn-site.xml

2. Hive configuration

There are mainly the following files: hive-env.sh, hive-site.xml, this article will not talk about the hive part, it will be explained in the next article.

5) Startup script bootstrap.sh

# bootstrap.sh
#!/usr/bin/env sh

wait_for() {
    
    
    echo Waiting for $1 to listen on $2...
    while ! nc -z $1 $2; do echo waiting...; sleep 1s; done
}

start_hdfs_namenode() {
    
    

        if [ ! -f /tmp/namenode-formated ];then
                ${HADOOP_HOME}/bin/hdfs namenode -format >/tmp/namenode-formated
        fi

        ${HADOOP_HOME}/bin/hdfs --loglevel INFO --daemon start namenode

        tail -f ${HADOOP_HOME}/logs/*namenode*.log
}

start_hdfs_datanode() {
    
    

        wait_for $1 $2

        ${HADOOP_HOME}/bin/hdfs --loglevel INFO --daemon start datanode

        tail -f ${HADOOP_HOME}/logs/*datanode*.log
}

start_yarn_resourcemanager() {
    
    

        ${HADOOP_HOME}/bin/yarn --loglevel INFO --daemon start resourcemanager

        tail -f ${HADOOP_HOME}/logs/*resourcemanager*.log
}

start_yarn_nodemanager() {
    
    

        wait_for $1 $2

        ${HADOOP_HOME}/bin/yarn --loglevel INFO --daemon start nodemanager

        tail -f ${HADOOP_HOME}/logs/*nodemanager*.log
}

start_yarn_proxyserver() {
    
    

        wait_for $1 $2

        ${HADOOP_HOME}/bin/yarn --loglevel INFO --daemon start proxyserver

        tail -f ${HADOOP_HOME}/logs/*proxyserver*.log
}

start_mr_historyserver() {
    
    

        wait_for $1 $2

        ${HADOOP_HOME}/bin/mapred --loglevel INFO  --daemon  start historyserver

        tail -f ${HADOOP_HOME}/logs/*historyserver*.log
}

case $1 in
        hadoop-hdfs-nn)
                start_hdfs_namenode
                ;;
        hadoop-hdfs-dn)
                start_hdfs_datanode $2 $3
                ;;
        hadoop-yarn-rm)
                start_yarn_resourcemanager
                ;;
        hadoop-yarn-nm)
                start_yarn_nodemanager $2 $3
                ;;
        hadoop-yarn-proxyserver)
                start_yarn_proxyserver $2 $3
                ;;
        hadoop-mr-historyserver)
                start_mr_historyserver $2 $3
                ;;
        *)
                echo "请输入正确的服务启动命令~"
        ;;
esac

6) YAML orchestration docker-compose.yaml

version: '3'
services:
  hadoop-hdfs-nn:
    image: registry.cn-hangzhou.aliyuncs.com/bigdata_cloudnative/hadoop:v1
    user: "hadoop:hadoop"
    container_name: hadoop-hdfs-nn
    hostname: hadoop-hdfs-nn
    restart: always
    env_file:
      - .env
    ports:
      - "30070:${HADOOP_HDFS_NN_PORT}"
    command: ["sh","-c","/opt/apache/bootstrap.sh hadoop-hdfs-nn"]
    networks:
      - hadoop_network
    healthcheck:
      test: ["CMD-SHELL", "curl --fail http://localhost:${HADOOP_HDFS_NN_PORT} || exit 1"]
      interval: 10s
      timeout: 5s
      retries: 3
  hadoop-hdfs-dn-0:
    image: registry.cn-hangzhou.aliyuncs.com/bigdata_cloudnative/hadoop:v1
    user: "hadoop:hadoop"
    container_name: hadoop-hdfs-dn-0
    hostname: hadoop-hdfs-dn-0
    restart: always
    depends_on:
      - hadoop-hdfs-nn
    env_file:
      - .env
    ports:
      - "30864:${HADOOP_HDFS_DN_PORT}"
    command: ["sh","-c","/opt/apache/bootstrap.sh hadoop-hdfs-dn hadoop-hdfs-nn ${HADOOP_HDFS_NN_PORT}"]
    networks:
      - hadoop_network
    healthcheck:
      test: ["CMD-SHELL", "curl --fail http://localhost:${HADOOP_HDFS_DN_PORT} || exit 1"]
      interval: 10s
      timeout: 5s
      retries: 3
  hadoop-hdfs-dn-1:
    image: registry.cn-hangzhou.aliyuncs.com/bigdata_cloudnative/hadoop:v1
    user: "hadoop:hadoop"
    container_name: hadoop-hdfs-dn-1
    hostname: hadoop-hdfs-dn-1
    restart: always
    depends_on:
      - hadoop-hdfs-nn
    env_file:
      - .env
    ports:
      - "30865:${HADOOP_HDFS_DN_PORT}"
    command: ["sh","-c","/opt/apache/bootstrap.sh hadoop-hdfs-dn hadoop-hdfs-nn ${HADOOP_HDFS_NN_PORT}"]
    networks:
      - hadoop_network
    healthcheck:
      test: ["CMD-SHELL", "curl --fail http://localhost:${HADOOP_HDFS_DN_PORT} || exit 1"]
      interval: 10s
      timeout: 5s
      retries: 3
  hadoop-hdfs-dn-2:
    image: registry.cn-hangzhou.aliyuncs.com/bigdata_cloudnative/hadoop:v1
    user: "hadoop:hadoop"
    container_name: hadoop-hdfs-dn-2
    hostname: hadoop-hdfs-dn-2
    restart: always
    depends_on:
      - hadoop-hdfs-nn
    env_file:
      - .env
    ports:
      - "30866:${HADOOP_HDFS_DN_PORT}"
    command: ["sh","-c","/opt/apache/bootstrap.sh hadoop-hdfs-dn hadoop-hdfs-nn ${HADOOP_HDFS_NN_PORT}"]
    networks:
      - hadoop_network
    healthcheck:
      test: ["CMD-SHELL", "curl --fail http://localhost:${HADOOP_HDFS_DN_PORT} || exit 1"]
      interval: 10s
      timeout: 5s
      retries: 3
  hadoop-yarn-rm:
    image: registry.cn-hangzhou.aliyuncs.com/bigdata_cloudnative/hadoop:v1
    user: "hadoop:hadoop"
    container_name: hadoop-yarn-rm
    hostname: hadoop-yarn-rm
    restart: always
    env_file:
      - .env
    ports:
      - "30888:${HADOOP_YARN_RM_PORT}"
    command: ["sh","-c","/opt/apache/bootstrap.sh hadoop-yarn-rm"]
    networks:
      - hadoop_network
    healthcheck:
      test: ["CMD-SHELL", "netstat -tnlp|grep :${HADOOP_YARN_RM_PORT} || exit 1"]
      interval: 10s
      timeout: 5s
      retries: 3
  hadoop-yarn-nm-0:
    image: registry.cn-hangzhou.aliyuncs.com/bigdata_cloudnative/hadoop:v1
    user: "hadoop:hadoop"
    container_name: hadoop-yarn-nm-0
    hostname: hadoop-yarn-nm-0
    restart: always
    depends_on:
      - hadoop-yarn-rm
    env_file:
      - .env
    ports:
      - "30042:${HADOOP_YARN_NM_PORT}"
    command: ["sh","-c","/opt/apache/bootstrap.sh hadoop-yarn-nm hadoop-yarn-rm ${HADOOP_YARN_RM_PORT}"]
    networks:
      - hadoop_network
    healthcheck:
      test: ["CMD-SHELL", "curl --fail http://localhost:${HADOOP_YARN_NM_PORT} || exit 1"]
      interval: 10s
      timeout: 5s
      retries: 3
  hadoop-yarn-nm-1:
    image: registry.cn-hangzhou.aliyuncs.com/bigdata_cloudnative/hadoop:v1
    user: "hadoop:hadoop"
    container_name: hadoop-yarn-nm-1
    hostname: hadoop-yarn-nm-1
    restart: always
    depends_on:
      - hadoop-yarn-rm
    env_file:
      - .env
    ports:
      - "30043:${HADOOP_YARN_NM_PORT}"
    command: ["sh","-c","/opt/apache/bootstrap.sh hadoop-yarn-nm hadoop-yarn-rm ${HADOOP_YARN_RM_PORT}"]
    networks:
      - hadoop_network
    healthcheck:
      test: ["CMD-SHELL", "curl --fail http://localhost:${HADOOP_YARN_NM_PORT} || exit 1"]
      interval: 10s
      timeout: 5s
      retries: 3
  hadoop-yarn-nm-2:
    image: registry.cn-hangzhou.aliyuncs.com/bigdata_cloudnative/hadoop:v1
    user: "hadoop:hadoop"
    container_name: hadoop-yarn-nm-2
    hostname: hadoop-yarn-nm-2
    restart: always
    depends_on:
      - hadoop-yarn-rm
    env_file:
      - .env
    ports:
      - "30044:${HADOOP_YARN_NM_PORT}"
    command: ["sh","-c","/opt/apache/bootstrap.sh hadoop-yarn-nm hadoop-yarn-rm ${HADOOP_YARN_RM_PORT}"]
    networks:
      - hadoop_network
    healthcheck:
      test: ["CMD-SHELL", "curl --fail http://localhost:${HADOOP_YARN_NM_PORT} || exit 1"]
      interval: 10s
      timeout: 5s
      retries: 3
  hadoop-yarn-proxyserver:
    image: registry.cn-hangzhou.aliyuncs.com/bigdata_cloudnative/hadoop:v1
    user: "hadoop:hadoop"
    container_name: hadoop-yarn-proxyserver
    hostname: hadoop-yarn-proxyserver
    restart: always
    depends_on:
      - hadoop-yarn-rm
    env_file:
      - .env
    ports:
      - "30911:${HADOOP_YARN_PROXYSERVER_PORT}"
    command: ["sh","-c","/opt/apache/bootstrap.sh hadoop-yarn-proxyserver hadoop-yarn-rm ${HADOOP_YARN_RM_PORT}"]
    networks:
      - hadoop_network
    healthcheck:
      test: ["CMD-SHELL", "netstat -tnlp|grep :${HADOOP_YARN_PROXYSERVER_PORT} || exit 1"]
      interval: 10s
      timeout: 5s
      retries: 3
  hadoop-mr-historyserver:
    image: registry.cn-hangzhou.aliyuncs.com/bigdata_cloudnative/hadoop:v1
    user: "hadoop:hadoop"
    container_name: hadoop-mr-historyserver
    hostname: hadoop-mr-historyserver
    restart: always
    depends_on:
      - hadoop-yarn-rm
    env_file:
      - .env
    ports:
      - "31988:${HADOOP_MR_HISTORYSERVER_PORT}"
    command: ["sh","-c","/opt/apache/bootstrap.sh hadoop-mr-historyserver hadoop-yarn-rm ${HADOOP_YARN_RM_PORT}"]
    networks:
      - hadoop_network
    healthcheck:
      test: ["CMD-SHELL", "netstat -tnlp|grep :${HADOOP_MR_HISTORYSERVER_PORT} || exit 1"]
      interval: 10s
      timeout: 5s
      retries: 3

networks:
  hadoop_network:
    driver: bridge

.envThe content of the file is as follows:

HADOOP_HDFS_NN_PORT=9870
HADOOP_HDFS_DN_PORT=9864
HADOOP_YARN_RM_PORT=8088
HADOOP_YARN_NM_PORT=8042
HADOOP_YARN_PROXYSERVER_PORT=9111
HADOOP_MR_HISTORYSERVER_PORT=19888

【Kind tips】

  • If the containers generated by different compose files do not specify the same network, they cannot be accessed through the hostname directly.
  • depends_onIt can only determine the startup sequence of the container, but cannot determine the startup sequence of the services in the container. It has little effect, bootstrap.shso add a wait_forfunction to the above script to really control the startup sequence of the services.

7) Start the service

# 这里-f docker-compose.yaml可以省略,如果文件名不是docker-compose.yaml就不能省略,-d 后台执行
docker-compose -f docker-compose.yaml up -d

# 查看状态
docker-compose -f docker-compose.yaml ps

insert image description here

8) Test verification

HDFS:http://ip:30070/
insert image description here
insert image description here

YARN: http://ip:30070/
insert image description here
insert image description here
The detailed deployment of docker-compose deployment of non-highly available Hadoop is here first, and the next step is to continue to deploy the highly available environment.

7. Hadoop HA deployment (high availability)

1) Install JDK

# jdk包在我下面提供的资源包里,当然你也可以去官网下载。
tar -xf jdk-8u212-linux-x64.tar.gz

# /etc/profile文件中追加如下内容:
echo "export JAVA_HOME=`pwd`/jdk1.8.0_212" >> /etc/profile
echo "export PATH=\$JAVA_HOME/bin:\$PATH" >> /etc/profile
echo "export CLASSPATH=.:\$JAVA_HOME/lib/dt.jar:\$JAVA_HOME/lib/tools.jar" >> /etc/profile

# 加载生效
source /etc/profile

2) Download hadoop related software

### 1、zookeeper
# 下载地址:https://zookeeper.apache.org/releases.html
# zookeeper非高可用用不到
wget https://dlcdn.apache.org/zookeeper/zookeeper-3.8.0/apache-zookeeper-3.8.0-bin.tar.gz --no-check-certificate
tar -xf  apache-zookeeper-3.8.0-bin.tar.gz

### 2、Hadoop
# 下载地址:https://dlcdn.apache.org/hadoop/common/
wget https://dlcdn.apache.org/hadoop/common/hadoop-3.3.5/hadoop-3.3.5.tar.gz --no-check-certificate

### 3、spark
# Spark下载地址:http://spark.apache.org/downloads.html
wget https://dlcdn.apache.org/spark/spark-3.3.2/spark-3.3.2-bin-hadoop3.tgz --no-check-certificate

### 4、flink
wget https://dlcdn.apache.org/flink/flink-1.17.0/flink-1.17.0-bin-scala_2.12.tgz --no-check-certificate

3) Build the mirror Dockerfile

FROM registry.cn-hangzhou.aliyuncs.com/bigdata_cloudnative/centos:7.7.1908

RUN rm -f /etc/localtime && ln -sv /usr/share/zoneinfo/Asia/Shanghai /etc/localtime && echo "Asia/Shanghai" > /etc/timezone

RUN export LANG=zh_CN.UTF-8

# 创建用户和用户组,跟yaml编排里的user: 10000:10000
RUN groupadd --system --gid=10000 hadoop && useradd --system --home-dir /home/hadoop --uid=10000 --gid=hadoop hadoop

# 安装sudo
RUN yum -y install sudo ; chmod 640 /etc/sudoers

# 给hadoop添加sudo权限
RUN echo "hadoop ALL=(ALL) NOPASSWD: ALL" >> /etc/sudoers

RUN yum -y install install net-tools telnet wget

RUN mkdir /opt/apache/

# 安装 JDK
ADD jdk-8u212-linux-x64.tar.gz /opt/apache/
ENV JAVA_HOME /opt/apache/jdk1.8.0_212
ENV PATH $JAVA_HOME/bin:$PATH

# 配置zookeeper
ENV ZOOKEEPER_VERSION 3.8.0
ADD apache-zookeeper-${ZOOKEEPER_VERSION}-bin.tar.gz /opt/apache/
ENV ZOOKEEPER_HOME /opt/apache/zookeeper
RUN ln -s /opt/apache/apache-zookeeper-${ZOOKEEPER_VERSION}-bin $ZOOKEEPER_HOME
COPY config/zookeeper-config/* ${ZOOKEEPER_HOME}/conf/

# 配置 Hadoop
ENV HADOOP_VERSION 3.3.5
ADD hadoop-${HADOOP_VERSION}.tar.gz /opt/apache/
ENV HADOOP_HOME /opt/apache/hadoop
RUN ln -s /opt/apache/hadoop-${HADOOP_VERSION} $HADOOP_HOME

ENV HADOOP_COMMON_HOME=${HADOOP_HOME} \
    HADOOP_HDFS_HOME=${HADOOP_HOME} \
    HADOOP_MAPRED_HOME=${HADOOP_HOME} \
    HADOOP_YARN_HOME=${HADOOP_HOME} \
    HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop \
    PATH=${
    
    PATH}:${HADOOP_HOME}/bin

# 配置Hive
ENV HIVE_VERSION 3.1.3
ADD apache-hive-${HIVE_VERSION}-bin.tar.gz /opt/apache/
ENV HIVE_HOME=/opt/apache/hive
ENV PATH=$HIVE_HOME/bin:$PATH
RUN ln -s /opt/apache/apache-hive-${HIVE_VERSION}-bin ${HIVE_HOME}

# 配置spark
ENV SPARK_VERSION 3.3.2
ADD spark-${SPARK_VERSION}-bin-hadoop3.tgz /opt/apache/
ENV SPARK_HOME=/opt/apache/spark
ENV PATH=$SPARK_HOME/bin:$PATH
RUN ln -s /opt/apache/spark-${SPARK_VERSION}-bin-hadoop3 ${SPARK_HOME}

# 配置 flink
ENV FLINK_VERSION 1.17.0
ADD flink-${FLINK_VERSION}-bin-scala_2.12.tgz /opt/apache/
ENV FLINK_HOME=/opt/apache/flink
ENV PATH=$FLINK_HOME/bin:$PATH
RUN ln -s /opt/apache/flink-${FLINK_VERSION} ${FLINK_HOME}

# 创建namenode、datanode存储目录
RUN mkdir -p /opt/apache/hadoop/data/{
    
    hdfs,yarn} /opt/apache/hadoop/data/hdfs/{
    
    journalnode,namenode} /opt/apache/hadoop/data/hdfs/datanode/data{
    
    1..3} /opt/apache/hadoop/data/yarn/{
    
    local-dirs,log-dirs,apps}

COPY bootstrap.sh /opt/apache/

COPY config/hadoop-config/* ${HADOOP_HOME}/etc/hadoop/

RUN chown -R hadoop:hadoop /opt/apache

ENV ll "ls -l"

WORKDIR /opt/apache

Start building the image

docker build -t hadoop-ha:v1 . --no-cache

# 为了方便小伙伴下载即可使用,我这里将镜像文件推送到阿里云的镜像仓库
docker tag hadoop:v1 registry.cn-hangzhou.aliyuncs.com/bigdata_cloudnative/hadoop-ha:v1
docker push registry.cn-hangzhou.aliyuncs.com/bigdata_cloudnative/hadoop-ha:v1

### 参数解释
# -t:指定镜像名称
# . :当前目录Dockerfile
# -f:指定Dockerfile路径
#  --no-cache:不缓存

4) configuration

The configuration files will be uniformly placed in the resource package provided at the bottom.

1. Hadoop configuration

There are mainly the following files: core-site.xml, dfs.hosts, dfs.hosts.exclude, hdfs-site.xml, mapred-site.xml, yarn-hosts-exclude, yarn-hosts-include,yarn-site.xml

2. Hive configuration

There are mainly the following files: hive-env.sh, hive-site.xml, this article will not talk about the hive part, it will be explained in the next article.

5) Startup script bootstrap.sh

#!/usr/bin/env sh

wait_for() {
    
    
    echo Waiting for $1 to listen on $2...
    while ! nc -z $1 $2; do echo waiting...; sleep 1s; done
}

start_zookeeper() {
    
    

        ${ZOOKEEPER_HOME}/bin/zkServer.sh start

        tail -f ${ZOOKEEPER_HOME}/logs/zookeeper-*.out
}

start_hdfs_journalnode() {
    
    

        wait_for $1 $2

        ${HADOOP_HOME}/bin/hdfs --loglevel INFO --daemon start journalnode

        tail -f ${HADOOP_HOME}/logs/*journalnode*.log

}

start_hdfs_namenode() {
    
    

        wait_for $1 $2

        if [ ! -f /opt/apache/hadoop/data/hdfs/namenode/formated ];then
                ${ZOOKEEPER_HOME}/bin/zkCli.sh -server zookeeper:${ZOOKEEPER_PORT} ls /hadoop-ha 1>/dev/null
                if [ $? -ne 0 ];then
                        $HADOOP_HOME/bin/hdfs zkfc -formatZK
                        $HADOOP_HOME/bin/hdfs namenode -format -force -nonInteractive && echo 1 > /opt/apache/hadoop/data/hdfs/namenode/formated
                else
                        $HADOOP_HOME/bin/hdfs namenode -bootstrapStandby && echo 1 > /opt/apache/hadoop/data/hdfs/namenode/formated
                fi
        fi

        $HADOOP_HOME/bin/hdfs --loglevel INFO --daemon start zkfc
        $HADOOP_HOME/bin/hdfs --loglevel INFO --daemon start namenode

        tail -f ${HADOOP_HOME}/logs/*.out
}

start_hdfs_datanode() {
    
    

        wait_for $1 $2

        ${HADOOP_HOME}/bin/hdfs --loglevel INFO --daemon start datanode

        tail -f ${HADOOP_HOME}/logs/*datanode*.log
}

start_yarn_resourcemanager() {
    
    

        wait_for $1 $2

        ${HADOOP_HOME}/bin/yarn --loglevel INFO --daemon start resourcemanager

        tail -f ${HADOOP_HOME}/logs/*resourcemanager*.log
}

start_yarn_nodemanager() {
    
    

        wait_for $1 $2

        ${HADOOP_HOME}/bin/yarn --loglevel INFO --daemon start nodemanager

        tail -f ${HADOOP_HOME}/logs/*nodemanager*.log
}

start_yarn_proxyserver() {
    
    

        wait_for $1 $2

        ${HADOOP_HOME}/bin/yarn --loglevel INFO --daemon start proxyserver

        tail -f ${HADOOP_HOME}/logs/*proxyserver*.log
}

start_mr_historyserver() {
    
    

        wait_for $1 $2

        ${HADOOP_HOME}/bin/mapred --loglevel INFO  --daemon  start historyserver

        tail -f ${HADOOP_HOME}/logs/*historyserver*.log
}

case $1 in
        zookeeper)
                start_zookeeper
                ;;
        hadoop-hdfs-jn)
                start_hdfs_journalnode $2 $3
                ;;
        hadoop-hdfs-nn)
                start_hdfs_namenode $2 $3
                ;;
        hadoop-hdfs-dn)
                start_hdfs_datanode $2 $3
                ;;
        hadoop-yarn-rm)
                start_yarn_resourcemanager $2 $3
                ;;
        hadoop-yarn-nm)
                start_yarn_nodemanager $2 $3
                ;;
        hadoop-yarn-proxyserver)
                start_yarn_proxyserver $2 $3
                ;;
        hadoop-mr-historyserver)
                start_mr_historyserver $2 $3
                ;;
        *)
                echo "请输入正确的服务启动命令~"
        ;;
esac

6) YAML orchestration docker-compose.yaml

version: '3'
services:
  zookeeper:
    image: registry.cn-hangzhou.aliyuncs.com/bigdata_cloudnative/hadoop-ha:v1
    user: "hadoop:hadoop"
    container_name: zookeeper
    hostname: zookeeper
    restart: always
    env_file:
      - .env
    ports:
      - ${
    
    ZOOKEEPER_PORT}
    command: ["sh","-c","/opt/apache/bootstrap.sh zookeeper"]
    networks:
      - hadoopha_network
    healthcheck:
      test: ["CMD-SHELL", "netstat -tnlp|grep :${ZOOKEEPER_PORT} || exit 1"]
      interval: 10s
      timeout: 5s
      retries: 3
  hadoop-hdfs-jn-0:
    image: registry.cn-hangzhou.aliyuncs.com/bigdata_cloudnative/hadoop-ha:v1
    user: "hadoop:hadoop"
    container_name: hadoop-hdfs-jn-0
    hostname: hadoop-hdfs-jn-0
    restart: always
    depends_on:
      - zookeeper
    env_file:
      - .env
    expose:
      - ${
    
    HADOOP_HDFS_JN_PORT}
    command: ["sh","-c","/opt/apache/bootstrap.sh hadoop-hdfs-jn zookeeper ${ZOOKEEPER_PORT}"]
    networks:
      - hadoopha_network
    healthcheck:
      test: ["CMD-SHELL", "netstat -tnlp|grep :${HADOOP_HDFS_JN_PORT} || exit 1"]
      interval: 10s
      timeout: 5s
      retries: 3
  hadoop-hdfs-jn-1:
    image: registry.cn-hangzhou.aliyuncs.com/bigdata_cloudnative/hadoop-ha:v1
    user: "hadoop:hadoop"
    container_name: hadoop-hdfs-jn-1
    hostname: hadoop-hdfs-jn-1
    restart: always
    depends_on:
      - hadoop-hdfs-jn-0
    env_file:
      - .env
    expose:
      - ${
    
    HADOOP_HDFS_JN_PORT}
    command: ["sh","-c","/opt/apache/bootstrap.sh hadoop-hdfs-jn zookeeper ${ZOOKEEPER_PORT}"]
    networks:
      - hadoopha_network
    healthcheck:
      test: ["CMD-SHELL", "netstat -tnlp|grep :${HADOOP_HDFS_JN_PORT} || exit 1"]
      interval: 10s
      timeout: 5s
      retries: 3
  hadoop-hdfs-jn-2:
    image: registry.cn-hangzhou.aliyuncs.com/bigdata_cloudnative/hadoop-ha:v1
    user: "hadoop:hadoop"
    container_name: hadoop-hdfs-jn-2
    hostname: hadoop-hdfs-jn-2
    restart: always
    depends_on:
      - hadoop-hdfs-jn-1
    env_file:
      - .env
    expose:
      - ${
    
    HADOOP_HDFS_JN_PORT}
    command: ["sh","-c","/opt/apache/bootstrap.sh hadoop-hdfs-jn zookeeper ${ZOOKEEPER_PORT}"]
    networks:
      - hadoopha_network
    healthcheck:
      test: ["CMD-SHELL", "netstat -tnlp|grep :${HADOOP_HDFS_JN_PORT} || exit 1"]
      interval: 10s
      timeout: 5s
      retries: 3
  hadoop-hdfs-nn-0:
    image: registry.cn-hangzhou.aliyuncs.com/bigdata_cloudnative/hadoop-ha:v1
    user: "hadoop:hadoop"
    container_name: hadoop-hdfs-nn-0
    hostname: hadoop-hdfs-nn-0
    restart: always
    depends_on:
      - hadoop-hdfs-jn-2
    env_file:
      - .env
    ports:
      - "30070:${HADOOP_HDFS_NN_PORT}"
    command: ["sh","-c","/opt/apache/bootstrap.sh hadoop-hdfs-nn hadoop-hdfs-jn-2 ${HADOOP_HDFS_JN_PORT}"]
    networks:
      - hadoopha_network
    healthcheck:
      test: ["CMD-SHELL", "netstat -tnlp|grep :${HADOOP_HDFS_NN_PORT} || exit 1"]
      interval: 10s
      timeout: 5s
      retries: 3
  hadoop-hdfs-nn-1:
    image: registry.cn-hangzhou.aliyuncs.com/bigdata_cloudnative/hadoop-ha:v1
    user: "hadoop:hadoop"
    container_name: hadoop-hdfs-nn-1
    hostname: hadoop-hdfs-nn-1
    restart: always
    depends_on:
      - hadoop-hdfs-nn-0
    env_file:
      - .env
    ports:
      - "30071:${HADOOP_HDFS_NN_PORT}"
    command: ["sh","-c","/opt/apache/bootstrap.sh hadoop-hdfs-nn hadoop-hdfs-nn-0 ${HADOOP_HDFS_NN_PORT}"]
    networks:
      - hadoopha_network
    healthcheck:
      test: ["CMD-SHELL", "netstat -tnlp|grep :${HADOOP_HDFS_NN_PORT} || exit 1"]
      interval: 10s
      timeout: 5s
      retries: 6
  hadoop-hdfs-dn-0:
    image: registry.cn-hangzhou.aliyuncs.com/bigdata_cloudnative/hadoop-ha:v1
    user: "hadoop:hadoop"
    container_name: hadoop-hdfs-dn-0
    hostname: hadoop-hdfs-dn-0
    restart: always
    depends_on:
      - hadoop-hdfs-nn-1
    env_file:
      - .env
    ports:
      - "30864:${HADOOP_HDFS_DN_PORT}"
    command: ["sh","-c","/opt/apache/bootstrap.sh hadoop-hdfs-dn hadoop-hdfs-nn-1 ${HADOOP_HDFS_NN_PORT}"]
    networks:
      - hadoopha_network
    healthcheck:
      test: ["CMD-SHELL", "curl --fail http://localhost:${HADOOP_HDFS_DN_PORT} || exit 1"]
      interval: 10s
      timeout: 5s
      retries: 8
  hadoop-hdfs-dn-1:
    image: registry.cn-hangzhou.aliyuncs.com/bigdata_cloudnative/hadoop-ha:v1
    user: "hadoop:hadoop"
    container_name: hadoop-hdfs-dn-1
    hostname: hadoop-hdfs-dn-1
    restart: always
    depends_on:
      - hadoop-hdfs-nn-1
    env_file:
      - .env
    ports:
      - "30865:${HADOOP_HDFS_DN_PORT}"
    command: ["sh","-c","/opt/apache/bootstrap.sh hadoop-hdfs-dn hadoop-hdfs-nn-1 ${HADOOP_HDFS_NN_PORT}"]
    networks:
      - hadoopha_network
    healthcheck:
      test: ["CMD-SHELL", "curl --fail http://localhost:${HADOOP_HDFS_DN_PORT} || exit 1"]
      interval: 10s
      timeout: 5s
      retries: 8
  hadoop-hdfs-dn-2:
    image: registry.cn-hangzhou.aliyuncs.com/bigdata_cloudnative/hadoop-ha:v1
    user: "hadoop:hadoop"
    container_name: hadoop-hdfs-dn-2
    hostname: hadoop-hdfs-dn-2
    restart: always
    depends_on:
      - hadoop-hdfs-nn-1
    env_file:
      - .env
    ports:
      - "30866:${HADOOP_HDFS_DN_PORT}"
    command: ["sh","-c","/opt/apache/bootstrap.sh hadoop-hdfs-dn hadoop-hdfs-nn-1 ${HADOOP_HDFS_NN_PORT}"]
    networks:
      - hadoopha_network
    healthcheck:
      test: ["CMD-SHELL", "curl --fail http://localhost:${HADOOP_HDFS_DN_PORT} || exit 1"]
      interval: 10s
      timeout: 5s
      retries: 8
  hadoop-yarn-rm-0:
    image: registry.cn-hangzhou.aliyuncs.com/bigdata_cloudnative/hadoop-ha:v1
    user: "hadoop:hadoop"
    container_name: hadoop-yarn-rm-0
    hostname: hadoop-yarn-rm-0
    restart: always
    depends_on:
      - zookeeper
    env_file:
      - .env
    ports:
      - "30888:${HADOOP_YARN_RM_PORT}"
    command: ["sh","-c","/opt/apache/bootstrap.sh hadoop-yarn-rm zookeeper ${ZOOKEEPER_PORT}"]
    networks:
      - hadoopha_network
    healthcheck:
      test: ["CMD-SHELL", "netstat -tnlp|grep :${HADOOP_YARN_RM_PORT} || exit 1"]
      interval: 10s
      timeout: 5s
      retries: 3
  hadoop-yarn-rm-1:
    image: registry.cn-hangzhou.aliyuncs.com/bigdata_cloudnative/hadoop-ha:v1
    user: "hadoop:hadoop"
    container_name: hadoop-yarn-rm-1
    hostname: hadoop-yarn-rm-1
    restart: always
    depends_on:
      - hadoop-yarn-rm-0
    env_file:
      - .env
    ports:
      - "30889:${HADOOP_YARN_RM_PORT}"
    command: ["sh","-c","/opt/apache/bootstrap.sh hadoop-yarn-rm hadoop-yarn-rm-0 ${HADOOP_YARN_RM_PORT}"]
    networks:
      - hadoopha_network
    healthcheck:
      test: ["CMD-SHELL", "netstat -tnlp|grep :${HADOOP_YARN_RM_PORT} || exit 1"]
      interval: 10s
      timeout: 5s
      retries: 3
  hadoop-yarn-nm-0:
    image: registry.cn-hangzhou.aliyuncs.com/bigdata_cloudnative/hadoop-ha:v1
    user: "hadoop:hadoop"
    container_name: hadoop-yarn-nm-0
    hostname: hadoop-yarn-nm-0
    restart: always
    depends_on:
      - hadoop-yarn-rm-1
    env_file:
      - .env
    ports:
      - "30042:${HADOOP_YARN_NM_PORT}"
    command: ["sh","-c","/opt/apache/bootstrap.sh hadoop-yarn-nm hadoop-yarn-rm-1 ${HADOOP_YARN_RM_PORT}"]
    networks:
      - hadoopha_network
    healthcheck:
      test: ["CMD-SHELL", "curl --fail http://localhost:${HADOOP_YARN_NM_PORT} || exit 1"]
      interval: 10s
      timeout: 5s
      retries: 3
  hadoop-yarn-nm-1:
    image: registry.cn-hangzhou.aliyuncs.com/bigdata_cloudnative/hadoop-ha:v1
    user: "hadoop:hadoop"
    container_name: hadoop-yarn-nm-1
    hostname: hadoop-yarn-nm-1
    restart: always
    depends_on:
      - hadoop-yarn-rm-1
    env_file:
      - .env
    ports:
      - "30043:${HADOOP_YARN_NM_PORT}"
    command: ["sh","-c","/opt/apache/bootstrap.sh hadoop-yarn-nm hadoop-yarn-rm-1 ${HADOOP_YARN_RM_PORT}"]
    networks:
      - hadoopha_network
    healthcheck:
      test: ["CMD-SHELL", "curl --fail http://localhost:${HADOOP_YARN_NM_PORT} || exit 1"]
      interval: 10s
      timeout: 5s
      retries: 3
  hadoop-yarn-nm-2:
    image: registry.cn-hangzhou.aliyuncs.com/bigdata_cloudnative/hadoop-ha:v1
    user: "hadoop:hadoop"
    container_name: hadoop-yarn-nm-2
    hostname: hadoop-yarn-nm-2
    restart: always
    depends_on:
      - hadoop-yarn-rm-1
    env_file:
      - .env
    ports:
      - "30044:${HADOOP_YARN_NM_PORT}"
    command: ["sh","-c","/opt/apache/bootstrap.sh hadoop-yarn-nm hadoop-yarn-rm-1 ${HADOOP_YARN_RM_PORT}"]
    networks:
      - hadoopha_network
    healthcheck:
      test: ["CMD-SHELL", "curl --fail http://localhost:${HADOOP_YARN_NM_PORT} || exit 1"]
      interval: 10s
      timeout: 5s
      retries: 3
  hadoop-yarn-proxyserver:
    image: registry.cn-hangzhou.aliyuncs.com/bigdata_cloudnative/hadoop-ha:v1
    user: "hadoop:hadoop"
    container_name: hadoop-yarn-proxyserver
    hostname: hadoop-yarn-proxyserver
    restart: always
    depends_on:
      - hadoop-yarn-rm-1
    env_file:
      - .env
    ports:
      - "30911:${HADOOP_YARN_PROXYSERVER_PORT}"
    command: ["sh","-c","/opt/apache/bootstrap.sh hadoop-yarn-proxyserver hadoop-yarn-rm-1 ${HADOOP_YARN_RM_PORT}"]
    networks:
      - hadoopha_network
    healthcheck:
      test: ["CMD-SHELL", "netstat -tnlp|grep :${HADOOP_YARN_PROXYSERVER_PORT} || exit 1"]
      interval: 10s
      timeout: 5s
      retries: 3
  hadoop-mr-historyserver:
    image: registry.cn-hangzhou.aliyuncs.com/bigdata_cloudnative/hadoop-ha:v1
    user: "hadoop:hadoop"
    container_name: hadoop-mr-historyserver
    hostname: hadoop-mr-historyserver
    restart: always
    depends_on:
      - hadoop-yarn-rm-1
    env_file:
      - .env
    ports:
      - "31988:${HADOOP_MR_HISTORYSERVER_PORT}"
    command: ["sh","-c","/opt/apache/bootstrap.sh hadoop-mr-historyserver hadoop-yarn-rm-1 ${HADOOP_YARN_RM_PORT}"]
    networks:
      - hadoopha_network
    healthcheck:
      test: ["CMD-SHELL", "netstat -tnlp|grep :${HADOOP_MR_HISTORYSERVER_PORT} || exit 1"]
      interval: 10s
      timeout: 5s
      retries: 6

networks:
  hadoopha_network:
    driver: bridge
    

.env

ZOOKEEPER_PORT=2181
HADOOP_HDFS_JN_PORT=8485
HADOOP_HDFS_NN_PORT=9870
HADOOP_HDFS_DN_PORT=9864
HADOOP_YARN_RM_PORT=8088
HADOOP_YARN_NM_PORT=8042
HADOOP_YARN_PROXYSERVER_PORT=9111
HADOOP_MR_HISTORYSERVER_PORT=19888

7) Start the service

# 这里-f docker-compose.yaml可以省略,如果文件名不是docker-compose.yaml就不能省略,-d 后台执行
docker-compose -f docker-compose.yaml up -d

# 查看状态
docker-compose -f docker-compose.yaml ps

insert image description here

8) Test verification

HDFS:http://ip:30070http://ip:30071

Namenode master node:
insert image description here
namenode standby node:
insert image description here
databnode node:
insert image description here
verify by command line test:

# 随便登录一个容器即可
docker exec -it hadoop-hdfs-jn-0 bash

hdfs dfs -ls /
hdfs dfs -touchz /test
hdfs dfs -mkdir /test123
hdfs dfs -ls /

insert image description here

YARN:http://ip:30888http://ip:30889

resourcemanager primary node:
insert image description here
resourcemanager standby node:
insert image description here
nodemanager node:
insert image description here


git address: https://gitee.com/hadoop-bigdata/docker-compose-hadoop

The detailed process of quickly deploying Hadoop clusters through docker-compose is here first. If you have any questions, please leave me a message. You can follow my public account [Big Data and Cloud Native Technology Sharing] Reply [ ] to get the full set of resources above dch~

Guess you like

Origin blog.csdn.net/qq_35745940/article/details/129763706