A detailed tutorial on quickly deploying Hive through docker-compose

I. Overview

In fact, the deployment of hive through docker-compose is superimposed on the basis of Hadoop deployment in the previous article. Hive is the most commonly used data warehouse service, so it is necessary to integrate it. Interested partners, please read my following carefully Content, the service deployed through docker-compose is mainly to quickly deploy services with the least resources and time cost, which is convenient for small partners to learn, test, verify functions, etc.~

For Hadoop deployment, please refer to my following articles:

It is best to browse through the articles on Hadoop deployment first. If you don’t care about this article.

For an introduction to Hive, please refer to my following article: Big Data Hadoop - Data Warehouse Hive

2. Preliminary preparation

1) Deploy docker

# 安装yum-config-manager配置工具
yum -y install yum-utils

# 建议使用阿里云yum源:(推荐)
#yum-config-manager --add-repo https://download.docker.com/linux/centos/docker-ce.repo
yum-config-manager --add-repo http://mirrors.aliyun.com/docker-ce/linux/centos/docker-ce.repo

# 安装docker-ce版本
yum install -y docker-ce
# 启动并开机启动
systemctl enable --now docker
docker --version

2) Deploy docker-compose

curl -SL https://github.com/docker/compose/releases/download/v2.16.0/docker-compose-linux-x86_64 -o /usr/local/bin/docker-compose

chmod +x /usr/local/bin/docker-compose
docker-compose --version

3. Create a network

# 创建,注意不能使用hadoop_network,要不然启动hs2服务的时候会有问题!!!
docker network create hadoop-network

# 查看
docker network ls

4. MySQL deployment

1) mysql mirror

docker pull  mysql:5.7
docker tag  mysql:5.7 registry.cn-hangzhou.aliyuncs.com/bigdata_cloudnative/mysql:5.7

2) Configuration

mkdir -p conf/ data/db/

cat >conf/my.cnf<<EOF
[mysqld]
character-set-server=utf8
log-bin=mysql-bin
server-id=1
pid-file        = /var/run/mysqld/mysqld.pid
socket          = /var/run/mysqld/mysqld.sock
datadir         = /var/lib/mysql
sql_mode=STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION
symbolic-links=0
secure_file_priv =
wait_timeout=120
interactive_timeout=120
default-time_zone = '+8:00'
skip-external-locking
skip-name-resolve
open_files_limit = 10240
max_connections = 1000
max_connect_errors = 6000
table_open_cache = 800
max_allowed_packet = 40m
sort_buffer_size = 2M
join_buffer_size = 1M
thread_cache_size = 32
query_cache_size = 64M
transaction_isolation = READ-COMMITTED
tmp_table_size = 128M
max_heap_table_size = 128M
log-bin = mysql-bin
sync-binlog = 1
binlog_format = ROW
binlog_cache_size = 1M
key_buffer_size = 128M
read_buffer_size = 2M
read_rnd_buffer_size = 4M
bulk_insert_buffer_size = 64M
lower_case_table_names = 1
explicit_defaults_for_timestamp=true
skip_name_resolve = ON
event_scheduler = ON
log_bin_trust_function_creators = 1
innodb_buffer_pool_size = 512M
innodb_flush_log_at_trx_commit = 1
innodb_file_per_table = 1
innodb_log_buffer_size = 4M
innodb_log_file_size = 256M
innodb_max_dirty_pages_pct = 90
innodb_read_io_threads = 4
innodb_write_io_threads = 4
EOF

3) Orchestration

version: '3'
services:
  db:
    image: registry.cn-hangzhou.aliyuncs.com/bigdata_cloudnative/mysql:5.7 #mysql版本
    container_name: mysql
    hostname: mysql
    volumes:
      - ./data/db:/var/lib/mysql
      - ./conf/my.cnf:/etc/mysql/mysql.conf.d/mysqld.cnf
    restart: always
    ports:
      - 13306:3306
    networks:
      - hadoop-network
    environment:
      MYSQL_ROOT_PASSWORD: 123456 #访问密码
      secure_file_priv:
    healthcheck:
      test: ["CMD-SHELL", "curl -I localhost:3306 || exit 1"]
      interval: 10s
      timeout: 5s
      retries: 3

# 连接外部网络
networks:
  hadoop-network:
    external: true

4) Deploy mysql

docker-compose -f mysql-compose.yaml up -d
docker-compose -f mysql-compose.yaml ps

# 登录容器
mysql -uroot -p123456

insert image description here

4. Hive deployment

1) Download hive

Download address: http://archive.apache.org/dist/hive

# 下载
wget http://archive.apache.org/dist/hive/hive-3.1.3/apache-hive-3.1.3-bin.tar.gz

# 解压
tar -zxvf apache-hive-3.1.3-bin.tar.gz

2) Configuration

images/hive-config/hive-site.xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
        <!-- 配置hdfs存储目录 -->
        <property>
                <name>hive.metastore.warehouse.dir</name>
                <value>/user/hive_remote/warehouse</value>
        </property>

        <property>
                <name>hive.metastore.local</name>
                <value>false</value>
        </property>

        <!-- 所连接的 MySQL 数据库的地址,hive_local是数据库,程序会自动创建,自定义就行 -->
        <property>
                <name>javax.jdo.option.ConnectionURL</name>
                <value>jdbc:mysql://mysql:3306/hive_metastore?createDatabaseIfNotExist=true&amp;useSSL=false&amp;serverTimezone=Asia/Shanghai</value>
        </property>

        <!-- MySQL 驱动 -->
        <property>
                <name>javax.jdo.option.ConnectionDriverName</name>
                <!--<value>com.mysql.cj.jdbc.Driver</value>-->
                <value>com.mysql.jdbc.Driver</value>
        </property>

        <!-- mysql连接用户 -->
        <property>
                <name>javax.jdo.option.ConnectionUserName</name>
                <value>root</value>
        </property>

        <!-- mysql连接密码 -->
        <property>
                <name>javax.jdo.option.ConnectionPassword</name>
                <value>123456</value>
        </property>

        <!--元数据是否校验-->
        <property>
                <name>hive.metastore.schema.verification</name>
                <value>false</value>
        </property>

        <property>
                <name>system:user.name</name>
                <value>root</value>
                <description>user name</description>
        </property>

        <property>
                <name>hive.metastore.uris</name>
                <value>thrift://hive-metastore:9083</value>
        </property>

        <!-- host -->
        <property>
                <name>hive.server2.thrift.bind.host</name>
                <value>0.0.0.0</value>
                <description>Bind host on which to run the HiveServer2 Thrift service.</description>
        </property>

        <!-- hs2端口 默认是10000-->
        <property>
                <name>hive.server2.thrift.port</name>
                <value>10000</value>
        </property>

        <property>
                <name>hive.server2.active.passive.ha.enable</name>
                <value>true</value>
        </property>

</configuration>

3) Start script

#!/usr/bin/env sh


wait_for() {
    
    
    echo Waiting for $1 to listen on $2...
    while ! nc -z $1 $2; do echo waiting...; sleep 1s; done
}

start_hdfs_namenode() {
    
    

        if [ ! -f /tmp/namenode-formated ];then
                ${HADOOP_HOME}/bin/hdfs namenode -format >/tmp/namenode-formated
        fi

        ${HADOOP_HOME}/bin/hdfs --loglevel INFO --daemon start namenode

        tail -f ${HADOOP_HOME}/logs/*namenode*.log
}

start_hdfs_datanode() {
    
    

        wait_for $1 $2

        ${HADOOP_HOME}/bin/hdfs --loglevel INFO --daemon start datanode

        tail -f ${HADOOP_HOME}/logs/*datanode*.log
}

start_yarn_resourcemanager() {
    
    

        ${HADOOP_HOME}/bin/yarn --loglevel INFO --daemon start resourcemanager

        tail -f ${HADOOP_HOME}/logs/*resourcemanager*.log
}

start_yarn_nodemanager() {
    
    

        wait_for $1 $2

        ${HADOOP_HOME}/bin/yarn --loglevel INFO --daemon start nodemanager

        tail -f ${HADOOP_HOME}/logs/*nodemanager*.log
}

start_yarn_proxyserver() {
    
    

        wait_for $1 $2

        ${HADOOP_HOME}/bin/yarn --loglevel INFO --daemon start proxyserver

        tail -f ${HADOOP_HOME}/logs/*proxyserver*.log
}

start_mr_historyserver() {
    
    

        wait_for $1 $2

        ${HADOOP_HOME}/bin/mapred --loglevel INFO  --daemon  start historyserver

        tail -f ${HADOOP_HOME}/logs/*historyserver*.log
}

start_hive_metastore() {
    
    

        if [ ! -f ${HIVE_HOME}/formated ];then
                schematool -initSchema -dbType mysql --verbose >  ${HIVE_HOME}/formated
        fi

        $HIVE_HOME/bin/hive --service metastore

}

start_hive_hiveserver2() {
    
    

        $HIVE_HOME/bin/hive --service hiveserver2
}


case $1 in
        hadoop-hdfs-nn)
                start_hdfs_namenode
                ;;
        hadoop-hdfs-dn)
                start_hdfs_datanode $2 $3
                ;;
        hadoop-yarn-rm)
                start_yarn_resourcemanager
                ;;
        hadoop-yarn-nm)
                start_yarn_nodemanager $2 $3
                ;;
        hadoop-yarn-proxyserver)
                start_yarn_proxyserver $2 $3
                ;;
        hadoop-mr-historyserver)
                start_mr_historyserver $2 $3
                ;;
        hive-metastore)
                start_hive_metastore $2 $3
                ;;
        hive-hiveserver2)
                start_hive_hiveserver2 $2 $3
                ;;
        *)
                echo "请输入正确的服务启动命令~"
        ;;
esac

4) Build the mirror Dockerfile

FROM registry.cn-hangzhou.aliyuncs.com/bigdata_cloudnative/hadoop:v1

COPY hive-config/* ${HIVE_HOME}/conf/

COPY bootstrap.sh /opt/apache/

COPY mysql-connector-java-5.1.49/mysql-connector-java-5.1.49-bin.jar ${HIVE_HOME}/lib/

RUN sudo mkdir -p /home/hadoop/ && sudo chown -R hadoop:hadoop /home/hadoop/

#RUN yum -y install which

Start building the image

# 构建镜像
docker build -t registry.cn-hangzhou.aliyuncs.com/bigdata_cloudnative/hadoop_hive:v1 . --no-cache

# 推送镜像(可选)
docker push registry.cn-hangzhou.aliyuncs.com/bigdata_cloudnative/hadoop_hive:v1

### 参数解释
# -t:指定镜像名称
# . :当前目录Dockerfile
# -f:指定Dockerfile路径
#  --no-cache:不缓存

5) Orchestration

version: '3'
services:
  hadoop-hdfs-nn:
    image: registry.cn-hangzhou.aliyuncs.com/bigdata_cloudnative/hadoop_hive:v1
    user: "hadoop:hadoop"
    container_name: hadoop-hdfs-nn
    hostname: hadoop-hdfs-nn
    restart: always
    privileged: true
    env_file:
      - .env
    ports:
      - "30070:${HADOOP_HDFS_NN_PORT}"
    command: ["sh","-c","/opt/apache/bootstrap.sh hadoop-hdfs-nn"]
    networks:
      - hadoop-network
    healthcheck:
      test: ["CMD-SHELL", "curl --fail http://localhost:${HADOOP_HDFS_NN_PORT} || exit 1"]
      interval: 20s
      timeout: 20s
      retries: 3
  hadoop-hdfs-dn-0:
    image: registry.cn-hangzhou.aliyuncs.com/bigdata_cloudnative/hadoop_hive:v1
    user: "hadoop:hadoop"
    container_name: hadoop-hdfs-dn-0
    hostname: hadoop-hdfs-dn-0
    restart: always
    depends_on:
      - hadoop-hdfs-nn
    env_file:
      - .env
    ports:
      - "30864:${HADOOP_HDFS_DN_PORT}"
    command: ["sh","-c","/opt/apache/bootstrap.sh hadoop-hdfs-dn hadoop-hdfs-nn ${HADOOP_HDFS_NN_PORT}"]
    networks:
      - hadoop-network
    healthcheck:
      test: ["CMD-SHELL", "curl --fail http://localhost:${HADOOP_HDFS_DN_PORT} || exit 1"]
      interval: 30s
      timeout: 30s
      retries: 3
  hadoop-hdfs-dn-1:
    image: registry.cn-hangzhou.aliyuncs.com/bigdata_cloudnative/hadoop_hive:v1
    user: "hadoop:hadoop"
    container_name: hadoop-hdfs-dn-1
    hostname: hadoop-hdfs-dn-1
    restart: always
    depends_on:
      - hadoop-hdfs-nn
    env_file:
      - .env
    ports:
      - "30865:${HADOOP_HDFS_DN_PORT}"
    command: ["sh","-c","/opt/apache/bootstrap.sh hadoop-hdfs-dn hadoop-hdfs-nn ${HADOOP_HDFS_NN_PORT}"]
    networks:
      - hadoop-network
    healthcheck:
      test: ["CMD-SHELL", "curl --fail http://localhost:${HADOOP_HDFS_DN_PORT} || exit 1"]
      interval: 30s
      timeout: 30s
      retries: 3
  hadoop-hdfs-dn-2:
    image: registry.cn-hangzhou.aliyuncs.com/bigdata_cloudnative/hadoop_hive:v1
    user: "hadoop:hadoop"
    container_name: hadoop-hdfs-dn-2
    hostname: hadoop-hdfs-dn-2
    restart: always
    depends_on:
      - hadoop-hdfs-nn
    env_file:
      - .env
    ports:
      - "30866:${HADOOP_HDFS_DN_PORT}"
    command: ["sh","-c","/opt/apache/bootstrap.sh hadoop-hdfs-dn hadoop-hdfs-nn ${HADOOP_HDFS_NN_PORT}"]
    networks:
      - hadoop-network
    healthcheck:
      test: ["CMD-SHELL", "curl --fail http://localhost:${HADOOP_HDFS_DN_PORT} || exit 1"]
      interval: 30s
      timeout: 30s
      retries: 3
  hadoop-yarn-rm:
    image: registry.cn-hangzhou.aliyuncs.com/bigdata_cloudnative/hadoop_hive:v1
    user: "hadoop:hadoop"
    container_name: hadoop-yarn-rm
    hostname: hadoop-yarn-rm
    restart: always
    env_file:
      - .env
    ports:
      - "30888:${HADOOP_YARN_RM_PORT}"
    command: ["sh","-c","/opt/apache/bootstrap.sh hadoop-yarn-rm"]
    networks:
      - hadoop-network
    healthcheck:
      test: ["CMD-SHELL", "netstat -tnlp|grep :${HADOOP_YARN_RM_PORT} || exit 1"]
      interval: 20s
      timeout: 20s
      retries: 3
  hadoop-yarn-nm-0:
    image: registry.cn-hangzhou.aliyuncs.com/bigdata_cloudnative/hadoop_hive:v1
    user: "hadoop:hadoop"
    container_name: hadoop-yarn-nm-0
    hostname: hadoop-yarn-nm-0
    restart: always
    depends_on:
      - hadoop-yarn-rm
    env_file:
      - .env
    ports:
      - "30042:${HADOOP_YARN_NM_PORT}"
    command: ["sh","-c","/opt/apache/bootstrap.sh hadoop-yarn-nm hadoop-yarn-rm ${HADOOP_YARN_RM_PORT}"]
    networks:
      - hadoop-network
    healthcheck:
      test: ["CMD-SHELL", "curl --fail http://localhost:${HADOOP_YARN_NM_PORT} || exit 1"]
      interval: 30s
      timeout: 30s
      retries: 3
  hadoop-yarn-nm-1:
    image: registry.cn-hangzhou.aliyuncs.com/bigdata_cloudnative/hadoop_hive:v1
    user: "hadoop:hadoop"
    container_name: hadoop-yarn-nm-1
    hostname: hadoop-yarn-nm-1
    restart: always
    depends_on:
      - hadoop-yarn-rm
    env_file:
      - .env
    ports:
      - "30043:${HADOOP_YARN_NM_PORT}"
    command: ["sh","-c","/opt/apache/bootstrap.sh hadoop-yarn-nm hadoop-yarn-rm ${HADOOP_YARN_RM_PORT}"]
    networks:
      - hadoop-network
    healthcheck:
      test: ["CMD-SHELL", "curl --fail http://localhost:${HADOOP_YARN_NM_PORT} || exit 1"]
      interval: 30s
      timeout: 30s
      retries: 3
  hadoop-yarn-nm-2:
    image: registry.cn-hangzhou.aliyuncs.com/bigdata_cloudnative/hadoop_hive:v1
    user: "hadoop:hadoop"
    container_name: hadoop-yarn-nm-2
    hostname: hadoop-yarn-nm-2
    restart: always
    depends_on:
      - hadoop-yarn-rm
    env_file:
      - .env
    ports:
      - "30044:${HADOOP_YARN_NM_PORT}"
    command: ["sh","-c","/opt/apache/bootstrap.sh hadoop-yarn-nm hadoop-yarn-rm ${HADOOP_YARN_RM_PORT}"]
    networks:
      - hadoop-network
    healthcheck:
      test: ["CMD-SHELL", "curl --fail http://localhost:${HADOOP_YARN_NM_PORT} || exit 1"]
      interval: 30s
      timeout: 30s
      retries: 3
  hadoop-yarn-proxyserver:
    image: registry.cn-hangzhou.aliyuncs.com/bigdata_cloudnative/hadoop_hive:v1
    user: "hadoop:hadoop"
    container_name: hadoop-yarn-proxyserver
    hostname: hadoop-yarn-proxyserver
    restart: always
    depends_on:
      - hadoop-yarn-rm
    env_file:
      - .env
    ports:
      - "30911:${HADOOP_YARN_PROXYSERVER_PORT}"
    command: ["sh","-c","/opt/apache/bootstrap.sh hadoop-yarn-proxyserver hadoop-yarn-rm ${HADOOP_YARN_RM_PORT}"]
    networks:
      - hadoop-network
    healthcheck:
      test: ["CMD-SHELL", "netstat -tnlp|grep :${HADOOP_YARN_PROXYSERVER_PORT} || exit 1"]
      interval: 30s
      timeout: 30s
      retries: 3
  hadoop-mr-historyserver:
    image: registry.cn-hangzhou.aliyuncs.com/bigdata_cloudnative/hadoop_hive:v1
    user: "hadoop:hadoop"
    container_name: hadoop-mr-historyserver
    hostname: hadoop-mr-historyserver
    restart: always
    depends_on:
      - hadoop-yarn-rm
    env_file:
      - .env
    ports:
      - "31988:${HADOOP_MR_HISTORYSERVER_PORT}"
    command: ["sh","-c","/opt/apache/bootstrap.sh hadoop-mr-historyserver hadoop-yarn-rm ${HADOOP_YARN_RM_PORT}"]
    networks:
      - hadoop-network
    healthcheck:
      test: ["CMD-SHELL", "netstat -tnlp|grep :${HADOOP_MR_HISTORYSERVER_PORT} || exit 1"]
      interval: 30s
      timeout: 30s
      retries: 3
  hive-metastore:
    image: registry.cn-hangzhou.aliyuncs.com/bigdata_cloudnative/hadoop_hive:v1
    user: "hadoop:hadoop"
    container_name: hive-metastore
    hostname: hive-metastore
    restart: always
    depends_on:
      - hadoop-hdfs-dn-2
    env_file:
      - .env
    ports:
      - "30983:${HIVE_METASTORE_PORT}"
    command: ["sh","-c","/opt/apache/bootstrap.sh hive-metastore hadoop-hdfs-dn-2 ${HADOOP_HDFS_DN_PORT}"]
    networks:
      - hadoop-network
    healthcheck:
      test: ["CMD-SHELL", "netstat -tnlp|grep :${HIVE_METASTORE_PORT} || exit 1"]
      interval: 30s
      timeout: 30s
      retries: 5
  hive-hiveserver2:
    image: registry.cn-hangzhou.aliyuncs.com/bigdata_cloudnative/hadoop_hive:v1
    user: "hadoop:hadoop"
    container_name: hive-hiveserver2
    hostname: hive-hiveserver2
    restart: always
    depends_on:
      - hive-metastore
    env_file:
      - .env
    ports:
      - "31000:${HIVE_HIVESERVER2_PORT}"
    command: ["sh","-c","/opt/apache/bootstrap.sh hive-hiveserver2 hive-metastore ${HIVE_METASTORE_PORT}"]
    networks:
      - hadoop-network
    healthcheck:
      test: ["CMD-SHELL", "netstat -tnlp|grep :${HIVE_HIVESERVER2_PORT} || exit 1"]
      interval: 30s
      timeout: 30s
      retries: 5

# 连接外部网络
networks:
  hadoop-network:
    external: true

6) Start deployment

docker-compose -f docker-compose.yaml up -d

# 查看
docker-compose -f docker-compose.yaml ps

insert image description here
simple test verification
insert image description here

[Problem] If the following similar errors occur, it is because of multiple startups, the previous data is still there, but the IP of the datanode has changed (the host machine deployment will not have such a problem, because the host machine's IP is fixed ), so you need to refresh the node, of course, you can also clean up the old data. It is not recommended to clean up the old data. It is recommended to use the way to refresh the node (if there is an external mount, like I have no external mount here, it is because before The old container is still there, there are several solutions below):

org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.protocol.DisallowedDatanodeException): Datanode denied communication with namenode because the host is not in the include-list: DatanodeRegistration(172.30.0.12:9866, datanodeUuid=f8188476-4a88-4cd6-836f-769d510929e4, infoPort=9864, infoSecurePort=0, ipcPort=9867, storageInfo=lv=-57;cid=CID-f998d368-222c-4a9a-88a5-85497a82dcac;nsid=1840040096;c=1680661390829)

insert image description here
【solution】

  1. Delete the old container and restart
# 清理旧容器
docker rm `docker ps -a|grep 'Exited'|awk '{print $1}'`

# 重启启动服务
docker-compose -f docker-compose.yaml up -d

# 查看
docker-compose -f docker-compose.yaml ps
  1. Login namenode refresh datanode
docker exec -it hadoop-hdfs-nn hdfs dfsadmin -refreshNodes
  1. Log in to any node to refresh the datanode
# 这里以 hadoop-hdfs-dn-0 为例
docker exec -it hadoop-hdfs-dn-0 hdfs dfsadmin -fs hdfs://hadoop-hdfs-nn:9000 -refreshNodes

At this point, the containerized deployment of Hive is complete. If you have any questions, welcome to leave me a message. I will continue to update relevant technical articles in the future. You can also follow my public account [Big Data and Cloud Native Technology Sharing] for in-depth exchange of technologies Or private message to ask questions~

Guess you like

Origin blog.csdn.net/qq_35745940/article/details/129908728