Small knowledge points: ARM architecture Linux big data cluster basic environment construction (Hadoop, MySQL, Hive, Spark, Flink, ZK, Kafka, Nginx, Node)

  The Mac with the M2 chip is replaced. The previous x86 version of the Linux big data cluster basic environment built on the ARM architecture virtual machine cluster is somewhat useless. Now I am writing a new one based on the ARM architecture. Except for a few incompatibilities, the others are almost the same. Equivalent to the previous version of the update

Part of it is the same as x86
x86 version: Linux virtual machine: Big data cluster basic environment construction (Hadoop, Spark, Flink, Hive, Zookeeper, Kafka, Nginx)

1. Relevant file download address

  • Centos 7 with ARM architecture
  • Java-1.8
    • https://www.oracle.com/java/technologies/downloads/#java8
      • Choose the ARM 64 version, and you can download it by searching an Oracle account online
  • Python-3.9
    • https://www.python.org/ftp/python/3.9.6/Python-3.9.6.tgz
  • Scala-2.12
    • https://www.scala-lang.org/download/2.12.12.html
  • Hadoop-3.2.1
    • http://archive.apache.org/dist/hadoop/common/hadoop-3.2.1/hadoop-3.2.1.tar.gz
  • Spark-3.1.2
    • http://archive.apache.org/dist/spark/spark-3.1.2/spark-3.1.2-bin-hadoop3.2.tgz
  • Flink-1.13.1
    • http://archive.apache.org/dist/flink/flink-1.13.1/flink-1.13.1-bin-scala_2.12.tgz
  • Hive-3.1.3
    • http://archive.apache.org/dist/hive/hive-3.1.3/apache-hive-3.1.3-bin.tar.gz
  • Zookeeper-3.8.0
    • http://archive.apache.org/dist/zookeeper/zookeeper-3.8.0/apache-zookeeper-3.8.0-bin.tar.gz
  • Kafka-3.2.0
    • http://archive.apache.org/dist/kafka/3.2.0/kafka_2.12-3.2.0.tgz
  • Nginx-1.23.1
    • https://nginx.org/download/nginx-1.23.1.tar.gz
  • Node-18.14.1
    • https://nodejs.org/dist/v17.9.1/node-v17.9.1-linux-arm64.tar.gz
  • Doris-1.2.2
    • https://mirrors.tuna.tsinghua.edu.cn/apache/doris/1.2/1.2.2-rc01/apache-doris-fe-1.2.2-bin-arm.tar.xz
    • https://mirrors.tuna.tsinghua.edu.cn/apache/doris/1.2/1.2.2-rc01/apache-doris-be-1.2.2-bin-arm.tar.xz
    • https://mirrors.tuna.tsinghua.edu.cn/apache/doris/1.2/1.2.2-rc01/apache-doris-dependencies-1.2.2-bin-arm.tar.xz

2. Basic environment configuration

  • Create a user and add sudo permissions and password-free
# root 下
useradd -m ac_cluster
passwd ac_cluster
# sudo 权限
sudo vi /etc/sudoers
# 添加下面两条
ac_cluster	ALL=(ALL)	ALL
ac_cluster	ALL=(ALL)	NOPASSWD: ALL
# 保存退出
:wq!
  • Modify static IP
# 编辑该文件
sudo vi /etc/sysconfig/network-scripts/ifcfg-ens160
# 修改参考如下,具体的在安装虚拟机的时候有展示,
BOOTPROTO=static
GATEWAY="192.168.104.2"
IPADDR="192.168.104.101"
NETMASK="255.255.255.0"
DNS1="8.8.8.8"
DNS2="114.114.114.114"
# 重启网络
sudo systemctl restart network
  • Modify yum source
    • The yum source of Alibaba Cloud is used here
# 先下载 wget
sudo yum -y install wget
# 修改阿里云的配置
sudo wget http://mirrors.aliyun.com/repo/Centos-altarch-7.repo -O /etc/yum.repos.d/CentOS-Base.repo
# 清理软件源
yum clean all
# 元数据缓存
yum makecache
# 测试
sudo yum -y install vim
  • modify hostname
# 根据实际修改
vim /etc/hostname
# 重启
reboot
  • turn off firewall
sudo systemctl stop firewalld
sudo systemctl disable firewalld
  • Modify Domain Name Mapping
sudo vim /etc/hosts
  • Configure password-free login between machines
# 输入后三次回车
ssh-keygen -t rsa
# 配置节点
ssh-copy-id hybrid01

insert image description here

  • Configure Time Synchronization
sudo yum -y install ntpdate
sudo ntpdate ntp1.aliyun.com
# 可选,也可以配置自动执行时间同步,根据自己需求来
crontab -e */1 * * * * sudo /usr/sbin/ntpdate ntp1.aliyun.com

3. Language environment configuration

3.1 Java environment installation

  • ARM64-ready Java packages
# 使用 rz -be 上传文件,没有的先下载,或者使用其他工具上传
sudo yum -y install lrzsz
# 上传,选择本地下载好的位置
rz -be
# 解压 Java 包
tar -zxvf jdk-8u361-linux-aarch64.tar.gz -C ../program/
# 配置环境变量
vim ~/.bash_profile
# 添加下面两条
export JAVA_HOME=/xx/xx
export PATH=$JAVA_HOME/bin:$PATH
# 更新环境
source ~/.bash_profile
# 测试
java -version
# 传给其他节点并配置环境变量即可
scp -r jdk1.8.0_361/ ac_cluster@hybrid02:$PWD

insert image description here

3.2 Python environment installation

  • Download the source package, you can use wget to download
# 下载 Python 源码包
wget https://www.python.org/ftp/python/3.9.6/Python-3.9.6.tgz
# 解压
tar -zxvf Python-3.9.6.tgz
# 安装依赖环境
sudo yum -y install unzip net-tools bzip2 zlib-devel bzip2-devel openssl-devel ncurses-devel sqlite-devel readline-devel tk-devel gdbm-devel libpcap-devel xz-devel libglvnd-glx gcc gcc-c++ make
# 预配置
./configure --prefix=/home/ac_cluster/program/python3.9.6 
# 编译安装
make && make install
# 配置环境变量或者软链接到 /usr/bin 中
sudo ln -s /home/ac_cluster/program/python3.9.6/bin/python3.9 /usr/bin/python3
sudo ln -s /home/ac_cluster/program/python3.9.6/bin/pip3.9 /usr/bin/pip3
# 测试
python3 --version

3.3 Scala environment installation

  • Download related packages
# 上传,也可以使用其他方式上传到虚拟机
rz -be
# 解压
tar -zxvf scala-2.12.10.tgz -C ../program/
# 配置环境
vim ~/.bash_profile
# 添加下面两条
export SCALA_HOME=/home/ac_cluster/program/scala-2.12.10
export PATH=$PATH:$SCALA_HOME/bin
# 更新环境
source ~/.bash_profile
# 测试
scala -version

insert image description here

4. Big data component installation

4.1 Hadoop cluster installation

  • Download the relevant packages and upload the virtual machine
# 上传
rz -be
# 解压
tar -zxvf hadoop-3.2.1.tar.gz -C ../program/
# 进入配置目录
cd ../program/hadoop-3.2.1/etc/hadoop
# 修改 hadoop-env.sh,添加下面的内容
export JAVA_HOME=/home/ac_cluster/program/jdk1.8.0_361
  • Modify core-site.xml
<configuration>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://hybrid01:9000</value>
    </property>
    <property>
        <name>hadoop.tmp.dir</name>
        <value>/home/ac_cluster/runtime/hadoop_repo</value>
    </property>
</configuration>
  • Modify hdfs-site.xml
<configuration>
    <property>
        <name>dfs.replication</name>
        <value>2</value>
    </property>
    <property>
        <name>dfs.namenode.secondary.http-address</name>
        <value>hybrid01:50090</value>
    </property>
</configuration>
  • Modify mapred-site.xml
<configuration>
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
</configuration>
  • Modify yarn-site.xml
<configuration>
  <property>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce_shuffle</value>
  </property>
  <property>
    <name>yarn.nodemanager.env-whitelist</name>
    <value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value>
  </property>
  <property>
    <name>yarn.resourcemanager.hostname</name>
    <value>hybrid02</value>
  </property>
</configuration>
  • Modify workers
    • Add relevant datanode node hostname
  • Modify hdfs, yarn startup script
# start-dfs.sh、stop-dfs.sh
HDFS_DATANODE_USER=root
HDFS_DATANODE_SECURE_USER=hdfs
HDFS_NAMENODE_USER=root
HDFS_SECONDARYNAMENODE_USER=root
# start-yarn.sh、stop-yarn.sh
YARN_RESOURCEMANAGER_USER=root
HADOOP_SECURE_DN_USER=yarn
YARN_NODEMANAGER_USER=root
  • Environment configuration
# 根据实际情况修改配置文件
vim ~/.bash_profile
# 添加下面的配置
export HADOOP_HOME=/home/ac_cluster/program/hadoop-3.2.1
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
export HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop
export HADOOP_CLASSPATH=`hadoop classpath`
# 更新环境
source ~/.bash_profile
  • remaining operations
# 分发各节点,根据自己实际情况操作
scp -r hadoop-3.2.1/ ac_cluster@hybrid02:$PWD
scp -r hadoop-3.2.1/ ac_cluster@hybrid03:$PWD
# 格式化 namenode
hdfs namenode -format
# hdfs 启动
start-dfs.sh
# yarn 启动(进入部署 resourcemanager 的节点进行部署)
start-yarn.sh

insert image description here
insert image description here

4.2 Docker installation

# 依赖安装
sudo yum -y install yum-utils device-mapper-persistent-data lvm2
# 配置 docker yum 源
sudo yum-config-manager --add-repo https://download.docker.com/linux/centos/docker-ce.repo
# 下载 docker
sudo yum -y install docker-ce docker-ce-cli containerd.io
# 启动 docker
sudo service docker start
# 如果直接用不了,需要 sudo 的话可以直接去掉
sudo gpasswd -a ac_cluster(你的用户名) docker
newgrp docker

4.3 MySQL installation

  • MySQL is installed with version 5.7, which is installed based on docker
    • At present, only 8 and above support ARM 64
    • Prepare the mounted directories conf and data in advance
      insert image description here
# 拉取镜像
docker pull mysql:8
# 启动容器
docker run -d --name=mysql -p 3306:3306 --restart=always --privileged=true -v /home/ac_cluster/docker/mysql/data:/var/lib/mysql -v /home/ac_cluster/docker/mysql/conf:/etc/mysql/conf.d -e MYSQL_ROOT_PASSWORD=123456 mysql:8 --character-set-server=utf8mb4 --collation-server=utf8mb4_unicode_ci
# 下载驱动留着备用
wget http://ftp.ntu.edu.tw/MySQL/Downloads/Connector-J/mysql-connector-java-8.0.26.tar.gz

4.4 Hive installation

  • Download related packages and upload
# 上传
rz -be
# 解压
tar -zxvf apache-hive-3.1.3-bin.tar.gz -C ../program/
# 进入配置目录
cd ../program/apache-hive-3.1.3-bin/conf/
# 修改 hive-env.sh
cp hive-env.sh.template hive-env.sh
vim hive-env.sh
# 添加下面的内容
HADOOP_HOME=/home/ac_cluster/program/hadoop-3.2.1
export HIVE_CONF_DIR=/home/ac_cluster/program/apache-hive-3.1.3-bin/conf
# 修改 hive-site.xml
vim hive-site.xml
  • hive-site.xml configuration
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>
  <!-- jdbc 连接的 URL -->
  <property>
    <name>javax.jdo.option.ConnectionURL</name>
    <value>jdbc:mysql://hybrid03:3306/hive?createDatabaseIfNotExist=true&amp;useSSL=false</value>
  </property>
  <!-- jdbc 连接的 Driver-->
  <property>
    <name>javax.jdo.option.ConnectionDriverName</name>
    <value>com.mysql.cj.jdbc.Driver</value>
  </property>
  <!-- jdbc 连接的 username-->
  <property>
    <name>javax.jdo.option.ConnectionUserName</name>
    <value>root</value>
  </property>
  <!-- jdbc 连接的 password -->
  <property>
    <name>javax.jdo.option.ConnectionPassword</name>
    <value>123456</value>
  </property>
  <!-- Hive 默认在 HDFS 的工作目录 -->
  <property>
    <name>hive.metastore.warehouse.dir</name>
    <value>/user/hive/warehouse</value>
  </property>
  <!-- 指定存储元数据要连接的地址 -->
  <property>
    <name>hive.metastore.uris</name>
    <value>thrift://hybrid03:9083</value>
  </property>
</configuration>
  • other operations
# 进入 hive 目录
cd ~/program/apache-hive-3.1.3-bin
# 将 MySQL 启动复制一份到 lib 下
cp ~/package/mysql-connector-java-8.0.26/mysql-connector-java-8.0.26.jar ./lib/
# guava 包问题,将 Hadoop 下的 guava 包复制到 lib 下
cp ~/program/hadoop-3.2.1/share/hadoop/common/lib/guava-27.0-jre.jar ./lib/
# 备份
mv ./lib/guava-19.0.jar ./lib/guava-19.0.jar.bak
# 初始化数据库
bin/schematool -dbType mysql -initSchema
# 配置环境
vim ~/.bash_profile
# 添加下面的配置
export HIVE_HOME=/home/ac_cluster/program/apache-hive-3.1.3-bin
export PATH=$PATH:$HIVE_HOME/bin
# 刷新环境
source ~/.bash_profile
# 启动服务
nohup hive --service metastore &
# 启动交互
hive

4.5 Spark installation

  • Usually Spark is used to run tasks based on Yarn, here is just a simple decompression, no cluster installation
# 上传
rz -be
# 解压
tar -zxvf spark-3.1.2-bin-hadoop3.2.tgz -C ../program/
# 配置环境
vim ~/.bash_profile
# 添加下面配置
export SPARK_HOME=/home/ac_cluster/program/spark-3.1.2-bin-hadoop3.2
export PATH=$PATH:$SPARK_HOME/bin
# 刷新
source ~/.bash_profile
  • Test sample yarn submission
    • You can see related information in Hadoop Yarn's 8088 port page
# 进入样例 jar 包所在位置
cd ~/program/spark-3.1.2-bin-hadoop3.2/examples/jars
# 提交任务
spark-submit --master yarn --class org.apache.spark.examples.SparkPi spark-examples_2.12-3.1.2.jar 100

insert image description here
insert image description here

  • Configure the Hive environment
    • Put Hive's hive-site.xml under Spark's conf
    • Put the MySQL driver into the Spark jars
    • test
      • spark-shell
      • spark.sql(“show databases”).show()

insert image description here

4.6 Flink installation

  • Flink, like Spark, runs on Yarn without installing a cluster
  • Download related packages and upload
# 上传
rz -be
# 解压
tar -zxvf flink-1.13.1-bin-scala_2.12.tgz -C ../program/
# 配置环境
vim ~/.bash_profile
# 添加下面的配置
export FLINK_HOME=/home/ac_cluster/program/flink-1.13.1
export PATH=$PATH:$FLINK_HOME/bin
# 刷新环境
source ~/.bash_profile
  • Test sample yarn submission
    • You can see related information in Hadoop Yarn's 8088 port page
# 进入样例目录
cd ~/program/flink-1.13.1/examples/batch
# 提交任务
flink run -m yarn-cluster WordCount.jar

insert image description here
insert image description here

4.7 Zookeeper installation

  • Download the corresponding package upload
# 上传
rz -be
# 解压
tar -zxvf apache-zookeeper-3.8.0-bin.tar.gz -C ../program/
# 创建数据目录和日志目录(所有节点都要操作)
mkdir -p ~/runtime/zookeeper/data ~/runtime/zookeeper/logs
# 进入配置目录
cd ~/program/apache-zookeeper-3.8.0-bin/conf
# 修改 zoo.cfg
cp zoo_sample.cfg zoo.cfg
vim zoo.cfg
# 修改下面的配置
dataDir=/home/ac_cluster/runtime/zookeeper/data
dataLogDir=/home/ac_cluster/runtime/zookeeper/logs
server.1=hybrid01:2888:3888
server.2=hybrid02:2888:3888
server.3=hybrid03:2888:3888
# 分发节点
scp -r apache-zookeeper-3.8.0-bin/ ac_cluster@hybrid02:$PWD
# 在 dataDir 目录下添加 myid 并赋值,其他节点同样操作,不同 id
echo 1 > ~/runtime/zookeeper/data/myid
# 配置环境(三个节点都要配)
vim ~/.bash_profile
# 添加下面的配置
export ZOOKEEPER_HOME=/home/ac_cluster/program/apache-zookeeper-3.8.0-bin
export PATH=$PATH:$ZOOKEEPER_HOME/bin
# 刷新
source ~/.bash_profile
# 三个节点逐个启动
zkServer.sh start
# 查看状态
zkServer.sh status

4.8 Kafka installation

  • Download the corresponding package and upload
# 上传
rz -be
# 解压
tar -zxvf kafka_2.12-3.2.0.tgz -C ../program/
# 创建 log 目录
mkdir -p ~/runtime/kafka/logs
# 进入 conf 目录修改 server.properties
cd ~/program/kafka_2.12-3.2.0/config/
vim server.properties
# 查看并修改下面的配置,id 必须唯一,传给其他节点需要修改
broker.id=0
log.dirs=/home/ac_cluster/runtime/kafka/logs
zookeeper.connect=hybrido1:2181,hybrid02:2181,hybrid03:2181
listeners=PLAINTEXT://192.168.104.101:9092
advertised.listeners=PLAINTEXT://192.168.104.101:9092
# 配置环境
vim ~/.bash_profile
# 添加下面的配置
export KAFKA_HOME=/home/ac_cluster/program/kafka_2.12-3.2.0
export PATH=$PATH:$KAFKA_HOME/bin
# 刷新
source ~/.bash_profile
# 分发节点,并修改相关配置,配置环境变量
scp -r kafka_2.12-3.2.0/ ac_cluster@hybrid02:$PWD
scp -r kafka_2.12-3.2.0/ ac_cluster@hybrid03:$PWD
# 逐个启动节点,如果有报错,查看下方解决方案
nohup kafka-server-start.sh $KAFKA_HOME/config/server.properties > ~/runtime/kafka/logs/nohup.log 2>&1 &

4.9 Nginx installation

  • Download the corresponding package and upload it, or use wget to download
# 环境依赖安装
sudo yum -y install gcc zlib zlib-devel pcre-devel openssl openssl-devel
# wget 下载包
wget https://nginx.org/download/nginx-1.22.1.tar.gz
# 解压
tar -zxvf nginx-1.22.1.tar.gz
# 创建配置目录
mkdir ~/program/nginx-1.22.1
# 预编译,一般只需要配置 --prefix,其他的可选安装
./configure --prefix=/home/ac_cluster/program/nginx-1.22.1 --with-http_stub_status_module --with-http_ssl_module --with-http_realip_module --with-stream --with-http_auth_request_module
# 编译安装
make && make install
# 配置环境
vim ~/.bash_profile
# 添加下面的配置
export NGINX_HOME=/home/ac_cluster/program/nginx-1.22.1
export PATH=$PATH:$NGINX_HOME/sbin
# 刷新
source ~/.bash_profile
# 启动,启动报错查看下方解决方案
nginx

insert image description here

4.10 Node installation

  • Download related packages and upload
# 使用 wget 下载
wget https://nodejs.org/dist/v17.9.1/node-v17.9.1-linux-arm64.tar.gz
# 解压
tar -zxvf node-v17.9.1-linux-arm64.tar.gz -C ../program/
# 配置环境变量
vim ~/.bash_profile
# 添加下面的配置
export NODE_HOME=/home/ac_cluster/program/node-v17.9.1-linux-arm64
export PATH=$PATH:$NODE_HOME/bin
# 刷新
source ~/.bash_profile
# 查看版本
node -v
npm -v
# 安装淘宝镜像
npm install -g cnpm --registry=https://registry.npm.taobao.org
# 查看
cnpm -v
# 安装 vue-cli 脚手架
cnpm install -g vue-cli
# 查看
vue -V

insert image description here

4.11 Doris installation

Continuously updating~~~

5. Problem solving

5.1 Environment configuration error, lost command

  • Add temporary command
export PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin

5.2 start-yarn.sh cannot start resourcemanager

  • View log error java.net.BindException: Port in use: hybrid02:8088
  • Solution: Enter the node deployed by resourcemanager to start
    - start-yarn.sh

5.3 mysql remote connection error Public Key Retrieval is not allowed

  • I am using DBeaver, modify the configuration to true in the driver properties

insert image description here

5.4 Flink submits tasks and reports error classloader.check-leaked-classloader

Exception in thread “Thread-5” java.lang.IllegalStateException: Trying to access closed classloader. Please check if you store classloaders directly or indirectly in static fields. If the stacktrace suggests that the leak occurs in a third party library and cannot be fixed immediately, you can disable this check with the configuration ‘classloader.check-leaked-classloader’.
insert image description here

  • solve
    • Edit flink-conf.yaml under the flink directory conf
      • 添加:classloader.check-leaked-classloader: false

5.5 Kafka startup error UseG1GC

VM option ‘UseG1GC’ is experimental and must be enabled via -XX:+UnlockExperimentalVMOptions
insert image description here

  • solve
    • Enter the bin directory under the Kafka installation directory, modify kafka-run-class.sh, delete the configuration in the figure below, and restart

insert image description here

5.6 Nginx startup error report Permission denied

nginx: [emerg] bind() to 0.0.0.0:80 failed (13: Permission denied)
sudo: nginx: command not found
insert image description here

  • Ordinary users start Nginx without permission to start port 80, and the command cannot be found after sudo
    • Solution: Go directly to the directory and use sudo to start
      • sudo sbin/nginx

5.7 Nginx 403 Forbidden

insert image description here- 403 is reported when the browser accesses nginx
- Solution: modify the nginx.conf configuration file, add user root configuration
- save and reload the configuration sudo sbin/nginx -s reload

insert image description here

5.8 Node reports error GLIBC_2.27 not found

node: /lib64/libm.so.6: version ‘GLIBC_2.27’ not found (required by node)
node: /lib64/libc.so.6: version ‘GLIBC_2.28’ not found (required by node)

  • After Node 18 version, GLIBC has been improved to 2.28 or later
    • Solution: Reduce the Node version or increase the GLIBC version. It is recommended to reduce the Node version to version 17. GLIBC is not easy to handle

insert image description here

Guess you like

Origin blog.csdn.net/baidu_40468340/article/details/129095396