Basic information: Centos-7.9, Java-1.8, Python-3.9, Scala-2.12, Hadoop-3.2.1, Spark-3.1.2, Flink-1.13.1, Hive-3.1.3, Zookeeper-3.8.0, Kafka -3.2.0, Nginx-1.23.1
Table of contents
All installation configurations are based on personal learning configurations, please specify the configurations for production environment installation
1. Relevant file download address
- Cents-7.9
- http://mirrors.aliyun.com/centos/7.9.2009/isos/x86_64
- Java-1.8
- https://www.oracle.com/java/technologies/downloads/#java8
- Python-3.9
- https://www.python.org/ftp/python/3.9.6/Python-3.9.6.tgz
- Scala-2.12
- https://www.scala-lang.org/download/2.12.12.html
- Hadoop-3.2.1
- http://archive.apache.org/dist/hadoop/common/hadoop-3.2.1/hadoop-3.2.1.tar.gz
- Spark-3.1.2
- http://archive.apache.org/dist/spark/spark-3.1.2/spark-3.1.2-bin-hadoop3.2.tgz
- Flink-1.13.1
- http://archive.apache.org/dist/flink/flink-1.13.1/flink-1.13.1-bin-scala_2.12.tgz
- Hive-3.1.3
- http://archive.apache.org/dist/hive/hive-3.1.3/apache-hive-3.1.3-bin.tar.gz
- Zookeeper-3.8.0
- http://archive.apache.org/dist/zookeeper/zookeeper-3.8.0/apache-zookeeper-3.8.0-bin.tar.gz
- Kafka-3.2.0
- http://archive.apache.org/dist/kafka/3.2.0/kafka_2.12-3.2.0.tgz
- Nginx-1.23.1
- https://nginx.org/download/nginx-1.23.1.tar.gz
2. Virtual machine basic configuration
- Modify static IP
- vi /etc/sysconfig/network-scripts/ifcfg-eth0
- Restart the network after modification
- systemctl restart network
- Relevant configuration is modified according to your own machine
BOOTPROTO="static"
ONBOOT="yes"
GATEWAY="10.211.55.1"
IPADDR="10.211.55.101"
NETMASK="255.255.255.0"
DNS1="114.114.114.114"
DNS2="8.8.8.8"
- create user
- create
- useradd -m ac_cluster
- password
- passwd ac_cluster
- sudo permissions
- vi /etc/sudoers
- Add corresponding user data under the root configuration
- create
- Modify yum source
- configuration location
- /etc/yum.repos.d
- download wget
- sudo yum -y install wget
- Get the repo file
- wget http://mirrors.aliyun.com/repo/Centos-7.repo
- Backup the original repo file
- mv CentOS-Base.repo CentOS-Base.repo.bak
- change name
- mv Centos-7.repo CentOS-Base.repo
- to refresh
- yum clean all
- yum makecache
- configuration location
- download vim
- yum -y install vim
- modify hostname
- vim /etc/hostname
- reboot
- turn off firewall
- systemctl stop firewalld
- systemctl disable firewalld
- Modify Domain Name Mapping
- vim /etc/hosts
- Configure ssh password-free
- ssh-keygen-t rsa
- Enter three times
- ssh-copy-id hybrid01
- According to the modification of the sub-node configuration, there are several sub-nodes to execute several times
- ssh-keygen-t rsa
- Configure Time Synchronization
- yum -y install ntpdate
- ntpdate ntp1.aliyun.com
- Can be configured to automatically perform time synchronization
- crontab -e */1 * * * * sudo /usr/sbin/ntpdate ntp1.aliyun.com
3. Language environment installation
1. Java environment installation
- After downloading the installation package, extract it to the specified directory
- tar -zxvf xxx -C /xx/xx
- wget one-click installation
- wget --no-check-certificate --no-cookies --header “Cookie: oraclelicense=accept-securebackup-cookie” http://download.oracle.com/otn-pub/java/jdk/8u131-b11/d54c1d3a095b4ff2b6607d096fa80163/jdk-8u131-linux-x64.tar.gz
- Environment configuration
- /etc/profile or ~/.bash_profile in the user directory
- After changing, remember to source
export JAVA_HOME=/xx/xx
export PATH=$JAVA_HOME/bin:$PATH
2. Python environment installation
- Download source package or wget download
- wget https://www.python.org/ftp/python/3.9.6/Python-3.9.6.tgz
- Unzip to the specified directory
- tar -zxvf xxx -C /xx/xx
- Depend on environment installation
- sudo yum -y install vim unzip net-tools && sudo yum -y install wget && sudo yum -y install bzip2 && sudo yum -y install zlib-devel bzip2-devel openssl-devel ncurses-devel sqlite-devel readline-devel tk-devel gdbm-devel db4-devel libpcap-devel xz-devel && sudo yum -y install libglvnd-glx && sudo yum -y install gcc gcc-c++
- pre-configured
- ./configure --prefix=/xxx/program/python3
- Compile and install
- make && make install
- Configure environment variables or put the python3 soft link in /usr/bin
3. Scala environment installation
- After downloading the installation package, extract it to the specified directory
- tar -zxvf xxx -C /xx/xx
- Environment configuration
- /etc/profile or ~/.bash_profile in the user directory
- After changing, remember to source
export SCALA_HOME=/xx/xx
export PATH=$SCALA_HOME/bin:$PATH
4. Big data component installation
1. Hadoop cluster installation
- decompress
- tar -zxvf xx -C /xx/xx
- Enter the Hadoop directory to modify the files under etc/hadoop
- Modify hadoop-env.sh
- export JAVA_HOME=/xxx
- Modify core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://hybrid01:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/xxx/runtime/hadoop_repo</value>
</property>
</configuration>
- Modify hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>hybrid01:50090</value>
</property>
</configuration>
- Modify mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
- Modify yarn-site.xml
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.env-whitelist</name>
<value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value>
</property>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>hybrid02</value>
</property>
</configuration>
- Modify the workers file configuration datanode
- Write the hostname of each datanode node
- Modify the hdfs startup script: start-dfs.sh, stop-dfs.sh
- Add the following content
HDFS_DATANODE_USER=root
HDFS_DATANODE_SECURE_USER=hdfs
HDFS_NAMENODE_USER=root
HDFS_SECONDARYNAMENODE_USER=root
- Modify yarn start and stop scripts: start-yarn.sh, stop-yarn.sh
- Add the following content
YARN_RESOURCEMANAGER_USER=root
HADOOP_SECURE_DN_USER=yarn
YARN_NODEMANAGER_USER=root
- Environment configuration
- /etc/profile or ~/.bash_profile in the user directory
- After changing, remember to source
HADOOP_HOME=/xxx/hadoop-3.2.1
export PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH
export HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop
export HADOOP_CLASSPATH=`hadoop classpath`
- Distribute each node
- scp -r hadoop-xxx user@hybrid01:$PWD
- format namenode
- hdfs namenode -format
- Start the cluster
- direct boot
- start-all.sh
- Daemon starts
- hadoop-daemons.sh start/stop namenode/datanode/secondarynamenode
- start yarn
- start-yarn.sh
- direct boot
2. MySQL installation
Other components will use MySQL for configuration. Here, install MySQL first, mainly using Docker to install, and I am too lazy to install the installation package.
- Docker installation
- sudo yum -y install yum-utils device-mapper-persistent-data lvm2
- sudo yum-config-manager --add-repo https://download.docker.com/linux/centos/docker-ce.repo
- sudo yum -y install docker-ce docker-ce-cli containerd.io
- sudo service docker start
- systemctl enable docker
- If you can’t use it directly, you can remove it directly if you need sudo
- sudo gpasswd -a your username docker
- newgrp docker
- MySQL installation
- Create a mount directory: data, conf
- docker pull mysql:5.7
- docker run -d --name=mysql -p 3306:3306 --restart=always --privileged=true -v /xxx/metadata/docker/mysql/data:/var/lib/mysql -v /xxx/metadata/docker/mysql/conf:/etc/mysql/conf.d -e MYSQL_ROOT_PASSWORD=123456 mysql:5.7
- Add remote login (Docker seems to come with it)
- Create a remote user: CREATE USER 'root'@'%' IDENTIFIED WITH mysql_native_password BY '123456';
- Open permission: GRANT ALL PRIVILEGES ON . TO 'root'@'%';
- Modify byte encoding
- alter database <database name> character set utf8;
- reboot
- MySQL driver download
- wget http://ftp.ntu.edu.tw/MySQL/Downloads/Connector-J/mysql-connector-java-5.1.48.tar.gz
3. Spark installation
Here I did not directly install the Spark cluster, usually submit it to Hadoop Yarn for execution, here just unzip the configuration environment variables
- decompress
- tar -zxvf xxx -C /xx/xx
- Environment configuration
- /etc/profile or ~/.bash_profile in the user directory
- After changing, remember to source
SPARK_HOME=/xxx/xx
export PATH=$SPARK_HOME/bin:$PATH
- Configure the Hive environment
- Put Hive to hive-site.xml under Spark to conf
- Copy the MySQL driver to Spark into jars
- test
- start spark-shell
- spark.sql(“show databases”).show()
4. Flink installation
- decompress
- tar -zxvf flink-xxx -C /xx/xx
- Modify flink-conf.yaml
- Modify localhost as the hostname
- Modify the workers hostname
- Modify the masters host name
- distribution cluster
- scp -r flink-xxx user:hybrid01:$PWD
- Environment configuration
- /etc/profile or ~/.bash_profile in the user directory
- After changing, remember to source
FLINK_HOME=/xxx/xx
export PATH=$FLINK_HOME/bin:$PATH
- start up
- start-cluster.sh
5. Hive installation
- decompress
- tar -zxvf hive-xxx -C /xx/xx
- hive-env.sh
- HADOOP_HOME=/xx/hadoop
- export HIVE_CONF_DIR=/xx/hive/conf
- hive-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<!-- jdbc 连接的 URL -->
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://hybrid03:3306/hive?createDatabaseIfNotExist=true&amp;useSSL=false</value>
</property>
<!-- jdbc 连接的 Driver-->
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
</property>
<!-- jdbc 连接的 username-->
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>root</value>
</property>
<!-- jdbc 连接的 password -->
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>123456</value>
</property>
<!-- Hive 默认在 HDFS 的工作目录 -->
<property>
<name>hive.metastore.warehouse.dir</name>
<value>/user/hive/warehouse</value>
</property>
<!-- 指定存储元数据要连接的地址 -->
<property>
<name>hive.metastore.uris</name>
<value>thrift://hybrid03:9083</value>
</property>
</configuration>
- MySQL driver
- Copy the driver downloaded when installing MySQL to lib
- guava package problem
- Copy guava-27.0-jre.jar from share/hadoop/commons/lib under Hadoop to lib under Hive
- Initialize the database
- schematool -dbType mysql -initSchema
- Environment configuration
- /etc/profile or ~/.bash_profile in the user directory
- After changing, remember to source
HIVE_HOME=/xxx/xx
export PATH=$HIVE_HOME/bin:$PATH
- start service
- nohup hive --service metastore &
- start interaction
- hive
6. Zookeeper installation
The subsequent operations are similar, so I won’t write them in detail.
- decompress
- tar -zxvf xxx -C /xx/xx
- zoo.cfg
dataDir=/acware/data/zookeeper
dataLogDir=/acware/logs/zookeeper
server.1=hybrid01:2888:3888
server.2=hybrid02:2888:3888
server.3=hybrid03:2888:3888
- Create data and log directories
- Create and assign myid under the dataDir directory
- Distribute the package to each node and create the corresponding directory and myid
- Configuration Environment
- start up
- zkServer.sh start
7. Kafka installation
- decompress
- Configuration Environment
- Create log directory
- Modify server.properties
#broker的全局唯一编号,不能重复
broker.id=0
#kafka运行日志存放的路径
log.dirs=log.dirs=/acware/logs/kafka
#配置连接Zookeeper集群地址
zookeeper.connect=hybrido1:2181,hybrid02:2181,hybrid03:2181
- Distribution cluster, modify borker.id
- start up
- kafka-server-start.sh $KAFKA_HOME/conf/server.properties
- Notice
- You need to set delete.topic.enable=true in server.properties to completely delete the Topic, otherwise it is just marked for deletion
8. Nginx installation
- Environment dependent configuration
- yum -y install gcc zlib zlib-devel pcre-devel openssl openssl-devel
- Official website download package or wget download
- wget https://nginx.org/download/nginx-1.22.1.tar.gz
- Precompiled
- I personally use other module accessories, and generally only need --prefix for installation
- ./configure --prefix=/xxx/nginx-1.22.1 --with-openssl=/xxxopenssl --with-http_stub_status_module --with-http_ssl_module --with-http_realip_module --with-stream --with-http_auth_request_module
- Compile and install
- make && make install
- Configuration Environment
- start up
- nginx
5. Problems in the process
1. The environment configuration is wrong and the command is lost
- Add temporary command
- export PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin