park3.0.1 is mainly built for Hadoop3.2, Mysql, Hbase2.3.3, Hive3.1.2, ZooKeeper3.5.5, Flume, Kafka, Redis

​In a non-production environment, use a newer version, and step on the pits in advance. The selection of the version is really a headache. Let's take a look at the test chart on the official website of apache :

Pseudo-distributed look here :

Before configuration: If you use pseudo-distributed, you must generate key-gen and ssh-copy-id to the local machine, and you must add the local machine name of 127.0.0.1 to the hosts and close the firewall. Otherwise, it will report

ryan.pub: ssh: connect to host ryan.pub port 22: No route to host

ryan.pub: Warning: Permanently added 'ryan.pub' (ECDSA) to the list of known hosts.

Choose Spark first : 3.0.1

Corresponding Hadoop : Choose one of 3.2 and 2.7. Based on the above picture, 2.7 cannot use HBase, so you can only choose 3.2

#hadoop software:

http://mirror.bit.edu.cn/apache/hadoop/common/hadoop-3.2.1/hadoop-3.2.1-src.tar.gz

#spark software:

http://archive.apache.org/dist/spark/spark-3.0.1/spark-3.0.1-bin-hadoop3.2.tgz

#spark source code

http://archive.apache.org/dist/spark/spark-3.0.1/spark-3.0.1.tgz

#hadoop source code

http://mirror.bit.edu.cn/apache/hadoop/common/hadoop-3.2.1/hadoop-3.2.1.tar.gz

HBase:2.3.3

http://archive.apache.org/dist/hbase/2.3.3/hbase-2.3.3-bin.tar.gz

Hive: 3.1.2

http://archive.apache.org/dist/hive/hive-3.1.2/apache-hive-3.1.2-bin.tar.gz

ZooKeeper: 3.5.5

http://archive.apache.org/dist/zookeeper/zookeeper-3.5.5/apache-zookeeper-3.5.5-bin.tar.gz

Kafka:2.6-scala2.12

http://mirror.bit.edu.cn/apache/kafka/2.6.0/kafka_2.12-2.6.0.tgz

Flume:1.9

http://mirror.bit.edu.cn/apache/flume/1.9.0/apache-flume-1.9.0-bin.tar.gz

Transfer all installation packages to linux01 at one time and start configuration

Cluster environment configuration:

hostname/IP spark hadoop mysql hbase hive zookeeper flume kafka redis
linux01.pub/192.168.10.10 1 1 1 1 1
linux02.pub/192.168.10.11 1 1 1
linux03.pub/192.168.10.12 1 1 1

linux04.pub/192.168.10.13 1 1 1 1 1 1 1
linux05.pub/192.168.10.14 1 1 1 1 1 1 1
linux06.pub/192.168.10.15 1 1 1 1 1 1 1

1. First install mysql on linux01

Do remember to delete all Mysql or Mariadb on the machine before installation

 
 
  1. #!/bin/bash

  2. service mysql stop 2>/dev/null

  3. service mysqld stop 2>/dev/null

  4. rpm -qa | grep -i mysql | xargs -n1 rpm -e --nodeps 2>/dev/null

  5. rpm -qa | grep -i mariadb | xargs -n1 rpm -e --nodeps 2>/dev/null

  6. rm -rf /var/lib/mysql

  7. rm -rf /usr/lib64/mysql

  8. rm -rf /etc/my.cnf

  9. rm -rf /usr/my.cnf

Directly refer to this article written before, and will not repeat it

Installation under my-sql centos7_pub.ryan's Blog-CSDN Blog

Check, whether mysql is installed successfully, you can use netstat, if not, you can use the following command to install

# Install network tools

yum install -y net-tools

# View port or program

netstat -nltp |grep mysqld #or 3306

2. Start to install Spark: 3.0.1 and Hadoop3.2.1 ecology

I wrote an article about Hadoop3.1.1 before: Quickly build a Hadoop cluster environment_windows deploy hadoop hbase_pub.ryan's blog-CSDN Blog

Just to be safe, do it all over again

2.1 Start to install Hadoop3.2.1

HDFS is the foundation of everything, so configure it on all machines: namenode: linux01.pub secondary namenode: linux02.pub datanade: linux01~06.pub

# unzip

tar -zxf hadoop-3.2.1.tar.gz  -C /opt/apps/

2.1.1 Configure environment variables, add paths and login users:

vim /etc/profile

 
 
  1. # hadoop 3.2.1配置

  2. export HADOOP_HOME=/opt/apps/hadoop-3.2.1/

  3. export HADOOP_HDFS_HOME=$HADOOP_HOME

  4. export HADOOP_YARN_HOME=$HADOOP_HOME

  5. export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop

  6. export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

  7. export HDFS_DATANODE_USER=root

  8. export HDFS_DATANODE_SECURE_USER=root

  9. export HDFS_NAMENODE_USER=root

  10. export HDFS_SECONDARYNAMENODE_USER=root

  11. export YARN_RESOURCEMANAGER_USER=root

  12. export YARN_NODEMANAGER_USER=root

source /etc/profile

hadoop version


Create directories: temporary file directory, HDFS metadata directory, and HDFS data storage directory. All directories under opt will be distributed to 1-6 hosts in the future, so they will be created under opt

mkdir -p /opt/data/hdfs/name /opt/data/hdfs/data /opt/log/hdfs /opt/tmp

Switch to the configuration file directory and start configuring hadoop

cd /opt/apps/hadoop-3.2.1/etc/hadoop

core-site.xml core configuration file
dfs-site.xml hdfs storage related configuration
apred-site.xml MapReduce related configuration
arn-site.xml Some configurations related to yarn
workers Used to specify the slave node, the default in the file is localhost
hadoop-env.sh Configure hadoop related variables

First modify hadoop-env.sh and add java_home variables to prevent errors:

export JAVA_HOME=/home/apps/jdk1.8.0_212
export HDFS_NAMENODE_USER=root
export HDFS_DATANODE_USER=root
export HDFS_SECONDARYNAMENODE_USER=root
export YARN_RESOURCEMANAGER_USER=root
export YARN_NODEMANAGER_USER=root

2.1.2 Start to configure core: core-site.xml

 
 
  1. <configuration>

  2. <property>

  3. <name>fs.defaultFS</name>

  4. <value>hdfs://linux01.pub:9000</value>

  5. <description>指定HDFS默认使用的文件系统</description>

  6. </property>

  7. <property>

  8. <name>hadoop.tmp.dir</name>

  9. <value>/opt/tmp/hdfs</value>

  10. <description>指定HDFS临时文件位置</description>

  11. </property>

  12. </configuration>

2.1.3 Configure HDFS: hdfs-site.xml

Specify alternate addresses, number of replicas, metadata, data location, and web access

 
 
  1. <configuration>

  2. <property>

  3. <name>dfs.namenode.secondary.http-address</name>

  4. <value>linux02.pub:50090</value>

  5. <description>HDFS备用namenode位置</description>

  6. </property>

  7. <property>

  8. <name>dfs.replication</name>

  9. <value>3</value>

  10. <description>HDFS数据副本数</description>

  11. </property>

  12. <property>

  13. <name>dfs.namenode.name.dir</name>

  14. <value>/opt/data/hdfs/name</value>

  15. <final>true</final>

  16. <description>namenode元数据存储位置</description>

  17. </property>

  18. <property>

  19. <name>dfs.datanode.data.dir</name>

  20. <value>/opt/data/hdfs/data</value>

  21. <final>true</final>

  22. <description>数据存储位置</description>

  23. </property>

  24. <property>

  25. <name>dfs.webhdfs.enabled</name>

  26. <value>true</value>

  27. <description>访问namenode的hdfs使用50070端口,访问datanode的webhdfs使用50075端口。访问文件、文件夹信息使用namenode的IP和50070端口,访问文件内>容或者进行打开、上传、修改、下载等操作使用datanode的IP和50075端口。要想不区分端口,直接使用namenode的IP和端口进行所有的webhdfs操作,就需要在所有的datanode上都设置hefs-site.xml中的dfs.webhdfs.enabled为true</description>

  28. </property>

  29. <property>

  30. <name>dfs.permissions.enabled</name>

  31. <value>false</value>

  32. <description>文件操作时的权限检查标识, 直接关闭</description>

  33. </property>

  34. </configuration>

2.1.4 Configure YARN: yarn-site.xml

Yarn uses linux01.pub uniformly

 
 
  1. <configuration>

  2. <property>

  3. <name>yarn.nodemanager.aux-services</name>

  4. <value>mapreduce_shuffle</value>

  5. </property>

  6. <property>

  7. <name>yarn.resourcemanager.address</name>

  8. <value>linux01.pub:8032</value>

  9. </property>

  10. <property>

  11. <name>yarn.resourcemanager.scheduler.address</name>

  12. <value>linux01.pub:8030</value>

  13. </property>

  14. <property>

  15. <name>yarn.log-aggregation-enable</name>

  16. <value>true</value>

  17. </property>

  18. <property>

  19. <name>yarn.resourcemanager.resource-tracker.address</name>

  20. <value>linux01.pub:8031</value>

  21. </property>

  22. <property>

  23. <name>yarn.resourcemanager.admin.address</name>

  24. <value>linux01.pub:8033</value>

  25. </property>

  26. <property>

  27. <name>yarn.resourcemanager.webapp.address</name>

  28. <value>linux01.pub:8088</value>

  29. </property>

  30. <property>

  31. <name>yarn.nodemanager.env-whitelist</name>

  32. <value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value>

  33. </property>

  34. </configuration>

2.1.5 Deployment MapReduce:mapred-site.xml

Specify the yarn mode used by mr, the node linux01.pub and the classpath

 
 
  1. <configuration>

  2. <property>

  3. <name>mapreduce.framework.name</name>

  4. <value>yarn</value>

  5. </property>

  6. <property>

  7. <name>mapreduce.jobhistory.address</name>

  8. <value>linux01.pub:10020</value>

  9. </property>

  10. <property>

  11. <name>mapreduce.jobhistory.webapp.address</name>

  12. <value>linux01.pub:19888</value>

  13. </property>

  14. <property>

  15. <name>mapreduce.application.classpath</name>

  16. <value>

  17. /opt/apps/hadoop-3.2.1/etc/hadoop,

  18. /opt/apps/hadoop-3.2.1/share/hadoop/common/*,

  19. /opt/apps/hadoop-3.2.1/share/hadoop/common/lib/*,

  20. /opt/apps/hadoop-3.2.1/share/hadoop/hdfs/*,

  21. /opt/apps/hadoop-3.2.1/share/hadoop/hdfs/lib/*,

  22. /opt/apps/hadoop-3.2.1/share/hadoop/mapreduce/*,

  23. /opt/apps/hadoop-3.2.1/share/hadoop/mapreduce/lib/*,

  24. /opt/apps/hadoop-3.2.1/share/hadoop/yarn/*,

  25. /opt/apps/hadoop-3.2.1/share/hadoop/yarn/lib/*

  26. </value>

  27. </property>

  28. </configuration>

2.1.6 Specify all datanodes: workers

linux01.pub
linux02.pub
linux03.pub
linux04.pub
linux05.pub
linux06.pub

2.1.7 Cluster distribution

Distribute all files under /opt in linux01 to other hosts, and distribute /etc/profile together

 scp -r /opt linux02.pub:$PWD
 scp -r /opt linux03.pub:$PWD
 scp -r /opt linux04.pub:$PWD
 scp -r /opt linux05.pub:$PWD
 scp -r /opt linux06.pub:$PWD

2.1.8 Cluster one-key start and one-key stop

Add the following startup configurations to start-dfs.sh and stop-dfs.sh

HDFS_DATANODE_USER=root

HADOOP_SECURE_DN_USER=hdfs

HDFS_NAMENODE_USER=root

HDFS_SECONDARYNAMENODE_USER=root

2.1.9 Initialize namenode:-format on linux01.pub

hadoop  namenode  -format

When this message appears, it means that the initialization has been successful:

2020-11-26 21:03:07,909 INFO common.Storage: Storage directory /opt/data/hdfs/name has been successfully formatted.

2.1.10 Start hadoop and test

Start HDFS on the master node:

start-dfs.sh

 Processes on namenode:

Processes on datanodes:

HDFS test namenode:

Open: http://linux01.pub:9870/

start-all.sh

Start HDFS and yarn:

http://linux01.pub:8088/

Upload a file to HDFS:

hdfs dfs -put ./hadoop-3.2.1.tar.gz /

You can see that the files have been stored in different blocks in different servers:

Data information on the sixth station:

Metadata information on the master node:

2.2 Start to install Spark3.0.1

Upload and decompress to a specified folder

2.2.1 Configure system variables:

# spark 3.0.1配置
export  SPARK_HOME=/opt/apps/spark-3.0.1-bin-hadoop3.2
export  PATH=$PATH:$SPARK_HOME/bin:$SPARK_HOME/sbin

2.2.2 Modify configuration file

Copy all the .template files and remove the suffix:

for i in *.template; do cp ${i} ${i%.*}; done

Create another directory and move the original file into it

Modify spark-env.sh to add:

vim spark-env.sh

# Add the following configuration

export JAVA_HOME=/home/apps/jdk1.8.0_212
export HADOOP_CONF_DIR=/opt/apps/hadoop-3.2.1/etc/hadoop
export SPARK_MASTER_HOST=linux01.pub
export SPARK_MASTER_PORT=7077
export SPARK_LOCAL_DIRS=/opt/apps/spark-3.0.1-bin-hadoop3.2

Add to slaves:

linux01.pub
linux02.pub
linux03.pub
linux04.pub
linux05.pub
linux06.pub

2.2.3 Distribute to other 5 computers

 scp -r /opt/apps/spark-3.0.1-bin-hadoop3.2/ linux02.pub:$PWD
scp -r /opt/apps/spark-3.0.1-bin-hadoop3.2/  linux03.pub:$PWD
scp -r /opt/apps/spark-3.0.1-bin-hadoop3.2/  linux04.pub:$PWD
scp -r /opt/apps/spark-3.0.1-bin-hadoop3.2/  linux05.pub:$PWD
scp -r /opt/apps/spark-3.0.1-bin-hadoop3.2/  linux06.pub:$PWD
scp -r /opt/apps/spark-3.0.1-bin-hadoop3.2/  linux06.pub:$PWD

2.2.4 run spark:

Enter the sbin under the spark directory,

./start-all.sh

In order to distinguish start-all.sh in hadoop, you can change the name to another name. If you use this name to start in the future, it will not conflict with hadoop, but in fact, it is rarely started at the same time. Usually, hadoop only starts one of them. hdfs, just start-dfs.sh

 cp start-all.sh  spark-start-all.sh

At this time, a worker process is started on each extension

Web page view: http://linux01.pub:8080/

So far, the spark construction has been completed, which is relatively simple.

3. Install ZooKeeper

Install to: linux04; linux05; linux06;

3.1 Unzip and add environment variables

tar -zxf apache-zookeeper-3.5.5-bin.tar.gz -C /opt/apps/

mv apache-zookeeper-3.5.5-bin/ zookeeper3.5.5

cd /opt/apps/zookeeper3.5.5

 vim /etc/profile

# Zookeeper配置
export  ZK_HOME=/opt/apps/zookeeper3.5.5
export  PATH=$PATH:$ZK_HOME/bin

 source /etc/profile

3.2 Configuration file

Create a new directory on linux04\05\06.pub to store zk data:

mkdir -p /opt/data/zkdata

And configure id:

On linux04.pub: echo 1 > /opt/data/zkdata/myid

On linux05.pub: echo 2 > /opt/data/zkdata/myid

On linux06.pub: echo 3 > /opt/data/zkdata/myid

The id here corresponds to the server.1/2/3 in the zoo.cfg below

 cp zoo_sample.cfg zoo.cfg
vim zoo.cfg

 
 
  1. dataDir=/opt/data/zkdata

  2. # the port at which the clients will connect

  3. clientPort=2181

  4. # the maximum number of client connections.

  5. # increase this if you need to handle more clients

  6. #maxClientCnxns=60

  7. #

  8. # Be sure to read the maintenance section of the

  9. # administrator guide before turning on autopurge.

  10. #

  11. # http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance

  12. #

  13. # The number of snapshots to retain in dataDir

  14. #autopurge.snapRetainCount=3

  15. # Purge task interval in hours

  16. # Set to "0" to disable auto purge feature

  17. #autopurge.purgeInterval=1i

  18. server.1=linux04.pub:2888:3888

  19. server.2=linux05.pub:2888:3888

  20. server.3=linux06.pub:2888:3888

3.3 Startup script spinning

bin/zk-startall.sh

 
 
  1. #!/bin/bash

  2. if [ $# -eq 0 ]

  3. then

  4. echo "please input param: start stop"

  5. else

  6. for i in {4..6}

  7. do

  8. echo "${i}ing linux0${i}.pub"

  9. ssh linux0${i}.pub "source /etc/profile;/opt/apps/zookeeper3.5.5/bin/zkServer.sh ${1}"

  10. done

  11. if [ $1 = start ]

  12. then

  13. sleep 3

  14. for i in {4..6}

  15. do

  16. echo "checking linux0${i}.pub"

  17. ssh linux0${i}.pub "source /etc/profile;/opt/apps/zookeeper3.5.5/bin/zkServer.sh status"

  18. done fi

  19. fi

Authorization execution: chmod +x ./zk-startall.sh

3.4 Distributed to linux05, 06.pub

scp -r ./zookeeper3.5.5/ linux05.pub:$PWD

scp -r ./zookeeper3.5.5/ linux06.pub:$PWD

3.5 Running and Testing

zk-startall.sh

4. Start to install HBase

4.1 Environment variables

 
 
  1. # hbase 配置

  2. export HBASE_HOME=/opt/apps/hbase-2.3.3

  3. export PATH=$PATH:$HBASE_HOME/bin

4.2 configure hbase

Install scala2.12.12 first

https://downloads.lightbend.com/scala/2.12.12/scala-2.12.12.tgz
decompress and configure scala and hbase system variables

 
 
  1. # java scala mysql 配置

  2. export JAVA_HOME=/home/apps/jdk1.8.0_212

  3. export SCALA_HOME=/home/apps/scala-2.12.12

  4. export PATH=$PATH:$JAVA_HOME/bin:/usr/local/mysql/bin:$SCALA_HOME/bin

  5. # hbase配置

  6. export HBASE_HOME=/opt/apps/hbase-2.3.3

  7. export PATH=$PATH:$HBASE_HOME/bin

4.2.1 Copy core-site.xml and hdfs-site.xml under hadoop to hbase/conf/

cp /opt/apps/hadoop-3.2.1/etc/hadoop/core-site.xml /opt/apps/hadoop-3.2.1/etc/hadoop/hdfs-site.xml /opt/apps/hbase-2.3.3/conf

4.2.2  Configure hbase-env.sh

export HBASE_MANAGES_ZK=false

export  JAVA_HOME=/home/apps/jdk1.8.0_212

4.2.3 Configure hbase-site.xml

 
 
  1. <configuration>

  2. <property>

  3. <name>hbase.rootdir</name>

  4. <value>hdfs://linux01.pub:9000/hbase</value>

  5. </property>

  6. <property>

  7. <name>hbase.cluster.distributed</name>

  8. <value>true</value>

  9. </property>

  10. <property>

  11. <name>hbase.master</name>

  12. <value>linux01.pub:60000</value>

  13. </property>

  14. <property>

  15. <name>hbase.zookeeper.quorum</name>

  16. <value>linux04.pub,linux05.pub,linux06.pub</value>

  17. </property>

  18. <property>

  19. <name>hbase.zookeeper.property.dataDir</name>

  20. <value>/opt/data/hbase</value>

  21. </property>

  22. <property>

  23. <name>hbase.unsafe.stream.capability.enforce</name>

  24. <value>false</value>

  25. </property>

  26. </configuration>

4.2.4 regionservers configuration

 
 
  1. linux01.pub

  2. linux02.pub

  3. linux03.pub

  4. linux04.pub

  5. linux05.pub

  6. linux06.pub

4.2.5 Resolving log conflicts

#Under hbase/lib/client-facing-thirdparty

mv /opt/apps/hbase-2.3.3/lib/client-facing-thirdparty/slf4j-log4j12-1.7.30.jar slf4j-log4j12-1.7.30.jar.bak

#Rename the slf4j file of hbase but do not delete it, as a backup, so as not to conflict with the hadoop log

4.3 Distributing packages

Distributing directories, distributing profiles, and distributing hbase directories to linux02.pub~linux06.pub have been done many times, omitted

4.4 Start and test

Start: start-hbase.sh Stop: stop-hbase.sh

#Test into the shell: hbase shell

create 't1', {NAME => 'f1', VERSIONS => 5}

Increase web page access: vim hbase-site.xml

<!-- New web access configuration -->
<property>
<name>hbase.master.info.port</name>
<value>60010</value>
</property>

Visit: http://linux01.pub:60010

Check integration with zk:

Check integration with hadoop:

5. Start to install Hive

upload decompression

5.1 Download mysql connector

MySQL :: Download Connector/J

5.2 Check whether mysql is started

5.3. Put the connector in lib under the hive installation directory

[root@linux01 home]# tar -zxf mysql-connector-java-5.1.49.tar.gz -C /opt/apps/hive3.1.2/lib/
[root@linux01 home]# cd /opt/apps/hive3.1.2/lib/

[root@linux01 lib]# mv mysql-connector-java-5.1.49/*.jar ./
[root@linux01 lib]# rm -rf mysql-connector-java-5.1.49

5.4 Copy the guava-27.0-jre.jar in lib under Hadoop to lib, and delete the original guava-19.0.jar directly

[root@linux01 lib]# ls /opt/apps/hadoop-3.2.1/share/hadoop/common/lib/guava-*
/opt/apps/hadoop-3.2.1/share/hadoop/common/lib/guava-27.0-jre.jar

[root@linux01 lib]# cp /opt/apps/hadoop-3.2.1/share/hadoop/common/lib/guava-27.0-jre.jar ./
[root@linux01 lib]# rm -rf ./guava-19.0.jar

5.5 configure hive

5.5.1 Configure environment variables

# hive配置
export  HIVE_BASE=/opt/apps/hive3.1.2
export  PATH=$PATH:$HIVE_BASE/bin

5.5.2 Configure hive-env.sh

Create hive-env.sh file, copy hive-env.sh.template and rename it hive-env.sh:

# hadoop installation directory
export HADOOP_HOME=/opt/apps/hadoop-3.2.1
# Hive configuration file directory
export HIVE_CONF_DIR=/opt/apps/hive3.1.2/conf
# Hive trust jar package directory
export HIVE_AUX_JARS_PATH=/opt/apps/hive3 .1.2/lib

5.5.3 Configure hive-site.xml

hive-site.xml file , copy hive-default.xml.template and rename it hive-site.xml:

Remove special characters at line 3210:

Start HADOOP: start-all.sh

Create two hdfs folders:

[root@linux01 ~]# hdfs dfs -mkdir -p /user/hive/warehouse; hdfs dfs -mkdir -p /tmp/hive
[root@linux01 ~]# hdfs dfs -chmod -R 777 /user/hive/warehouse; hdfs dfs -chmod -R 777  /tmp/hive
[root@linux01 ~]# hdfs dfs -ls /
Found 3 items
drwxr-xr-x   - root supergroup          0 2020-11-27 00:58 /hbase
drwxr-xr-x   - root supergroup          0 2020-11-27 11:57 /tmp
drwxr-xr-x   - root supergroup          0 2020-11-27 11:57 /user

Modify system:java.io.tmp.dir and system:user.name to the temporary folder created by yourself and your own account

Lines 143, 149, 1848, 4407

Quickly modify VIM:

:%s#${system:java.io.tmpdir}#/opt/tmp/hive#g
:%s#${system:user.name}#root#g

Configuration database:

585行:javax.jdo.option.ConnectionURL   value改为:jdbc:mysql://linux01.pub:3306/hive?createDatabaseIfNotExist=true&characterEncoding=UTF-8

Line 1104: javax.jdo.option.ConnectionDriverName values ​​changed to: com.mysql.jdbc.Driver (if com.mysql.cj.jdbc.Driver of 8.0

1131: javax.jdo.option.ConnectionUserName value changed to: your database user name

Line 571: javax.jdo.option.ConnectionPassword value changed: your password

Line 800: hive.metastore.schema.verification value changed to: false

5.5.4 Create a log configuration file: hive-log4j2.properties

#Remove the template:

[root@linux01 conf]# cp hive-log4j2.properties.template hive-log4j2.properties
[root@linux01 conf]# vim hive-log4j2.properties

# Create a log directory:

property.hive.log.dir = /opt/log/hive

Big pit: The log jar package in hive conflicts with the log jar package in hadoop, and SLF4J: Class path contains multiple SLF4J bindings appears.

The solution is also very simple: there will be a prompt below which jar it is. This must be developed with a chicken leg, and the jar can be solved by directly deleting the jar in hive.

5.6 Start hive and test

Initialization: schematool -initSchema -dbType mysql

Start: hive

Test: show tables;

6. Install Flume

6.1. Use template files to create env and conf files under conf

6.2. Add export java_home to env

6.3, i test: enter the bin: .\flume-ng version

6.4. Distribute to 5 and 6 machines and complete

7. Install Kafka

7.1 start zookeeper

Using our own script: zk-startall.sh start

7.2 Modify server.properties under conf

Modify brokerid:

linux04.pub: brokerid-->0

linux05.pub: brokerid-->1

linux06.pub: brokerid-->2

Modify the bound IP:listeners=PLAINTEXT://linux04.pub:9092

Modify the log directory: Create a new directory: log.dirs=/opt/log/kafka

Modify zk cluster: zookeeper.connect=linux04.pub:2181, linux05.pub:2181, linux06.pub:2181

7.3 Distribution to other extensions:

Change the brokerid, change the IP, create a directory and change the log directory, configure the environment

# Kafka配置
export  KAFKA_HOME=/opt/apps/kafka_2in.12-2.6.0
export  PATH=$PATH:$KAFKA_HOME/bin

modify id ip

7.4 Write a one-click startup script

 
 
  1. #!/bin/bash

  2. if [ $# -eq 0 ]

  3. then

  4. echo "please input param: start stop"

  5. else

  6. if [ $1 = start ]

  7. then

  8. for i in {4..6}

  9. do

  10. echo "${1}ing linux0${i}.pub"

  11. ssh linux0${i}.pub "source /etc/profile;/opt/apps/kafka_2.12-2.6.0/bin/kafka-server-start.sh -daemon /opt/apps/kafka_2.12-2.6.0/config/server.properties"

  12. done

  13. fi

  14. if [ $1 = stop ]

  15. then

  16. for i in {4..6}

  17. do

  18. ssh linux0${i}.pub "source /etc/profile;/opt/apps/kafka_2.12-2.6.0/bin/kafka-server-stop.sh"

  19. done

  20. fi

  21. fi

Start and test:

8. Install Redis

8.1 Download, decompress and compile

Redis

download the latest version

unstable will not be considered

Install gcc:

yum -y install centos-release-scl

yum -y install devtoolset-9-gcc devtoolset-9-gcc-c++ devtoolset-9-binutils

scl enable devtoolset-9 bash

echo "source /opt/rh/devtoolset-9/enable" >> /etc/profile

source /etc/profile

 compile:

make && make install PREFIX=/opt/apps/redis6

8.2 Configure environment variables

#redis配置
export  REDIS_HOME=/opt/apps/redis6
export  PATH=$PATH:$REDIS_HOME/bin

8.3 configure redis

Copy redis.conf in the source package to the installation directory:

cp /opt/jars/redis-6.0.9/redis.conf ./

Modify bind to the local name: bind linux04.pub

Modify background operation: daemonize yes

8.4 Start the test

redis-server ./redis.conf

Client connection:

redis-cli -h linux04.pub -p 6379

8.5 Distribute to 5 and 6, and modify the bound host name and environment variables

9 Redis cluster construction

(I won’t use it for the time being, copy someone else’s, and build it again when I need it)

Redis cluster construction (very detailed, suitable for novices)_redis cluster construction (very detailed, suitable for novices)_Cool cool watermelon blog-CSDN blog

9.1. Introduction to Redis Cluster (Redis Cluster)

  • Redis is an open source key value storage system, which is favored by many Internet companies. Redis version 3.0 only supports singleton mode before version 3.0, and clusters are supported in version 3.0 and later. I am using version 3.0.0 of redis here;

  • The redis cluster adopts the P2P mode, which is completely decentralized, and there is no central node or proxy node;

  • The redis cluster does not have a unified entrance. When the client (client) connects to the cluster, it only needs to connect to any node (node) in the cluster. The nodes inside the cluster communicate with each other (PING-PONG mechanism), and each node is A redis instance;

  • In order to achieve high availability of the cluster, that is, to judge whether the node is healthy (whether it can be used normally), redis-cluster has such a voting fault tolerance mechanism: if more than half of the nodes in the cluster vote that a node is down, then the node is down (fail). This is the method to judge whether the node is hung up;

  • So how to judge whether the cluster is down? -> If any node in the cluster is down, and the node has no slave node (backup node), then the cluster is down. This is the method to judge whether the cluster is down;

  • So why does the cluster hang when any node hangs (no slave node)? -> Because the cluster has built-in 16384 slots (hash slots), and all physical nodes are mapped to these 16384[0-16383] slots, or these slots are equally allocated to each node. When a data (key-value) needs to be stored in the Redis cluster, redis will first perform the crc16 algorithm on the key, and then get a result. Then calculate the remainder of 16384 for this result, and this remainder will correspond to one of the slots [0-16383], and then determine which node the key-value is stored in. So once a node hangs up, the slot corresponding to the node cannot be used, which will cause the cluster to fail to work normally.

  • To sum up, each Redis cluster can theoretically have up to 16384 nodes.

9.2. The environment required for cluster construction


2.1 The Redis cluster needs at least 3 nodes, because the voting fault tolerance mechanism requires more than half of the nodes to think that a node is hung up, and the node is hung up, so 2 nodes cannot form a cluster.
2.2 To ensure the high availability of the cluster, each node needs to have a slave node, that is, a backup node, so the Redis cluster needs at least 6 servers. Because I don't have so many servers, and I can't start so many virtual machines, so I build a pseudo-distributed cluster here, that is, one server runs 6 redis instances virtually, and the port number is changed to (7001-7006), of course, the actual production environment The Redis cluster construction is the same as here.
2.3 install ruby


    9.3. The specific steps of cluster construction are as follows (note that the firewall must be turned off)


3.1 Create a new redis-cluster directory under the usr/local directory to store cluster nodes
Create a new Redis cluster directory
3.2 Copy all the files in the bin directory under the redis directory to the /usr/local/redis-cluster/redis01 directory, don't worry there is no redis01 here directory will be created automatically. The operation command is as follows (note the current path):

 cp -r redis/bin/ redis-cluster/redis01

insert image description here
3.3 Delete the snapshot file dump.rdb in the redis01 directory, and modify the redis.cnf file in this directory. Specifically, modify two places: one is to change the port number to 7001, and the other is to enable the cluster creation mode and open the comment. As shown in the figure below:
delete the dump.rdb file,
Delete the dump.rdb file
modify the port number to 7001, and the default is 6379.
insert image description here
Open the comment of cluster-enabled yes
insert image description here
3.4 Copy 5 copies of the redis-cluster/redis01 file to the redis-cluster directory (redis02-redis06) , Create 6 redis instances to simulate 6 nodes of the Redis cluster. Then modify the port numbers in redis.conf under the remaining 5 files to 7002-7006 respectively. As shown in the figure below:
create redis02-06 directory
insert image description here

Modify the redis.conf file port number to 7002-7006 respectively.
insert image description here
3.5 Then start all redis nodes. Since it is too troublesome to start one by one, create a script file to start redis nodes in batches here. The command is start-all.sh , and the file content as follows:

 
 
  1. cd redis01

  2. ./redis-server redis.conf

  3. cd ..

  4. cd redis02

  5. ./redis-server redis.conf

  6. cd ..

  7. cd redis03

  8. ./redis-server redis.conf

  9. cd ..

  10. cd redis04

  11. ./redis-server redis.conf

  12. cd ..

  13. cd redis05

  14. ./redis-server redis.conf

  15. cd ..

  16. cd redis06

  17. ./redis-server redis.conf

  18. cd ..

3.6 After creating the startup script file, you need to modify the permissions of the script so that it can be executed. The instructions are as follows:

chmod +x start-all.sh
  • 1

insert image description here
3.7 Execute the start-all.sh script to start 6 redis nodes
insert image description here
3.8 ok, so far the 6 redis nodes have been successfully started, and then the cluster will be officially started. The above are the preparatory conditions. Don’t feel troublesome because the pictures look tedious. In fact, the above steps are just one sentence: create 6 redis instances (6 nodes) and start them.
To build a cluster, you need to use a tool (script file), which is in the source code of the redis decompressed file. Because this tool is a ruby ​​script file, the operation of this tool requires a ruby ​​operating environment, which is equivalent to the operation of the java language on the jvm. So you need to install ruby, the command is as follows:

yum install ruby

Then you need to install ruby-related packages to the server. I use redis-3.0.0.gem here. Everyone should pay attention to: the version of redis and the version of the ruby ​​package should be consistent.
Install the Ruby package to the server: You need to download it first and then install it, as shown in the figure.
insert image description here
The installation command is as follows:

gem install redis-3.0.0.gem

insert image description here

3.9 In the previous step, the operating environment and ruby ​​package required by the ruby ​​tool have been installed. Next, the ruby ​​script tool needs to be copied to the usr/local/redis-cluster directory. So where is this ruby ​​script tool? As mentioned before, in the source code of the redis decompression file, that is, the redis-trib.rb file in the redis/src directory.
insert image description here
insert image description here
3.10 Copy the ruby ​​tool (redis-trib.rb) to the redis-cluster directory, the command is as follows:

cp redis-trib.rb /usr/local/redis-cluster

Then use the script file to build a cluster, the instructions are as follows:

./redis-trib.rb create --replicas 1 47.106.219.251:7001 47.106.219.251:7002 47.106.219.251:7003 47.106.219.251:7004 47.106.219.251:7005 47.106.219.251:7006

Note: Here you should enter the corresponding ip address according to your server ip!
insert image description here

There is a place in the middle where you need to manually enter yes. So
insert image description here
far, the Redi cluster has been successfully built! Please pay attention to the last paragraph, which shows the slots (hash slots) assigned to each node. There are a total of 6 nodes here, 3 of which are slave nodes, so the 3 master nodes map 0-5460, 5461-10922, 10933-16383solts.

3.11 Finally connect to the cluster nodes, just connect to any one:

redis01/redis-cli -p 7001 -c 

Note: Be sure to add -c, otherwise the nodes cannot automatically jump! As can be seen in the figure below, the stored data (key-value) is evenly distributed to different nodes:
insert image description here

9.4 Conclusion

Finally, add two redis cluster basic commands:
1. View current cluster information

cluster info

2. Check how many nodes are in the cluster

cluster nodes

Guess you like

Origin blog.csdn.net/eagle89/article/details/131473292