Environment configuration file

Modify hostname

centOS7 以上需要这么设置,重启也有效
hostnamectl set-hostname test-x

Modify / etc / hosts

Here is where it is needed doctrine


4792989-2ca2b2b77805fd4c.png
image.png

Configuring ssh-free secret

  1. [Master, Slave1, Slave2]
ssh-keygen  -t   rsa   -P  '' //敲回车就行, 保证公匙一致
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys // 把自己的公匙写入authorized_keys
  1. [Master, Slave1, Slave2]
    each copy
// master
ssh-copy-id root@slave1
ssh-copy-id root@slave2 
// slave1
ssh-copy-id root@master
ssh-copy-id root@slave2 
// slave2
ssh-copy-id root@master
ssh-copy-id root@slave1
  1. Free tests are considered confidential by the configuration is correct, otherwise we must examine the reasons and exclude
ssh root@localhost
ssh root@other_server 

Configuring Java

  1. Download java
    I tried here under wget, Rom oracle server, after the last download in the windows scp past.
  2. Configuration jdk environment variable
    to extract /usr/local/jdk1.8
    edit / etc / profile, add the following configuration, source / etc / profile
JAVA_HOME=/usr/local/jdk1.8
CLASS_PATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar:$JRE_HOME/lib
PATH=$PATH:$JAVA_HOME/bin:$JRE_HOME/bin
  1. Java -version and javac test to see if configured.

Configuration Scala

  1. Download the latest version (pre-built for hadoop 2.7 and later), to extract /usr/local/scala-2.12.8
  2. Edit / etc / profile, add the following configuration, source / etc / profile
export PATH="$PATH:/usr/local/scala-2.12.8/bin"
  1. Scala scala can enter input terminal

Installation hadoop

  1. Download the latest version, unzip to /usr/local/hadoop-3.1.2
  2. Modify the configuration file
    2.1 core-site.xml // Some parameters of the overall configuration of Hadoop
<property>
     <name>hadoop.tmp.dir</name>
     <value>/usr/local/hadoop-3.1.2/tmp</value>
</property>
<property>
     <name>fs.default.name</name>
     <value>hdfs://test-1:9000</value>
</property>

2.2 hdfs-site.xml // module configured HDFS

<property>
    <name>dfs.name.dir</name>
    <value>/usr/local/hadoop-3.1.2/dfs/name</value>
</property>
<property>
    <name>dfs.data.dir</name>
    <value>/usr/local/hadoop-3.1.2/dfs/data</value>
</property>
<property>
    <name>dfs.replication</name>
    <value>2</value>
</property>
<property>
    <name>dfs.permissions</name>
    <value>false</value>
</property>

2.3 mapred-site.xml // MapReduce configuration module
I assume Hadoop path /usr/local/hadoop-3.0.0-beta1, same below

<property>
    <name>mapred.job.tracker</name>
    <value>test-1:49001</value>
</property>
<property>
     <name>mapred.local.dir</name>
     <value>/usr/local/hadoop-3.1.2/var</value>
</property>
<property>
     <name>mapreduce.framework.name</name>
     <value>yarn</value>
</property>
<property>
     <name>yarn.app.mapreduce.am.env</name>
     <value>/usr/local/hadoop-3.1.2</value>
</property>
<property>
     <name>mapreduce.map.env</name>
     <value>/usr/local/hadoop-3.1.2</value>
</property>
<property>
     <name>mapreduce.reduce.env</name>
     <value>/usr/local/hadoop-3.1.2</value>
</property>
<property>
  <name>yarn.app.mapreduce.am.env</name>
  <value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
</property>
<property>
  <name>mapreduce.map.env</name>
  <value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
</property>
<property>
  <name>mapreduce.reduce.env</name>
  <value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
</property>

2.4 yarn-site.xml // YARN module configuration, it is important, little attention run up. For example, memory, easiest wordcount need more than 1GB of memory, so it is lower than the run. Refer to scheduling and isolation YARN in the memory and CPU of the two resources

<property>
        <name>yarn.resourcemanager.hostname</name>
        <value>test-1</value>
   </property>
   <property>
        <description>The address of the applications manager interface in the RM.</description>
        <name>yarn.resourcemanager.address</name>
        <value>${yarn.resourcemanager.hostname}:8032</value>
   </property>
   <property>
        <description>The address of the scheduler interface.</description>
        <name>yarn.resourcemanager.scheduler.address</name>
        <value>${yarn.resourcemanager.hostname}:8030</value>
   </property>
   <property>
        <description>The http address of the RM web application.</description>
        <name>yarn.resourcemanager.webapp.address</name>
        <value>${yarn.resourcemanager.hostname}:8088</value>
   </property>
   <property>
        <description>The https adddress of the RM web application.</description>
        <name>yarn.resourcemanager.webapp.https.address</name>
        <value>${yarn.resourcemanager.hostname}:8090</value>
   </property>
   <property>
        <name>yarn.resourcemanager.resource-tracker.address</name>
        <value>${yarn.resourcemanager.hostname}:8031</value>
   </property>
   <property>
        <description>The address of the RM admin interface.</description>
        <name>yarn.resourcemanager.admin.address</name>
        <value>${yarn.resourcemanager.hostname}:8033</value>
   </property>
   <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
   </property>
 <property>
        <name>yarn.log-aggregation-enable</name>
        <value>true</value>
   </property>
<property>
        <name>yarn.nodemanager.vmem-check-enabled</name>
        <value>false</value>
   </property>
   <property>
        <name>yarn.nodemanager.resource.memory-mb</name>
        <value>2048</value>
   </property>
   <property>
        <name>yarn.nodemanager.resource.cpu-vcores</name>
        <value>1</value>
   </property>
   <property>
        <name>yarn.nodemanager.vmem-check-enabled</name>
        <value>false</value>
   </property>

2.5 hadoop-env.sh

补一行
export JAVA_HOME=YOUR_JAVA_HOME_PATH

2.6 works // work node configuration
will delete inside localhost, instead

test-2
test-3

2.7 sbin / start-dfs.sh sbin / stop-dfs.sh
fill in the blank space

HDFS_DATANODE_USER=root
HDFS_DATANODE_SECURE_USER=hdfs
HDFS_NAMENODE_USER=root
HDFS_SECONDARYNAMENODE_USER=root

2.8 sbin / start-yarn.sh sbin / stop-yarn.sh
fill in the blank space

YARN_RESOURCEMANAGER_USER=root
HADOOP_SECURE_DN_USER=yarn
YARN_NODEMANAGER_USER=root
  1. scp to all nodes, or perform the same configuration on all nodes
  2. Excuting an order
在 namenode 上执行 hdfs namenode -format
然后进入 sbin 目录下
./start-all.sh

Normal, according to the different profiles possible nuances
NameNode MR is the master node, is DataNodes worker; NodeManager YARN is on the working node, ResourceManger is the master node.


4792989-3a462219cb8e9805.png
image.png

HDFS management interface , YARN management interface

Can run the demo and then not being given the newspaper, reported what was wrong and then change.

Installation spark

  1. Download the latest version (pre-built for hadoop 2.7 and later), to extract /usr/local/spark-2.4.3
cd /usr/local/spark-2.4.3/conf
cp spark-env.sh.template spark-env.sh
vi spark-env.sh

export JAVA_HOME=/usr/local/jdk1.8
export SCALA_HOME=/usr/local/scala-2.12.8
export HADOOP_HOME=/usr/local/hadoop-3.1.2
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
SPARK_MASTER_IP=test-1
SPARK_LOCAL_DIRS=/usr/local/spark-2.4.3

cp slaves.template slaves
vi slaves

test-2
test-3
  1. Distributed to other nodes
scp /usr/local/spark-2.4.3 test-2:/usr/local 
scp /usr/local/spark-2.4.3 test-3:/usr/local 
  1. Start-all.sh (a lot like hadoop) under execution sbin directories Spark's management interface
    4792989-687c0a3553f83970.png
    image.png

    In the case of a start hadoop should have these processes in
    4792989-acb5eb9fd2a30d3f.png
    image.png
  2. 测试
    5.1 单机多线程
    ./bin/run-example SparkPi 10 --master local[2]
    5.2 Standalone 集群模式运行
./bin/spark-submit \
--class org.apache.spark.examples.SparkPi \
--master spark://test-1:7077 \
examples/jars/spark-examples_2.11-2.4.3.jar \
100

5.3 YARN 模式运行

./bin/spark-submit \
--class org.apache.spark.examples.SparkPi \
--master yarn \
--deploy-mode cluster \
examples/jars/spark-examples*.jar \
10

安装 zookeeper

  1. 下载最新版本,解压到 /usr/local/zookeeper
cd /usr/local/zookeeper/conf/
cp zoo_sample.cfg zoo.cfg
vi zoo.cfg

#修改 data 项
dataDir=/usr/local/zookeeper/data
#附加
dataLogDir=/usr/local/zookeeper/logs
# zookeeper最近的版本中有个内嵌的管理控制台是通过jetty启动,也会占用8080 端口。
admin.serverPort=4040

server.1=10.123.9.53:4001:4002
server.2=10.123.9.54:4001:4002
server.3=10.123.9.55:4001:4002

dataDir 和 dataLogDir 需要在启动前创建完成

mkdir /usr/local/zookeeper/data
mkdir /usr/local/zookeeper/logs

clientPort 为 zookeeper的服务端口
server.0、server.1、server.2 为 zk 集群中三个 node 的信息,定义格式为 hostname:port1:port2,其中 port1 是 node 间通信使用的端口,port2 是node 选举使用的端口,需确保三台主机的这两个端口都是互通的。

  1. 更改日志配置
    Zookeeper 默认会将控制台信息输出到启动路径下的 zookeeper.out 中,通过如下方法,可以让 Zookeeper 输出按尺寸切分的日志文件:
    1)修改/usr/local/zookeeper/conf/log4j.properties文件,将
    zookeeper.root.logger=INFO, CONSOLE
    改为
    zookeeper.root.logger=INFO, ROLLINGFILE
    2)修改/usr/local/zookeeper/bin/zkEnv.sh文件,将
    ZOO_LOG4J_PROP="INFO,CONSOLE"
    改为
    ZOO_LOG4J_PROP="INFO,ROLLINGFILE"

  2. 在master主机的 dataDir 路径下创建一个文件名为 myid 的文件
    1、在第一台master主机上建立的 myid 文件内容是 1,第二台slave1主机上建立的myid文件内容是 2,第三台slave2主机上建立的myid文件内容是 3。myid文件内容需要与/usr/local/zookeeper/conf/zoo.cfg中的配置的server.id的编号对应。
    2、可以先把zk文件拷贝到其他节点后,再在各自的节点上手动修改myid编号。

  3. 分发 zookeeper 到其它节点,并修改各自的myid编号

  4. 启动各节点上的 zookeeper,
    ./bin/zkServer.sh start

    4792989-3a40425e780c05fa.png
    image.png

    jps 应该能看见一个 QuorumPeerMain 进程,各节点上都有
    在各节点上执行 ./zkServer.sh status
    4792989-4289cf039fdf2345.png
    image.png

    模拟 leader 挂掉,
    4792989-32eab3eee88d93d4.png
    image.png

    模拟机器恢复后重新加入集群,
    失败。如果当掉了 leader,再加入,会加不进去。查了下,应该是3.4.X的 bug,按理说应该已经修复了。但是还有。暂不清楚。

安装 kafka

  1. 下载最新版本,解压到 /usr/local/kafka
  2. 见这篇博客。kafka 只部署了两个节点,test-2 和 test-3,分别为 10.123.9.54:9092,10.123.9.55:9092

安装 MySQL

见这篇博客。mysql 只安装在 test-3 上。
用户名 root
密码 adcLab2019

整体
test-1


4792989-882236fed100abac.png
image.png

test-2


4792989-11db5eacb6fa8649.png
image.png

test-3
4792989-a219e6604325218f.png
image.png

Reproduced in: https: //www.jianshu.com/p/59b0e230f0d5

Guess you like

Origin blog.csdn.net/weixin_34351321/article/details/91125366