Modify hostname
centOS7 以上需要这么设置,重启也有效
hostnamectl set-hostname test-x
Modify / etc / hosts
Here is where it is needed doctrine
Configuring ssh-free secret
- [Master, Slave1, Slave2]
ssh-keygen -t rsa -P '' //敲回车就行, 保证公匙一致
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys // 把自己的公匙写入authorized_keys
- [Master, Slave1, Slave2]
each copy
// master
ssh-copy-id root@slave1
ssh-copy-id root@slave2
// slave1
ssh-copy-id root@master
ssh-copy-id root@slave2
// slave2
ssh-copy-id root@master
ssh-copy-id root@slave1
- Free tests are considered confidential by the configuration is correct, otherwise we must examine the reasons and exclude
ssh root@localhost
ssh root@other_server
Configuring Java
- Download java
I tried here under wget, Rom oracle server, after the last download in the windows scp past. - Configuration jdk environment variable
to extract /usr/local/jdk1.8
edit / etc / profile, add the following configuration, source / etc / profile
JAVA_HOME=/usr/local/jdk1.8
CLASS_PATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar:$JRE_HOME/lib
PATH=$PATH:$JAVA_HOME/bin:$JRE_HOME/bin
- Java -version and javac test to see if configured.
Configuration Scala
- Download the latest version (pre-built for hadoop 2.7 and later), to extract /usr/local/scala-2.12.8
- Edit / etc / profile, add the following configuration, source / etc / profile
export PATH="$PATH:/usr/local/scala-2.12.8/bin"
- Scala scala can enter input terminal
Installation hadoop
- Download the latest version, unzip to /usr/local/hadoop-3.1.2
- Modify the configuration file
2.1 core-site.xml // Some parameters of the overall configuration of Hadoop
<property>
<name>hadoop.tmp.dir</name>
<value>/usr/local/hadoop-3.1.2/tmp</value>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://test-1:9000</value>
</property>
2.2 hdfs-site.xml // module configured HDFS
<property>
<name>dfs.name.dir</name>
<value>/usr/local/hadoop-3.1.2/dfs/name</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>/usr/local/hadoop-3.1.2/dfs/data</value>
</property>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
2.3 mapred-site.xml // MapReduce configuration module
I assume Hadoop path /usr/local/hadoop-3.0.0-beta1, same below
<property>
<name>mapred.job.tracker</name>
<value>test-1:49001</value>
</property>
<property>
<name>mapred.local.dir</name>
<value>/usr/local/hadoop-3.1.2/var</value>
</property>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>yarn.app.mapreduce.am.env</name>
<value>/usr/local/hadoop-3.1.2</value>
</property>
<property>
<name>mapreduce.map.env</name>
<value>/usr/local/hadoop-3.1.2</value>
</property>
<property>
<name>mapreduce.reduce.env</name>
<value>/usr/local/hadoop-3.1.2</value>
</property>
<property>
<name>yarn.app.mapreduce.am.env</name>
<value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
</property>
<property>
<name>mapreduce.map.env</name>
<value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
</property>
<property>
<name>mapreduce.reduce.env</name>
<value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
</property>
2.4 yarn-site.xml // YARN module configuration, it is important, little attention run up. For example, memory, easiest wordcount need more than 1GB of memory, so it is lower than the run. Refer to scheduling and isolation YARN in the memory and CPU of the two resources
<property>
<name>yarn.resourcemanager.hostname</name>
<value>test-1</value>
</property>
<property>
<description>The address of the applications manager interface in the RM.</description>
<name>yarn.resourcemanager.address</name>
<value>${yarn.resourcemanager.hostname}:8032</value>
</property>
<property>
<description>The address of the scheduler interface.</description>
<name>yarn.resourcemanager.scheduler.address</name>
<value>${yarn.resourcemanager.hostname}:8030</value>
</property>
<property>
<description>The http address of the RM web application.</description>
<name>yarn.resourcemanager.webapp.address</name>
<value>${yarn.resourcemanager.hostname}:8088</value>
</property>
<property>
<description>The https adddress of the RM web application.</description>
<name>yarn.resourcemanager.webapp.https.address</name>
<value>${yarn.resourcemanager.hostname}:8090</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>${yarn.resourcemanager.hostname}:8031</value>
</property>
<property>
<description>The address of the RM admin interface.</description>
<name>yarn.resourcemanager.admin.address</name>
<value>${yarn.resourcemanager.hostname}:8033</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
<property>
<name>yarn.nodemanager.vmem-check-enabled</name>
<value>false</value>
</property>
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>2048</value>
</property>
<property>
<name>yarn.nodemanager.resource.cpu-vcores</name>
<value>1</value>
</property>
<property>
<name>yarn.nodemanager.vmem-check-enabled</name>
<value>false</value>
</property>
2.5 hadoop-env.sh
补一行
export JAVA_HOME=YOUR_JAVA_HOME_PATH
2.6 works // work node configuration
will delete inside localhost, instead
test-2
test-3
2.7 sbin / start-dfs.sh sbin / stop-dfs.sh
fill in the blank space
HDFS_DATANODE_USER=root
HDFS_DATANODE_SECURE_USER=hdfs
HDFS_NAMENODE_USER=root
HDFS_SECONDARYNAMENODE_USER=root
2.8 sbin / start-yarn.sh sbin / stop-yarn.sh
fill in the blank space
YARN_RESOURCEMANAGER_USER=root
HADOOP_SECURE_DN_USER=yarn
YARN_NODEMANAGER_USER=root
- scp to all nodes, or perform the same configuration on all nodes
- Excuting an order
在 namenode 上执行 hdfs namenode -format
然后进入 sbin 目录下
./start-all.sh
Normal, according to the different profiles possible nuances
NameNode MR is the master node, is DataNodes worker; NodeManager YARN is on the working node, ResourceManger is the master node.
HDFS management interface , YARN management interface
Can run the demo and then not being given the newspaper, reported what was wrong and then change.
Installation spark
- Download the latest version (pre-built for hadoop 2.7 and later), to extract /usr/local/spark-2.4.3
cd /usr/local/spark-2.4.3/conf
cp spark-env.sh.template spark-env.sh
vi spark-env.sh
export JAVA_HOME=/usr/local/jdk1.8
export SCALA_HOME=/usr/local/scala-2.12.8
export HADOOP_HOME=/usr/local/hadoop-3.1.2
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
SPARK_MASTER_IP=test-1
SPARK_LOCAL_DIRS=/usr/local/spark-2.4.3
cp slaves.template slaves
vi slaves
test-2
test-3
- Distributed to other nodes
scp /usr/local/spark-2.4.3 test-2:/usr/local
scp /usr/local/spark-2.4.3 test-3:/usr/local
- Start-all.sh (a lot like hadoop) under execution sbin directories Spark's management interface
In the case of a start hadoop should have these processes in
- 测试
5.1 单机多线程
./bin/run-example SparkPi 10 --master local[2]
5.2 Standalone 集群模式运行
./bin/spark-submit \
--class org.apache.spark.examples.SparkPi \
--master spark://test-1:7077 \
examples/jars/spark-examples_2.11-2.4.3.jar \
100
5.3 YARN 模式运行
./bin/spark-submit \
--class org.apache.spark.examples.SparkPi \
--master yarn \
--deploy-mode cluster \
examples/jars/spark-examples*.jar \
10
安装 zookeeper
- 下载最新版本,解压到 /usr/local/zookeeper
cd /usr/local/zookeeper/conf/
cp zoo_sample.cfg zoo.cfg
vi zoo.cfg
#修改 data 项
dataDir=/usr/local/zookeeper/data
#附加
dataLogDir=/usr/local/zookeeper/logs
# zookeeper最近的版本中有个内嵌的管理控制台是通过jetty启动,也会占用8080 端口。
admin.serverPort=4040
server.1=10.123.9.53:4001:4002
server.2=10.123.9.54:4001:4002
server.3=10.123.9.55:4001:4002
dataDir 和 dataLogDir 需要在启动前创建完成
mkdir /usr/local/zookeeper/data
mkdir /usr/local/zookeeper/logs
clientPort 为 zookeeper的服务端口
server.0、server.1、server.2 为 zk 集群中三个 node 的信息,定义格式为 hostname:port1:port2,其中 port1 是 node 间通信使用的端口,port2 是node 选举使用的端口,需确保三台主机的这两个端口都是互通的。
更改日志配置
Zookeeper 默认会将控制台信息输出到启动路径下的 zookeeper.out 中,通过如下方法,可以让 Zookeeper 输出按尺寸切分的日志文件:
1)修改/usr/local/zookeeper/conf/log4j.properties文件,将
zookeeper.root.logger=INFO, CONSOLE
改为
zookeeper.root.logger=INFO, ROLLINGFILE
2)修改/usr/local/zookeeper/bin/zkEnv.sh文件,将
ZOO_LOG4J_PROP="INFO,CONSOLE"
改为
ZOO_LOG4J_PROP="INFO,ROLLINGFILE"在master主机的 dataDir 路径下创建一个文件名为 myid 的文件
1、在第一台master主机上建立的 myid 文件内容是 1,第二台slave1主机上建立的myid文件内容是 2,第三台slave2主机上建立的myid文件内容是 3。myid文件内容需要与/usr/local/zookeeper/conf/zoo.cfg中的配置的server.id的编号对应。
2、可以先把zk文件拷贝到其他节点后,再在各自的节点上手动修改myid编号。分发 zookeeper 到其它节点,并修改各自的myid编号
-
启动各节点上的 zookeeper,
./bin/zkServer.sh start
jps 应该能看见一个 QuorumPeerMain 进程,各节点上都有
在各节点上执行 ./zkServer.sh status
模拟 leader 挂掉,
模拟机器恢复后重新加入集群,
失败。如果当掉了 leader,再加入,会加不进去。查了下,应该是3.4.X的 bug,按理说应该已经修复了。但是还有。暂不清楚。
安装 kafka
- 下载最新版本,解压到 /usr/local/kafka
- 见这篇博客。kafka 只部署了两个节点,test-2 和 test-3,分别为 10.123.9.54:9092,10.123.9.55:9092
安装 MySQL
见这篇博客。mysql 只安装在 test-3 上。
用户名 root
密码 adcLab2019
整体
test-1
test-2
test-3
Reproduced in: https: //www.jianshu.com/p/59b0e230f0d5