Fully distributed Hadoop Cluster Setup:
1)配置文件
1.hadoop-env.sh
export JAVA_HOME=/opt/module/jdk1.8.0_171
export HDFS_NAMENODE_USER=root
export HDFS_DATANODE_USER=root
export HDFS_SECONDARYNAMENODE_USER=root
2.hdfs-core.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://master:9820</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/opt/module/hadoop-3.1.1/tmp</value>
</property>
</configuration>
3.hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value> 2 </ value>
</ Property>
<Property>
<name> dfs.namenode.secondary.http-address </ name>
<value> Slave1: 9868 </ value>
</ Property>
</ Configuration>
. 4 .workers
Slave1
slave2
slave3
2) file system is formatted
$ bin / HDFS NameNode -format
. 3) start cluster
$ sbin / start-dfs.sh
. 4) check visually Hadoop cluster:
Master: 9870
Hadoop-HA搭建
1)配置文件
1.hadoop-env.sh
export JAVA_HOME=/opt/module/jdk1.8.0_171
export HDFS_NAMENODE_USER=root
export HDFS_DATANODE_USER=root
export HDFS_ZKFC_USER=root
export HDFS_JOURNALNODE_USER=root
2.hdfs-core.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://mycluster</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/opt/module/hadoop-3.1.1/tmp</value>
</property>
<property>
<name>hadoop.http.staticuser.user</name>
<value>root</value>
</property>
<property>
<name>ha.zookeeper.quorum</name>
<value>slave1:2181,slave2:2181,slave3:2181</value>
</property>
</configuration>
3.hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.nameservices</name>
<value>mycluster</value>
</property>
<property>
<name>dfs.ha.namenodes.mycluster</name>
<value>nn1,nn2</value>
</property>
<property>
<name>dfs.namenode.rpc-address.mycluster.nn1</name>
<value>master:8020</value>
</property>
<property>
<name>dfs.namenode.rpc-address.mycluster.nn2</name>
<value>slave1:8020</value>
</property>
<property>
<name>dfs.namenode.http-address.mycluster.nn1</name>
<value>master:9870</value>
</property>
<property>
<name>dfs.namenode.http-address.mycluster.nn2</name>
<value>slave1:9870</value>
</property>
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://master:8485;slave1:8485;slave2:8485/mycluster</value>
</property>
<property>
<name>dfs.client.failover.proxy.provider.mycluster</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<property>
<name>dfs.ha.fencing.methods</name>
<value>sshfence</value>
</property>
<property>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<value>/root/.ssh/id_rsa</value>
</property>
<property>
<name>dfs.journalnode.edits.dir</name>
<value>/opt/journal</value>
</property>
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
</configuration>
4.workers
slave1
slave2
slave3
2)zookeeper集群搭建
zoo.cfg
tickTime=2000
dataDir=/opt/module/zookeeper-3.4.12/data
clientPort=2181
initLimit=5
syncLimit=2
server.1=slave1:2888:3888
server.2=slave2:2888:3888
server.3=slave3:2888:3888
Item master directory / data / myid content are [1,2,3]
3) are performed on each node zk: zkServer.sh start
to see whether a successful start: zkServer.sh Status
. 4) starts journalnode (each node journalnode ) start
HDFS --daemon start journalnode
5) synchronous edit log
if there is a single cluster and namenode
HDFS namenode -initializeSharedEdits (executed on the format already namenode)
HDFS --daemon start namenode
HDFS namenode -bootstrapStandby (no format on the namenode performed)
if a new cluster
HDFS namenode -format
HDFS --daemon start namenode
HDFS namenode -bootstrapStandby (not performed on the namenode format)
. 6) and starts formatting zookeeper
$ HADOOP_HOME / bin / hdfs zkfc -formatZK ( in which one namenode format node can)
$ HADOOP_HOME / bin / hdfs --daemon start zkfc ( two zkfc (ie namenode) node start)
7) the Yarn build
yarn-env.sh
Export YARN_RESOURCEMANAGER_USER = root
Export YARN_NODEMANAGER_USER = root
mapred-site.xml
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>yarn.app.mapreduce.am.env</name>
<value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
</property>
<property>
<name>mapreduce.map.env</name>
<value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
</property>
<property>
<name>mapreduce.reduce.env</name>
<value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
</property>
yarn-site.xml
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.resourcemanager.ha.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.resourcemanager.cluster-id</name>
<value>cluster1</value>
</property>
<property>
<name>yarn.resourcemanager.ha.rm-ids</name>
<value>rm1,rm2</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm1</name>
<value>slave2</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm2</name>
<value>slave3</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address.rm1</name>
<value>slave2:8088</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address.rm2</name>
<value>slave3:8088</value>
</property>
<property>
<name>yarn.resourcemanager.zk-address</name>
<value>slave1:2181,slave2:2181,slave3:2181</value>
</property>