illustrate
Note 1: This is the second part of the big data part. For the first one, see the link https://blog.csdn.net/focuson_/article/details/80153371 . The installation preparation instructions for the machine and the installation of zookeeper have been described in the previous section. explained in the blog post.
Note 2: This article is the installation of hadoop, and the cluster distribution is designed as:
machine |
install software |
process |
focuson1 |
zookeeper,hadoop namenode,hadoop DataNode |
JournalNode; DataNode; QuorumPeerMain; NameNode; DFSZKFailoverController; NodeManager |
focuson2 |
zookeeper;hadoop namenode,hadoop DataNode;yarn |
JournalNode; DataNode; QuorumPeerMain; NameNode; DFSZKFailoverController;NodeManager;ResourceManager |
focuson3 |
zookeeper,hadoop DataNode;yarn |
JournalNode; DataNode; QuorumPeerMain;NodeManager;ResourceManager |
installation steps:
1. Upload the compressed package to the focuson1 home directory
cd/usr/local/src/ mkdir hadoop mv~/hadoop-2.6.0.tar.gz . tar -xvfhadoop-2.6.0.tar.gz rm -fhadoop-2.6.0.tar.gz
2. Modify the configuration file
1》hadoop-env.sh
exportJAVA_HOME=/usr/local/src/java/jdk1.7.0_51//Must have
2" yarn and Hadoop integration
2.1 mapred-site.xml
<configuration> <configuration> <!-- Specify mr frame as yarn mode --> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> </configuration> </configuration>
2.2 yarn-site.xml
<configuration> <!-- Site specific YARNconfiguration properties --> <!-- Enable RM high reliability--> <property> <name>yarn.resourcemanager.ha.enabled</name> <value>true</value> </property> <!-- Specify the clusterid of RM --> <property> <name>yarn.resourcemanager.cluster-id</name> <value>yrc</value> </property> <!-- Specify the name of the RM--> <property> <name>yarn.resourcemanager.ha.rm-ids</name> <value>rm1,rm2</value> </property> <!-- Specify the address of the RM respectively --> <property> <name>yarn.resourcemanager.hostname.rm1</name> <value>focuson2</value> </property> <property> <name>yarn.resourcemanager.hostname.rm2</name> <value>focuson3</value> </property> <!-- Specify the zk cluster address --> <property> <name>yarn.resourcemanager.zk-address</name> <value>focuson1:2181,focuson2:2181,focuson3:2181</value> </property> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> </configuration>
3" hdfs-site.xml (port: rpc: 9000; http: 50070)
<configuration> <!--meservice is ns1, which needs to be consistent with core-site.xml --> <property> <name>dfs.nameservices</name> <value>ns1</value> </property> <!-- There are two NameNodes under ns1, nn1, nn2 --> <property> <name>dfs.ha.namenodes.ns1</name> <value>nn1,nn2</value> </property> <!-- nn1's RPC address --> <property> <name>dfs.namenode.rpc-address.ns1.nn1</name> <value>focuson1:9000</value> </property> <!-- http communication address of nn1--> <property> <name>dfs.namenode.http-address.ns1.nn1</name> <value>focuson1:50070</value> </property> <!-- nn2's RPC address --> <property> <name>dfs.namenode.rpc-address.ns1.nn2</name> <value>focuson2:9000</value> </property> <!-- http communication address of nn2--> <property> <name>dfs.namenode.http-address.ns1.nn2</name> <value>focuson2:50070</value> </property> <!-- Specify the storage location of the NameNode's metadata on the JournalNode--> <property> <name>dfs.namenode.shared.edits.dir</name> <value>qjournal://focuson1:8485;focuson2:8485;focuson3:8485/ns1</value> </property> <!-- Specify the location where JournalNode stores data on the local disk--> <property> <name>dfs.journalnode.edits.dir</name> <value>/usr/local/src/hadoop/hadoop-2.6.0/journal</value> </property> <!-- Failed to enable automatic switchover of NameNode--> <property> <name>dfs.ha.automatic-failover.enabled</name> <value>true</value> </property> <!-- Configuration failure automatic switching implementation--> <property> <name>dfs.client.failover.proxy.provider.ns1</name> <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value> </property> <!-- Configure the isolation mechanism method, multiple mechanisms are separated by newlines, that is, each mechanism temporarily uses one line --> <property> <name>dfs.ha.fencing.methods</name> <value> sshfence shell(/bin/true) </value> </property> <!-- SSH free login is required when using sshfence isolation mechanism--> <property> <name>dfs.ha.fencing.ssh.private-key-files</name> <value>/root/.ssh/id_rsa</value> </property> <!-- Configure sshfence isolation mechanism timeout time--> <property> <name>dfs.ha.fencing.ssh.connect-timeout</name> <value>30000</value> </property> </configuration>
4》core-site.xml
<configuration> <!-- Specify the nameservice of hdfs as ns1 --> <property> <name>fs.defaultFS</name> <value>hdfs://ns1</value> </property> <!-- Specify hadoop temporary directory --> <property> <name>hadoop.tmp.dir</name> <value>/usr/local/src/hadoop/hadoop-2.6.0/tmp</value> </property> <!-- Specify the zookeeper address--> <property> <name>ha.zookeeper.quorum</name> <value>focuson1:2181,focuson2:2181,focuson3:2181</value> </property> </configuration>
5》、slaves
focuson1 focuson2 focuson3
4. Copy the project to two other machines
scp -r /usr/local/src/hadoopfocuson2:/usr/local/src/ scp -r /usr/local/src/hadoopfocuson3:/usr/local/src/
5. Format the namenode
Execute on focuson1: hdfs namenode format
The tmp folder will be generated in /usr/local/src/hadoop/hadoop-2.6.0 (the path is the configured hadoop.tmp.dir), and the folder will be tested under the path of focuson2.
*If this operation is not performed, an error will be reported
5. Start one: dfs, just execute it on focuson1, it will automatically execute namenode/datanode/journalnode/zkfc
Enter /usr/local/src/hadoop and execute sbin/start-dfs.sh
The output log is as follows:
18/04/28 19:02:36 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Starting namenodes on [focuson1 focuson2] focuson1: starting namenode, logging to /usr/local/src/hadoop/hadoop-2.6.0/logs/hadoop-root-namenode-focuson1.out focuson2: starting namenode, logging to /usr/local/src/hadoop/hadoop-2.6.0/logs/hadoop-root-namenode-focuson2.out focuson1: starting datanode, logging to /usr/local/src/hadoop/hadoop-2.6.0/logs/hadoop-root-datanode-focuson1.out focuson2: starting datanode, logging to /usr/local/src/hadoop/hadoop-2.6.0/logs/hadoop-root-datanode-focuson2.out focuson3: starting datanode, logging to /usr/local/src/hadoop/hadoop-2.6.0/logs/hadoop-root-datanode-focuson3.out Starting journal nodes [focuson1 focuson2 focuson3] focuson3: starting journalnode, logging to /usr/local/src/hadoop/hadoop-2.6.0/logs/hadoop-root-journalnode-focuson3.out focuson1: starting journalnode, logging to /usr/local/src/hadoop/hadoop-2.6.0/logs/hadoop-root-journalnode-focuson1.out focuson2: starting journalnode, logging to /usr/local/src/hadoop/hadoop-2.6.0/logs/hadoop-root-journalnode-focuson2.out 18/04/28 19:03:04 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Starting ZK Failover Controllers on NN hosts [focuson1 focuson2] focuson2: starting zkfc, logging to /usr/local/src/hadoop/hadoop-2.6.0/logs/hadoop-root-zkfc-focuson2.out focuson1: starting zkfc, logging to /usr/local/src/hadoop/hadoop-2.6.0/logs/hadoop-root-zkfc-focuson1.out
Start two: yarn, on focuson2:
Enter /usr/local/src/hadoop and execute sbin/start-dfs.sh
[root@focuson2 hadoop-2.6.0]# ./sbin/start-yarn.sh starting yarn daemons starting resourcemanager, logging to /usr/local/src/hadoop/hadoop-2.6.0/logs/yarn-root-resourcemanager-focuson1.out focuson2: starting nodemanager, logging to /usr/local/src/hadoop/hadoop-2.6.0/logs/yarn-root-nodemanager-focuson2.out focuson3: starting nodemanager, logging to /usr/local/src/hadoop/hadoop-2.6.0/logs/yarn-root-nodemanager-focuson3.out focuson1: starting nodemanager, logging to /usr/local/src/hadoop/hadoop-2.6.0/logs/yarn-root-nodemanager-focuson1.out
Execute on focuson3 (only one resourcemanager will be started, for high availability):
[root@focuson3 hadoop-2.6.0]# ./sbin/start-yarn.sh starting yarn daemons resourcemanager running as process 4258. Stop it first. focuson3: nodemanager running as process 4689. Stop it first. focuson2: nodemanager running as process 5783. Stop it first. focuson1: nodemanager running as process 7596. Stop it first.
6. Verify. On focuson1, jps:
[root@focuson1 hadoop-2.6.0]# jps 6977 DataNode 7089 JournalNode 7177 DFSZKFailoverController 7596 NodeManager 7790 Jps 4255 QuorumPeerMain 6911 NameNode
On focuson2,
[root@focuson2 hadoop-2.6.0]# jps 6144 Jps 5505 DFSZKFailoverController 2963 QuorumPeerMain 5140 DataNode 5783 NodeManager 5047 NameNode 6056 ResourceManager 5321 JournalNode
On focuson3:
[root@focuson3 hadoop-2.6.0]# jps 5136 Jps 4689 NodeManager 4258 ResourceManager 4419 DataNode 3044 QuorumPeerMain 4504 JournalNode
Log in to the web interface to view:
It can be seen that the namenode of focuson2 is standby, and that of focuson1 is active.
Kill the namenode process on focuson1, and you will find that focuson2 is active, as follows:
[root@focuson1 hadoop-2.6.0]# jps 6977 DataNode 7089 JournalNode 7177 DFSZKFailoverController 7596 NodeManager 7790 Jps 4255 QuorumPeerMain 6911 NameNode [root@focuson1 hadoop-2.6.0]# kill -9 6911
7. Operate one:
Execute some commands of hdfs on focuson1:
touch first .txt hdfs dfs –put first.txt hdfs dfs –put /ls ......
success!