hadoop 2.9.2 fully distributed installation

Fully distributed installation
Fully distributed environment deployment Hadoop
Fully distributed is the real use of multiple Linux hosts to deploy Hadoop, planning the Linux machine cluster, so that each module of Hadoop is deployed on different multiple machines;

1. The environment is ready for
virtual machine installation, use KVM virtual machine here;

2. After the network configuration
is completed, you can access the external network;

3. The hostname configuration
distinguishes the three hosts separately;

4.hosts configuration
modify the host name
hostname hadoop-node1
hostname hadoop-node2
hostname hadoop-node3

Write the correspondence between the three host names and IP addresses to the hosts file;
vim / etc / hosts
10.10.2.177 hadoop-node1
10.10.2.178 hadoop-node2
10.10.2.179 hadoop-node3

5. Server function role planning
hadoop-node1 hadoop-node2 hadoop-node3
NameNode ResourceManage
DataNode DataNode DataNode
NodeManager NodeManager NodeManager
HistoryServer SecondaryNameNode

6. Install Hadoop on one machine
# Here, first extract and configure hadoop on the first machine, and then distribute the configuration file to the other two machines to install the cluster;
1) Decompress the hadoop directory
tar -zxvf /opt/hadoop-2.9.2.tar.gz -C / opt / modules / app /
2) Configure the Hadoop JDK path, modify the JDK in the hadoop-env.sh, mapred-env.sh, yarn-env.sh files Path;
export JAVA_HOME = "/ opt / modules / jdk1.7.0_80"
3) Configure core-site.xml
cd /opt/modules/app/hadoop-2.9.2/etc/hadoop
vim core-site.xml
<configuration>
<property>
<name> fs.defaultFS </ name>
<value> hdfs: // master: 9000 </ value>
</ property>
<property>
  <name> io.file.buffer.size </ name>
  <value > 131072 </ value>
</ property>
<property>
  <name> hadoop.tmp.dir </name>
  <value>/data/tmp</value>
</property>
<property>
   <name> hadoop.proxyuser.hadoop.hosts </ name>
<value> </ value>
</ property>
<property>
   <name> hadoop.proxyuser.hadoop.groups </ name>
  <value>
</ value>
</ property>
</ configuration>
# fs.defaultFS is the address of the NameNode # hadoop.tmp.dir is the address
of the temporary directory of hadoop. By default, the data files of NameNode and DataNode will exist in the corresponding subdirectories under this directory in. If this directory does not exist, you must create it manually;
4) Configure slaves #Specify
which datanode nodes
cd /opt/modules/hadoopstandalone/hadoop-2.9.2/etc/hadoop
vim slaves
hadoop-node1
hadoop-node2
hadoop- node3
5) Configure hdfs-site.xml
cd /opt/modules/app/hadoop-2.9.2/etc/hadoop
vim hdfs-site.


<name>dfs.namenode.secondary.http-address</name>
<value>hadoop-node3:50090</value>
</property>
</configuration>
#dfs.namenode.secondary.http-address 是指定secondaryNameNode 的http访问地址和端口号,在此将hadoop-node3规划为SecondaryNameNode服务器;
6)配置yarn-site.xml
cd /opt/modules/app/hadoop-2.9.2/etc/hadoop
vim yarn-site.xml
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>hadoop-node2</value>
</property>
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
<property>
<name> yarn.log-aggregation.retain-seconds </ name>
<value> 106800 </ value>
</ property>
#According to the planning yarn.resourcemanager.hostname this specified resourcemanager server points to hadoop-node2
# yarn.log-aggregation -enable is whether to enable logging aggregation
# yarn.log-aggregation-retain-seconds is configured gather logs up to save much time on HDFS
7) configuration-the site.xml mapred
# copied from mapred-site.xml.template A mapred-site.xml file;
cd /opt/modules/app/hadoop-2.9.2/etc/hadoop
cp mapred-site.xml.template mapred-site.xml
vim mapred-site.xml

<configuration>
<property>
<name> mapreduce.framework.name </ name>
<value> yarn </ value>
</ property>
<property>
<name> mapreduce.jobhistory.address </ name>
<value> hadoop- node1: 10020 </ value>
</ property>
<property>
<name> mapreduce.jobhistory.webapp.address </ name>
<value> hadoop-node1: 19888 </ value>
</ property>
</ configuration>
#mapreduce .framework.name Set the mapreduce task to run on yarn
# mapreduce.jobhistory.address is to set the mapreduce history server to be installed on the hadoop-node1 machine
# mapreduce.jobhistory.webapp.address is to set the history server web page address and port

7. Set SSH no password
will visit each other via SSH Hadoop cluster between individual machines, each access password impractical, so no need to configure SSH password between individual machines;
1) generate a public key-on node1 Hadoop
SSH -keygen -t rsa #Enter
all, all are the default values. After completion, the public key file id_rsa.pub and private key file id_rsa will be generated in the .ssh of the current user's home directory
2) Distribute the public key
ssh-copy-id hadoop-node1
ssh-copy-id hadoop-node2
ssh-copy-id hadoop-node3
3) Set the password
- free login of hadoop-node2 / hadoop-node3 to other machines The same operation of hadoop-node1, generate public and private keys, and then distribute to the other three machines;

8. Distribute hadoop files
1) Three machines create Hadoop directories
mkdir -p / opt / modules / app
2) Distribute hadoop files via scp
# The share / doc directory under the Hadoop root directory is used to store hadoop documents, the files are larger, before distribution Can be directly deleted to improve the distribution speed;
scp -r /opt/modules/app/hadoop-2.9.2/ hadoop-node2: / opt / modules / app /
scp -r /opt/modules/app/hadoop-2.9. 2 / hadoop-node3: / opt / modules / app /

9. Format NameNode
# Perform formatting on the NameNode machine -master node /
opt / modules / app / hadoop-2.9.2 / bin / hdfs namenode -format #Note
: If you need to reformat the NameNode, you need to change the original NameNode and All files under DataNode are deleted, otherwise an error will be reported. The directory where NameNode and DataNode are located is configured in the hadoop.tmp.dir, dfs.namenode.dir, dfs.datanode.data.dir properties in core-site.xml;
<property>
<name> hadoop.tmp.dir </ name>
<value> / opt / data / tmp </ value>
</ property>
<property>
<name> dfs.namenode.name.dir </ name>
<value> file: // $ {hadoop.tmp.dir} / dfs / name </ value>
</ property>
<property>
<name> dfs.datanode.data.dir </ name>
<value> file: // $ {hadoop.tmp.dir} / dfs / data </ ​​value >
</ property>
#Because each format, the default is to create a cluster ID and write it to the VERSION file of NameNode and DataNode (the directory where the VERSION file is located is dfs / name / current and dfs / data / current). Generate a new cluster ID. If you do not delete the original directory, it will cause the new cluster ID in the VERSION file in the namenode, and the old cluster ID in the DataNode, resulting in inconsistent errors; #Another
method is when formatting Specify the cluster ID parameter, specified as the old cluster ID value;

10. Start the cluster
1) Start HDFS
# hadoop-node1 node start HDFS
/opt/modules/app/hadoop-2.9.2/sbin/start-dfs.sh
2) start YARN
# hadoop-node2 node start yarn
/ opt / modules /app/hadoop-2.9.2/sbin/start-yarn.sh #Start
ResourceManager
cd /opt/modules/app/hadoop-2.9.2
sbin / yarn-daemon.sh start resourcemanager on hadoop-node2
3) Startup log Server
# According to the plan, start the MapReduce log service on hadoop-node3
cd /opt/modules/app/hadoop-2.9.2/sbin/mr-jobhistory-daemon.sh start historyserver #View the
startup status
jps
4) View the HDFS Web Page
http://10.10.2.177:50070
5) View YARN ’s web page
http://10.10.2.178:8088

11. Test Job
Here, use the wordcount example that comes with hadoop to test and run the mapreduce
test in local mode. The test process must be executed on the yarn running node (hadoop-node2), including creating hdfs storage directory, uploading wc.input test file, Output calculation test results;

1) Prepare mapreduce input file wc.input
cd / opt / data /
touch wc.input
vim wc.input
hadoop mapreduce hive
hbase spark storm
sqoop hadoop hive
spark hadoop
2) Create input directory input
cd / opt / modules / app on HDFS /hadoop-2.9.2/
bin / hdfs dfs -mkdir / input
3) Upload wc.input to HDFS
cd /opt/modules/app/hadoop-2.9.2/
bin / hdfs dfs -put / opt / data / wc .input /input/wc.input
4) Run the mapreduce Demo
cd /opt/modules/app/hadoop-2.9.2/
bin / yarn jar share / hadoop / mapreduce / hadoop-mapreduce-examples-2.9.2 that comes with hadoop .jar wordcount /input/wc.input / output
5) View the output file
cd /opt/modules/app/hadoop-2.9.2/
bin / hdfs dfs -ls / output

12. Status screenshot
version 2.9.2
hadoop 2.9.2 fully distributed installation

hadoop 2.9.2 fully distributed installation

hadoop 2.9.2 fully distributed installation

http://10.10.2.177:50070/dfshealth.html#tab-overview

hadoop 2.9.2 fully distributed installation

hadoop 2.9.2 fully distributed installation

hadoop 2.9.2 fully distributed installation

hadoop 2.9.2 fully distributed installation

http://10.10.2.178:8088/cluster/nodes

hadoop 2.9.2 fully distributed installation

Version 3.0.0

hadoop 2.9.2 fully distributed installation

hadoop 2.9.2 fully distributed installation

hadoop 2.9.2 fully distributed installation

hadoop 2.9.2 fully distributed installation

hadoop 2.9.2 fully distributed installation

Guess you like

Origin blog.51cto.com/driver2ice/2486106