Since the machine reasons, only got three machines, so I have to take three machines, a machine to do Namenode, other 2 do DataNode.
Management node and the data node is preferably deployed separately Hadoop
Data node running the task in the actual load is relatively heavy, easily affect the stability of exemplary management
The following is to build the steps of:
1 modify / etc / hosts file
Here some explanation under his own hostname been modified, changed what master, the machine itself hostname salver 1, salver2 or form such namenode, datanode1, datanode2 this, but bloggers because they can not modify the hostname, so the direct use, where As long as the good relationship between the ip and hostname mapping, are no problem. So do not be too tangled this problem.
The three machines are configured the same configuration, it is roughly as follows, according to the actual situation with: I'll be back for example to three names
192.168.9.1 namenode 192.168.9.2 datanode1 192.168.9.3 datanode2
Three machines were modified after 2 checks whether all Ping the interconnected between the three machines , this command is as follows
ping -c 3 datanode1
Such tests should be carried out three machines above, whether it can pass between each test and are in addition a few machines
3 free secret log on , log on as it will further follow-up Hadoop cluster in time to start, so here will do Free secret sign-on between the three machines under
First generate the public key on three machines
ssh-keygen -t rsa -P ''
4 authorized_keys files were created on three machines
touch /root/.ssh/authorized_keys
5. Check authorized_keys file successfully generated
ls /root/.ssh/
6 The three machines id_rsa.pub copy of the key, and then put all three key authorized_keys, and then the contents of the three machines authorized_keys file consistency
As the effect of substantially
7 Check whether the free secret landing
ssh datanode1
They were tested on three interconnected machine
8 JDK environmental inspections
JDK environment were examined on three machines, JDK version I use is jdk1.8.0_65
If the JDK need to upgrade the steps outlined below:
Download the JDK installation package
In / opt / java create java directory
The installation package is then placed in the directory, to decompress
mkdir /opt/java
tar -zxvf jdk-8u65-linux-x64.tar.gz
Modify the configuration file
vim /etc/profile
Add the following command in the file:
export JAVA_HOME=/opt/java/jdk1.8.0_65
export CLASSPATH=$:CLASSPATH:$JAVA_HOME/lib/
export PATH=$PATH:$JAVA_HOME/bin
Execute commands to validate the configuration
source /etc/profile
Test whether the installation is successful
Java -version
If you still are not successful previous versions, the solution is as follows:
First check under
which java
which javac
Then modify the corresponding flexible connection
rm -rf /usr/bin/java
rm -rf /usr/bin/javac
ln -s $JAVA_HOME/bin/javac /usr/bin/javac
ln -s $JAVA_HOME/bin/javac /usr/bin/java
After the implementation continue to view
Java -version
Javac -version
9 Create a folder to download hadoop hadoop Mirror
mkdir /opt/hadoop
10 then proceeds to extract the installation folder hadoop Package
tar -xvf hadoop-2.8.4.tar.gz
11 Create a folder is created on the three machines
mkdir /root/hadoop
mkdir /root/hadoop/tmp
mkdir /root/hadoop/var
mkdir /root/hadoop/dfs
mkdir /root/hadoop/dfs/name
mkdir /root/hadoop/dfs/data
The following steps should be operating three machines, consistency is the same file, you can get only one, behind several machines directly copied in the past.
12 modified core-site.xml file
Add disposed between the <configuration> </ configuration> node
vim /opt/hadoop/hadoop-2.8.0/etc/hadoop/core-site.xml
<property>
<name>hadoop.tmp.dir</name>
<value>/root/hadoop/tmp</value>
<description>Abase for other temporary directories.</description>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://namenode:9000</value>
</property>
13 hadoop-env.sh modify the file, this file is configured to run hadoop environment jave
vim /opt/hadoop/hadoop-2.8.4/etc/hadoop/hadoop-env.sh
将export JAVA_HOME=${JAVA_HOME}
改为
export JAVA_HOME=/opt/java/jdk1.8.0_65
自己的JDK路径
14 modified hdfs-site.xml file, here is the configuration storage path
Add disposed between the <configuration> </ configuration> node
vim /opt/hadoop/hadoop-2.8.4/etc/hadoop/hdfs-site.xml
<property>
<name>dfs.name.dir</name>
<value>/root/hadoop/dfs/name</value>
<description>Path on the local filesystem where theNameNode stores the namespace and transactions logs persistently.</description>
</property>
<property>
<name>dfs.data.dir</name>
<value>/root/hadoop/dfs/data</value>
<description>Comma separated list of paths on the localfilesystem of a DataNode where it should store its blocks.</description>
</property>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.permissions</name>
<value>true</value>
<description>need not permissions</description>
</property>
<property>
<name>dfs.namenode.datanode.registration.ip-hostname-check</name>
<value>false</value>
Description: dfs.permissions configured as false, the permit can not check the permissions on the files generated on the dfs, convenient touches easily, but you need to prevent accidental deletion, set it to true, or directly delete the property node, because The default is true.
15 New and modified mapred-site.xml file
cp /opt/hadoop/hadoop-2.8.4/etc/hadoop/mapred-site.xml.template /opt/hadoop/hadoop-2.8.4/etc/hadoop/mapred-site.xml
vim /opt/hadoop/hadoop-2.8.4/etc/hadoop/mapred-site.xml
Modify this new mapred-site.xml file, added disposed between the <configuration> </ configuration> node:
<property>
<name>mapred.job.tracker</name>
<value>namenode:49001</value>
</property>
<property>
<name>mapred.local.dir</name>
<value>/root/hadoop/var</value>
</property>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
16 modify slaves file note here is stored in the machine's hostname datanode
vim /opt/hadoop/hadoop-2.8.4/etc/hadoop/slaves
The inside localhost delete, add the following:
datanode1
datanode2
17 modified yarn-site.xml file, this is mainly about the yarn configuration services
Add disposed between the <configuration> </ configuration> node (note that, the memory configuration in accordance with the machine the better, I am here with only two because the machine is not G) ::
vim /opt/hadoop/hadoop-2.8.4/etc/hadoop/yarn-site.xml
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.resourcemanager.hostname</name>
<value>namenode</value>
</property>
<property>
<description>The address of the applications manager interface in the RM.</description>
<name>yarn.resourcemanager.address</name>
<value>${yarn.resourcemanager.hostname}:8032</value>
</property>
<property>
<description>The address of the scheduler interface.</description>
<name>yarn.resourcemanager.scheduler.address</name>
<value>${yarn.resourcemanager.hostname}:8030</value>
</property>
<property>
<description>The http address of the RM web application.</description>
<name>yarn.resourcemanager.webapp.address</name>
<value>${yarn.resourcemanager.hostname}:8088</value>
</property>
<property>
<description>The https adddress of the RM web application.</description>
<name>yarn.resourcemanager.webapp.https.address</name>
<value>${yarn.resourcemanager.hostname}:8090</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>${yarn.resourcemanager.hostname}:8031</value>
</property>
<property>
<description>The address of the RM admin interface.</description>
<name>yarn.resourcemanager.admin.address</name>
<value>${yarn.resourcemanager.hostname}:8033</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.scheduler.maximum-allocation-mb</name>
<value>2048</value>
<discription>8182MB</discription>
</property>
<property>
<name>yarn.nodemanager.vmem-pmem-ratio</name>
<value>2.1</value>
</property>
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>2048</value>
</property>
<property>
<name>yarn.nodemanager.vmem-check-enabled</name>
<value>false</value>
</property>
18 performs an initialization operation in the machine namenode
cd /opt/hadoop/hadoop-2.8.0/bin
19 perform the initialization script, that is, execute the command:
./hadoop namenode -format
20 startup command is executed on the machine namenode
Enter the directory:
cd /opt/hadoop/hadoop-2.8.4/sbin/
Start command:
./start-all.sh
21 test hadoop, turn off the firewall
systemctl stop firewalld.service
22 See if hadoop cluster starts up
Enter the following command:
jps
23 View hadoop -overview
Access namenode machine IP: 50070, bloggers here have done a port mapping, real access port is 50070
The page will automatically jump
You can then access ip: 8088 View datanode port cluster