Overview: The core idea of distributed distribution is that more people have more power, and more people can gather materials and flames. When many computers are centralized for task processing, their storage and computing capabilities are improved, and parallel computing is possible. However, the maintenance and management of many PCs is also a problem. The so-called It's hard to talk about it. This is the truth that you can't have both. Only two evils can invade each other and take the lesser to maximize the benefits.
This experiment uses three virtual machines: master, node1, node2, where master is used as namenode, senondNameNode, and JobTracker, and the other two points are used as dataNode and taskTracker. The specific construction process is as follows:
1. Configure the host file (or use the DNS server) )
This experiment uses three virtual machines: master, node1, node2, where master is used as namenode, senondNameNode, and JobTracker, and the other two points are used as dataNode and taskTracker. The specific construction process is as follows:
1. Configure the host file (or use the DNS server) )
Modify the /etc/hosts file
IP address hostname
[root@bogon ~]# vi /etc/hosts
# Do not remove the following line, or various programs
# that require network functionality will fail.
127.0.0.1 localhost.localdomain localhost
::1 localhost6.localdomain6 localhost6
192.168.1.106 node1
192.168.1.107 master
192.168.1.110 node2
[root@bogon ~]# scp /etc/hosts master:/etc/hosts
The authenticity of host 'master (192.168.1.107)' can't be established.
RSA key fingerprint is 42:d9:0b:a6:15:c2:23:c0:2d:d4:bd:88:4b:c5:dd:ff.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'master,192.168.1.107' (RSA) to the list of known hosts.
hosts 100% 252 0.3KB/s 00:00
[root@bogon ~]# scp /etc/hosts node2:/etc/hosts
2. Create a hadoop running account
Configure a dedicated user for running hadoop, of course, using the super user root is not illegal
3. Configure ssh password-free access
Each node generates a public key and a private key, and copies the public key to authorized_keys
Public key distribution and delivery: Then copy the public keys of each node to the authorized_keys file
Generate key to root directory
ssh-keygen -t rsa
The public key file is put into authorized_keys
cd .ssh/
cp id_rsa.pub authorized_keys
4. Install JDK
[root@bogon bin]# vi ~/.bash_profile
JAVA_HOME=/usr/java/jdk1.7.0_67
PATH=$PATH:$HOME/bin:$JAVA_HOME/bin
export PATH JAVA_HOME
verify:
[root@bogon bin]# ssh node1
Last login: Tue Dec 8 11:22:14 2015 from 192.168.1.103
[root@node1 ~]# source .bash_profile
[root@node1 ~]# echo $JAVA_HOME
/usr/java/jdk1.7.0_67
[root@node1 ~]# jps
==========================================================
5. Download and unzip the hadoop installation package
1) Unzip and configure the hadoop environment variable bin
HADOOP_HOME
PATH: HADOOP_HOME/bin
6. Configuration
file
modification
【hadoop-env.sh】
JAVA_HOME
【core-site.xml】
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://master:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/opt/hadoop_data</value>
</property>
</configuration>
【hdfs-site.xml】
<configuration>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.permissions</name>
<value>flase</value>
</property>
</configuration>
【mapred-site.xml】
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>http://master:9001</value>
</property>
</configuration>
7. Configure the master and slaves files
masters configure the master node
slaves configure slave nodes
[root@node2 conf]# cat masters
master
[root@node2 conf]# cat slaves
node1
node2
8. Copy hadoop to each node
[root@master ~]# scp .bash_profile node1:~/
[root@master ~]# scp .bash_profile node2:~/
[root@node2 opt]#scp -r hadoop node1:/opt
[root@node2 opt]#scp -r hadoop master:/opt
==========================================================
9. Format the namenode
Only format the master node
hadoop purpose -format
[root@master ~]# hadoop namenode -format
Warning: $HADOOP_HOME is deprecated.
15/12/08 12:41:19 INFO designation.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG: host = master/192.168.1.107
STARTUP_MSG: args = [-format]
STARTUP_MSG: version = 1.1.2
STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.1 -r 1440782; compiled by 'hortonfo' on Thu Jan 31 02:03:24 UTC 2013
************************************************************/
15/12/08 12:41:25 INFO util.GSet: VM type = 64-bit
15/12/08 12:41:25 INFO util.GSet: 2% max memory = 19.33375 MB
15/12/08 12:41:25 INFO util.GSet: capacity = 2^21 = 2097152 entries
15/12/08 12:41:25 INFO util.GSet: recommended=2097152, actual=2097152
15/12/08 12:41:29 INFO purpose.FSNamesystem: fsOwner = root
15/12/08 12:41:29 INFO namenode.FSNamesystem: supergroup=supergroup
15/12/08 12:41:29 INFO namenode.FSNamesystem: isPermissionEnabled=true
15/12/08 12:41:29 INFO namenode.FSNamesystem: dfs.block.invalidate.limit=100
15/12/08 12:41:29 INFO purpose.FSNamesystem: isAccessTokenEnabled = false accessKeyUpdateInterval = 0 min (s), accessTokenLifetime = 0 min (s)
15/12/08 12:41:29 INFO namenode.NameNode: Caching file names occuring more than 10 times
15/12/08 12:41:33 INFO common.Storage: Image file of size 110 saved in 0 seconds.
15/12/08 12:41:33 INFO namenode.FSEditLog: closing edit log: position=4, editlog=/opt/hadoop_data/dfs/name/current/edits
15/12/08 12:41:33 INFO namenode.FSEditLog: close success: truncate to 4, editlog=/opt/hadoop_data/dfs/name/current/edits
15/12/08 12:41:34 INFO common.Storage: Storage directory /opt/hadoop_data/dfs/name has been successfully formatted.
15/12/08 12:41:34 INFO purpose.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at master/192.168.1.107
************************************************************/
10. Start hadoop
Start at the master node, the master is the locomotive, leading the martial arts
start-all.sh
[root@master ~]# start-all.sh
Warning: $HADOOP_HOME is deprecated.
starting namenode, logging to /opt/hadoop/libexec/../logs/hadoop-root-namenode-master.out
node2: starting datanode, logging to /opt/hadoop/libexec/../logs/hadoop-root-datanode-node2.out
node1: starting datanode, logging to /opt/hadoop/libexec/../logs/hadoop-root-datanode-node1.out
The authenticity of host 'master (192.168.1.107)' can't be established.
RSA key fingerprint is 42:d9:0b:a6:15:c2:23:c0:2d:d4:bd:88:4b:c5:dd:ff.
Are you sure you want to continue connecting (yes/no)? yes
master: Warning: Permanently added 'master,192.168.1.107' (RSA) to the list of known hosts.
master: starting secondarynamenode, logging to /opt/hadoop/libexec/../logs/hadoop-root-secondarynamenode-master.out
starting jobtracker, logging to /opt/hadoop/libexec/../logs/hadoop-root-jobtracker-master.out
node2: starting tasktracker, logging to /opt/hadoop/libexec/../logs/hadoop-root-tasktracker-node2.out
node1: starting tasktracker, logging to /opt/hadoop/libexec/../logs/hadoop-root-tasktracker-node1.out
11. Verification process
Use jps to verify that each background process is successfully started
[root@master ~]# jps
3614 NameNode
3763 SecondaryNameNode
3916 Jps
3837 JobTracker
[root@node1 ~]# jps
3513 Jps
[root@node1 ~]# jps
3626 TaskTracker
3555 DataNode
3667 Jps
[root@node2 ~]# jps
3573 DataNode
3627 TaskTracker
3698 Jps
[root@node2 ~]#
[root@master bin]# hadoop dfsadmin -report
Warning: $HADOOP_HOME is deprecated.
Configured Capacity: 36889264128 (34.36 GB)
Present Capacity: 28400594944 (26.45 GB)
DFS Remaining: 28400537600 (26.45 GB)
DFS Used: 57344 (56 KB)
DFS Used%: 0%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0
-------------------------------------------------
Datanodes available: 2 (2 total, 0 dead)
Name: 192.168.1.106:50010
Decommission Status : Normal
Configured Capacity: 18444632064 (17.18 GB)
DFS Used: 28672 (28 KB)
Non DFS Used: 4213334016 (3.92 GB)
DFS Remaining: 14231269376(13.25 GB)
DFS Used%: 0%
DFS Remaining%: 77.16%
Last contact: Tue Dec 08 12:58:40 PST 2015
Name: 192.168.1.110:50010
Decommission Status : Normal
Configured Capacity: 18444632064 (17.18 GB)
DFS Used: 28672 (28 KB)
Non DFS Used: 4275335168 (3.98 GB)
DFS Remaining: 14169268224(13.2 GB)
DFS Used%: 0%
DFS Remaining%: 76.82%
Last contact: Tue Dec 08 12:58:39 PST 2015
turn stop
[root@master bin]# stop-all.sh
Warning: $HADOOP_HOME is deprecated.
no jobtracker to stop
node1: no tasktracker to stop
node2: no tasktracker to stop
stopping namenode
node2: stopping datanode
node1: stopping datanode
master: stopping secondarynamenode