[Eight] Hadoop distributed environment construction

 Overview: The core idea of ​​distributed distribution is that more people have more power, and more people can gather materials and flames. When many computers are centralized for task processing, their storage and computing capabilities are improved, and parallel computing is possible. However, the maintenance and management of many PCs is also a problem. The so-called It's hard to talk about it. This is the truth that you can't have both. Only two evils can invade each other and take the lesser to maximize the benefits.

This experiment uses three virtual machines: master, node1, node2, where master is used as namenode, senondNameNode, and JobTracker, and the other two points are used as dataNode and taskTracker. The specific construction process is as follows:

1. Configure the host file (or use the DNS server) )
      Modify the /etc/hosts file
      IP address hostname
[root@bogon ~]# vi /etc/hosts
# Do not remove the following line, or various programs
# that require network functionality will fail.
127.0.0.1               localhost.localdomain localhost
::1             localhost6.localdomain6 localhost6
 
192.168.1.106  node1
192.168.1.107  master
192.168.1.110 node2
[root@bogon ~]# scp /etc/hosts  master:/etc/hosts
The authenticity of host 'master (192.168.1.107)' can't be established.
RSA key fingerprint is 42:d9:0b:a6:15:c2:23:c0:2d:d4:bd:88:4b:c5:dd:ff.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'master,192.168.1.107' (RSA) to the list of known hosts.
hosts                                                                                                                                                        100%  252     0.3KB/s   00:00    
[root@bogon ~]# scp /etc/hosts  node2:/etc/hosts
2. Create a hadoop running account
      Configure a dedicated user for running hadoop, of course, using the super user root is not illegal
3. Configure ssh password-free access
      Each node generates a public key and a private key, and copies the public key to authorized_keys
      Public key distribution and delivery: Then copy the public keys of each node to the authorized_keys file
      Generate key to root directory 
ssh-keygen -t rsa
The public key file is put into authorized_keys
cd .ssh/
cp id_rsa.pub authorized_keys
4. Install JDK
[root@bogon bin]# vi ~/.bash_profile 
JAVA_HOME=/usr/java/jdk1.7.0_67
PATH=$PATH:$HOME/bin:$JAVA_HOME/bin
export PATH  JAVA_HOME
 verify:
 [root@bogon bin]# ssh node1
Last login: Tue Dec  8 11:22:14 2015 from 192.168.1.103
[root@node1 ~]# source .bash_profile 
[root@node1 ~]# echo $JAVA_HOME
/usr/java/jdk1.7.0_67
[root@node1 ~]# jps
 
==========================================================
5. Download and unzip the hadoop installation package
1) Unzip and configure the hadoop environment variable bin
HADOOP_HOME
PATH: HADOOP_HOME/bin
 
6. Configuration file modification
【hadoop-env.sh】
JAVA_HOME
【core-site.xml】
<configuration>
<property>
        <name>fs.default.name</name>
        <value>hdfs://master:9000</value>
</property>
<property>
        <name>hadoop.tmp.dir</name>
        <value>/opt/hadoop_data</value>
</property>
</configuration>
 
【hdfs-site.xml】
<configuration>
<property>
        <name>dfs.replication</name>
        <value>2</value>
</property>
<property>
        <name>dfs.permissions</name>
        <value>flase</value>
</property>
</configuration>
 
【mapred-site.xml】
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
        <name>mapred.job.tracker</name>
        <value>http://master:9001</value>
</property>
</configuration>
 
7. Configure the master and slaves files
masters configure the master node
slaves configure slave nodes
[root@node2 conf]# cat masters 
master
[root@node2 conf]# cat slaves 
node1
node2
 
8. Copy hadoop to each node
[root@master ~]# scp .bash_profile  node1:~/
[root@master ~]# scp .bash_profile  node2:~/
 
[root@node2 opt]#scp -r hadoop node1:/opt
[root@node2 opt]#scp -r hadoop master:/opt
==========================================================
 
9. Format the namenode
Only format the master node   
hadoop purpose -format
 
[root@master ~]# hadoop namenode -format
Warning: $HADOOP_HOME is deprecated.
15/12/08 12:41:19 INFO designation.NameNode: STARTUP_MSG: 
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   host = master/192.168.1.107
STARTUP_MSG:   args = [-format]
STARTUP_MSG:   version = 1.1.2
STARTUP_MSG:   build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.1 -r 1440782; compiled by 'hortonfo' on Thu Jan 31 02:03:24 UTC 2013
************************************************************/
15/12/08 12:41:25 INFO util.GSet: VM type       = 64-bit
15/12/08 12:41:25 INFO util.GSet: 2% max memory = 19.33375 MB
15/12/08 12:41:25 INFO util.GSet: capacity      = 2^21 = 2097152 entries
15/12/08 12:41:25 INFO util.GSet: recommended=2097152, actual=2097152
15/12/08 12:41:29 INFO purpose.FSNamesystem: fsOwner = root
15/12/08 12:41:29 INFO namenode.FSNamesystem: supergroup=supergroup
15/12/08 12:41:29 INFO namenode.FSNamesystem: isPermissionEnabled=true
15/12/08 12:41:29 INFO namenode.FSNamesystem: dfs.block.invalidate.limit=100
15/12/08 12:41:29 INFO purpose.FSNamesystem: isAccessTokenEnabled = false accessKeyUpdateInterval = 0 min (s), accessTokenLifetime = 0 min (s)
15/12/08 12:41:29 INFO namenode.NameNode: Caching file names occuring more than 10 times 
15/12/08 12:41:33 INFO common.Storage: Image file of size 110 saved in 0 seconds.
15/12/08 12:41:33 INFO namenode.FSEditLog: closing edit log: position=4, editlog=/opt/hadoop_data/dfs/name/current/edits
15/12/08 12:41:33 INFO namenode.FSEditLog: close success: truncate to 4, editlog=/opt/hadoop_data/dfs/name/current/edits
15/12/08 12:41:34 INFO common.Storage: Storage directory /opt/hadoop_data/dfs/name has been successfully formatted.
15/12/08 12:41:34 INFO purpose.NameNode: SHUTDOWN_MSG: 
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at master/192.168.1.107
************************************************************/
 
10. Start hadoop
Start at the master node, the master is the locomotive, leading the martial arts
start-all.sh
[root@master ~]# start-all.sh
Warning: $HADOOP_HOME is deprecated.
starting namenode, logging to /opt/hadoop/libexec/../logs/hadoop-root-namenode-master.out
node2: starting datanode, logging to /opt/hadoop/libexec/../logs/hadoop-root-datanode-node2.out
node1: starting datanode, logging to /opt/hadoop/libexec/../logs/hadoop-root-datanode-node1.out
The authenticity of host 'master (192.168.1.107)' can't be established.
RSA key fingerprint is 42:d9:0b:a6:15:c2:23:c0:2d:d4:bd:88:4b:c5:dd:ff.
Are you sure you want to continue connecting (yes/no)? yes
master: Warning: Permanently added 'master,192.168.1.107' (RSA) to the list of known hosts.
master: starting secondarynamenode, logging to /opt/hadoop/libexec/../logs/hadoop-root-secondarynamenode-master.out
starting jobtracker, logging to /opt/hadoop/libexec/../logs/hadoop-root-jobtracker-master.out
node2: starting tasktracker, logging to /opt/hadoop/libexec/../logs/hadoop-root-tasktracker-node2.out
node1: starting tasktracker, logging to /opt/hadoop/libexec/../logs/hadoop-root-tasktracker-node1.out
 
11. Verification process
  Use jps to verify that each background process is successfully started
[root@master ~]# jps
3614 NameNode
3763 SecondaryNameNode
3916 Jps
3837 JobTracker
 
[root@node1 ~]# jps
3513 Jps
[root@node1 ~]# jps
3626 TaskTracker
3555 DataNode
3667 Jps
 
[root@node2 ~]# jps
3573 DataNode
3627 TaskTracker
3698 Jps
[root@node2 ~]# 
 
[root@master bin]# hadoop dfsadmin -report
Warning: $HADOOP_HOME is deprecated.
Configured Capacity: 36889264128 (34.36 GB)
Present Capacity: 28400594944 (26.45 GB)
DFS Remaining: 28400537600 (26.45 GB)
DFS Used: 57344 (56 KB)
DFS Used%: 0%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0
-------------------------------------------------
Datanodes available: 2 (2 total, 0 dead)
Name: 192.168.1.106:50010
Decommission Status : Normal
Configured Capacity: 18444632064 (17.18 GB)
DFS Used: 28672 (28 KB)
Non DFS Used: 4213334016 (3.92 GB)
DFS Remaining: 14231269376(13.25 GB)
DFS Used%: 0%
DFS Remaining%: 77.16%
Last contact: Tue Dec 08 12:58:40 PST 2015
 
Name: 192.168.1.110:50010
Decommission Status : Normal
Configured Capacity: 18444632064 (17.18 GB)
DFS Used: 28672 (28 KB)
Non DFS Used: 4275335168 (3.98 GB)
DFS Remaining: 14169268224(13.2 GB)
DFS Used%: 0%
DFS Remaining%: 76.82%
Last contact: Tue Dec 08 12:58:39 PST 2015
 
turn stop
[root@master bin]# stop-all.sh
Warning: $HADOOP_HOME is deprecated.
no jobtracker to stop
node1: no tasktracker to stop
node2: no tasktracker to stop
stopping namenode
node2: stopping datanode
node1: stopping datanode
master: stopping secondarynamenode

 

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=326744755&siteId=291194637