1. Environmental description
There are three machines, all of which are linux RHEL6 systems. The IPs of the three machines are 192.168.1.99, 192.168.1.98, and 192.168.1.97.
Set each hostname
192.168.1.99 purpose
192.168.1.98 datanode1
192.168.1.97 datanode2
How to set the hostname:
http://stranger2008.iteye.com/blog/1825953
Each machine has already installed java, the installation directory is /usr/local/java, the installation method:
http://stranger2008.iteye.com/blog/1820548
Add the following code to /etc/hosts on each machine:
192.168.1.99 purpose 192.168.1.98 datanode1 192.168.1.97 datanode2
2. Set up SSH passwordless login
Hadoop running a cluster requires each machine to support ssh passwordless login. I log in directly with root here.
Enter the root home directory and execute the generate key code
#cd ~ #ssh-keygen -t rsa
Execute the above code and press Enter all the time. The .ssh folder will be generated in the root home directory. The folder includes two files id_rsa.pub and id_rsa.
The above steps are done the same for each machine.
After generating the two files id_rsa.pub and id_rsa, create a file authorized_keys locally, download the id_rsa.pub on the three servers, and add the contents to the authorized_keys file, and then upload authorized_keys to Under the /root/.ssh/ directory of each server.
Then try to connect, and the machine can also connect to itself. The first time you connect, you need to enter a password, but you don't need it later.
ssh namenode ssh datanode1 ssh datanode2
3. Install hadoop
download link:
http://labs.xiaonei.com/apache-mirror/hadoop/core/hadoop-0.20.1/hadoop-0.20.2.tar.gz
a, create an installation directory
mkdir /usr/local/hadoop/
b. Unzip the installation file hadoop-0.21.0.tar and put it in the installation directory
tar -zxvf hadoop-0.21.0.tar
c, set environment variables
Add the following to /etc/profile
#config hadoop export HADOOP_HOME=/usr/local/hadoop/ export PATH=$HADOOP_HOME/bin:$PATH #hadoop logs file path export HADOOP_LOG_DIR=${HADOOP_HOME}/logs
Make the settings take effect: source /etc/profile
d, set the master-slave configuration
The configuration of /usr/local/hadoop/conf/masters is as follows:
intended
The configuration of /usr/local/hadoop/conf/slaves is as follows:
datanode1 datanode2
e, modify the configuration file
/usr/local/hadoop/conf/hadoop-env.sh
Set JAVA_HOME to the path where jdk is installed
# The java implementation to use. Required. export JAVA_HOME=/usr/local/java/
core-site.xml
<configuration> <property> <name>fs.default.name</name> <value>hdfs://Namenode:9000/</value> </property> <property> <name>hadoop.tmp.dir</name> <value>/usr/local/hadoop/tmp/</value> </property> </configuration>
hdfs-site.xml
<configuration> <property> <name>dfs.replication</name> #set bak file number <value>1</value> </property> </configuration>
mapred-site.xml
<configuration> <property> <name>mapred.job.tracker</name> <value>Namenode:9001</value> </property> </configuration>
f. Initialize Hadoop
#cd /usr/local/hadoop/ # ./ bin / hadoop purpose-format
The above steps of af are exactly the same for the three machines.
4. Start hadoop on the namenode machine
#cd /usr/local/hadoop/ #./bin/start-all.sh
After startup, use the command JPS to view the results as follows:
[root@namenode hadoop]# jps
1806 Jps
1368 NameNode
1694 JobTracker
1587 SecondaryNameNode
Then go to Datanode1/2 to check, execute JPS, the result is as follows:
[root@datanode2 hadoop]# jps
1440 Jps
1382 TaskTracker
1303 DataNode
[root@datanode2 hadoop] # jps
1382 TaskTracker
1303 DataNode
1452 Jps
indicates that you have successfully installed Hadoop in the cluster
5. Check the status
View cluster status: $ hadoop dfsadmin -report
Hadoop web view: http://192.168.1.99:50070
http://192.168.1.99:50030 to view running jobs and results