Linux hadoop installation configuration

1. Environmental description

There are three machines, all of which are linux RHEL6 systems. The IPs of the three machines are 192.168.1.99, 192.168.1.98, and 192.168.1.97.

Set each hostname

192.168.1.99 purpose

192.168.1.98 datanode1

192.168.1.97 datanode2

How to set the hostname:

http://stranger2008.iteye.com/blog/1825953

Each machine has already installed java, the installation directory is /usr/local/java, the installation method:

http://stranger2008.iteye.com/blog/1820548

Add the following code to /etc/hosts on each machine:

192.168.1.99 purpose
192.168.1.98 datanode1
192.168.1.97 datanode2

 

2. Set up SSH passwordless login

Hadoop running a cluster requires each machine to support ssh passwordless login. I log in directly with root here.

Enter the root home directory and execute the generate key code

 

#cd ~
#ssh-keygen -t rsa

Execute the above code and press Enter all the time. The .ssh folder will be generated in the root home directory. The folder includes two files id_rsa.pub and id_rsa.

The above steps are done the same for each machine.

 

After generating the two files id_rsa.pub and id_rsa, create a file authorized_keys locally, download the id_rsa.pub on the three servers, and add the contents to the authorized_keys file, and then upload authorized_keys to Under the /root/.ssh/ directory of each server.

 

Then try to connect, and the machine can also connect to itself. The first time you connect, you need to enter a password, but you don't need it later.

 

ssh namenode
ssh datanode1
ssh datanode2

 

3. Install hadoop

download link:

http://labs.xiaonei.com/apache-mirror/hadoop/core/hadoop-0.20.1/hadoop-0.20.2.tar.gz

a, create an installation directory

mkdir /usr/local/hadoop/

b. Unzip the installation file hadoop-0.21.0.tar and put it in the installation directory

tar -zxvf hadoop-0.21.0.tar

c, set environment variables

Add the following to /etc/profile

#config hadoop
export HADOOP_HOME=/usr/local/hadoop/
export PATH=$HADOOP_HOME/bin:$PATH
#hadoop logs file path
export HADOOP_LOG_DIR=${HADOOP_HOME}/logs

Make the settings take effect: source /etc/profile

d, set the master-slave configuration

The configuration of /usr/local/hadoop/conf/masters is as follows:

intended

The configuration of /usr/local/hadoop/conf/slaves is as follows:

datanode1
datanode2

 e, modify the configuration file

/usr/local/hadoop/conf/hadoop-env.sh

Set JAVA_HOME to the path where jdk is installed

# The java implementation to use.  Required.
export JAVA_HOME=/usr/local/java/

  

core-site.xml

<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://Namenode:9000/</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/usr/local/hadoop/tmp/</value>
</property>
</configuration>

  hdfs-site.xml

<configuration>
<property>
<name>dfs.replication</name>
#set bak file number
<value>1</value>
</property>
</configuration>

 

mapred-site.xml

<configuration>
<property>
<name>mapred.job.tracker</name>
<value>Namenode:9001</value>
</property>
</configuration>

f. Initialize Hadoop

#cd /usr/local/hadoop/
# ./ bin / hadoop purpose-format

The above steps of af are exactly the same for the three machines.

 

4. Start hadoop on the namenode machine

#cd /usr/local/hadoop/
#./bin/start-all.sh

 

After startup, use the command JPS to view the results as follows:

 

[root@namenode hadoop]# jps
1806 Jps
1368 NameNode
1694 JobTracker
1587 SecondaryNameNode
Then go to Datanode1/2 to check, execute JPS, the result is as follows:
[root@datanode2 hadoop]# jps
1440 Jps
1382 TaskTracker
1303 DataNode
[root@datanode2 hadoop] # jps
1382 TaskTracker
1303 DataNode
1452 Jps
indicates that you have successfully installed Hadoop in the cluster

 

 5. Check the status

 

View cluster status: $ hadoop dfsadmin -report
Hadoop web view: http://192.168.1.99:50070

http://192.168.1.99:50030 to view running jobs and results

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=327012636&siteId=291194637