Hadoop-0.20.2 installation configuration

Abstract : This article introduces the basic steps of installing three ubuntu virtual machines under VirtualBox and setting up a Hadoop environment. Finally, it runs the wordcount routine in the Hadoop example .

1Experimental environment

VirtualBox version: 4.3.2 r90405

Ubuntu virtual machine version: ubuntu11.04

Ubuntu virtual machine jdk version: jdk-1.6.0_45

Ubuntu virtual machine hadoop version: hadoop-0.20.2

2General Overview

To implement Hadoop multi-node distributed computing on a separate computer , multiple hosts need to be established through virtual machines. This article uses VirtualBox virtual machines to build a multi-node platform. Complete the new virtual machine, install ssh , configure the key to achieve passwordless access, install jdk , install and configure hadoop , and run the wordcount program that comes with Hadoop to verify the environment configuration.

3 detailed steps

3.1 Virtual machine installation

Since multiple virtual machines need to be run at the same time and considering the system load, this experiment chose the early version of Ubuntu 10.04 . After downloading the system image file, open VirtualBox and create a new OS . After simple configuration, a virtual machine can be After it is built, start the virtual machine, select the image file directory, and after going through the system options, the system is built. Then use the same method to build another one. The three virtual machines are named UB01 , UB02 and UB03 respectively . The user names are all Set it to vbox , and the login password is also vbox . After it is built, use ifconfig to check the IP addresses of the three virtual machines . You can see that their IPs are different and they can ping each other , as shown in the figure below. Go here The virtual machine installation is completed.

After being able to ping successfully, configure the aliases of the three machines so that they do not need to access each other through IP. Open /etc/hosts and add the content:

223.3.77.207 UB01

223.3.73.102 UB02

223.3.85.84 UB03

For UB01, write as above, for UB02 as follows, and UB03 can be launched together (each IP address here should be what you see after ifconfig, it varies from machine to machine)

After setting the alias, you can ping the virtual machine alias. The result is as follows:


3.2 ssh installation and configuration

Just being able to ping is not enough. In order to complete the distributed computing system, the three machines need to be able to access each other without passwords (or the master can access the slave without passwords). First install ssh-openserver on three virtual machines:

sudo apt-get install ssh rsync

After completion, create a new .ssh folder in the personal directory /home/vbox/ and execute in .ssh:

ssh-keygen -t rsa

The system will ask you for some configurations. Since this is your first experiment, these contents are not needed. Just click Enter to continue. After completion, two files, id_rsa and id_rsa.pub, will be generated under .ssh/, and the three machines will be processed in the same way.

After the secret key is generated, we need to exchange the keys of the three virtual machines with each other, such as executing in UB01

scp ~/.ssh/id_rsa.pub UB02:/home/vbox/.ssh/id_rsa.pub.UB01

scp ~/.ssh/id_rsa.pub UB03:/home/vbox/.ssh/id_rsa.pub.UB01

The function of the above two sentences is to copy the id_rsa.pub file (key on UB01) on the local machine to the same location on UB02 and UB03, and rename it to id_rsa.pub.UB01.

Do the same for UB02 and UB03. After completion, there should be 3 keys in /.ssh/ of each machine, one is your own and the other two are other people's. Put your own key together with the two other people's keys. Add to the authorization key (for UB01)

cat id_rsa.pub >> authorized_keys;

cat id_rsa.pub.UB02 >> authorized_keys;

cat id_rsa.pub.UB03 >> authorized_keys;

The above three sentences add three keys to the same authorized_keys respectively.

Do the same for UB02 and UB03. In this way, each of the three machines has access rights to the other two machines.

Next, check whether passwordless access can be achieved through ssh. Enter in the terminal (for UB01):

ssh UB02;

If the access is successful, a welcome message will be displayed. Yes is required for the first access, and you can access directly after that. The screenshot of the experiment is as follows:

3.3 jdk installation and configuration

JDK must be installed in three virtual machines. You only need to install and configure it on one machine, and copy the jdk folder to the other two machines. You do not need to install them separately.

The jdk file we chose is jdk-6u45-linux-i586.bin. After the download is completed, move the .bin package to the personal directory /home/vbox/ and execute

chmod u+x jdk-6u45-linux-i586.bin;

sudo -s ./jdk-6u45-linux-i586.bin;

Wait for the installation to be completed. The jdk directory will be generated in the current path. Next, set the environment variables and add the values ​​​​of JAVA_HOME, CLASSPATH and PATH in /etc/environment. After reboot, enter java -version in Terminal and you will see the version information, proving that the installation is successful. , the installation result is as shown below:



3.4 Hadoop installation and configuration

The same operation is also performed on three machines. You can operate it on one machine and then copy it to other machines.

Move the downloaded hadoop-0.20.2.tar.gz to the personal directory /home/vbox/ and perform the installation operation:

tar -xzvf hadoop-0.20.2.tar.gz //Decompress the file

The hadoop folder will be generated in the current path, and then the folder owner will be modified:

chown vbox:vbox hadoop-0.20.2

Then add the hadoop environment variables to the /etc/environment file. After adding, the file is as shown below:

Next, you need to modify the configuration files in the hadoop/conf/ directory. There are 6 files that need to be modified, namely masters, slaves, core-site.xml, mapred-site.xml, hdfs-site.xml, and hadoop-env.sh. , modified as shown below:


Add the JAVA_HOME variable value to the file hadoop-env.sh and set it to: JAVA_HOME=/home/vbox/jdk1.6.0_45. The above configuration information is completely consistent for UB01 , UB02 and UB03 .

3.5 wordcount program test

The files used in this test are files , each with a size of 128M . During the test, enter the /home/vbox/hadoop/ directory, format the file system and start all services.

hadoop namenode-format;

start-all.sh

After starting the service, you can check the hadoop system status through the jps command and the web page , as shown below:

Create the input input and write the file to the input

hadoop fs -mkdir input;

hadoop fs -put fileinput; //file is the path of the text file, uploaded to HDFS

Execute wordcount and view the counting results

hadoop jarhadoop-0.20.2-examples.jar wordcount input output

The experimental process and results are as follows:


Guess you like

Origin blog.csdn.net/dy01dy/article/details/40621377