Hadoop complete distribution installation detailed process --------****--------(ubuntu version)

ubuntu version book:

Commonly used domestic mirror source base: https://blog.csdn.net/m0_46202060/article/details/106251733

1. Download the installation package and test documents and
switch the directory to /tmp

-------------------------------------------------- ---------------- (The corresponding installation package can be downloaded from the mirror source) ----------------------- -----------------------------------------------

cd /tmp  

Download the Hadoop installation package

---------------------------------------------------------------------hadoop-2.6.0-cdh5.4.5.tar.gz----------------------------------------------------------------------------

Download the JDK installation package

---------------------------------------------------------------------jdk-7u75-linux-x64.tar.gz --------------------------------------------------------------------------------

2. Install Java JDK
The version installed here is jdk-7u75-linux-x64.tar.gz;

The current user is a normal user, and the super administrator can operate the /opt directory. All you need to use the sudo command to obtain privileges can be successfully decompressed; the command will decompress it to the /opt directory:

sudo tar -zxvf /tmp/jdk-7u75-linux-x64.tar.gz  -C /opt/  

And rename the decompressed folder jdk-7u75-linux-x64 to java:

sudo mv /opt/jdk1.7.0_75 /opt/java  

Modify the user and group of the java directory:

sudo chown -R zhangyu.zhangyu /opt/java

After installing jdk to configure environment variables, edit /etc/profile:

sudo vim /etc/profile  

Add the following at the end of the document:

export JAVA_HOME=/opt/java  
export PATH=$JAVA_HOME/bin:$PATH  

Refresh environment variables:

source /etc/profile  

After refreshing the environment variables, you can find the commands available for java through the java home directory. Use java to view the version number command to verify whether the installation is successful:

java -version  

The normal results are as follows:

weimingchao

3. Hadoop installation
The version installed here is hadoop-2.6.0-cdh5.4.5.tar.gz; the command will extract it to the /opt directory:

sudo tar -zxvf  /tmp/hadoop-2.6.0-cdh5.4.5.tar.gz  -C /opt/  

And rename the decompressed folder hadoop-2.6.0-cdh5.4.5 to hadoop:

sudo mv /opt/hadoop-2.6.0-cdh5.4.5 /opt/hadoop  

Modify the user and group of the hadoop directory:

sudo chown -R zhangyu.zhangyu /opt/hadoop  

After installing jdk to configure environment variables, edit /etc/profile:

sudo vim /etc/profile  

Add the following at the end:

export HADOOP_HOME=/opt/hadoop  
export PATH=$HADOOP_HOME/bin:$PATH  

Refresh environment variables:

source /etc/profile  

Use hadoop to view the version number command to verify whether the installation is successful:

hadoop version  

The normal results are as follows:

weimingchao

4. Modify the hosts file
. The commands to obtain network card information are: ifconfig and ip a; use the command to obtain network card information to view the IP address of the current node; edit the /etc/hosts file:

sudo vim /etc/hosts  

Add the mapping name of the local IP address corresponding to the local machine and the mapping name corresponding to the IP address of other nodes:

0.0.0.0 master  
0.0.0.0 slave1  
0.0.0.0 slave2     

0.0.0.0 here refers to the ips of your three virtual machines, which must correspond to their respective ips

After configuring the hosts file, you can access the corresponding IP address through the mapping name;

5. Create a data folder

sudo mkdir /data  

Change the owner to the current user:

sudo chown -R zhangyu.zhangyu /data  

6, modify hadoop hadoop-env.sh file configuration

vim  /opt/hadoop/etc/hadoop/hadoop-env.sh  

Modify JAVA_HOME to the directory where java is located:

export JAVA_HOME=/opt/java/  

7. Modify the configuration of hadoop core-site.xml file
Edit the core-site.xml file:

vim  /opt/hadoop/etc/hadoop/core-site.xml  

Replace with the following xml text (of course you can also modify it yourself):

<?xml version="1.0"?>  
<?xml-stylesheet type="text/xsl"   
        href="configuration.xsl"?>  
<configuration>  
<property>  
    <name>hadoop.tmp.dir</name>  
    <value>/data/tmp/hadoop/tmp</value>  
</property>  
<property>  
    <name>fs.defaultFS</name>  
    <value>hdfs://master:9000/</value>  
    <description>NameNode URI</description>  
</property>  
</configuration>  

There are two configurations:

One is hadoop.tmp.dir, which configures the storage location of temporary files during hadoop processing. The directory /data/ here needs to be created in advance. The other is fs.defaultFS, which configures the address of the hadoop HDFS file system.

Remember that the beginning of the configuration file (<?xml version="1.0"?>) must not contain illegal characters such as spaces. The following configuration files are the same:

Otherwise, an illegal character will be reported as an error: Reference ---- Solution

8. Modify the configuration of hadoop hdfs-site.xml file
Edit hdfs-site.xml file:

vim  /opt/hadoop/etc/hadoop/hdfs-site.xml

Replace with the following xml text:

<?xml version="1.0"?>  
<?xml-stylesheet type="text/xsl"   
        href="configuration.xsl"?>  
<configuration>  
<property>  
    <name>dfs.namenode.name.dir</name>  
    <value>/data/tmp/hadoop/hdfs/name</value>  
</property>  
<property>  
    <name>dfs.datanode.data.dir</name>  
    <value>/data/tmp/hadoop/hdfs/data</value>  
</property>  
<property>  
     <name>dfs.replication</name>  
     <value>1</value>  
</property>  
<property>  
     <name>dfs.permissions</name>  
     <value>false</value>  
</property>  
</configuration>  

Configuration item description:

dfs.namenode.name.dir, configure the storage location of metadata information; dfs.datanode.data.dir, configure the specific data storage location; dfs.replication, configure the number of backups for each database, because we currently use 1 node, so, Set to 1, if set to 2, the operation will report an error.

9. Modify the hadoop yarn-site.xml file configuration
Edit the yarn-site.xml file:

vim  /opt/hadoop/etc/hadoop/yarn-site.xml  

Replace with the following xml text:

<?xml version="1.0"?>  
<?xml-stylesheet type="text/xsl"  
        href="configuration.xsl"?>  
<configuration>  
<property>  
    <name>yarn.nodemanager.aux-services</name>  
    <value>mapreduce_shuffle</value>  
</property>  
</configuration>  

The configuration here is to specify the service used.

10. Modify the hadoop mapred-site.xml file configuration to
create the mapred-site.xml file:

vim  /opt/hadoop/etc/hadoop/mapred-site.xml  

Enter the following xml text:

<?xml version="1.0"?>  
<?xml-stylesheet type="text/xsl"   
        href="configuration.xsl"?>  
<configuration>  
<property>  
    <name>mapreduce.framework.name</name>  
    <value>yarn</value>  
</property>  
</configuration>  

This specifies the framework used for mapreduce task processing.

11. Modify the hadoop slaves file configuration

vim  /opt/hadoop/etc/hadoop/slaves 

Overwrite the master node mapping name and slave node mapping name:

master  
slave1  
slave2  

12. Create a public key Create a public key
under the zhangyu user:

ssh-keygen  

The following content appears:

Enter file in which to save the key (/home/zhangyu/.ssh/id_rsa):

Use the default option directly, press Enter, and the following content will appear:

Enter passphrase (empty for no passphrase):

Enter directly, the content appears:

Enter same passphrase again:

Press Enter directly, the creation is complete, and the result content is as follows:

weimingchao

13. Copy the public key, you need to enter the password of the zhangyu user during the process

ssh-copy-id master  

ssh-copy-id slave1  

ssh-copy-id slave2  

Tip: You need to enter "yes" and password "zhangyu" during command execution. Please execute the three nodes one by one to complete.

Test whether the connection is normal:

ssh master  

Note the change of hostname

Enter to exitexit the test:

ssh slave1  

Enter to exitexit the test:

ssh slave2  

Enter to exitexit the test:

It can be seen from the test that there is no need to enter a password when connecting to each node, because the authorization key has been set.

14. Copy files to all slave nodes

scp -r /opt/java/ /opt/hadoop/ slave1:/tmp/

scp -r /opt/java/ /opt/hadoop/ slave2:/tmp/ So
far, the master node configuration is complete.

----------------=--------------------------------------------------------------------(_)-------------------------------------------------------------------------------=------------

Slave1 for cluster construction

1. Java environment configuration

sudo mv /tmp/java /opt/  

After installing jdk to configure environment variables, edit /etc/profile:

sudo vim /etc/profile  

Add the following at the end of the file:

export JAVA_HOME=/opt/java/
export PATH= J A V A H O M E / b i n : JAVA_HOME/bin: JAVAHOME/bin:PATH

Refresh environment variables:

 source /etc/profile  

Use java to view the version number command to verify whether the installation is successful:

 java -version  

2. Modify the hosts file

Edit the /etc/hosts file:

 sudo vim /etc/hosts 

Add the mapping name of the local IP address corresponding to the local machine and the mapping name corresponding to the IP address of other nodes:

  0.0.0.0 master  
    0.0.0.0 slave1  
    0.0.0.0 slave2   

The node IP address is the "intranet management address"
3. Create a public key

Create a public key under the zhangyu user:

 ssh-keygen  

The following content appears:

Enter file in which to save the key (/home/zhangyu/.ssh/id_rsa):

Use the default option directly, press Enter, and the following content will appear:

Enter passphrase (empty for no passphrase)

Enter directly, the content appears:

Enter same passphrase again

Press Enter directly to complete the creation.

To copy the public key, you need to enter the user password during the process: "zhangyu"

ssh-copy-id master  

ssh-copy-id slave1  

 ssh-copy-id slave2  

After copying, you can test the connection.
4. Hadoop environment configuration

sudo mv /tmp/hadoop /opt/

After installing hadoop, configure environment variables and edit /etc/profile:

sudo vim /etc/profile  

Add the following at the end of the file:

 export HADOOP_HOME=/opt/hadoop/  
    export PATH=$HADOOP_HOME/bin:$PATH 

Refresh environment variables:

 source /etc/profile 

Use hadoop to view the version number command to verify whether the installation is successful:

hadoop version

5. Create a data folder

sudo mkdir /data

Change the owner to the current user:

 sudo chown -R zhangyu.zhangyu /data  

At this point, slave1 configuration is complete.

The steps of slave2 and slave1 are the same, so omitted here;

----------------=--------------------------------------------------------------------(_)-------------------------------------------------------------------------------=------------

The following content will continue after the configuration of all slave nodes is complete!

15. Format the distributed file system

Execute on the hadoop master node (and can only run once):

Key statement of namenode accessories:

/opt/hadoop/bin/hadoop namenode -format   

16. Start Hadoop
to execute on the Hadoop master node:

/opt/hadoop/sbin/start-all.sh 

You need to enter "yes" during execution to allow the use of public keys to connect to other machines.

17. View the Hadoop process.
Execute on the Hadoop master node:

The
output of jps must contain 6 processes, and the results are as follows:

weimingchao

Perform the same operation on the hadoop slave node:

jps  

The output must contain 3 processes

18. Enter the following code in the command line to open the Hadoop WebUI management interface:

firefox http://master:50070  

At this point, the complete distribution of hadoop has been installed! ! !

----------------=--------------------------------------------------------------------(_)-------------------------------------------------------------------------------=------------

Next, test:

19. Test the HDFS cluster and MapReduce task program
Use the WordCount sample program that comes with Hadoop to check the cluster; perform the following operations on the master node to create the HDFS directory:

hadoop fs -mkdir /zhangyu/  
hadoop fs -mkdir /zhangyu/input  

Upload the test file to the Hadoop HDFS cluster directory:

hadoop fs -put /tmp/word.txt /zhangyu/input  

Execute the wordcount program:

cd /opt/hadoop/share/hadoop/mapreduce/  
hadoop jar hadoop-mapreduce-examples-2.6.0-cdh5.4.5.jar wordcount /zhangyu/input/ /zhangyu/out/  

View the execution results:

hadoop fs -ls /zhangyu/out/  

If the result in the list contains the "_SUCCESS" file, the code cluster runs successfully.

To view the specific execution results, you can use the following commands:

hadoop fs -text /zhangyu/out/part-r-00000  

At this point, the cluster installation is complete.

Guess you like

Origin blog.csdn.net/m0_46202060/article/details/109608724